Trending topics
#
Bonk Eco continues to show strength amid $USELESS rally
#
Pump.fun to raise $1B token sale, traders speculating on airdrop
#
Boop.Fun leading the way with a new launchpad on Solana.
My first-day impressions on Codex 5.3 vs Opus 4.6:
Goal: can they actually do the job of an AI engineer/researcher?
TLDR:
- Yes, they (surprisingly) can.
- Opus 4.6 > Codex-5.3-xhigh for this task
- both are a big jump over last gen
Task: Optimize @karpathy's nanochat “GPT-2 speedrun” - wall-clock time to GPT-2–level training. The code is already heavily optimized. #1 on the leaderboard hits 57.5% MFU on 8×H100. Beating it is genuinely hard.
Results:
1. Both behaved like real AI engineers. They read the code, explored ideas, ran mini benchmarks, wrote plans, and kicked off full end-to-end training while I slept.
2. I woke up to real wins from Opus 4.6:
- torch compile "max-autotune-no-cudagraphs mode" (+1.3% speed)
- Muon optimizer ns_steps=3 (+0.3% speed)
- BF16 softcap, skip .float() cast (-1GB memory)
Total training time: 174.42m → 171.40m
Codex-5.3-xhigh had interesting ideas and higher MFU, but hurt final quality. I suspect context limits mattered. I saw it hit 0% context at one point.
3. I ran the same experiment earlier on Opus 4.5 and Codex 5.2. There were no meaningful gains. Both new models are clearly better.
Overall take:
I prefer Opus 4.6 for this specific task. The 1M context window matters. The UX is better.
People keep saying “Codex 5.3 > Opus 4.6”, but I believe different models shine in different codebases and tasks.
Two strong models is a win.
I’ll happily use both....
Top
Ranking
Favorites
