熱門話題
#
Bonk 生態迷因幣展現強韌勢頭
#
有消息稱 Pump.fun 計劃 40 億估值發幣,引發市場猜測
#
Solana 新代幣發射平臺 Boop.Fun 風頭正勁

Julian Schrittwieser
Anthropic AlphaGo、AlphaZero、MuZero、AlphaCode、AlphaTensor、AlphaProof 技術人員Gemini RL 上一篇 DeepMind 首席研究工程師
這週我和MAD播客的@mattturck聊天非常有趣!我們討論了AI、強化學習的趨勢,以及為什麼它能解鎖代理、擴展等等:
我們談論的內容和進一步閱讀的連結:

Matt Turck2025年10月24日
Failing to Understand the Exponential, Again?
My conversation with @Mononofu - Julian Schrittwieser (@AnthropicAI, AlphaGo Zero, MuZero) - on Move 37, Scaling RL, Nobel Prize for AI, and the AI frontier:
00:00 - Cold open: “We’re not seeing any slowdown.”
00:32 - Intro — Meet Julian
01:09 - The “exponential” from inside frontier labs
04:46 - 2026–2027: agents that work a full day; expert-level breadth
08:58 - Benchmarks vs reality: long-horizon work, GDP-Val, user value
10:26 - Move 37 — what actually happened and why it mattered
13:55 - Novel science: AlphaCode/AlphaTensor → when does AI earn a Nobel?
16:25 - Discontinuity vs smooth progress (and warning signs)
19:08 - Does pre-training + RL get us there? (AGI debates aside)
20:55 - Sutton’s “RL from scratch”? Julian’s take
23:03 - Julian’s path: Google → DeepMind → Anthropic
26:45 - AlphaGo (learn + search) in plain English
30:16 - AlphaGo Zero (no human data)
31:00 - AlphaZero (one algorithm: Go, chess, shogi)
31:46 - MuZero (planning with a learned world model)
33:23 -Lessons for today’s agents: search + learning at scale
34:57 - Do LLMs already have implicit world models?
39:02 - Why RL on LLMs took time (stability, feedback loops)
41:43 - Compute & scaling for RL — what we see so far
42:35 - Rewards frontier: human prefs, rubrics, RLVR, process rewards
44:36 - RL training data & the “flywheel” (and why quality matters)
48:02 - RL & Agents 101 — why RL unlocks robustness
50:51 - Should builders use RL-as-a-service? Or just tools + prompts?
52:18 - What’s missing for dependable agents (capability vs engineering)
53:51 - Evals & Goodhart — internal vs external benchmarks
57:35 - Mechanistic interpretability & “Golden Gate Claude”
1:00:03 - Safety & alignment at Anthropic — how it shows up in practice
1:03:48 - Jobs: human–AI complementarity (comparative advantage)
1:06:33 - Inequality, policy, and the case for 10× productivity → abundance
1:09:24 - Closing thoughts
43.13K
熱門
排行
收藏

