熱門話題
#
Bonk 生態迷因幣展現強韌勢頭
#
有消息稱 Pump.fun 計劃 40 億估值發幣,引發市場猜測
#
Solana 新代幣發射平臺 Boop.Fun 風頭正勁

EigenPhi HQ 🎯 Wisdom of DeFi (🔭, 🎙) 🦇🔊
企業 AI 的使用案例往往會讓驗證變得複雜。但如果你能利用結構化日誌、經濟意圖或代理行為,你就能增強信號。我們一起努力將這些可驗證的行為納入模型訓練體系。

Salesforce AI Research9月24日 08:57
📣 驗證的變化:理解大型語言模型中的驗證動態
📄 論文:
🔗 項目:
你是否曾想過你的 LLM 驗證器是否真的可靠?我們的分析框架揭示了三個關鍵因素,決定了在問題難度、生成器能力和驗證器能力下的驗證成功。
關鍵見解:
📈 問題難度驅動正確回應的識別 - 驗證器在簡單問題上表現出色,但在困難問題上卻掙扎
🔍 生成器的強度影響錯誤檢測 - 弱生成器會產生明顯的錯誤,強生成器則會創造優雅但錯誤的解決方案
⚖️ 驗證器的擴展在某些範疇中顯示出收益遞減 - 有時 GPT-4o 僅僅比較小的模型好一點
💡 對於測試時擴展:弱生成器 + 驗證可以匹配強生成器的性能,而昂貴的驗證器並不總是值得。
感謝 Yefan Zhou @LiamZhou98, Austin Xu @austinsxu, Yilun Zhou @YilunZhou, Janvijay Singh @iamjanvijay, Jiang Gui @JiangGui, Shafiq Joty @JotyShafiq 的出色工作!
#LLM #AIVerification #TestTimeScaling #FutureOfAI #EnterpriseAI

743
向TOOL團隊致敬 👏 將Ethereum提升為超大規模的協處理器是個遊戲改變者。在我們這邊,只有當擴展基礎設施與透明、可審計的交易處理和優先級數據相匹配時,才能蓬勃發展。沒有這一點,低延遲的最終性將為集中化打開大門。

0xprincess9月24日 22:26
1// 我們自豪地宣布 TOOL 測試網啟動!
3.35K
驗證者的法則是一個很好的視角,Jason。我很好奇你對像是密碼學或鏈上記錄這樣的領域有什麼看法——在這些領域中,驗證幾乎是免費的,但解決方案的複雜性卻爆炸性增長? 💭🔐

Jason Wei2025年7月16日
New blog post about asymmetry of verification and "verifier's law":
Asymmetry of verification–the idea that some tasks are much easier to verify than to solve–is becoming an important idea as we have RL that finally works generally.
Great examples of asymmetry of verification are things like sudoku puzzles, writing the code for a website like instagram, and BrowseComp problems (takes ~100 websites to find the answer, but easy to verify once you have the answer).
Other tasks have near-symmetry of verification, like summing two 900-digit numbers or some data processing scripts. Yet other tasks are much easier to propose feasible solutions for than to verify them (e.g., fact-checking a long essay or stating a new diet like "only eat bison").
An important thing to understand about asymmetry of verification is that you can improve the asymmetry by doing some work beforehand. For example, if you have the answer key to a math problem or if you have test cases for a Leetcode problem. This greatly increases the set of problems with desirable verification asymmetry.
"Verifier's law" states that the ease of training AI to solve a task is proportional to how verifiable the task is. All tasks that are possible to solve and easy to verify will be solved by AI. The ability to train AI to solve a task is proportional to whether the task has the following properties:
1. Objective truth: everyone agrees what good solutions are
2. Fast to verify: any given solution can be verified in a few seconds
3. Scalable to verify: many solutions can be verified simultaneously
4. Low noise: verification is as tightly correlated to the solution quality as possible
5. Continuous reward: it’s easy to rank the goodness of many solutions for a single problem
One obvious instantiation of verifier's law is the fact that most benchmarks proposed in AI are easy to verify and so far have been solved. Notice that virtually all popular benchmarks in the past ten years fit criteria #1-4; benchmarks that don’t meet criteria #1-4 would struggle to become popular.
Why is verifiability so important? The amount of learning in AI that occurs is maximized when the above criteria are satisfied; you can take a lot of gradient steps where each step has a lot of signal. Speed of iteration is critical—it’s the reason that progress in the digital world has been so much faster than progress in the physical world.
AlphaEvolve from Google is one of the greatest examples of leveraging asymmetry of verification. It focuses on setups that fit all the above criteria, and has led to a number of advancements in mathematics and other fields. Different from what we've been doing in AI for the last two decades, it's a new paradigm in that all problems are optimized in a setting where the train set is equivalent to the test set.
Asymmetry of verification is everywhere and it's exciting to consider a world of jagged intelligence where anything we can measure will be solved.

876
熱門
排行
收藏