Trending topics
#
Bonk Eco continues to show strength amid $USELESS rally
#
Pump.fun to raise $1B token sale, traders speculating on airdrop
#
Boop.Fun leading the way with a new launchpad on Solana.

Jakub Pachocki
OpenAI
Last week, our reasoning models took part in the 2025 International Collegiate Programming Contest (ICPC), the world’s premier university-level programming competition. Our system solved all 12 out of 12 problems, a performance that would have placed first in the world (the best human team solved 11 problems).
This milestone rounds off an intense 2 months of competition performances by our models:
- A second place finish in AtCoder Heuristics World Finals
- Gold medal at the International Mathematical Olympiad
- Gold medal at the International Olympiad in Informatics
- And now, a gold medal, first place finish at the ICPC World Finals.
I believe these results, coming from a family of general reasoning models rooted in our main research program, are perhaps the clearest benchmark of progress this year. These competitions are great self-contained, time-boxed tests for the ability to discover new ideas. Even before our models were proficient at simple arithmetic, we looked towards these contests as milestones of progress towards transformative artificial intelligence.
Our models now rank among the top humans in these domains, when posed with well-specified questions and restricted to ~5 hours. The challenge now is moving to more open-ended problems, and much longer time horizons. This level of reasoning ability, applied over months and years to problems that really matter, is what we’re after - automating scientific discovery.
This rapid progress also underscores the importance of safety & alignment research. We still need more understanding of the alignment properties of long-running reasoning models; in particular, I recommend reviewing the fascinating findings from the study of scheming in reasoning models that we released today (
Congratulations to my teammates that poured their hearts into getting these competition results, and to everyone contributing to the underlying fundamental research that enables them!

Mostafa RohaninejadSep 18, 01:06
1/n
I’m really excited to share that our @OpenAI reasoning system got a perfect score of 12/12 during the 2025 ICPC World Finals, the premier collegiate programming competition where top university teams from around the world solve complex algorithmic problems. This would have placed it first among all human participants. 🥇🥇

172
I am extremely excited about the potential of chain-of-thought faithfulness & interpretability. It has significantly influenced the design of our reasoning models, starting with o1-preview.
As AI systems spend more compute working e.g. on long term research problems, it is critical that we have some way of monitoring their internal process. The wonderful property of hidden CoTs is that while they start off grounded in language we can interpret, the scalable optimization procedure is not adversarial to the observer's ability to verify the model's intent - unlike e.g. direct supervision with a reward model.
The tension here is that if the CoTs were not hidden by default, and we view the process as part of the AI's output, there is a lot of incentive (and in some cases, necessity) to put supervision on it. I believe we can work towards the best of both worlds here - train our models to be great at explaining their internal reasoning, but at the same time still retain the ability to occasionally verify it.
CoT faithfulness is part of a broader research direction, which is training for interpretability: setting objectives in a way that trains at least part of the system to remain honest & monitorable with scale. We are continuing to increase our investment in this research at OpenAI.

Bowen BakerJul 16, 2025
Modern reasoning models think in plain English.
Monitoring their thoughts could be a powerful, yet fragile, tool for overseeing future AI systems.
I and researchers across many organizations think we should work to evaluate, preserve, and even improve CoT monitorability.

352
Top
Ranking
Favorites