Trending topics
#
Bonk Eco continues to show strength amid $USELESS rally
#
Pump.fun to raise $1B token sale, traders speculating on airdrop
#
Boop.Fun leading the way with a new launchpad on Solana.
Even after the steep progress of the past 3 months, it remains that AI performance is tied to task familiarity. In domains that can be densely sampled (via programmatic generation + verification), performance is effectively unbounded, and will keep increasing from current levels. In novel, unfamiliar domains, performance remains low and further progress still requires new ideas, not just more data and compute.

13 hours ago
Ok, I think my experiment leaving AI working on stuff 24/7 ends here. It doesn't work. Code explodes in complexity, results are not that great, the AI can't get past hard walls (it is still completely unable to even *grasp* SupGen), and it is insanely expensive (spent ~1k over the last 2 days). The best results are on the JS compiler, mostly because it is familiar (compared to inets), but not worth losing control over the codebase.
I think the dream of having AI's working on the background and making real progress on things that matter (i.e., truly new things) isn't here yet. It is still a machine hard-stuck on its own training data, incapable of thinking out of the box. It is great for building things that were already built. But not new things
Also coding normally has the under-appreciated advantage that you're doing two things at the same time: building a codebase *and* learning it. AI's do only half of that. The other half is obviously impossible 🤔
For benchmarks that target novel tasks, a common form of benchmark hacking that arbitrages this gap is to generate a dense sampling of potential tasks by manually parameterizing the space and then brute-forcing it. Very expensive but it works. There's little you can do to restore benchmark validity here besides increasing the dimensionality of the task space.
20
Top
Ranking
Favorites
