DApp Store | Web3 Hub for Events & Games

Trending topics

Even after the steep progress of the past 3 months, it remains that AI performance is tied to task familiarity. In domains that can be densely sampled (via programmatic generation + verification), performance is effectively unbounded, and will keep increasing from current levels. In novel, unfamiliar domains, performance remains low and further progress still requires new ideas, not just more data and compute.

For benchmarks that target novel tasks, a common form of benchmark hacking that arbitrages this gap is to generate a dense sampling of potential tasks by manually parameterizing the space and then brute-forcing it. Very expensive but it works. There's little you can do to restore benchmark validity here besides increasing the dimensionality of the task space.

20

Top

Ranking

Favorites