Trending topics
#
Bonk Eco continues to show strength amid $USELESS rally
#
Pump.fun to raise $1B token sale, traders speculating on airdrop
#
Boop.Fun leading the way with a new launchpad on Solana.
This BullshitBench result goes a long way toward explaining the widespread intuition that Claude is the best daily driver, despite Google and OAI’s eye-popping benchmarks.
Contrast BullshitBench with the problem-solving benchmarks. All of the latter presuppose correct solutions.
But in real life, problems are poorly defined and it’s often unclear what questions are worth asking or even have answers. You need a model that can steer you off the wrong path — ie, call bullshit.

Top
Ranking
Favorites
