Trending topics
#
Bonk Eco continues to show strength amid $USELESS rally
#
Pump.fun to raise $1B token sale, traders speculating on airdrop
#
Boop.Fun leading the way with a new launchpad on Solana.
the way to interpret it is that after post training your weights are sort of equidistant from all tasks it saw during pretraining (model has seen all tasks so they've pulled it towards them). so all this method does is perturb the weights and see which perturbations bring the network closer to task-specific weights. it's like really cheap lora
this also connects to the observation that post-training doesn't add knowledge, but simply chisels the pretraining distribution

Mar 13, 23:41
Simply adding Gaussian noise to LLMs (one step—no iterations, no learning rate, no gradients) and ensembling them can achieve performance comparable to or even better than standard GRPO/PPO on math reasoning, coding, writing, and chemistry tasks. We call this algorithm RandOpt.
To verify that this is not limited to specific models, we tested it on Qwen, Llama, OLMo3, and VLMs.
What's behind this? We find that in the Gaussian search neighborhood around pretrained LLMs, diverse task experts are densely distributed — a regime we term Neural Thickets.
Paper:
Code:
Website:

perturbing weights is really analogous to random rollouts in high temperature. i do think this can be iterative (like grpo)
perturb weights with large radius -> select better performers -> keep decreasing radius
this *should* increase task accuracy
@yule_gan did you try this?
31
Top
Ranking
Favorites
