DApp Store | Web3 Hub for Events & Games

Trending topics

the way to interpret it is that after post training your weights are sort of equidistant from all tasks it saw during pretraining (model has seen all tasks so they've pulled it towards them). so all this method does is perturb the weights and see which perturbations bring the network closer to task-specific weights. it's like really cheap lora this also connects to the observation that post-training doesn't add knowledge, but simply chisels the pretraining distribution

perturbing weights is really analogous to random rollouts in high temperature. i do think this can be iterative (like grpo) perturb weights with large radius -> select better performers -> keep decreasing radius this *should* increase task accuracy @yule_gan did you try this?

31

Top

Ranking

Favorites