DApp Store | Web3 Hub tapahtumille ja peleille

Trendaavat aiheet

Pluralis Research

Protocol Learning

Pluralis Research kirjasi uudelleen

Probably biggest week in Decentralized Training to date off back of ICLR and more about to come out. Summary of situation as it stands today: 1. Decentralized RL post-training is clearly working. @gensynai the latest with great results here. This process takes a strong base model, gives copies to participants who generate reasoning traces which are then collected and used to improve the base model. This obviously relies on the base models being available/open-weight, and is significantly cheaper than Pretraining. Nodes only need to do inference. The drawback is there is mounting evidence (and it is very intuitive) that it is not possible to RL your way past a bad base model. So you retain a dependency. We need to wait for the results of these runs, but the reality is this is going to work one way or another because the process is so trivially parallizable. 2. Data-Parallel (DP) pretraining looks good. Both @NousResearch and @PrimeIntellect already have results here at 10B scale model sizes. It will be very straightforward (but expensive for node operators) to extend this to the 100B case. This is because in DP every node keeps a full copy of the model, so you need for example 8xh100s to train at the 10b size; you can’t use small cards. So you can just extend this technique by scaling up the nodes and do cross-datacenter collaborative training (i.e. every node is comprised of 100 H100’s or so, and you train >100b model). You also have the problem that everyone sees a full copy of model so not clear how to monetize (Protocol Learning solves this). 3. Model-Parallel (where the model itself is split over nodes - think 1000 geographically seperate Macbooks training a 100b param model, where each device only has a small portion of total model) started to show the first inklings of being possible. We (@PluralisHQ) published the ‘Beyond Top k' paper which compresses comms between nodes over 90%, as well as two other works that show you can use heterogenous devices in a Pipeline Parallel (PP) setup. We also had our Nesterov method for PP accepted into ICML2025, which as far as I'm aware of is the first paper on decentralized training accepted into a major AI conference since the original SWARM paper, and should help catalyse interest from mainstream AI circles. Is decentralized model-parallel solved → NO. The communication bandwidth is so much worse compared to a datacenter, that even 90% is not enough. We need to get to around 300x compression to hit parity with centralised training. There remains a huge question if this is even possible - you are destroying so much of the training signal by doing this. This is Pluralis’s focus. However what happens if this works? For the first time, you can do real collaborative pretraining. There is no dependence on deepseek or Meta. Individuals can combine compute to create models at this scale, from scratch. We get actual community-driven innovation happening here in a way that has never existed to date. Decentralised RL-based post-training can then be used to make these models even better. The reality is we are at the earliest days of something hugely significant occurring here. This is going to be a major field. The above companies are firing on all cylinders, a bunch more is about to come out of the gate shortly, and I don’t expect this to slow down at all from now until whatever happens happens. And if your reading this you’re early.

Johtavat

Rankkaus

Suosikit

Ketjussa trendaava

Trendaa X:ssä

Viimeisimmät suosituimmat rahoitukset

Merkittävin