We propose a new Momentum Look-Ahead algorithm at ICLR'25 MCDC which allows heterogenous GPUs to be used with high utilization in decentralized pretraining. Baselines are Async-DiloCo and DyLU which we outperform.
18,42K