torchtitan has built it HSDP + diloco support, it's probably the best place right now to start doing decentralized learning research. It also come with support for many arch (llama3,llama4, deepseekv3...) as well as all possible parallelism (6d?). Pytorch team cooked here
15,75K