1/ What if I told you you could pre-train LLMs with a mixture of consumer-grade and datacenter GPUs, over low-bandwidth internet with minimal loss? New paper: Heterogeneous Low‑Bandwidth Pre‑Training of LLMs