The last of our three ICLR workshop papers: Compression in pipeline parallel training has struggled to go beyond 10% compression without hurting model performance. We get 90%.
6,76K