500k samples of multilingual post-training data in 5 languages: French, Spanish, Italian, German and Portuguese. To address the lack of multilingual post-training datasets, we created these samples and found they improve performance on benchmarks like Global MMLU, Belebele, and Multi-IF
7,43K