Name: Speeding up training with FP8 and Triton
Start: 2025-11-13T18:15:00+04:00
Location: Moskovyan 35

Speeding up training with FP8 and Triton

Training Large Language Models requires a lot of compute. To mitigate the GPU cost and speed up both the compute and communication, research labs have started exploring training in lower precision. FP8 has the potential of speeding up a kernel by up to 2x, but may introduce non-trivial quality degradation if done wrong. In this talk, Vlad from the YandexGPT pretraining team will provide an overview of recent papers on FP8 training and related open source software and share the insights behind production-scale FP8 pretraining. Along the way, we will cover how GPUs work and implement Triton kernels to speed up computations.
Agenda
18:15 l Doors open
19:00-20:00 l Talk by Vladislav + Q&A
20:00-21:00 l Networking
🗓 November 13, 7:00 PM
📍Yandex Hall, Yerevan
📎 Language: English