Name: The joy and pain of training an LLM from scratch
Start: 2026-06-04T18:30:00+02:00
End: 2026-06-04T20:30:00+02:00
Location: Talent Garden

The joy and pain of training an LLM from scratch

The talk presents Zagreus and Nesso, two small language models optimized for English and Romance languages, trained from scratch.
About this Event

ATTENZIONE: Per rimanere sempre aggiornato su questo evento e sui prossimi, iscriviti alla nostra pagina Linkedin (https://www.linkedin.com/company/kaggle-practical-ai-milano).

L'evento si svolgerà in modalità ibrida: in presenza e online.

>>> IN PRESENZA

Luogo: Talent Garden Isola - Piazza Città di Lombardia, 1 Milano

PIANO TERRA

Ingresso a partire dalle 18:15.

Evento gratuito con limite dei partecipanti e con registrazione obbligatoria.

>>> IMPORTANTE! ISCRIVITI A QUESTO EVENTO SOLO SE INTENDI PARTECIPARE IN PRESENZA. Se ti sei iscritto e per qualsiasi ragione sei impossibilitato a partecipare ANNULLA l'iscrizione.**

Se vuoi partecipare all'evento ONLINE segui le istruzioni riportate qui di seguito.

>>> ONLINE

ATTENZIONE LA DIRETTA STREAMING PER QUESTO EVENTO NON SARA' GARANTITA.

Iscriviti al nostro canale YouTube e segui l'evento in diretta streaming disponibile a questo indirizzo

https://www.youtube.com/@kagglePracticalAIMilano/streams

PROGRAMMA

18:15-18:30 Registrazione partecipanti

18:30-18:45 Presentazione della nostra community

18:45-19:45 Talk

19:45-20:30 Q&A e Networking

SPEAKER

Alessandro Ercolani He has studied Artificial Intelligence for many years, beginning with a Bachelor’s degree in Cognitive Science focused on Computational Neuroscience, followed by a Master’s degree in Computer Science and Natural Language Processing. He has worked as a functional programmer and as a Big Data and Machine Learning expert for the Italian Government. He is currently Head of Artificial Intelligence at PagoPA, where he develops AI-powered services for the public sector. In his spare time, he contributes to the open-source LLM ecosystem and trains language models at mii-llm, which he co-founded.

ABSTRACT

The current trajectory of large language models (LLMs) is dominated by scale, more parameters, more data, and more computation. However, this paradigm often overlooks a critical frontier: efficient, high-performing models designed for edge deployment and linguistic diversity. In this talk, we present the Zagreus and Nesso model families, a suite of ~0.4B parameter state of the art small language models (SLMs) trained entirely from scratch and optimized for English and Romance languages

We provide a fully reproducible, end-to-end account of training an LLM from first principles: from large-scale data engineering and tokenization (~1 trillion tokens) to distributed pre-training on a 64×A100 GPU cluster using Hugging Face Nanotron, and post-training with Axolotl or instruction-following and agentic capabilities. We discuss the practical challenges of multi-node training (Slurm orchestration, NCCL/CUDA compatibility), as well as architectural decisions favoring dense transformer models over MoE in the sub-billion parameter regime.

Beyond methodology, we present extensive multilingual evaluations using lm-evaluation-harness, demonstrating that carefully engineered SLMs can achieve competitive performance relative to larger models in their class. We further introduce a set of bilingual benchmark highlighting strengths and systematic failure modes across reasoning, generation, and factual tasks.

Our findings demonstrate that it is possible to train small language models that achieve state-of-the-art performance within specific classes of tasks, particularly when carefully engineered and aligned with high-quality data. Rather than relying on scale alone, we show how targeted design choices, efficient training pipelines, and focused post-training can unlock strong capabilities even in sub-billion parameter models. We conclude with actionable insights for practitioners building domain-specific or edge-deployable systems, and release all models, training recipes, and evaluation pipelines to support open and reproducible AI.