The joy and pain of training an LLM from scratch

Thu Jun 04 2026 at 06:30 pm to 08:30 pm UTC+02:00

Talent Garden | Milano

Kaggle & Practical AI - Milano
Publisher/HostKaggle & Practical AI - Milano
The joy and pain of training an LLM from scratch
Advertisement
The talk presents Zagreus and Nesso, two small language models optimized for English and Romance languages, trained from scratch.
About this Event

ATTENZIONE: Per rimanere sempre aggiornato su questo evento e sui prossimi, iscriviti alla nostra pagina Linkedin (https://www.linkedin.com/company/kaggle-practical-ai-milano).


L'evento si svolgerà in modalità ibrida: in presenza e online.

>>> IN PRESENZA

Luogo: Talent Garden Isola - Piazza Città di Lombardia, 1 Milano

PIANO TERRA

Ingresso a partire dalle 18:15.

Evento gratuito con limite dei partecipanti e con registrazione obbligatoria.

>>> IMPORTANTE! ISCRIVITI A QUESTO EVENTO SOLO SE INTENDI PARTECIPARE IN PRESENZA. Se ti sei iscritto e per qualsiasi ragione sei impossibilitato a partecipare ANNULLA l'iscrizione.**

Se vuoi partecipare all'evento ONLINE segui le istruzioni riportate qui di seguito.

>>> ONLINE

ATTENZIONE LA DIRETTA STREAMING PER QUESTO EVENTO NON SARA' GARANTITA.

Iscriviti al nostro canale YouTube e segui l'evento in diretta streaming disponibile a questo indirizzo

https://www.youtube.com/@kagglePracticalAIMilano/streams


PROGRAMMA

18:15-18:30 Registrazione partecipanti

18:30-18:45 Presentazione della nostra community

18:45-19:45 Talk

19:45-20:30 Q&A e Networking


SPEAKER

Alessandro Ercolani He has studied Artificial Intelligence for many years, beginning with a Bachelor’s degree in Cognitive Science focused on Computational Neuroscience, followed by a Master’s degree in Computer Science and Natural Language Processing. He has worked as a functional programmer and as a Big Data and Machine Learning expert for the Italian Government. He is currently Head of Artificial Intelligence at PagoPA, where he develops AI-powered services for the public sector. In his spare time, he contributes to the open-source LLM ecosystem and trains language models at mii-llm, which he co-founded.


ABSTRACT

The current trajectory of large language models (LLMs) is dominated by scale, more parameters, more data, and more computation. However, this paradigm often overlooks a critical frontier: efficient, high-performing models designed for edge deployment and linguistic diversity. In this talk, we present the Zagreus and Nesso model families, a suite of ~0.4B parameter state of the art small language models (SLMs) trained entirely from scratch and optimized for English and Romance languages


We provide a fully reproducible, end-to-end account of training an LLM from first principles: from large-scale data engineering and tokenization (~1 trillion tokens) to distributed pre-training on a 64×A100 GPU cluster using Hugging Face Nanotron, and post-training with Axolotl or instruction-following and agentic capabilities. We discuss the practical challenges of multi-node training (Slurm orchestration, NCCL/CUDA compatibility), as well as architectural decisions favoring dense transformer models over MoE in the sub-billion parameter regime.


Beyond methodology, we present extensive multilingual evaluations using lm-evaluation-harness, demonstrating that carefully engineered SLMs can achieve competitive performance relative to larger models in their class. We further introduce a set of bilingual benchmark highlighting strengths and systematic failure modes across reasoning, generation, and factual tasks.


Our findings demonstrate that it is possible to train small language models that achieve state-of-the-art performance within specific classes of tasks, particularly when carefully engineered and aligned with high-quality data. Rather than relying on scale alone, we show how targeted design choices, efficient training pipelines, and focused post-training can unlock strong capabilities even in sub-billion parameter models. We conclude with actionable insights for practitioners building domain-specific or edge-deployable systems, and release all models, training recipes, and evaluation pipelines to support open and reproducible AI.

Advertisement

Event Venue & Nearby Stays

Talent Garden, 1 Piazza Città di Lombardia, Milano, Italy

Tickets

EUR 0.00

Icon
Concerts, fests, parties, meetups - all the happenings, one place.

Ask AI if this event suits you:

More Events in Milano

THE WALL - LAB
Thu, 04 Jun at 05:30 pm THE WALL - LAB

Fabbrica del Vapore

Breaking the Comfort Zone \u2022 The Aperitivo (Milano)
Thu, 04 Jun at 05:30 pm Breaking the Comfort Zone • The Aperitivo (Milano)

Hotel The Square Milano Duomo

SHARING IS CARING | Osteria Democratica | Milano
Thu, 04 Jun at 05:40 pm SHARING IS CARING | Osteria Democratica | Milano

Osteria Democratica

Presentazione Percorso di Certificazione mBIT Coach
Thu, 04 Jun at 06:00 pm Presentazione Percorso di Certificazione mBIT Coach

E-consultant

Social Spark: da invisibile a credibile \u2013 Presentazione del libro
Thu, 04 Jun at 06:00 pm Social Spark: da invisibile a credibile – Presentazione del libro

Museo d'Arte e Scienza

Paper\/Northern Lights | Gianni Pettena in BiM
Thu, 04 Jun at 07:00 pm Paper/Northern Lights | Gianni Pettena in BiM

BiM

Grande Degustazione  dell'Estate
Thu, 04 Jun at 07:00 pm Grande Degustazione dell'Estate

JODOK pizza e cucina

Yoga al parco
Thu, 04 Jun at 07:00 pm Yoga al parco

Parco Segantini

Artist\u2019s Club Vol.2
Thu, 04 Jun at 07:00 pm Artist’s Club Vol.2

Celeste al Mercato

Fuckup Nights
Thu, 04 Jun at 07:30 pm Fuckup Nights

BASE Milano

A.M.A. La Spalla - FOCUS: La Protesi di Spalla
Thu, 04 Jun at 07:30 pm A.M.A. La Spalla - FOCUS: La Protesi di Spalla

Il Vinaio al calice

Milano is Happening!

Never miss your favorite happenings again!

Explore Milano Events