About this Event
ATTENZIONE: Per rimanere sempre aggiornato su questo evento e sui prossimi, iscriviti alla nostra pagina Linkedin (https://www.linkedin.com/company/kaggle-practical-ai-milano).
L'evento si svolgerà in modalità ibrida: in presenza e online.
>>> IN PRESENZA
Luogo: Talent Garden Isola - Piazza Città di Lombardia, 1 Milano
PIANO TERRA
Ingresso a partire dalle 18:15.
Evento gratuito con limite dei partecipanti e con registrazione obbligatoria.
>>> IMPORTANTE! ISCRIVITI A QUESTO EVENTO SOLO SE INTENDI PARTECIPARE IN PRESENZA. Se ti sei iscritto e per qualsiasi ragione sei impossibilitato a partecipare ANNULLA l'iscrizione.**
Se vuoi partecipare all'evento ONLINE segui le istruzioni riportate qui di seguito.
>>> ONLINE
ATTENZIONE LA DIRETTA STREAMING PER QUESTO EVENTO NON SARA' GARANTITA.
Iscriviti al nostro canale YouTube e segui l'evento in diretta streaming disponibile a questo indirizzo
https://www.youtube.com/@kagglePracticalAIMilano/streams
PROGRAMMA
18:15-18:30 Registrazione partecipanti
18:30-18:45 Presentazione della nostra community
18:45-19:45 Talk
19:45-20:30 Q&A e Networking
SPEAKER
Alessandro Ercolani He has studied Artificial Intelligence for many years, beginning with a Bachelor’s degree in Cognitive Science focused on Computational Neuroscience, followed by a Master’s degree in Computer Science and Natural Language Processing. He has worked as a functional programmer and as a Big Data and Machine Learning expert for the Italian Government. He is currently Head of Artificial Intelligence at PagoPA, where he develops AI-powered services for the public sector. In his spare time, he contributes to the open-source LLM ecosystem and trains language models at mii-llm, which he co-founded.
ABSTRACT
The current trajectory of large language models (LLMs) is dominated by scale, more parameters, more data, and more computation. However, this paradigm often overlooks a critical frontier: efficient, high-performing models designed for edge deployment and linguistic diversity. In this talk, we present the Zagreus and Nesso model families, a suite of ~0.4B parameter state of the art small language models (SLMs) trained entirely from scratch and optimized for English and Romance languages
We provide a fully reproducible, end-to-end account of training an LLM from first principles: from large-scale data engineering and tokenization (~1 trillion tokens) to distributed pre-training on a 64×A100 GPU cluster using Hugging Face Nanotron, and post-training with Axolotl or instruction-following and agentic capabilities. We discuss the practical challenges of multi-node training (Slurm orchestration, NCCL/CUDA compatibility), as well as architectural decisions favoring dense transformer models over MoE in the sub-billion parameter regime.
Beyond methodology, we present extensive multilingual evaluations using lm-evaluation-harness, demonstrating that carefully engineered SLMs can achieve competitive performance relative to larger models in their class. We further introduce a set of bilingual benchmark highlighting strengths and systematic failure modes across reasoning, generation, and factual tasks.
Our findings demonstrate that it is possible to train small language models that achieve state-of-the-art performance within specific classes of tasks, particularly when carefully engineered and aligned with high-quality data. Rather than relying on scale alone, we show how targeted design choices, efficient training pipelines, and focused post-training can unlock strong capabilities even in sub-billion parameter models. We conclude with actionable insights for practitioners building domain-specific or edge-deployable systems, and release all models, training recipes, and evaluation pipelines to support open and reproducible AI.
Event Venue & Nearby Stays
Talent Garden, 1 Piazza Città di Lombardia, Milano, Italy
EUR 0.00












