Name: vLLM Inference Meetup Milano
Start: 2026-04-16T17:00:00+02:00
End: 2026-04-16T21:00:00+02:00
Location: Bending Spoons

Join us for the first vLLM inference meetup in Italy, hosted by Red Hat AI, IBM, NVIDIA, Mistral AI, and Bending Spoons
About this Event

Join us at the Milano vLLM Meetup!

Excited about large language models? Come hang out in person with fellow practitioners in Milano. It’s a great chance to chat, learn, and exchange cool ideas about vLLM tech in a relaxed setting. Whether you're a pro or just curious, everyone's welcome!

What to Expect

Deep technical sessions from vLLM maintainers, committers, and teams using vLLM at scale
Live demos focused on real workflows
Great networking with food and drinks

Who Should Attend

vLLM users and contributors
ML and infra engineers working on inference and serving
Platform teams running GenAI in production
Anyone curious about efficient inference across local, cloud, and Kubernetes

Don't miss out on this awesome meetup – see you there!

Agenda

🕑: 05:00 PM - 05:30 PM
Doors Open, Check-In
🕑: 05:30 PM - 05:40 PM
Welcome and Opening Remarks
Host: Andrea Ferretti, AI Researcher, Bending Spoons
🕑: 05:40 PM - 06:00 PM
Intro to vLLM and Project Update
Host: Nicolò Lucchesi, vLLM Maintainer, Red Hat
🕑: 06:00 PM - 06:30 PM
Powering Evernote AI features with vLLM
Host: Ludovico Papavassiliou, AI Engineer, Bending Spoons
🕑: 06:30 PM - 06:50 PM
vLLM & Mistral AI
Host: AI Engineers from Mistral AI
🕑: 06:50 PM - 07:10 PM
Distributed large-scale serving with llm-d
Host: AI Engineers from Red Hat AI
🕑: 07:10 PM - 07:20 PM
Coffee & Tea Break*
🕑: 07:20 PM - 07:40 PM
The New Open-Weight Frontier: Hybrid Models in vLLM
Host: Thomas Parnell, vLLM Maintainer , IBM
🕑: 07:40 PM - 08:00 PM
Model Compression for fast Inference on vLLM
Host: AI Engineers from Nvidia
🕑: 08:00 PM - 09:00 PM
Networking, Food and Drinks