ML-NYC Speaker Series and Happy Hour: Danqi Chen

Tue Apr 08 2025 at 04:00 pm to 06:00 pm UTC-04:00

Flatiron Institute | New York

ML-NYC Speaker Series and Happy Hour
Publisher/HostML-NYC Speaker Series and Happy Hour
ML-NYC Speaker Series and Happy Hour:  Danqi Chen
Advertisement
Title: Optimizing Data Use for Pre-training Language Models
About this Event

The ML in NYC Speaker Series + Happy Hour is excited to host Professor Danqi Chen as our April speaker! Her talk will take place this Tuesday, April 8th, at 4pm at the Flatiron Institute. As always, there will be a reception afterward for all attendees.


Title: Optimizing Data Use for Pre-training Language Models

Abstract: Modern language models are trained on massive, unstructured data consisting of trillions of tokens, typically obtained by crawling the web. In this talk, I argue that we are still in the early stages of understanding pre-training data and unlocking its full potential, and that more effective use of data can lead to both more compute-efficient and more capable language models. I will present several perspectives on improving data curation, focusing on three general techniques. First, quality filtering aims to train classifiers that can distinguish high-quality from low-quality documents at scale (QuRating). Second, domain curation focuses on developing taxonomies of web data and leveraging domain mixing strategies to enhance pre-training (WebOrganizer). Third, I will introduce a simple pre-training approach that conditions on metadata, which both accelerates training and improves model steerability (MeCo). Together, these efforts highlight the importance of optimizing the use of pre-training data and point toward a more data-centric paradigm for training future language models.
Bio: Danqi Chen is an Assistant Professor of Computer Science at Princeton University and co-leads the Princeton NLP Group. She also serves as an Associate Director of Princeton Language and Intelligence (PLI), an initiative focused on developing fundamental research of large AI models. Her research centers on training, adapting, and understanding language models (LMs), with an emphasis on making them more accessible to academia. She also works at the intersection of LMs and retrieval, exploring how retrieval can serve as a foundational component of LMs. Before joining Princeton, Danqi was a visiting scientist at Facebook AI Research in Seattle. She earned her Ph.D. from Stanford University (2018) and her B.E. from Tsinghua University (2012), both in Computer Science. Her work has been recognized with a Sloan Fellowship, an NSF CAREER Award, a Samsung AI Researcher of the Year Award, and multiple outstanding paper awards from ACL and EMNLP.

Advertisement

Event Venue & Nearby Stays

Flatiron Institute, 162 5th Avenue, New York, United States

Tickets

USD 0.00

Sharing is Caring:

More Events in New York

Street Snapshots: Learning Photography with Your Phone  (Spring 2025)
Tue, 08 Apr, 2025 at 12:30 pm Street Snapshots: Learning Photography with Your Phone (Spring 2025)

The Collective @ 60 MacDougal

Hands-On: Agnolotti
Tue, 08 Apr, 2025 at 01:00 pm Hands-On: Agnolotti

Eataly NYC Flatiron

In the Age of Titanic: Immigration Versus Travel Abroad the Great Liners
Tue, 08 Apr, 2025 at 01:00 pm In the Age of Titanic: Immigration Versus Travel Abroad the Great Liners

Church of St. Ignatius Loyola

The Good Business that comes from Good Business Research Forum
Tue, 08 Apr, 2025 at 01:15 pm The Good Business that comes from Good Business Research Forum

Fordham University School of Law

Schubert, Brahms, and Ravel IN-PERSON AT DOROT
Tue, 08 Apr, 2025 at 02:00 pm Schubert, Brahms, and Ravel IN-PERSON AT DOROT

Dorot

Web3 University Tour @ NYU
Tue, 08 Apr, 2025 at 03:00 pm Web3 University Tour @ NYU

NYU Stern School of Business

11th Annual Pace University Disability is Diversity Film Festival
Tue, 08 Apr, 2025 at 05:00 pm 11th Annual Pace University Disability is Diversity Film Festival

Pace University 15 Beekman St

2 Hour CEU Professional Event on Trusts & Their Uses
Tue, 08 Apr, 2025 at 05:00 pm 2 Hour CEU Professional Event on Trusts & Their Uses

Inspīr Carnegie Hill

Private Wine Tasting - Charles Heidsieck Champagne
Tue, 08 Apr, 2025 at 05:00 pm Private Wine Tasting - Charles Heidsieck Champagne

Millesima USA LLC

Leaf ID + Watercolor Workshop!
Tue, 08 Apr, 2025 at 05:00 pm Leaf ID + Watercolor Workshop!

Center for Wellbeing and Happiness

BIC 2025 Open House
Tue, 08 Apr, 2025 at 05:00 pm BIC 2025 Open House

259 Convent Ave

Free Wine Tasting - Kosher Wines for Passover
Tue, 08 Apr, 2025 at 05:00 pm Free Wine Tasting - Kosher Wines for Passover

Millesima USA LLC

New York is Happening!

Never miss your favorite happenings again!

Explore New York Events