Exploring datasets for RAG and fine-tuning: how to refine textual data ?

Fri Jun 28 2024 at 01:00 pm to 05:00 pm UTC+02:00

3 Rue Rossini | Paris

datacraft
Publisher/Hostdatacraft
Exploring datasets for RAG and fine-tuning: how to refine textual data ?
Advertisement
Workshop - Exploring datasets for RAG and fine-tuning: how to refine textual data ?
About this Event

Speakers


  • Charles de Dampierre, CEO, Bunka.ai
  • Louis-marie Lorin, COO, Bunka.ai


Prerequisites
Level in Machine Learning
Good knowledge of Machine Learning/DA/AI
Level in Python
Good knowledge of Python

Talk description

Bunka provides an intuitive and innovative way of exploring textual datasets, using visual maps to give an overview of your data. Bunka enhances traditional visualizations by adding a fully customizable dimensional layer. Whether ML engineers want to analyze perplexity, toxicity ( than can be define ), or various other text quality metrics, Bunka has them covered.

Bunka's technology is fully adaptive and connects to various frameworks, whether it is the latest LLM model, a new state-of-the-art embedder, or another clustering algorithm.

Since RAG and fine-tuning require clean datasets to perform optimally, being able to visually analyze the quality of a dataset and evaluate the impact of poor-quality data on these models is crucial.

In this workshop, we will use an open-source dataset consisting of conversations between users and ChatGPT, which could represent generic customer service chats. The interest lies in its generality, as it contains standard chats applicable to any use case.

First, we will represent and visualize the chat titles from ChatGPT by embedding them, followed by doing the same process for the chat content.

Next, we'll analyze the toxicity of the responses using a RoBERTa classifier to compare with the efficiency of Bunka. Toxicity will be defined as the extent to which a response contains offensive, discriminatory, or otherwise harmful content, according to a metric we will establish during the workshop.

After that, we will evaluate the quality of responses using Bunka, a tool designed to improve AI model performance by enhancing data quality through advanced visual topic modeling and cluster analysis.

Finally, we'll identify and eliminate redundant data, resulting in a filtered and optimized dataset, ready for LLM fine-tuning or RAG applications.


Cet événement est réservé à nos membres, mais nous gardons quelques places pour ceux qui souhaiteraient découvrir le club. N'hésitez à vous inscrire vous serez sur liste d'attente et nous vous confirmerons votre participation 3 jours avant l'événement.

2406-Bunka

Advertisement

Event Venue & Nearby Stays

3 Rue Rossini, 3 Rue Rossini, Paris, France

Tickets

EUR 0.00

Sharing is Caring:

More Events in Paris

Boutique \u00e9ph\u00e9m\u00e8re
Fri Jun 28 2024 at 10:00 am Boutique éphémère

Chantilly, France

Pop up coiffage
Fri Jun 28 2024 at 10:30 am Pop up coiffage

169 Rue du Faubourg Saint-Antoine,Paris,75011,FR

Pop up coiffage
Fri Jun 28 2024 at 10:30 am Pop up coiffage

169 Rue du Faubourg Saint-Antoine

Remise des dipl\u00f4mes de l'\u00c9ponyme
Fri Jun 28 2024 at 11:00 am Remise des diplômes de l'Éponyme

École de théâtre l'Éponyme

Ateliers ouverts 2024
Fri Jun 28 2024 at 12:00 pm Ateliers ouverts 2024

14 rue Bonaparte, 75006 Paris, France

D\u00e9j' des C\u00e9libatairesParis \ud83c\udf7d\ufe0f\u2764\ufe0f
Fri Jun 28 2024 at 12:00 pm Déj' des CélibatairesParis 🍽️❤️

Nouveau Restaurant Paris 19ème Villette (adresse envoyée aux inscrits)

Hommage \u00e0 Pierre Petitmengin
Fri Jun 28 2024 at 02:00 pm Hommage à Pierre Petitmengin

École normale supérieure - PSL (salle Dussane)

Conf\u00e9rences \u00e0 l'attention des donateurs et amis de l'Institut Pasteur
Fri Jun 28 2024 at 02:30 pm Conférences à l'attention des donateurs et amis de l'Institut Pasteur

205 Rue de Vaugirard

One Day Trip to Paris; The city of Beauty and Love!
Fri Jun 28 2024 at 03:00 pm One Day Trip to Paris; The city of Beauty and Love!

Paris, France

Cours d'essai - Dessin d\u2019apr\u00e8s mod\u00e8le vivant 15h et 16h30  (NABA)
Fri Jun 28 2024 at 03:00 pm Cours d'essai - Dessin d’après modèle vivant 15h et 16h30 (NABA)

Beaux-Arts de Paris

Innovate France 2024
Fri Jun 28 2024 at 04:00 pm Innovate France 2024

Sorbonne University Pierre and Marie Curie Campus

Paris is Happening!

Never miss your favorite happenings again!

Explore Paris Events