Talk

How to build your own Tiny Language Model from scratch

LanguageEnglish
Audience levelIntermediate
Elevator pitch

Join us to discover Tiny Language Models: ultra-compact, resource-efficient AI tools that deliver impressive results. In this session, you’ll learn how to create your own model, from data preparation to training. These models combine innovation with efficiency, making AI more accessible than ever.

Abstract

During the past year, Small Language Models (SLMs), such as Microsoft’s Phi-3 and HuggingFace’s SmolLM, have gained a lot of attention. These models have demonstrated impressive results, often rivaling much larger models, particularly when applied to task-specific applications. But what if we went even smaller? In this talk, we’ll explore Tiny Language Models, ultra-compact models that are an order of magnitude smaller than SLMs. Despite their limited capacity and narrower scope, these models demonstrate surprising versatility and offer practical solutions across different use cases. The greatest advantage of these models is their ability to be trained with minimal resources in just a few days, making them accessible to a wide audience. This session will take a hands-on approach to understanding how to train your own tiny language model. We’ll begin by showing how to synthetically generate and curate a training dataset tailored to specific tasks using libraries like distilabel. Next, we’ll delve into training these models using repositories such as nanoGPT or llama2.c. You’ll learn how you can prototype and train a model in just a few hours, opening doors to fast experimentation and quick deployment. We’ll also discuss the real-world advantages of these ultra-compact models, from deploying them on edge devices for on-the-go inference to using them in agent-based modeling for simulations. Beyond standalone use, tiny models can also serve as tools to better understand and refine larger models, bridging the gap between innovation and efficiency. By the end of this session, you’ll gain a deeper appreciation for the potential of tiny language models, not just as standalone solutions but as key players in advancing AI accessibility. Whether you’re a developer, researcher, or AI enthusiast, this talk will equip you with the knowledge to make the most of these unassuming yet powerful models.

TagsMachine-Learning, Deep Learning, Natural Language Processing
Participant

Luca Gilli

I am the co-founder of Clearbox AI, a company specializing in synthetic data generation solutions to drive innovation and enhance data privacy. Outside of work, I enjoy hiking in the beautiful Piedmontese mountains. I’m also passionate about farming, with a special interest in experimenting with hydroponic techniques.