Learn the inner workings of deep learning frameworks by building one from scratch. We will start with a simple autograd engine and progress to modules, optimizers, and even GPT-style architectures, all in just three hours.

Data scientists typically concentrate on the mathematical foundations when designing and training neural networks, often treating the process by which deep learning frameworks link high-level code with lower-level mathematical operations as a black box. As a result, the internal workings of these frameworks are frequently overlooked.

This workshop is aimed to open the black box by letting the participants construct a small deep learning framework from scratch. We will begin with creating a simple automatic differentiation engine, followed by more advanced elements such as modules, and optimizers.

In the second half of the workshop we will focus on building both simple neural networks and advanced architectures like GPT/transformers using our newly created framework.

The detailed text guide and solutions for all of the exercises are going to be provided as a public GitHub repository.

After constructing the framework from scratch, the participants will gain a comprehensive understanding of:

the inner workings of deep learning frameworks;
the mapping of high-level framework components to lower-level operations;
the operational principles of autograd engine and dynamic computational graphs;
higher-level abstractions such as modules, and their mechanisms of automatic parameters tracking;

Target audience

This workshop is primarily intended for those with some experience in building deep learning models using popular frameworks like PyTorch, TensorFlow, or JAX. However, prior experience is not absolutely mandatory, as essential fundamentals will be briefly covered.

Outline

Introduction, motivation and essential theory [20 min] Tensors + autograd engine [30 min] Modules and layers [30 min] Optimizers [15 min] Using the framework to build and train a simple model [15 min] Using the framework to build a working copy of GPT-2 [60 min] Concluding remarks + sharing bonus exercises [10 min]

What's inside the box? Building a deep learning framework from scratch.

Oleh Kostromin