Transformers have changed fields from Computer Vision to protein-folding. But what about forecasting? Come learn how titans like Amazon and Google train their foundational models, and, most importantly, how not to fall for the hype and learn how to read the public benchmarks.
This is a pragmatic talk, tailored for Generative AI and ML practitioners. While the talk explains how transformer-based architectures are adapted to forecasting problems, the true goal is to get a lay of the land of the publicly available datasets, the data that was used to train the model, and see through the benchmarks that have been published.
In other words, the objective of the talk is to make attendants more familiar with the literature, and provide them with elements to evaluate the authors’ claims beyond the hype.
The first part of the talk explains how researchers discretised time series into a finite “dictionary” and the theoretical limitations of this approach, such as how to deal with different sampling frequencies during training.
We will then make a parallel with Large Language Models (LLMs) scaling laws to evaluate the data strategies used to train such universal forecasting models. In other words, we will ask the question whether we have enough publicly available time-series data to train foundational models, and how this can affect such models’ evaluation.
Finally, we will display how we attempted to benchmark those models against a robust baseline of models, and where our experiment is compared to the publicly available results.
No specific prior knowledge is required for attendance. Though forecasting practitioners stand to gain the most from this talk, every practitioner in machine learning or generative AI can follow along and draw the key conclusions.
Outline Minutes 1-3. Problem statement: challenges in adapting transformers to the time-series domain. Minutes 3-10. How Chronos and TimesFM are implemented. Minutes 10-15. Do we have enough data to train universal forecasters? Minutes 15-25. Lessons learnt from evaluating transformer-based models for a specific use-case. Minutes 25-30. Wrap up and Q&A.