As scikit-learn maintainers, we would love to use PyPI download stats and other similar metrics to help inform some of our decisions.
In this talk we will highlight a number of caveats we discovered while trying to understand the complex reality behind these seemingly simple metrics.
We all love to tell stories with data and we all love to listen to them.
As package maintainers or package users, we resort to proxy metrics (Github stars, PyPI download stats, website analytics …) to try to help answer inherently hard questions like these:
In the context of scikit-learn, we will present the kind of surprises and caveats we discovered when trying to make sense of the PyPI download stats.
Highlights include:
We will then zoom out a bit and talk about other metrics we looked at, for example scikit-learn.org website analytics, GitHub stars and “Used by” stats. After presenting the inherent biases of these datasources, we will summarize the kind of insights we gained by combining them.
During the presentation, we will also highlight a few tools and websites we used along the journey to make it easier to look at PyPI download stats numbers in more details.
We will conclude with some thoughts about how to combine this kind of metrics to inform some of our decisions, while at the same time not falling in love too much with the stories we tell with them.
Loïc has a Particle Physics background, which is how he discovered Python towards the end of his PhD.
He is a scikit-learn and joblib core contributor and has been involved in a number of Python open-source projects in the past 10 years, amongst which Pyodide, dask-jobqueue, sphinx-gallery and nilearn.