Co-founder & CSO |Hugging Face
Co-founded Hugging Face and created the Transformers and Datasets libraries. Led BigScience/BLOOM, the largest open AI research collaboration. Now pushing open-source robotics with LeRobot.
Biography
Thomas Wolf is co-founder and Chief Science Officer (CSO) of Hugging Face, the collaborative open-source platform for machine learning that hosts over 1 million public models and is valued at $4.5 billion. After graduating from Ecole Polytechnique (Paris), he researched laser-plasma interactions at the BELLA Center of Lawrence Berkeley National Laboratory. He completed his Ph.D. in Statistical/Quantum Physics at Sorbonne University and ESPCI, working on superconducting materials. He then pivoted to law, earning a degree from Pantheon Sorbonne University and spending five years as a European patent attorney at Cabinet Plasseraud. In 2015, consulting for deep-learning startups, he recognized that many AI methods were re-branded statistical physics approaches and taught himself modern ML. In 2016, alongside Clement Delangue and Julien Chaumond, he co-founded Hugging Face in New York City, initially as a chatbot app before pivoting to become the central hub for open-source AI. Wolf created the Transformers library (158k+ GitHub stars) and the Datasets library, co-authored the O'Reilly book 'Natural Language Processing with Transformers,' initiated and led the BigScience research workshop that produced BLOOM (a 176-billion-parameter multilingual LLM), and now leads Hugging Face's push into open-source robotics with LeRobot. His papers have been cited over 55,000 times.
Created the Transformers library (158k+ GitHub stars), providing a unified API for state-of-the-art pretrained models across PyTorch, TensorFlow, and JAX. Used by 5,000+ research organizations worldwide.
Created the Datasets library for efficient access to thousands of ML datasets. Won EMNLP 2021 Best Demonstration Paper.
Initiated and led BigScience, the largest open research collaboration in AI (1,000+ researchers, 60+ countries). Produced BLOOM, a 176B-parameter multilingual open LLM trained on 46 languages.
Co-authored DistilBERT (NeurIPS 2019), demonstrating that a 40% smaller BERT model retaining 97% of performance could be produced via knowledge distillation, enabling wider deployment.
Co-authored the reference book on building language applications with Hugging Face, published by O'Reilly (2022).
Leading Hugging Face's open-source robotics initiative, bringing community-driven development to physical AI with affordable hardware like the $100 SO100 robotic arm.
Co-authored FineWeb (15T tokens) and FineWeb2 (3T+ words, multilingual) — open pretraining datasets that produce better-performing LLMs than other open data sources.
Led release of SmolLM family of small language models (135M-1.7B params) with fully open training data, demonstrating that small open models can rival larger proprietary ones.
It's nice to give a fish to someone to feed them, it's even better to teach them to fish.
Everyone feels like they can build with AI and not just consuming AI.
All of these people also become roboticists in a way, if you give them the tools.
The missing brick was really software that could adapt, that could be dynamic.
The big bet was, can you build a big community in robotics as well?
Research generated March 19, 2026