Machine Learning Engineer |Jane Street
Creator of HuggingFace Accelerate, core maintainer of Transformers, co-author of fastai book with Jeremy Howard.
Biography
Sylvain Gugger is a Machine Learning Engineer at Jane Street on the ML-infra team, where he helps traders and researchers accelerate their model training and inference. Before Jane Street, he spent three years at Hugging Face (2020-2023) as a core maintainer of the Transformers library (1,250+ commits) and creator of the Accelerate library (290+ commits, 9.5k+ GitHub stars), which simplifies distributed PyTorch training across GPUs, TPUs, and mixed-precision setups with minimal code changes. Prior to Hugging Face, he worked at fast.ai (2017-2019) alongside Jeremy Howard, co-authoring "Deep Learning for Coders with fastai and PyTorch" (O'Reilly, 2020) and helping build the fastai library and courses. Gugger began his career as a mathematics and computer science teacher in France, spending seven years at CPGE (Classes Preparatoires aux Grandes Ecoles), where he authored 10 math textbooks published by Dunod. He discovered machine learning through Jeremy Howard's fast.ai MOOC after relocating to New York City around 2015. A key moment in his early ML career was the Stanford DAWNBench competition, where his implementation of Leslie Smith's super-convergence method helped the fast.ai team achieve first place on CIFAR-10 (fastest and cheapest training). His widely read blog posts on the 1-cycle learning rate policy and learning rate finders have become foundational references in the deep learning community. He holds a Masters in Mathematics and Computer Science from Ecole normale superieure (2003-2007).
Created the Accelerate library (9.5k+ GitHub stars), a thin PyTorch wrapper that simplifies distributed training across GPUs, TPUs, and mixed-precision setups with only 5 lines of code changes. Supports FSDP, DeepSpeed, and data/pipeline/tensor parallelism.
Contributed 1,250+ commits to the Transformers library as a core maintainer, working on the Trainer API, model implementations, and training infrastructure that powers millions of ML workflows.
Co-authored with Jeremy Howard the O'Reilly book (2020) and accompanying Jupyter notebooks (fastbook), making deep learning accessible to programmers without a PhD. Cited by 16,500+ researchers.
Popularized and implemented Leslie Smith's 1-cycle policy and super-convergence technique, enabling dramatically faster training. His blog posts became foundational references adopted across PyTorch, TensorFlow, and Keras ecosystems.
Key contributor to the fast.ai team that won Stanford's DAWNBench CIFAR-10 competition for fastest and cheapest training on publicly available infrastructure, beating entries from Google and Intel clusters.
Co-developed the fastai deep learning library with Jeremy Howard, introducing progressive resizing, learning rate finders, and one-cycle training as default practices that influenced the broader PyTorch ecosystem.
I created a new open source library to make it much more lightweight to help people with our trainings.
No one knows anything about machine learning. Like, it's really just a cooking sense.
Making sure that your code is there... you can change a small line of code in your model and think 'Oh, this is totally harmless,' but then it actually destroys the performance.
Yesterday was my last day at Hugging Face. The past three years have been exhilarating and I am very proud of what the team has accomplished during that time!
The way we tune all the other hyper-parameters of the model will impact the best learning rate.
Research generated March 19, 2026