Lewis Tunstall

Research Papers

3 papers

arXiv

Zephyr: Direct Distillation of LM Alignment

2310.16944·2023-10-25

arXiv

Efficient Few-Shot Learning Without Prompts (SetFit)

2209.11055·2022-09-22

arXiv

Like a Good Nearest Neighbor: Practical Content Moderation and Text Classification

2302.08957·2023-02-17

Biography

Lewis Tunstall is a Machine Learning Engineer at Hugging Face, where he leads post-training efforts and builds open-source tools to align language models with human preferences. He co-authored the O'Reilly bestseller "Natural Language Processing with Transformers" (2022) with Leandro von Werra and Thomas Wolf, and is a core contributor to TRL (Transformer Reinforcement Learning) and the Alignment Handbook. Before transitioning to ML, Tunstall earned a PhD in Theoretical Physics from the University of Adelaide, was a 2010 Fulbright Scholar, and held research positions in Australia, the USA, and Switzerland (University of Bern). He grew up in northwest Tasmania, Australia and is now based in Bern, Switzerland.

Reinforcement Learning from Human Feedback (RLHF)LLM Alignment & Post-TrainingTransformer ArchitecturesFew-Shot LearningSynthetic Data for Fine-TuningDirect Preference Optimization (DPO)Small Language ModelsOpen Source AIMathematical Reasoning in LLMsNLP & Text Classification

Timeline

13 Research13 total

2025

2025-01Research

Co-authored SmolLM3-3B: small multilingual long-context reasoner with Anchored Preference Optimization post-training

2025-01Research

Co-launched Open-R1 initiative to reproduce DeepSeek R1 distillation pipeline; released OlympicCoder for competitive programming

2024

2024-06Research

Won first AI Math Olympiad (AIMO) progress prize with NuminaMath — 850k math problem-solution pairs dataset

2024-08Research

Released SmolLM2-Instruct alignment recipe via Alignment Handbook

2023

2023-10Research

Released Zephyr-7B: Direct Distillation of LM Alignment — surpassed Llama2-Chat-70B on MT-Bench using dDPO without human annotation (arXiv:2310.16944)

2023-11Research

Published Alignment Handbook: robust open-source recipes to align language models with human and AI preferences

2022

2022-02Research

Published "Natural Language Processing with Transformers" (O'Reilly) with Leandro von Werra and Thomas Wolf

2022-09Research

Released SetFit: Efficient Few-Shot Learning Without Prompts (arXiv:2209.11055)

2021

2021-01Research

Joined Hugging Face as Machine Learning Engineer; co-developed Hugging Face Course with Sylvain Gugger and Lysandre Jik

2020

2020-01Research

Co-created TRL (Transformer Reinforcement Learning) library for fine-tuning LLMs with RLHF

2019

2019-01Research

Conceived idea for NLP with Transformers book; partnered with Leandro von Werra and sent cold email to Thomas Wolf at Hugging Face

2016

2016-11Research

Visiting researcher at Higgs Centre for Theoretical Physics, University of Edinburgh; affiliated with University of Bern ITP

2010

2010-01Research

Awarded inaugural Fulbright South Australia Scholarship; PhD candidate at University of Adelaide in Theoretical Physics (particle physics)

Key Contributions

Natural Language Processing with Transformers (O'Reilly)

Bestselling book co-authored with Leandro von Werra and Thomas Wolf, providing a hands-on guide to building NLP applications with Hugging Face Transformers. Revised edition published in full color.

TRL (Transformer Reinforcement Learning)

Core contributor to Hugging Face's library for fine-tuning and aligning language models via RLHF, DPO, and other preference learning methods. One of the most widely used alignment libraries in the open-source ecosystem.

Zephyr-7B

Led the Zephyr project demonstrating that distilled DPO (dDPO) with AI feedback can produce a 7B-parameter chat model surpassing Llama2-Chat-70B on MT-Bench, without requiring human annotation.

Alignment Handbook

Open-source collection of robust recipes to align language models with human and AI preferences, covering SFT, DPO, Constitutional AI, and model-specific fine-tuning guides.

SetFit

Efficient few-shot learning framework that fine-tunes Sentence Transformers contrastively on small labeled sets, achieving competitive accuracy with orders of magnitude fewer parameters than prompt-based methods.

NuminaMath / AIMO Prize Winner

Teamed up with Numina to create an 850k math problem-solution dataset and won the first AI Math Olympiad progress prize, demonstrating the power of high-quality open datasets for mathematical reasoning.

SmolLM Series & Open-R1

Co-authored SmolLM2 and SmolLM3 small language models and co-launched the Open-R1 initiative to reproduce open reasoning model training, including OlympicCoder for competitive programming.

Hugging Face Course

Co-developed the free Hugging Face NLP course bridging software engineers into the Transformers ecosystem, covering tokenization, fine-tuning, and deployment across multiple modalities.

Notable Quotes

“

A few lines of code could outperform features that had been carefully designed by physicists over many years.

The Sequence Chat: Hugging Face's Lewis Tunstall on Zephyr, RLHF and LLM Innovation·Source

“

You could skip the costly human annotation step altogether and focus on generating data for specific tasks.

The Sequence Chat: Hugging Face's Lewis Tunstall on Zephyr, RLHF and LLM Innovation·Source

“

Just start. Just start coding. Just start contributing if you want to do open-source. You can always find reasons not to do it but you just have to get your hands dirty.

Machine Learning Experts Interview, Hugging Face Blog·Source

“

The biggest lesson I learned when I was starting out in the field is using baseline models when starting out... you can actually get quite far with regular expressions and linear models like logistic regression.

Machine Learning Experts Interview, Hugging Face Blog·Source

“

History generally shows that one shouldn't bet against deep learning!

The Sequence Chat: Lewis Tunstall on Building the Model that Won the AI Math Olympiad·Source

“

Although there are several open weight models for mathematics, the training datasets are rarely, if ever, made public.

The Sequence Chat: Lewis Tunstall on Building the Model that Won the AI Math Olympiad·Source

14 sources(click to expand)

Machine Learning Experts - Lewis Tunstall (Hugging Face Blog)Natural Language Processing with Transformers (O'Reilly)Zephyr: Direct Distillation of LM Alignment (arXiv:2310.16944)SetFit: Efficient Few-Shot Learning Without Prompts (arXiv:2209.11055)The Sequence Chat: Lewis Tunstall on Zephyr, RLHF and LLM Innovation The Sequence Chat: Lewis Tunstall on the AI Math Olympiad Lewis Tunstall GitHub Profile Alignment Handbook (GitHub)TRL - Transformer Reinforcement Learning (Hugging Face)SmolLM3: smol, multilingual, long-context reasoner (Hugging Face Blog)SDS 695: NLP with Transformers (SuperDataScience Podcast)Lewis Tunstall - DataTalks.Club University of Adelaide Fulbright Scholars Lewis Tunstall - Uphill Conference Speaker

Research generated March 19, 2026

Builders & Technical Leaders/Lewis Tunstall

All Profiles