Research Scientist |Hugging Face
RL and LLM alignment researcher. Co-created the Open LLM Leaderboard, co-authored Zephyr and the Alignment Handbook, and built Godot RL Agents.
Biography
Edward Beeching is a Research Scientist at Hugging Face in Lyon, France, specializing in reinforcement learning for LLM alignment and embodied learning. He holds a PhD from INSA Lyon / INRIA CHROMA on deep RL for robotic navigation and planning. At Hugging Face he co-created the Open LLM Leaderboard, co-authored the Zephyr alignment model and the Alignment Handbook, integrated the Decision Transformer into the Transformers library, and co-authored TRL (Transformer Reinforcement Learning). He built Godot RL Agents (1,400 stars), an open-source framework connecting the Godot game engine to deep RL algorithms. His recent work includes SmolLM3, a 3B dual-mode reasoning model, the JAT multi-purpose transformer agent, and a survey of 16 open-source async RL training libraries. He has published 381 models and 103 datasets on Hugging Face.
Open-source framework connecting the Godot game engine to deep RL algorithms (StableBaselines3, Sample Factory, Ray RLLib, CleanRL). 1,400 stars on GitHub.
Co-created the Hugging Face Open LLM Leaderboard for benchmarking open-source large language models on IFEval, BBH, MATH, GPQA, MUSR, and MMLU-PRO
Co-authored Zephyr-7B, a distilled chat model that surpassed Llama2-Chat-70B on MT-Bench using distilled direct preference optimization without human annotation
Co-authored robust recipes for aligning language models with human and AI preferences, including fine-tuning recipes for SmolLM2-Instruct and Zephyr
Co-author of Hugging Face's open-source library for training transformer language models with RL techniques including SFT, GRPO, and DPO
Integrated the Decision Transformer model into Hugging Face's transformers library, enabling offline RL via conditional sequence modeling
Co-authored SmolLM3, a 3B parameter dual-mode reasoning model with multilingual support, 128k context, and state-of-the-art performance at 3B scale
Co-authored the first fully open-sourced multi-task RL agent capable of sequential decision-making, CV, and NLP with a single set of weights
Published the Mixture-of-Thoughts dataset on Hugging Face with 349k items and 197 likes, used for reasoning model training
I had the opportunity to spend the last month building an open-source, state of the art, dual-mode reasoning model at the 3B scale. Building on the amazing work of Hugging Face's Pretraining team. It was tough but we managed to get on the Pareto front with the Qwen3 models.
A month ago I joined Hugging Face as a Research Scientist. They're great: opening an office in Lyon, allowing me to work on open-source projects and trusting me to define my own schedule. I am proud to have added the Decision Transformer to transformers.
For those of you who don't have time to read 5,000 words about async RL plumbing (we get it, you have models to train).
Research generated March 19, 2026