Edward Beeching

Edward Beeching is a Research Scientist at Hugging Face in Lyon, France, specializing in reinforcement learning for LLM alignment and embodied learning. He holds a PhD from INSA Lyon / INRIA CHROMA on deep RL for robotic navigation and planning. At Hugging Face he co-created the Open LLM Leaderboard, co-authored the Zephyr alignment model and the Alignment Handbook, integrated the Decision Transformer into the Transformers library, and co-authored TRL (Transformer Reinforcement Learning). He built Godot RL Agents (1,400 stars), an open-source framework connecting the Godot game engine to deep RL algorithms. His recent work includes SmolLM3, a 3B dual-mode reasoning model, the JAT multi-purpose transformer agent, and a survey of 16 open-source async RL training libraries. He has published 381 models and 103 datasets on Hugging Face.

Reinforcement Learning from Human Feedback (RLHF)LLM Alignment & Preference TuningOpen LLM Evaluation & LeaderboardsDecision Transformers & Offline RLAsync RL Training InfrastructureSmall Language Models (SmolLM)Deep RL for Robotics & NavigationGame Engine RL (Godot RL Agents)Multi-Purpose Transformer AgentsTest-Time Compute Optimization

Timeline

12 Research12 total

2026

2026-03Research

Co-authored 'Keep the Tokens Flowing: Lessons from 16 Open-Source RL Libraries', a comprehensive survey of async RL training infrastructure

2025

2025-03Research

Co-authored 'Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning' (ICML 2025)

2025-07Research

Co-authored SmolLM3, a 3B parameter dual-mode reasoning model with multilingual and 128k context support

2024

2024-02Research

Co-authored 'Jack of All Trades, Master of Some, a Multi-Purpose Transformer Agent' (JAT) — first fully open-sourced multi-task RL agent

2023

2023-04Research

Co-authored 'StackLLaMA: A hands-on guide to train LLaMA with RLHF' blog post on Hugging Face

2023-05Research

Co-created the Open LLM Leaderboard on Hugging Face to track open-source LLM progress

2023-10Research

Co-authored 'Zephyr: Direct Distillation of LM Alignment' — 7B chat model surpassing Llama2-Chat-70B on MT-Bench without human annotation

2022

2022-03Research

Published Decision Transformer integration in Hugging Face Transformers library

2022-03Research

Joined Hugging Face as a Research Scientist in Lyon

2022-05Research

Defended PhD thesis 'Large-scale automatic learning of autonomous agent behavior with structured deep reinforcement learning' at INSA Lyon

2021

2021-12Research

Published 'Godot Reinforcement Learning Agents' paper (arXiv 2112.03636)

2020

2020-07Research

Published 'Learning to Plan with Uncertain Topological Maps' at ECCV 2020 (spotlight)

Key Contributions

Godot RL Agents

Open-source framework connecting the Godot game engine to deep RL algorithms (StableBaselines3, Sample Factory, Ray RLLib, CleanRL). 1,400 stars on GitHub.

Open LLM Leaderboard

Co-created the Hugging Face Open LLM Leaderboard for benchmarking open-source large language models on IFEval, BBH, MATH, GPQA, MUSR, and MMLU-PRO

Zephyr: Direct Distillation of LM Alignment

Co-authored Zephyr-7B, a distilled chat model that surpassed Llama2-Chat-70B on MT-Bench using distilled direct preference optimization without human annotation

The Alignment Handbook

Co-authored robust recipes for aligning language models with human and AI preferences, including fine-tuning recipes for SmolLM2-Instruct and Zephyr

TRL (Transformer Reinforcement Learning)

Co-author of Hugging Face's open-source library for training transformer language models with RL techniques including SFT, GRPO, and DPO

Decision Transformer in Transformers

Integrated the Decision Transformer model into Hugging Face's transformers library, enabling offline RL via conditional sequence modeling

SmolLM3

Co-authored SmolLM3, a 3B parameter dual-mode reasoning model with multilingual support, 128k context, and state-of-the-art performance at 3B scale

JAT: Jack of All Trades Multi-Purpose Transformer Agent

Co-authored the first fully open-sourced multi-task RL agent capable of sequential decision-making, CV, and NLP with a single set of weights

Mixture-of-Thoughts Dataset

Published the Mixture-of-Thoughts dataset on Hugging Face with 349k items and 197 likes, used for reasoning model training

Notable Quotes

“

I had the opportunity to spend the last month building an open-source, state of the art, dual-mode reasoning model at the 3B scale. Building on the amazing work of Hugging Face's Pretraining team. It was tough but we managed to get on the Pareto front with the Qwen3 models.

X (Twitter) post on SmolLM3·Source

“

A month ago I joined Hugging Face as a Research Scientist. They're great: opening an office in Lyon, allowing me to work on open-source projects and trusting me to define my own schedule. I am proud to have added the Decision Transformer to transformers.

X (Twitter) post on joining Hugging Face·Source

“

For those of you who don't have time to read 5,000 words about async RL plumbing (we get it, you have models to train).

Keep the Tokens Flowing blog post·Source

15 sources(click to expand)

Edward Beeching - Personal Website edbeeching on GitHub (90 repos, Godot RL Agents 1,400 stars)edbeeching on Hugging Face (381 models, 103 datasets)Ed Beeching - Research Scientist - Hugging Face | LinkedIn Edward Beeching on Google Scholar DBLP: Edward Beeching publications Zephyr: Direct Distillation of LM Alignment (arXiv)Open LLM Leaderboard on Hugging Face The Alignment Handbook on GitHub TRL: Transformer Reinforcement Learning on GitHub SmolLM3 blog post on Hugging Face Keep the Tokens Flowing: Lessons from 16 Open-Source RL Libraries Godot Reinforcement Learning Agents (arXiv)Jack of All Trades, Master of Some (arXiv)Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning (arXiv / ICML 2025)

Research generated March 19, 2026

Researchers & Thinkers/Edward Beeching

All Profiles