Research Scientist |Meta Superintelligence Labs
Pioneered chain-of-thought prompting, instruction tuning (FLAN), and emergent abilities of LLMs at Google Brain. Co-created OpenAI's o1 reasoning model. Now at Meta Superintelligence Labs working on reasoning and reinforcement learning.
Biography
Jason Wei is an American AI researcher currently at Meta Superintelligence Labs, known for pioneering chain-of-thought prompting, instruction tuning (FLAN), and the concept of emergent abilities in large language models. He earned a BS in Computer Science and Mathematics from Harvey Mudd College (2016) and a PhD in Computer Science from Dartmouth College (2020, advised by Lorenzo Torresani). He joined Google as an AI Resident in October 2020, rising to Research Scientist at Google Brain, where he authored three of the most influential papers in modern LLM research: chain-of-thought prompting (5,000+ citations), FLAN instruction tuning, and emergent abilities of LLMs. In February 2023 he joined OpenAI, where he co-created the o1 reasoning model and contributed to o3 and Deep Research. In July 2025 he and colleague Hyung Won Chung left OpenAI to join Meta's newly formed Superintelligence Labs, where he focuses on reasoning and reinforcement learning for frontier AI systems.
Introduced chain-of-thought (CoT) prompting in January 2022, demonstrating that large language models can solve complex reasoning tasks by generating intermediate reasoning steps. The paper has been cited over 5,000 times and fundamentally changed how practitioners interact with LLMs, directly inspiring OpenAI's o1 reasoning model.
Led the FLAN series of research showing that finetuning language models on tasks described via natural language instructions dramatically improves zero-shot and few-shot performance. The initial FLAN (2021) used 60+ tasks on a 137B model; Flan-PaLM (2022) scaled to 1,800 tasks on PaLM 540B, achieving SOTA on MMLU. This work laid the foundation for instruction-tuned models like ChatGPT.
Defined and cataloged 137 emergent abilities — capabilities that appear unpredictably as models scale — providing a conceptual framework for understanding why larger models exhibit qualitatively new behaviors. Published in TMLR 2022.
Co-created OpenAI's o1 model (launched September 2024), which uses reinforcement learning to train models to perform chain-of-thought reasoning at inference time. This shifted the field from prompt-based CoT to learned reasoning, enabling adaptive test-time compute scaling.
First paper (2019), presenting four simple text augmentation techniques (synonym replacement, random insertion, random swap, random deletion) that boosted text classification performance, especially on small datasets. Accepted at EMNLP 2019.
Don't do chain of thought purely via prompting. Train models to do better chain of thought using RL.
In the history of deep learning we have always tried to scale training compute, but chain of thought is a form of adaptive compute that can also be scaled at inference time.
o1-mini is the most surprising research result I've seen in the past year.
Beating the teacher requires walking your own path and taking risks and rewards from the environment.
There will be no fast takeoff, because there is a jagged edge of intelligence capability and rate of improvement.
Research generated March 19, 2026