Alexander Rush

Biography

Alexander Rush (known as Sasha) is a computer scientist, educator, and engineer known for pioneering work in Natural Language Processing education and open-source ML tooling. He is an Associate Professor of Computer Science at Cornell Tech and the Cornell Ann S. Bowers College of Computing and Information Science, and currently works at Cursor building frontier RL models for code generation. Rush earned his PhD from MIT (2014) in Electrical Engineering and Computer Science under Michael Collins, focusing on Lagrangian relaxation methods for natural language decoding. He was an Assistant Professor at Harvard SEAS (2015-2019) before moving to Cornell Tech. He also served as a part-time researcher at Hugging Face. His group's work has been recognized with an NSF CAREER Award, a Sloan Fellowship, a NeurIPS 2023 Outstanding Main Track Runner-Up award for "Scaling Data-Constrained Language Models," and paper awards at EMNLP, NLP, Hardware, and Visualization conferences. He is widely known for The Annotated Transformer (a literate-programming walkthrough of "Attention is All You Need"), GPU Puzzles (12k+ stars), Tensor Puzzles, MiniTorch, and a series of educational puzzle repositories that have become standard learning resources in the ML community. In March 2025 he joined Cursor, where he co-designed the Composer model — a frontier MoE model trained with RL for real-world coding. He is based in New York City, where he teaches at Cornell Tech.

Natural Language ProcessingTransformer ArchitectureGPU Programming & CUDADeep Learning EducationStructured PredictionEfficient InferenceReinforcement Learning for CodeLanguage Model ScalingOpen Source ML ToolingSequence-to-Sequence Models

Timeline

14 Research14 total

2025

2025-03Research

Announced joining Cursor to build frontier RL models at scale in real-world coding environments

2025-10Research

Announced Composer — a frontier MoE model trained with RL for real-world coding, co-designed with Cursor

2024

2024-01Research

Released Annotated S4 and Annotated Mamba — literate implementations of state-space models

2023

2023-01Research

Released MiniChain — tiny library for coding with large language models (1.2k stars)

2023-12Research

Won NeurIPS 2023 Outstanding Main Track Runner-Up for "Scaling Data-Constrained Language Models" with Hugging Face

2022

2022-08Research

Released GPU Puzzles — learn CUDA through 14 interactive puzzles (now 12k+ GitHub stars)

2022-10Research

Appeared on CS224U podcast: NLP research, engineering, and education

2021

2021-01Research

Launched MiniTorch — educational deep learning library for Cornell Tech ML Engineering course

2020

2020-02Research

Released Torch-Struct: Deep Structured Prediction Library (arXiv:2002.00876)

2019

2019-03Research

Awarded NSF CAREER grant for research on controllable text generation

2019-09Research

Moved from Harvard to Cornell Tech as Associate Professor; joined NYC NLP community

2018

2018-04Research

Published The Annotated Transformer — literate-code walkthrough of "Attention is All You Need" using PyTorch

2015

2015-01Research

Joined Harvard SEAS as Assistant Professor of Computer Science

2014

2014-01Research

Completed PhD at MIT EECS under Michael Collins on Lagrangian relaxation for natural language decoding

Key Contributions

The Annotated Transformer

A literate-programming blog post that walks through the "Attention is All You Need" paper line-by-line with a complete PyTorch implementation. Became the definitive educational resource for understanding Transformer internals, refreshed with community contributions in 2022.

GPU Puzzles

A collection of 14 interactive CUDA puzzles with a visual debugger, requiring no background knowledge. The most popular GPU programming tutorial on GitHub with 12k+ stars, teaching parallel programming through hands-on problem solving.

Tensor Puzzles

A set of 21 puzzles for learning PyTorch tensor operations through constraint-based challenges. Companion to GPU Puzzles with 4k stars, training developers to think in terms of vectorized operations.

MiniTorch

A pure Python re-implementation of the PyTorch API designed for education. Used in Cornell Tech's ML Engineering course, it teaches deep learning internals through incremental implementation of autodiff, tensors, and neural networks from scratch.

Scaling Data-Constrained Language Models (NeurIPS 2023)

Award-winning research with Hugging Face studying the impact of training dataset size on LLM scaling laws. Won Outstanding Main Track Runner-Up at NeurIPS 2023 out of 13,321 submissions.

Annotated S4 / Annotated Mamba

Literate-code walkthroughs of state-space models (S4 and Mamba), continuing the Annotated Transformer tradition for next-generation sequence architectures.

Cursor Composer Model

A frontier MoE model trained with RL for real-world coding, co-designed with the Cursor platform. Represents a new approach to building specialized AI coding models with reinforcement learning at scale.

llama2.rs

A fast llama2 decoder implementation in pure Rust, demonstrating efficient inference techniques for large language models (1k+ stars).

Notable Quotes

“

If you can do half as good a job, but do it on a topic that a lot of people are trying to learn right now, that can be a valuable service.

CS224U Podcast, Oct 2022·Source

“

A bottleneck for young CS researchers is internalizing the core scale and unit calculations of LLMs. This video is about learning to think in a 'physics' style.

Twitter/X, Jul 2024·Source

“

At the end of the day, our interest is in building open source NLP.

Practical AI Podcast·Source

“

Cursor is a small, ambitious team, and they've created my favorite AI systems. We're now building frontier RL models at scale in real-world coding environments. Excited for how good coding is going to be.

Twitter/X announcement, Mar 2025·Source

“

One personal reflection is how interesting a challenge RL is. Unlike other ML systems, you can't abstract much from the full-scale system. Roughly, we co-designed this project and Cursor together in order to allow running the agent at the necessary scale.

Twitter/X, Oct 2025·Source

14 sources(click to expand)

Alexander Rush — personal website Cornell Tech — Alexander "Sasha" Rush GitHub profile — srush The Annotated Transformer (Harvard NLP)The Annotated Transformer — ACL Anthology MiniTorch — educational deep learning library PhD Thesis: Lagrangian relaxation for natural language decoding (MIT, 2014)Cornell Tech award-winning paper on scaling language models (NeurIPS 2023)NSF CAREER Award announcement (Harvard SEAS, 2019)CS224U Podcast: Sasha Rush on NLP research, engineering, and education Sasha Rush interview — Medium (Sayak Paul)Cursor announcement — Twitter/X Composer model announcement — Twitter/X Practical AI — Sasha Rush episodes

Research generated March 19, 2026

Researchers & Thinkers/Alexander Rush

All Profiles