Associate Professor / Researcher |Cornell Tech & Cursor
Creator of The Annotated Transformer, GPU Puzzles, and MiniTorch. NLP researcher turned frontier RL engineer at Cursor, building the Composer coding model.
Biography
Alexander Rush (known as Sasha) is a computer scientist, educator, and engineer known for pioneering work in Natural Language Processing education and open-source ML tooling. He is an Associate Professor of Computer Science at Cornell Tech and the Cornell Ann S. Bowers College of Computing and Information Science, and currently works at Cursor building frontier RL models for code generation. Rush earned his PhD from MIT (2014) in Electrical Engineering and Computer Science under Michael Collins, focusing on Lagrangian relaxation methods for natural language decoding. He was an Assistant Professor at Harvard SEAS (2015-2019) before moving to Cornell Tech. He also served as a part-time researcher at Hugging Face. His group's work has been recognized with an NSF CAREER Award, a Sloan Fellowship, a NeurIPS 2023 Outstanding Main Track Runner-Up award for "Scaling Data-Constrained Language Models," and paper awards at EMNLP, NLP, Hardware, and Visualization conferences. He is widely known for The Annotated Transformer (a literate-programming walkthrough of "Attention is All You Need"), GPU Puzzles (12k+ stars), Tensor Puzzles, MiniTorch, and a series of educational puzzle repositories that have become standard learning resources in the ML community. In March 2025 he joined Cursor, where he co-designed the Composer model — a frontier MoE model trained with RL for real-world coding. He is based in New York City, where he teaches at Cornell Tech.
A literate-programming blog post that walks through the "Attention is All You Need" paper line-by-line with a complete PyTorch implementation. Became the definitive educational resource for understanding Transformer internals, refreshed with community contributions in 2022.
A collection of 14 interactive CUDA puzzles with a visual debugger, requiring no background knowledge. The most popular GPU programming tutorial on GitHub with 12k+ stars, teaching parallel programming through hands-on problem solving.
A set of 21 puzzles for learning PyTorch tensor operations through constraint-based challenges. Companion to GPU Puzzles with 4k stars, training developers to think in terms of vectorized operations.
A pure Python re-implementation of the PyTorch API designed for education. Used in Cornell Tech's ML Engineering course, it teaches deep learning internals through incremental implementation of autodiff, tensors, and neural networks from scratch.
Award-winning research with Hugging Face studying the impact of training dataset size on LLM scaling laws. Won Outstanding Main Track Runner-Up at NeurIPS 2023 out of 13,321 submissions.
Literate-code walkthroughs of state-space models (S4 and Mamba), continuing the Annotated Transformer tradition for next-generation sequence architectures.
A frontier MoE model trained with RL for real-world coding, co-designed with the Cursor platform. Represents a new approach to building specialized AI coding models with reinforcement learning at scale.
A fast llama2 decoder implementation in pure Rust, demonstrating efficient inference techniques for large language models (1k+ stars).
If you can do half as good a job, but do it on a topic that a lot of people are trying to learn right now, that can be a valuable service.
A bottleneck for young CS researchers is internalizing the core scale and unit calculations of LLMs. This video is about learning to think in a 'physics' style.
Cursor is a small, ambitious team, and they've created my favorite AI systems. We're now building frontier RL models at scale in real-world coding environments. Excited for how good coding is going to be.
One personal reflection is how interesting a challenge RL is. Unlike other ML systems, you can't abstract much from the full-scale system. Roughly, we co-designed this project and Cursor together in order to allow running the agent at the necessary scale.
Research generated March 19, 2026