Omar Khattab

Omar Khattab is an Assistant Professor at MIT EECS & CSAIL (TIBCO Founders' Career Development Professor) and the creator of DSPy and ColBERT. He completed his CS PhD at Stanford NLP, advised by Matei Zaharia and Christopher Potts, supported by the Apple Scholars in AI/ML Fellowship. Before joining MIT in July 2025, he was a Research Scientist at Databricks. His research investigates how to program intelligent software systems that are partly specified in natural language, that process natural language at scale, and whose quality and cost can be optimized using language models. DSPy has exceeded 500,000 downloads per month and sparked applications at Google, Amazon, IBM, VMware, Databricks, Baidu, AliExpress, and dozens of startups. ColBERT has won five SIGIR paper awards since 2020, including the SIGIR 2025 Best Paper for the WARP retrieval engine.

Declarative AI ProgrammingLanguage Model Program OptimizationNeural Information RetrievalLate Interaction Retrieval (ColBERT)Multi-Vector SearchRecursive Language ModelsNatural Language ProgramsRetrieval-Augmented GenerationFoundation Model ProgrammingScalable IR Infrastructure

Timeline

11 Research11 total

2026

2026-01Research

Published updated Recursive Language Models (RLM) paper with Alex Zhang and Tim Kraska; RLM-Qwen3-8B outperforms vanilla GPT-5 on long-context tasks

2025

2025-01Research

Completed PhD thesis 'Building more reliable and scalable AI systems with foundation model programming' at Stanford

2025-07Research

Joined MIT EECS & CSAIL as Assistant Professor (TIBCO Founders' Career Development Professor)

2025-07Research

WARP: An Efficient Engine for Multi-Vector Retrieval wins SIGIR 2025 Best Paper Award

2024

2024-01Research

DSPy paper accepted as ICLR 2024 Spotlight ('Compiling Declarative Language Model Calls into Self-Improving Pipelines')

2024-01Research

STORM knowledge curation system published at NAACL 2024, built on DSPy

2024-06Research

Joined Databricks as Research Scientist, increasing investment in open-source DSPy community

2024-10Research

MIPRO optimizer paper published at EMNLP 2024, achieving up to 13% accuracy gains on multi-stage LM programs

2023

2023-01Research

DSPy framework open-sourced under stanfordnlp on GitHub

2022

2022-04Research

ColBERTv2 published at NAACL 2022, advancing efficient neural retrieval with residual compression

2020

2020-07Research

ColBERT published at SIGIR 2020, introducing late interaction for efficient passage search over BERT (1,500+ citations)

Key Contributions

DSPy

The framework for programming -- not prompting -- language models. A declarative programming model for composing and automatically optimizing natural language programs using modular signatures, with optimizers like MIPRO and GEPA. 32,900+ stars, 500K+ monthly downloads.

ColBERT

Late interaction retrieval model that independently encodes queries and documents with BERT, then uses fine-grained MaxSim for efficient scoring. Foundational to modern neural IR with 1,500+ citations and five SIGIR paper awards.

ColBERTv2 & PLAID

Improved ColBERT with residual compression and the PLAID indexing engine, dramatically reducing storage and latency while preserving retrieval quality.

WARP

Efficient engine for multi-vector retrieval achieving 3x speedup over ColBERTv2/PLAID with dynamic similarity imputation and implicit decompression. Won SIGIR 2025 Best Paper Award.

Recursive Language Models (RLM)

General inference paradigm allowing LLMs to programmatically examine, decompose, and recursively call themselves over long inputs, processing contexts up to two orders of magnitude beyond model windows.

MIPRO Optimizer

Algorithm for optimizing instructions and few-shot demonstrations in multi-stage language model programs, achieving up to 13% accuracy improvement over baselines.

STORM

Knowledge curation system built on DSPy that automatically researches and writes Wikipedia-style articles from scratch. Published at NAACL 2024.

Notable Quotes

“

DSPy is basically a programming language for building AI systems. We ask you to write your system in normal Python, but express your AI steps in the form of DSPy signatures. If you do that well, we give you two things automatically.

X/Twitter post·Source

“

We need higher-level programming languages. We have only one: DSPy.

X/Twitter post responding to OpenAI·Source

“

One of the biggest challenges facing DSPy is the lack of competition. As far as I can tell, there's just no other serious programming model for general-purpose, declarative AI programming.

X/Twitter post·Source

12 sources(click to expand)

Omar Khattab - Personal Website Omar Khattab - MIT EECS Faculty Profile okhat (Omar Khattab) on GitHub DSPy: The framework for programming -- not prompting -- language models (GitHub)ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT (SIGIR 2020)DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines (ICLR 2024 Spotlight)WARP: An Efficient Engine for Multi-Vector Retrieval (SIGIR 2025 Best Paper)Recursive Language Models (arXiv 2025)MIPRO: Optimizing Instructions and Demonstrations for Multi-Stage Language Model Programs (EMNLP 2024)Omar Khattab PhD Thesis - Stanford Digital Repository Omar Khattab on Google Scholar MIT School of Engineering welcomes new faculty in 2024-25

Research generated March 19, 2026

Researchers & Thinkers/Omar Khattab

All Profiles