Assistant Professor |MIT EECS & CSAIL
Creator of DSPy (32.9k stars) and ColBERT. Pioneered declarative AI programming and late-interaction retrieval. Stanford NLP PhD, former Databricks Research Scientist. SIGIR 2025 Best Paper for WARP.
Biography
Omar Khattab is an Assistant Professor at MIT EECS & CSAIL (TIBCO Founders' Career Development Professor) and the creator of DSPy and ColBERT. He completed his CS PhD at Stanford NLP, advised by Matei Zaharia and Christopher Potts, supported by the Apple Scholars in AI/ML Fellowship. Before joining MIT in July 2025, he was a Research Scientist at Databricks. His research investigates how to program intelligent software systems that are partly specified in natural language, that process natural language at scale, and whose quality and cost can be optimized using language models. DSPy has exceeded 500,000 downloads per month and sparked applications at Google, Amazon, IBM, VMware, Databricks, Baidu, AliExpress, and dozens of startups. ColBERT has won five SIGIR paper awards since 2020, including the SIGIR 2025 Best Paper for the WARP retrieval engine.
The framework for programming -- not prompting -- language models. A declarative programming model for composing and automatically optimizing natural language programs using modular signatures, with optimizers like MIPRO and GEPA. 32,900+ stars, 500K+ monthly downloads.
Late interaction retrieval model that independently encodes queries and documents with BERT, then uses fine-grained MaxSim for efficient scoring. Foundational to modern neural IR with 1,500+ citations and five SIGIR paper awards.
Improved ColBERT with residual compression and the PLAID indexing engine, dramatically reducing storage and latency while preserving retrieval quality.
Efficient engine for multi-vector retrieval achieving 3x speedup over ColBERTv2/PLAID with dynamic similarity imputation and implicit decompression. Won SIGIR 2025 Best Paper Award.
General inference paradigm allowing LLMs to programmatically examine, decompose, and recursively call themselves over long inputs, processing contexts up to two orders of magnitude beyond model windows.
Algorithm for optimizing instructions and few-shot demonstrations in multi-stage language model programs, achieving up to 13% accuracy improvement over baselines.
Knowledge curation system built on DSPy that automatically researches and writes Wikipedia-style articles from scratch. Published at NAACL 2024.
DSPy is basically a programming language for building AI systems. We ask you to write your system in normal Python, but express your AI steps in the form of DSPy signatures. If you do that well, we give you two things automatically.
We need higher-level programming languages. We have only one: DSPy.
One of the biggest challenges facing DSPy is the lack of competition. As far as I can tell, there's just no other serious programming model for general-purpose, declarative AI programming.
Research generated March 19, 2026