Hamel Husain

Biography

Hamel Husain is a machine learning engineer with over 20 years of experience, currently an independent consultant at Parlance Labs. He previously held applied ML roles at GitHub (2017-2022) and Airbnb (2016-2017), and earlier at DataRobot. At GitHub he led CodeSearchNet, a large-scale benchmark for semantic code search that laid groundwork later used by OpenAI for code understanding models. He co-created nbdev (5.3k stars), a literate-programming system built on Jupyter Notebooks, and fastpages (3.6k stars), an open-source notebook blogging platform. In January 2026 he published 'Why I Stopped Using nbdev', explaining how AI coding assistants changed the trade-offs of literate programming. Husain now focuses on AI evaluation methodology, co-teaching the 'AI Evals for Engineers & PMs' course on Maven with Shreya Shankar, with over 3,000 students from 500+ companies including OpenAI, Anthropic, and Google.

AI Evaluation (Evals)LLM Product DevelopmentLiterate ProgrammingSemantic Code SearchMLOps & ML InfrastructureFine-Tuning LLMsOpen-Source Developer ToolsData Science for AIJupyter Notebook WorkflowsAI-Assisted Development

Timeline

16 Research16 total

2026

2026-01Research

Published 'Why I Stopped Using nbdev', explaining how AI coding tools changed literate programming trade-offs

2026-01Research

Published 'LLM Evals: Everything You Need to Know' comprehensive FAQ on hamel.dev

2026-03Research

Published blog post 'Evals Skills for Coding Agents' on hamel.dev

2025

2025-06Research

Published blog post on Inspect AI, an OSS Python library for LLM evals

2025-07Research

Published 'Stop Saying RAG Is Dead' blog post

2025-09Research

Appeared on Lenny's Podcast with Shreya Shankar discussing why AI evals are the hottest new skill for product builders

2025-10Research

Published 'Selecting The Right AI Evals Tool' on hamel.dev

2024

2024-04Research

Published 'Your AI Product Needs Evals' foundational blog post on AI evaluation methodology

2024-07Research

Appeared on TWIML AI Podcast #694: 'Building Real-World LLM Products with Fine-Tuning and More'

2022

2022-01Research

Helped lead a complete rewrite of nbdev (v2) with fast.ai

2020

2020-01Research

Presented fastpages at JupyterCon 2020 and released the notebook blogging system

2019

2019-01Research

Led launch of CodeSearchNet, a large-scale benchmark and dataset for semantic code search at GitHub

2018

2018-01Research

Gave talks on 'Semantic Search at GitHub' and 'Natural Language Code Search With Kubeflow' at KubeCon 2018

2017

2017-01Research

Joined GitHub as Staff Machine Learning Engineer, began work on code representation learning

2016

2016-01Research

Joined Airbnb as Data Scientist, discovered critical data-leakage bug in growth marketing model, created blueprints that informed BigHead ML platform

2003

2003-01Research

Started career building credit risk models at a bank, entering ML before the term was common

Key Contributions

nbdev

Co-created with fast.ai the literate programming framework 'Create delightful software with Jupyter Notebooks'. 5.3k stars on GitHub. Led the v2 rewrite in 2022.

fastpages

Open-source blogging platform with enhanced Jupyter Notebook support, enabling data scientists to publish notebooks as blog posts. 3.6k stars (archived).

CodeSearchNet

Large-scale benchmark and dataset for semantic code search, enabling representation learning of code across multiple programming languages. 2.4k stars. Led at GitHub.

AI Evals for Engineers & PMs (Maven course)

Co-taught with Shreya Shankar, enrolled 3,000+ students from 500+ companies including OpenAI, Anthropic, and Google. Teaches systematic AI product evaluation methodology.

evals-skills

Skills resource complementing the AI Evals course, providing practical evaluation templates and examples. 891 stars on GitHub.

claude-review-loop

Automated code review tool using Claude for iterative review feedback. 590 stars on GitHub.

'Your AI Product Needs Evals' (blog post)

Influential blog post establishing the case that unsuccessful AI products almost always share a common root cause: failure to create robust evaluation systems.

Notable Quotes

“

Unsuccessful products almost always share a common root cause: a failure to create robust evaluation systems.

Your AI Product Needs Evals (blog post)·Source

“

Don't rely on generic evaluation frameworks to measure the quality of your AI. Instead, create an evaluation system specific to your problem.

Your AI Product Needs Evals (blog post)·Source

“

Like software engineering, success with AI hinges on how fast you can iterate.

Your AI Product Needs Evals (blog post)·Source

“

The abuse of generic metrics is endemic. Many eval vendors promote off the shelf metrics, which ensnare engineers into superfluous tasks.

LLM Evals: Everything You Need to Know·Source

“

I write software to solve problems, not to write code. I want to work in an environment where AI has the highest chance of success.

Why I Stopped Using nbdev·Source

“

99% of the labor involved with fine-tuning is assembling high-quality data that covers your AI product's surface area.

Your AI Product Needs Evals (blog post)·Source

11 sources(click to expand)

hamelsmu (Hamel Husain) on GitHub — 389 repos, 2.5k followers Hamel Husain's Blog (hamel.dev)Hamel Husain - Staff Machine Learning Engineer @ GitHub (ApplyingML interview)Hamel Husain - Parlance Labs | LinkedIn AI Evals For Engineers & PMs — Maven course Why AI evals are the hottest new skill — Lenny's Podcast TWIML AI Podcast #694: Building Real-World LLM Products with Hamel Husain Why I Stopped Using nbdev — Hamel's Blog Hamel Husain, Author at The GitHub Blog Your AI Product Needs Evals — Hamel's Blog An interview with Hamel Husain by Sayak Paul (Medium)

Research generated March 19, 2026

Builders & Technical Leaders/Hamel Husain

All Profiles