ML Engineer & Consultant |Parlance Labs
Creator of nbdev and fastpages, led CodeSearchNet at GitHub, now teaches AI evals to 3,000+ engineers. 20+ years in ML across DataRobot, Airbnb, and GitHub.
Biography
Hamel Husain is a machine learning engineer with over 20 years of experience, currently an independent consultant at Parlance Labs. He previously held applied ML roles at GitHub (2017-2022) and Airbnb (2016-2017), and earlier at DataRobot. At GitHub he led CodeSearchNet, a large-scale benchmark for semantic code search that laid groundwork later used by OpenAI for code understanding models. He co-created nbdev (5.3k stars), a literate-programming system built on Jupyter Notebooks, and fastpages (3.6k stars), an open-source notebook blogging platform. In January 2026 he published 'Why I Stopped Using nbdev', explaining how AI coding assistants changed the trade-offs of literate programming. Husain now focuses on AI evaluation methodology, co-teaching the 'AI Evals for Engineers & PMs' course on Maven with Shreya Shankar, with over 3,000 students from 500+ companies including OpenAI, Anthropic, and Google.
Co-created with fast.ai the literate programming framework 'Create delightful software with Jupyter Notebooks'. 5.3k stars on GitHub. Led the v2 rewrite in 2022.
Open-source blogging platform with enhanced Jupyter Notebook support, enabling data scientists to publish notebooks as blog posts. 3.6k stars (archived).
Large-scale benchmark and dataset for semantic code search, enabling representation learning of code across multiple programming languages. 2.4k stars. Led at GitHub.
Co-taught with Shreya Shankar, enrolled 3,000+ students from 500+ companies including OpenAI, Anthropic, and Google. Teaches systematic AI product evaluation methodology.
Skills resource complementing the AI Evals course, providing practical evaluation templates and examples. 891 stars on GitHub.
Automated code review tool using Claude for iterative review feedback. 590 stars on GitHub.
Influential blog post establishing the case that unsuccessful AI products almost always share a common root cause: failure to create robust evaluation systems.
Unsuccessful products almost always share a common root cause: a failure to create robust evaluation systems.
Don't rely on generic evaluation frameworks to measure the quality of your AI. Instead, create an evaluation system specific to your problem.
Like software engineering, success with AI hinges on how fast you can iterate.
The abuse of generic metrics is endemic. Many eval vendors promote off the shelf metrics, which ensnare engineers into superfluous tasks.
I write software to solve problems, not to write code. I want to work in an environment where AI has the highest chance of success.
99% of the labor involved with fine-tuning is assembling high-quality data that covers your AI product's surface area.
Research generated March 19, 2026