Nicholas Carlini

Nicholas Carlini is a Research Scientist at Anthropic (since March 2025), previously at Google DeepMind (2023-2025) and Google Brain (2018-2023). He holds a Ph.D. and B.A. in Computer Science and Mathematics from UC Berkeley (advisor: David Wagner). His research sits at the intersection of machine learning and computer security, focusing on adversarial robustness, training-data extraction, membership inference, model stealing, and LLM safety. His papers have received best paper awards at IEEE S&P (1), USENIX Security (2), and ICML (3), and have been covered by the New York Times, BBC, Nature, Science, Wired, and Popular Science. In February 2026 he led the landmark experiment where 16 parallel Claude agents autonomously built a 100,000-line Rust-based C compiler capable of compiling the Linux kernel. On GitHub (carlini, 1.9k followers, 42 public repos) his projects range from foundational adversarial-ML tooling (nn_robust_attacks, obfuscated-gradients) to creative hacks (printf-tac-toe, regex-chess) and LLM evaluation (yet-another-applied-llm-benchmark).

Adversarial Machine LearningLLM Security & SafetyTraining Data Extraction & PrivacyMembership Inference AttacksModel Stealing & ExtractionNeural Network RobustnessData Poisoning AttacksAI Alignment & Responsible AIMulti-Agent Software EngineeringCreative Programming & Code Golf

Timeline

17 Research17 total

2026

2026-01Research

Published blog post 'How to win a best paper award' -- reflections on a decade of impactful research

2026-02Research

Published Anthropic engineering blog post: 16 Claude agents built a 100K-line Rust-based C compiler that compiles Linux 6.9 on x86/ARM/RISC-V, costing ~$20K over ~2,000 sessions

2025

2025-01Research

Keynote at CoLM: 'Are Large Language Models Worth It?' -- comprehensive cost-benefit analysis of LLM risks and benefits

2025-01Research

Published 'Machines of Ruthless Efficiency' -- essay on near-term AI harms including surveillance, phishing, and job displacement

2025-01Research

Appeared on Machine Learning Street Talk (MLST) discussing AI security, emergent LLM capabilities, and model-stealing research

2025-01Research

Appeared on 'Security, Cryptography, Whatever' podcast discussing differential cryptanalysis on LLMs

2025-03Research

Career update: left Google DeepMind after seven years, joined Anthropic to work on safety and security research

2024

2024-01Research

Best Paper at ICML 2024 for 'Stealing Part of a Production Language Model' -- extracted embedding layers from ChatGPT for under $20

2023

2023-01Research

Published 'Scalable Extraction of Training Data from (Production) Language Models' showing adversaries can extract gigabytes of training data from open and closed models

2023-01Research

Published 'Quantifying Memorization Across Neural Language Models' at ICLR 2023 establishing log-linear memorization relationships

2022

2022-01Research

Published 'Deduplicating Training Data Makes Language Models Better' at ACL 2022 showing near-duplicate removal reduces verbatim memorization by 10x

2022-01Research

Published 'Membership Inference Attacks From First Principles' at IEEE S&P 2022 introducing the LiRA attack (10x more powerful at low FPR)

2021

2021-01Research

Published 'Extracting Training Data from Large Language Models' at USENIX Security 2021 -- landmark paper extracting verbatim PII from GPT-2

2021-01Research

Distinguished Paper at USENIX Security 2021 for 'Poisoning the Unlabeled Dataset of Semi-Supervised Learning'

2020

2020-01Research

Published 'Cryptanalytic Extraction of Neural Network Models' at CRYPTO 2020 -- novel cryptanalytic approach requiring 100x fewer queries

2018

2018-01Research

Best Paper at ICML 2018 for 'Obfuscated Gradients Give a False Sense of Security' -- landmark work identifying gradient masking pitfalls in adversarial defenses

2017

2017-01Research

Best Student Paper at IEEE S&P 2017 for 'Towards Evaluating the Robustness of Neural Networks' -- foundational C&W attack methodology

Key Contributions

Carlini & Wagner (C&W) Attack

Foundational adversarial robustness evaluation methodology for neural networks. The C&W attack became the standard benchmark for evaluating adversarial defenses. Best Student Paper at IEEE S&P 2017.

Obfuscated Gradients

Landmark ICML 2018 Best Paper showing that most adversarial example defenses at ICLR 2018 relied on gradient masking and could be circumvented. 909 stars on GitHub.

Training Data Extraction from LLMs

Series of papers demonstrating that language models leak individual training examples, including PII. Showed adversaries can extract gigabytes of training data from production models like GPT-2 and ChatGPT.

Stealing Part of a Production Language Model

ICML 2024 Best Paper demonstrating model-stealing attacks on production LLMs, recovering embedding layers from ChatGPT for under $20.

16 Claude Agents C Compiler

Led experiment where 16 parallel Claude agents autonomously wrote a 100,000-line Rust-based C compiler capable of compiling Linux 6.9, achieving 99% pass rate on GCC torture tests. Cost ~$20K over ~2,000 sessions.

yet-another-applied-llm-benchmark

Evaluation benchmark for language models on practical problem-solving tasks Carlini has previously encountered. 1,049 stars on GitHub.

deduplicate-text-datasets

Rust tool for text dataset deduplication, supporting the ACL 2022 paper showing near-duplicate removal reduces memorization by 10x. 1,300 stars (archived).

nn_robust_attacks

Reference implementation of robust evasion attacks against neural networks for finding adversarial examples. 860 stars on GitHub.

Notable Quotes

“

The science of security and flaws in machine learning should be open by default.

Career Update: Google DeepMind -> Anthropic·Source

“

Ensuring that the benefits of LLMs outweigh the risks is literally your job.

Are Large Language Models Worth It? (CoLM keynote)·Source

“

It's worth being really careful and honest with yourself about whether or not what you're doing is actually net positive.

Are Large Language Models Worth It? (CoLM keynote)·Source

“

The LLMs that we're developing are already sufficiently advanced that we probably would have called them science fiction just five or ten years ago.

Are Large Language Models Worth It? (CoLM keynote)·Source

“

We are building new AI systems because they are ruthlessly efficient, and this efficiency directly enables harms previously constrained by human limitations.

Machines of Ruthless Efficiency·Source

“

Agent teams show the possibility of implementing entire, complex projects autonomously.

Building a C Compiler with a Team of Parallel Claudes (Anthropic Engineering)·Source

12 sources(click to expand)

Nicholas Carlini - Personal Website carlini (Nicholas Carlini) on GitHub (42 repos, 1.9k followers)Nicholas Carlini on Google Scholar Career Update: Google DeepMind -> Anthropic (March 2025)Are Large Language Models Worth It? (CoLM keynote, 2025)Machines of Ruthless Efficiency (2025 essay)Building a C Compiler with a Team of Parallel Claudes (Anthropic, Feb 2026)Papers by Nicholas Carlini MLST Podcast: Nicholas Carlini (Google DeepMind) - Jan 2025 Security, Cryptography, Whatever: Cryptanalyzing LLMs with Nicholas Carlini Nicholas Carlini on dblp (full publication list)Future of Life Institute: Defeating AI Defenses with Nicholas Carlini

Research generated March 19, 2026

Researchers & Thinkers/Nicholas Carlini

All Profiles