Research Scientist |Anthropic
Adversarial ML and LLM security researcher. Led the 16-Claude-agent C compiler experiment. Best paper awards at IEEE S&P, USENIX Security (2x), and ICML (3x).
Biography
Nicholas Carlini is a Research Scientist at Anthropic (since March 2025), previously at Google DeepMind (2023-2025) and Google Brain (2018-2023). He holds a Ph.D. and B.A. in Computer Science and Mathematics from UC Berkeley (advisor: David Wagner). His research sits at the intersection of machine learning and computer security, focusing on adversarial robustness, training-data extraction, membership inference, model stealing, and LLM safety. His papers have received best paper awards at IEEE S&P (1), USENIX Security (2), and ICML (3), and have been covered by the New York Times, BBC, Nature, Science, Wired, and Popular Science. In February 2026 he led the landmark experiment where 16 parallel Claude agents autonomously built a 100,000-line Rust-based C compiler capable of compiling the Linux kernel. On GitHub (carlini, 1.9k followers, 42 public repos) his projects range from foundational adversarial-ML tooling (nn_robust_attacks, obfuscated-gradients) to creative hacks (printf-tac-toe, regex-chess) and LLM evaluation (yet-another-applied-llm-benchmark).
Foundational adversarial robustness evaluation methodology for neural networks. The C&W attack became the standard benchmark for evaluating adversarial defenses. Best Student Paper at IEEE S&P 2017.
Landmark ICML 2018 Best Paper showing that most adversarial example defenses at ICLR 2018 relied on gradient masking and could be circumvented. 909 stars on GitHub.
Series of papers demonstrating that language models leak individual training examples, including PII. Showed adversaries can extract gigabytes of training data from production models like GPT-2 and ChatGPT.
ICML 2024 Best Paper demonstrating model-stealing attacks on production LLMs, recovering embedding layers from ChatGPT for under $20.
Led experiment where 16 parallel Claude agents autonomously wrote a 100,000-line Rust-based C compiler capable of compiling Linux 6.9, achieving 99% pass rate on GCC torture tests. Cost ~$20K over ~2,000 sessions.
Evaluation benchmark for language models on practical problem-solving tasks Carlini has previously encountered. 1,049 stars on GitHub.
Rust tool for text dataset deduplication, supporting the ACL 2022 paper showing near-duplicate removal reduces memorization by 10x. 1,300 stars (archived).
Reference implementation of robust evasion attacks against neural networks for finding adversarial examples. 860 stars on GitHub.
The science of security and flaws in machine learning should be open by default.
Ensuring that the benefits of LLMs outweigh the risks is literally your job.
It's worth being really careful and honest with yourself about whether or not what you're doing is actually net positive.
The LLMs that we're developing are already sufficiently advanced that we probably would have called them science fiction just five or ten years ago.
We are building new AI systems because they are ruthlessly efficient, and this efficiency directly enables harms previously constrained by human limitations.
Agent teams show the possibility of implementing entire, complex projects autonomously.
Research generated March 19, 2026