Patrick von Platen

Patrick von Platen is a Research Engineer at Mistral AI, formerly at Hugging Face, where he was the core developer and creator of the Diffusers library -- the leading open-source toolbox for state-of-the-art diffusion models powering image, video, and audio generation in PyTorch (33k+ GitHub stars). He holds a PhD in Computer Engineering from the University of Cambridge (2019) and previously worked at Uber AI and RWTH Aachen University. At Hugging Face he was also a core maintainer of the Transformers library, specializing in encoder-decoder models, long-range sequence modeling, and speech recognition (Wav2Vec2). He has published influential papers on cross-lingual speech representation learning (XLS-R), speech benchmarking (XTREME-S, ESB), and knowledge distillation (Distil-Whisper, LCM-LoRA). At Mistral AI he co-authored the Pixtral 12B multimodal model paper. His work consistently focuses on democratizing advanced ML through open-source software and community-driven development.

Diffusion ModelsText-to-Image GenerationOpen-Source ML LibrariesStable DiffusionSpeech Recognition (ASR)Cross-Lingual Speech RepresentationsKnowledge DistillationTransformer ArchitecturesMultimodal Language ModelsCommunity-Driven AI Development

Timeline

15 Research15 total

2024

2024-01Research

Co-authored aMUSEd paper -- an open MUSE reproduction for efficient text-to-image generation

2024-10Research

Co-authored Pixtral 12B paper at Mistral AI -- a 12B-parameter multimodal language model for natural images and documents

2023

2023-01Research

Co-authored AfroDigits -- a community-driven spoken digit dataset covering 38 African languages, presented at AfricaNLP 2023

2023-04Research

Released Diffusers v0.15.0 'Beyond Image Generation' expanding the library to audio and 3D generation

2023-07Research

Announced Diffusers 0.19.0 with full SDXL (Stable Diffusion XL 1.0) support

2023-10Research

Co-authored LCM-LoRA paper -- a universal Stable Diffusion acceleration module achieving significant inference speedups

2023-11Research

Co-authored Distil-Whisper paper on robust knowledge distillation via large-scale pseudo labelling for speech recognition

2022

2022-01Research

Co-authored XLS-R paper on self-supervised cross-lingual speech representation learning at scale (128 languages, 500K hours), published at INTERSPEECH 2022

2022-01Research

Co-authored XTREME-S benchmark for evaluating cross-lingual speech representations across 102 languages, published at INTERSPEECH 2022

2022-05Research

Created the Hugging Face Diffusers repository on GitHub, initiating the open-source diffusion model library

2022-08Research

Co-authored 'Stable Diffusion with Diffusers' blog post -- the first public guide to using Stable Diffusion via the Diffusers library

2022-11Research

Delivered colloquium 'Generating Images and Audio With the Hugging Face Diffusers Library' at Cross-Modal Learning, Hamburg

2021

2021-03Research

Published 'Fine-Tune Wav2Vec2 for English ASR' blog post on Hugging Face, making speech recognition accessible to the community

2021-11Research

Co-authored 'Datasets: A Community Library for Natural Language Processing' paper, published at EMNLP 2021 Demos (650+ datasets, 250+ contributors)

2019

2019-01Research

Completed PhD in Computer Engineering at the University of Cambridge

Key Contributions

Hugging Face Diffusers

Created and led development of the Diffusers library -- the go-to open-source toolbox for state-of-the-art pretrained diffusion models for image, video, audio, and 3D generation in PyTorch. Over 33,000 GitHub stars.

XLS-R: Cross-Lingual Speech Representations at Scale

Large-scale self-supervised model trained on 128 languages using 500K hours of speech data, achieving state-of-the-art results on speech translation, recognition, and language identification tasks.

Wav2Vec2 Integration in HF Transformers

Contributed the Wav2Vec2 model to Hugging Face Transformers and authored tutorials that made self-supervised speech recognition accessible to the broader community.

Pixtral 12B

Co-authored at Mistral AI -- a 12-billion-parameter multimodal language model trained to understand both natural images and documents, with a novel vision encoder.

Distil-Whisper

Robust knowledge distillation approach for speech recognition via large-scale pseudo labelling, enabling faster and smaller Whisper models.

LCM-LoRA

Universal Stable Diffusion acceleration module that dramatically reduces inference steps while maintaining quality, enabling real-time image generation.

XTREME-S Benchmark

Evaluation benchmark covering 102 languages across speech recognition, classification, translation, and retrieval tasks for cross-lingual speech representations.

HF Datasets Library

Co-authored the Datasets community library for NLP, providing 650+ datasets and tools for efficient data loading and processing, published at EMNLP 2021.

Notable Quotes

“

Most recent research on diffusion models like DALL-E 2 and Imagen have not been made accessible and remain behind closed doors of large tech companies. This is why, at Hugging Face, we decided to build and open-source Diffusers.

Hugging Face Diffusers announcement·Source

“

We want Diffusers to be built by and for the community -- if you want to build the future of the hottest ML models, come join us!

LinkedIn post on Diffusers community·Source

“

The power of open-source is impressive -- Diffusers is one month old and has over 20 external contributors.

LinkedIn post on Diffusers growth·Source

“

The speed at which Diffusion models are getting better and faster is mind-blowing.

LinkedIn post on real-time image translation·Source

“

Diffusion models can do more than just image!

LinkedIn post on Diffusers v0.15.0·Source

12 sources(click to expand)

Patrick von Platen - GitHub Profile (2.1k followers, 99 repos)Patrick von Platen - Hugging Face Profile (268 models, 12 papers, 819 followers)Patrick von Platen - Mistral AI | LinkedIn Hugging Face Diffusers - GitHub (33k+ stars)Stable Diffusion with Diffusers - Hugging Face Blog (Aug 2022)Colloquium: Generating Images and Audio With the HF Diffusers Library (Nov 2022)Patrick von Platen - Google Scholar (16,993 citations)Patrick von Platen - DBLP Bibliography Patrick von Platen - DeepAI Profile Pixtral 12B - arXiv (Oct 2024)Patrick von Platen - Research Engineer at Mistral AI | The Org Fine-Tune Wav2Vec2 for English ASR - Hugging Face Blog (Mar 2021)

Research generated March 19, 2026

Builders & Technical Leaders/Patrick von Platen

All Profiles