Research Engineer |Mistral AI
Creator of HuggingFace Diffusers, the leading open-source library for diffusion models. Former core maintainer of HF Transformers. Now building multimodal models at Mistral AI.
Biography
Patrick von Platen is a Research Engineer at Mistral AI, formerly at Hugging Face, where he was the core developer and creator of the Diffusers library -- the leading open-source toolbox for state-of-the-art diffusion models powering image, video, and audio generation in PyTorch (33k+ GitHub stars). He holds a PhD in Computer Engineering from the University of Cambridge (2019) and previously worked at Uber AI and RWTH Aachen University. At Hugging Face he was also a core maintainer of the Transformers library, specializing in encoder-decoder models, long-range sequence modeling, and speech recognition (Wav2Vec2). He has published influential papers on cross-lingual speech representation learning (XLS-R), speech benchmarking (XTREME-S, ESB), and knowledge distillation (Distil-Whisper, LCM-LoRA). At Mistral AI he co-authored the Pixtral 12B multimodal model paper. His work consistently focuses on democratizing advanced ML through open-source software and community-driven development.
Created and led development of the Diffusers library -- the go-to open-source toolbox for state-of-the-art pretrained diffusion models for image, video, audio, and 3D generation in PyTorch. Over 33,000 GitHub stars.
Large-scale self-supervised model trained on 128 languages using 500K hours of speech data, achieving state-of-the-art results on speech translation, recognition, and language identification tasks.
Contributed the Wav2Vec2 model to Hugging Face Transformers and authored tutorials that made self-supervised speech recognition accessible to the broader community.
Co-authored at Mistral AI -- a 12-billion-parameter multimodal language model trained to understand both natural images and documents, with a novel vision encoder.
Robust knowledge distillation approach for speech recognition via large-scale pseudo labelling, enabling faster and smaller Whisper models.
Universal Stable Diffusion acceleration module that dramatically reduces inference steps while maintaining quality, enabling real-time image generation.
Evaluation benchmark covering 102 languages across speech recognition, classification, translation, and retrieval tasks for cross-lingual speech representations.
Co-authored the Datasets community library for NLP, providing 650+ datasets and tools for efficient data loading and processing, published at EMNLP 2021.
Most recent research on diffusion models like DALL-E 2 and Imagen have not been made accessible and remain behind closed doors of large tech companies. This is why, at Hugging Face, we decided to build and open-source Diffusers.
We want Diffusers to be built by and for the community -- if you want to build the future of the hottest ML models, come join us!
The power of open-source is impressive -- Diffusers is one month old and has over 20 external contributors.
The speed at which Diffusion models are getting better and faster is mind-blowing.
Research generated March 19, 2026