ML Engineer & Core Maintainer |Hugging Face
Core maintainer of HuggingFace Transformers with 1,370+ merged commits. Added TFViT, TFCLIP, FlaxVisionEncoderDecoder, and Kosmos-2. Owns CI/CD testing infrastructure.
Biography
Yih-Dar Shieh is a Machine Learning Engineer at Hugging Face, where he has been a core maintainer of the Transformers library since February 2022. With over 1,370 merged commits and 1,530+ pull requests in huggingface/transformers, he is one of the most prolific contributors to the project. His work spans three major areas: adding vision and multimodal model implementations (TFViT, TFCLIPModel, FlaxVisionEncoderDecoderModel, Kosmos-2), building and maintaining CI/CD testing infrastructure (tiny model creation, PR comment CI, failure reporting, CircleCI and GitHub Actions workflows), and ensuring cross-framework parity between PyTorch, TensorFlow, and JAX/Flax models. Before joining Hugging Face, he was an AI Engineer at Biggerpan (2018-2021) working on NLP intent/entity classification. He holds a Ph.D. in Mathematics (number theory) from Aix-Marseille University (2015), with a dissertation on 'Arithmetic Aspects of Point Counting and Frobenius Distributions' supervised by David Kohel and Gilles Lachaud, and an engineering degree in Computer Science from Polytech Marseille (2018). He is based in Paris, France.
Implemented Microsoft's Kosmos-2 grounding multimodal LLM in Hugging Face Transformers (v4.35), enabling object-level image-text interaction via bounding boxes. Acknowledged by Microsoft for the HuggingFace implementation and online demo.
Added TFViTModel, TFCLIPModel, TFVisionEncoderDecoderModel, and FlaxVisionEncoderDecoderModel to Transformers, bringing vision and multimodal capabilities to TensorFlow and JAX/Flax frameworks.
Built and maintains the CI testing infrastructure for huggingface/transformers: PR comment CI feedback, new failure reporting, CircleCI and GitHub Actions workflows, tiny model creation scripts, and cross-framework equivalence tests.
Co-authored the influential Hugging Face blog post identifying and fixing a bug where gradient accumulation was not mathematically equivalent to full batch training across popular ML frameworks.
Fixed critical upsampling, downsampling, and cross-attention bugs in the Hugging Face Diffusers library's core architecture.
Systematically fixed discrepancies between PyTorch, TensorFlow, and Flax model implementations across dozens of model architectures, including loss calculation, hidden states, and attention outputs.
Created the Flax image captioning example and published the ViT-GPT2 proof-of-concept model for the FlaxVisionEncoderDecoder framework, demonstrating vision-language generation.
Today is my 1st day at Hugging Face open source team! My main focus is on the reliability of the ecosystem, the testing and the production readiness - tools that are used & loved by a large community and +10000 organizations.
Very proud (and surprised) to see the demo of my work on (Flax) Vision Encoder Decoder model being featured.
Microsoft KOSMOS-2 model is now available in Transformers v 4.35! It is a grounding multimodal large language model (MLLM) which enables interacting the text and image at the object level (via bounding boxes).
Joining an open source startup has converted me from a silent user to an active contributor.
Research generated March 19, 2026