Research Scientist |ByteDance
Co-creator of Mixup (11K+ citations), one of the most influential data augmentation techniques in deep learning. PhD from MIT under Suvrit Sra on Riemannian optimization. Created Fixup Initialization for training deep ResNets without normalization. Now leads Monetization GenAI research at ByteDance.
Biography
Hongyi Zhang is a machine learning researcher and co-creator of Mixup, one of the most influential data augmentation techniques in deep learning (11,000+ citations). He earned his PhD in Brain and Cognitive Sciences from MIT in 2019 under Suvrit Sra at LIDS, with a thesis on non-convex optimization and learning covering Riemannian optimization on manifolds and deep neural network training. His research at Facebook AI Research (FAIR) produced Mixup (ICLR 2018) and Fixup Initialization (ICLR 2019). He holds a BS in Machine Intelligence from Peking University (2009-2013). Since 2019, he has been a Research Scientist at ByteDance, working on Monetization GenAI, LLM post-training, reinforcement learning for LLMs, and the ByteBrain AI-for-Infrastructure platform. His team co-developed ChatTS, a time-series multimodal LLM accepted to VLDB 2025.
Introduced Mixup, a data augmentation technique that trains neural networks on convex combinations of pairs of examples and their labels. With 11,000+ citations, it became one of the most influential regularization methods in deep learning, improving generalization, adversarial robustness, and GAN training stability.
Proposed Fixup (Fixed-update Initialization), enabling training of very deep residual networks (up to 10,000 layers) without batch normalization by properly rescaling standard initialization. Achieved state-of-the-art performance in image classification and machine translation.
Developed the first variance-reduced stochastic optimization method for Riemannian manifolds, achieving global linear convergence rates for geodesically strongly convex functions.
Adapted the SPIDER algorithm to Riemannian manifolds, achieving curvature-independent convergence rates for both nonconvex and strongly convex optimization problems.
Co-developed with Tsinghua University at ByteDance, ChatTS is the first multimodal LLM that takes multivariate time series as input for understanding and reasoning, achieving 46% improvement in alignment and 25.8% improvement in reasoning over GPT-4o. Accepted to VLDB 2025.
Open-source reference implementation of the Mixup training method with code for CIFAR and GAN experiments. 469 stars on GitHub.
mixup extends the training distribution by incorporating the prior knowledge that linear interpolations of feature vectors should lead to linear interpolations of targets.
mixup regularizes the neural network to favor simple linear behavior in-between training examples.
mixup can be implemented in a few lines of code, and introduces minimal computation overhead.
Research generated March 19, 2026