Systems Lead, Sora |OpenAI
Core DeepSpeed contributor at Microsoft (FastGen, ZeRO++, DeepSpeed-Chat). PhD in HPC from Colorado School of Mines. Now Systems Lead for Sora at OpenAI.
Biography
Connor Holmes is a systems researcher specializing in high-performance GPU computing for deep learning training and inference. He holds a PhD in High Performance Computing and a BS in Electrical and Electronics Engineering, both from the Colorado School of Mines (2016-2022). During his PhD he published GRNN, a GPU-based RNN inference library achieving up to 17.5x speedups over CPU baselines, at EuroSys 2019. He joined Microsoft Research full-time in June 2022 after multiple internships (2019-2021) and became a core member of the DeepSpeed team, co-authoring key papers including DeepSpeed-FastGen (lead author), ZeRO++, DeepSpeed-Chat, Random-LTD, and the DeepSpeed4Science Initiative. His DeepSpeed-FastGen system introduced Dynamic SplitFuse, delivering up to 2.3x higher throughput and 3.7x lower tail latency for LLM serving. In December 2023 he moved to OpenAI as a researcher and became the Systems Lead for Sora, OpenAI's video generation model, where he has supported the launches of both Sora and Sora 2.
Lead author of DeepSpeed-FastGen, a high-throughput LLM text generation system combining DeepSpeed-MII and DeepSpeed-Inference. Introduced Dynamic SplitFuse, a novel prompt and generation composition strategy delivering up to 2.3x higher effective throughput, 2x lower average latency, and 3.7x lower tail latency compared to existing systems.
Designed and implemented high-performance CUDA quantization kernels for ZeRO++, enabling 4x reduction in communication volume for distributed LLM training. The block-quantization approach was 3x more accurate and 5x faster than basic quantization, yielding up to 2.16x better throughput at 384 GPU scale.
Co-authored the DeepSpeed-Chat system that democratized RLHF training of ChatGPT-like models at all scales, combining training and inference optimizations to enable training of models with hundreds of billions of parameters at a fraction of the cost.
Created GRNN during his PhD at Colorado School of Mines, a CUDA library for low-latency, scalable RNN inference on GPUs. Achieved up to 17.5x speedup over state-of-the-art CPU inference and 9x over GPU inference libraries through novel data reorganization, thread mapping, and performance modeling. Published at EuroSys 2019.
Served as Systems Lead for Sora at OpenAI, building the systems infrastructure enabling generation of up to 1-minute 1080p videos. Supported the launches of both Sora (Feb 2024) and Sora 2 (Dec 2025).
Co-authored the DeepSpeed4Science initiative enabling large-scale scientific discovery, addressing challenges like memory explosion in protein-structure prediction (Evoformer) and very-long sequence support for virus evolutionary landscape modeling.
Excited to share Sora today from @OpenAI !!! Sora can generate up to a minute of 1080p video. @_tim_brooks and @billpeeb have been pushing hard on this and it's been wonderful supporting their work!
So proud of @LiLiunian, @DmytroOk, Avi and the rest of the team. @billpeeb has an incredible vision. Best place to work in the world.
Research generated March 19, 2026