Connor Holmes

Connor Holmes is a systems researcher specializing in high-performance GPU computing for deep learning training and inference. He holds a PhD in High Performance Computing and a BS in Electrical and Electronics Engineering, both from the Colorado School of Mines (2016-2022). During his PhD he published GRNN, a GPU-based RNN inference library achieving up to 17.5x speedups over CPU baselines, at EuroSys 2019. He joined Microsoft Research full-time in June 2022 after multiple internships (2019-2021) and became a core member of the DeepSpeed team, co-authoring key papers including DeepSpeed-FastGen (lead author), ZeRO++, DeepSpeed-Chat, Random-LTD, and the DeepSpeed4Science Initiative. His DeepSpeed-FastGen system introduced Dynamic SplitFuse, delivering up to 2.3x higher throughput and 3.7x lower tail latency for LLM serving. In December 2023 he moved to OpenAI as a researcher and became the Systems Lead for Sora, OpenAI's video generation model, where he has supported the launches of both Sora and Sora 2.

Distributed Training SystemsLLM Inference OptimizationGPU Kernel EngineeringCommunication-Efficient Training (ZeRO++)RLHF Training SystemsVideo Generation InfrastructureToken Dropping for Efficient TrainingHigh-Performance RNN InferenceQuantization KernelsScientific AI Systems

Timeline

14 Research14 total

2025

2025-12Research

Supported the launch of Sora 2 at OpenAI

2024

2024-01Research

Published DeepSpeed-FastGen (lead author): introduced Dynamic SplitFuse for up to 2.3x higher throughput and 3.7x lower tail latency in LLM serving

2024-02Research

Supported the launch of OpenAI Sora as Systems Lead, enabling up to 1-minute 1080p video generation

2023

2023-06Research

Co-authored ZeRO++: Extremely Efficient Collective Communication for Giant Model Training, contributing high-performance quantization kernels that reduced communication volume by 4x

2023-08Research

Co-authored DeepSpeed-Chat: Easy, Fast and Affordable RLHF Training of ChatGPT-like Models at All Scales

2023-10Research

Co-authored DeepSpeed4Science Initiative: Enabling Large-Scale Scientific Discovery through Sophisticated AI System Technologies

2023-12Research

Left Microsoft Research; joined OpenAI as a Researcher

2022

2022-05Research

Completed PhD in High Performance Computing from Colorado School of Mines

2022-06Research

Joined Microsoft Research full-time as a Researcher on the DeepSpeed team

2022-11Research

Co-authored Random-LTD: Random and Layerwise Token Dropping Brings Efficient Training for Large-scale Transformers

2022-11Research

Contributed Stable Diffusion inference enhancements to DeepSpeed via custom CUDA kernels

2019

2019-02Research

Started first internship at Microsoft, beginning a series of internships with the DeepSpeed team (Feb 2019 - Jul 2021)

2019-03Research

Published GRNN: Low-Latency and Scalable RNN Inference on GPUs at EuroSys 2019 in Dresden, Germany, achieving up to 17.5x speedup over CPU and 9x over GPU baselines

2016

2016-08Research

Began undergraduate and graduate studies in High Performance Computing at Colorado School of Mines

Key Contributions

DeepSpeed-FastGen

Lead author of DeepSpeed-FastGen, a high-throughput LLM text generation system combining DeepSpeed-MII and DeepSpeed-Inference. Introduced Dynamic SplitFuse, a novel prompt and generation composition strategy delivering up to 2.3x higher effective throughput, 2x lower average latency, and 3.7x lower tail latency compared to existing systems.

ZeRO++ Quantization Kernels

Designed and implemented high-performance CUDA quantization kernels for ZeRO++, enabling 4x reduction in communication volume for distributed LLM training. The block-quantization approach was 3x more accurate and 5x faster than basic quantization, yielding up to 2.16x better throughput at 384 GPU scale.

DeepSpeed-Chat RLHF System

Co-authored the DeepSpeed-Chat system that democratized RLHF training of ChatGPT-like models at all scales, combining training and inference optimizations to enable training of models with hundreds of billions of parameters at a fraction of the cost.

GRNN: GPU-Based RNN Inference Library

Created GRNN during his PhD at Colorado School of Mines, a CUDA library for low-latency, scalable RNN inference on GPUs. Achieved up to 17.5x speedup over state-of-the-art CPU inference and 9x over GPU inference libraries through novel data reorganization, thread mapping, and performance modeling. Published at EuroSys 2019.

OpenAI Sora Systems Infrastructure

Served as Systems Lead for Sora at OpenAI, building the systems infrastructure enabling generation of up to 1-minute 1080p videos. Supported the launches of both Sora (Feb 2024) and Sora 2 (Dec 2025).

DeepSpeed4Science Initiative

Co-authored the DeepSpeed4Science initiative enabling large-scale scientific discovery, addressing challenges like memory explosion in protein-structure prediction (Evoformer) and very-long sequence support for virus evolutionary landscape modeling.

Notable Quotes

“

Excited to share Sora today from @OpenAI !!! Sora can generate up to a minute of 1080p video. @_tim_brooks and @billpeeb have been pushing hard on this and it's been wonderful supporting their work!

X (Twitter), Feb 2024·Source

“

So proud of @LiLiunian, @DmytroOk, Avi and the rest of the team. @billpeeb has an incredible vision. Best place to work in the world.

X (Twitter), on Sora 2 launch·Source

13 sources(click to expand)

Connor Holmes GitHub Profile (cmikeh2)Connor Holmes on X (@cmikeh2)Connor Holmes - Researcher at OpenAI | The Org Connor Holmes - Google Scholar Connor Holmes' research works | ResearchGate DeepSpeed-FastGen: High-throughput Text Generation for LLMs via MII and DeepSpeed-Inference (arXiv)ZeRO++: Extremely Efficient Collective Communication for Giant Model Training (arXiv)DeepSpeed-Chat: Easy, Fast and Affordable RLHF Training (arXiv)DeepSpeed4Science Initiative (arXiv)Random-LTD: Random and Layerwise Token Dropping (arXiv)GRNN: Low-Latency and Scalable RNN Inference on GPUs (ACM)DeepSpeed ZeRO++ Blog - Microsoft Research DeepSpeed GitHub Repository

Research generated March 19, 2026

AI Infrastructure & Inference/Connor Holmes

All Profiles