Georgi Gerganov

Biography

Georgi Gerganov is a Bulgarian software engineer based in Sofia, best known as the creator of llama.cpp and the GGML tensor library, which together democratized local LLM inference on consumer hardware. He first gained prominence with whisper.cpp, a pure C/C++ port of OpenAI's Whisper speech-to-text model, before releasing llama.cpp in March 2023 to run Meta's LLaMA models without heavy dependencies. In 2023 he founded ggml.ai with pre-seed funding from Nat Friedman and Daniel Gross, and in February 2026 he and the ggml team joined Hugging Face while retaining full autonomy over the llama.cpp project. His work emphasizes minimal dependencies, integer quantization, and cross-platform efficiency, enabling AI inference on everything from Raspberry Pis to server GPUs.

LLM Inference in C/C++GGML Tensor LibraryModel Quantization (GGUF)On-Device AI / Edge Inferencewhisper.cpp / Speech-to-TextCross-Platform ML (Metal, Vulkan, CUDA)Open-Source AI InfrastructureLocal-First AIFlashAttention ImplementationHardware-Optimized Computing

Timeline

10 Research10 total

2026

2026-02Research

GGML and llama.cpp team joined Hugging Face, maintaining full autonomy and 100% open-source commitment

2024

2024-01Research

Added Vulkan, SYCL, and expanded Metal/CUDA backend support for broader hardware compatibility

2024-04Research

Merged FlashAttention support into llama.cpp, significantly improving inference performance

2023

2023-01Research

Founded ggml.ai with pre-seed funding from Nat Friedman and Daniel Gross to advance GGML development

2023-03Research

Released llama.cpp, implementing Meta's LLaMA inference in pure C/C++ with no dependencies

2023-03Research

Appeared on Changelog Interviews #532: 'Bringing Whisper and LLaMA to the masses'

2023-06Research

Published the llama.cpp / ggml roadmap for June 2023, outlining project direction

2023-08Research

Introduced the GGUF file format to replace GGML format, improving backwards compatibility for multiple model architectures

2022

2022-09Research

Started development of GGML, a C tensor library for machine learning with strict memory management and multi-threading

2022-10Research

Released whisper.cpp, a pure C/C++ port of OpenAI's Whisper speech-to-text model

Key Contributions

llama.cpp

LLM inference engine in pure C/C++ with no dependencies, supporting quantization from 2-bit to 8-bit, multiple GPU backends (Metal, CUDA, Vulkan), and running on commodity hardware. Over 91,000 GitHub stars and 14,000 forks.

GGML

Minimalist C tensor library for machine learning that enables large model inference on commodity hardware with integer quantization, zero runtime memory allocations, and no third-party dependencies. Released under MIT license.

whisper.cpp

Pure C/C++ port of OpenAI's Whisper automatic speech recognition model, enabling on-device speech-to-text without Python dependencies. Pioneered the approach later applied to LLMs with llama.cpp.

GGUF File Format

Binary file format for storing quantized ML models, supporting 2-bit to 8-bit integer quantization, float16, bfloat16, float32, and 1.58-bit quantization. Replaced the original GGML format for better architecture extensibility.

ggml.ai

Company founded in 2023 to support open-source ML inference development, backed by Nat Friedman and Daniel Gross. Acquired by Hugging Face in February 2026.

8 sources(click to expand)

ggerganov (Georgi Gerganov) - GitHub Profile llama.cpp - Wikipedia GGML and llama.cpp join HF to ensure the long-term progress of Local AI ggml.ai - Official Website Changelog Interviews #532 - Bringing Whisper and LLaMA to the masses with Georgi Gerganov llama.cpp GitHub Repository whisper.cpp GitHub Repository GitHub API - users/ggerganov

Research generated March 19, 2026

AI Infrastructure & Inference/Georgi Gerganov

All Profiles