Creator of llama.cpp & GGML |Hugging Face
Bulgarian engineer who democratized local LLM inference with llama.cpp and the GGML tensor library, enabling AI on consumer hardware.
Biography
Georgi Gerganov is a Bulgarian software engineer based in Sofia, best known as the creator of llama.cpp and the GGML tensor library, which together democratized local LLM inference on consumer hardware. He first gained prominence with whisper.cpp, a pure C/C++ port of OpenAI's Whisper speech-to-text model, before releasing llama.cpp in March 2023 to run Meta's LLaMA models without heavy dependencies. In 2023 he founded ggml.ai with pre-seed funding from Nat Friedman and Daniel Gross, and in February 2026 he and the ggml team joined Hugging Face while retaining full autonomy over the llama.cpp project. His work emphasizes minimal dependencies, integer quantization, and cross-platform efficiency, enabling AI inference on everything from Raspberry Pis to server GPUs.
LLM inference engine in pure C/C++ with no dependencies, supporting quantization from 2-bit to 8-bit, multiple GPU backends (Metal, CUDA, Vulkan), and running on commodity hardware. Over 91,000 GitHub stars and 14,000 forks.
Minimalist C tensor library for machine learning that enables large model inference on commodity hardware with integer quantization, zero runtime memory allocations, and no third-party dependencies. Released under MIT license.
Pure C/C++ port of OpenAI's Whisper automatic speech recognition model, enabling on-device speech-to-text without Python dependencies. Pioneered the approach later applied to LLMs with llama.cpp.
Binary file format for storing quantized ML models, supporting 2-bit to 8-bit integer quantization, float16, bfloat16, float32, and 1.58-bit quantization. Replaced the original GGML format for better architecture extensibility.
Company founded in 2023 to support open-source ML inference development, backed by Nat Friedman and Daniel Gross. Acquired by Hugging Face in February 2026.
Research generated March 19, 2026