Local LLMsC++

llama.cpp

ggml-org/llama.cpp

118k

The engine that put LLMs on laptops and phones.

High-performance LLM inference in plain C/C++ with quantization — the runtime under much of the local-model ecosystem, from laptops to Raspberry Pis.