← All repos84k
Models & inferencePython
vLLM
vllm-project/vllm
High-throughput serving engine for LLMs.
A fast, memory-efficient inference and serving engine using PagedAttention — the production server behind many open-model deployments.
vllm-project/vllm
High-throughput serving engine for LLMs.
A fast, memory-efficient inference and serving engine using PagedAttention — the production server behind many open-model deployments.