Best Open-Source LLM Inference Servers on GitHub (2026)

This list contains the top 12 open-source llm inference servers on GitHub, ranked by the RepoRadar scoring engine across five quality dimensions. The top-ranked repo is jundot/omlx with 16.8k stars. Projects span written in Python, Shell, Go, C++. Data last updated 2026-06-19.

Updated · 12 repos · Data: GitHub public API

Refine live →
1
jundot
jundot
omlx
85
Strong

LLM inference server with continuous batching & SSD caching for Apple Silicon — managed from the macOS menu bar

16.8k 1,423 574 Python updated yesterday
apple-siliconinference-serverllmmacos
2
Light-Heart-Labs
Light-Heart-Labs
DreamServer
84
Strong

Turn your PC, Mac, or Linux box into an AI server. LLM inference, chat UI, voice, agents, workflows, RAG, and image generation.

2.1k 319 52 Shell updated 20 hours ago
ai-agentsamdcomfyuidocker
3
OpenCSGs
OpenCSGs
csghub-server
84
Strong

csghub-server is the backend server for CSGHub which helps user to manage datasets, modes, and also run Model Inference, Finetune and Application Spaces.

1.1k 231 7 Go updated yesterday
aidatasetsgolanghuggingface
All Results
predibase
predibase
lorax
82
Strong

Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs

3.8k 320 180 Python updated 21 days ago
fine-tuninggptllamallm
Luce-Org
Luce-Org
lucebox-hub
82
Strong

Fast LLM speculative inference server for consumer hardware.

2.6k 240 52 C++ updated 19 hours ago
cudacuda-kernelsdflashkernel
superlinked
superlinked
sie
81
Strong

Open-source inference server and production cluster for all the models your agent needs.

2.1k 183 12 Python updated 2 days ago
bgecolbertdata-pipelinedeep-learning
waybarrios
waybarrios
vllm-mlx
77
Strong

OpenAI and Anthropic compatible server for Apple Silicon. Run LLMs and vision-language models (Llama, Qwen-VL, LLaVA) with continuous batching, MCP tool calling, and multimodal su…

1.4k 188 64 Python updated 6 days ago
anthropicapple-siliconaudio-processingclaude-code
SharpAI
SharpAI
SwiftLM
74
Solid

⚡ Native MLX Swift LLM inference server for Apple Silicon. OpenAI-compatible API, SSD streaming for 100B+ MoE models, TurboQuant KV cache compression, MACOS + iOS iPhone app.

694 39 7 Swift updated 1 month ago
apple-siliinferenceiosllm
brycewang-stanford
brycewang-stanford
StatsPAI
74
Solid

StatsPAI is the first agent-native Python library for causal inference and applied econometrics — unified API, broad cross-method coverage, structured result objects, machine-read…

234 44 4 Python updated 12 hours ago
agent-nativeai-agentscausal-discoverycausal-inference
EricLBuehler
EricLBuehler
candle-vllm
72
Solid

Efficent platform for inference and serving local LLMs including an OpenAI compatible API server.

676 80 28 Rust updated 7 days ago
toverainc
64
Solid

Open source, local, and self-hosted highly optimized language inference server supporting ASR/STT, TTS, and LLM across WebRTC, REST, and WS

505 59 41 Python updated 4 months ago
cudadeep-learningllamallm
diffbot
49
Mixed

DIffbot LLM Inference Server

240 27 1 Python updated 9 months ago

Frequently Asked Questions

What is the best open-source llm inference servers on GitHub?

Based on the RepoRadar scoring engine, jundot/omlx is currently the top-ranked option with 16.8k stars and a score of 85/100.

How are open-source llm inference servers repositories ranked?

Repositories are ranked by the RepoRadar score — a composite of five dimensions: Popularity (35%), Freshness (25%), Maintenance (20%), Community (10%), and Completeness (10%). Scores range from 0–100.

When was this list last updated?

This list was last updated on 2026-06-19. Data is sourced directly from GitHub's public API. No cached or fabricated repositories are used.

Kubernetes Dashboards Golang Cli Frameworks Google Maps Scrapers Websocket Libraries