Best Open-Source LLM Inference Servers on GitHub (2026)

Q: What is the best open-source llm inference servers on GitHub?

Based on the RepoRadar scoring engine, jundot/omlx is currently the top-ranked open-source llm inference servers with 16.8k stars and a score of 85/100.

Q: When was this open-source llm inference servers list last updated?

This list was last updated on 2026-06-19. Data is sourced directly from GitHub's public API.

This list contains the top 12 open-source llm inference servers on GitHub, ranked by the RepoRadar scoring engine across five quality dimensions. The top-ranked repo is jundot/omlx with 16.8k stars. Projects span written in Python, Shell, Go, C++. Data last updated 2026-06-19.

Updated 2026-06-19 · 12 repos · Data: GitHub public API

Refine live →

LLM inference server with continuous batching & SSD caching for Apple Silicon — managed from the macOS menu bar

★16.8k ⑂ 1,423 ⚠ 574 Python updated yesterday

apple-siliconinference-serverllmmacos

Turn your PC, Mac, or Linux box into an AI server. LLM inference, chat UI, voice, agents, workflows, RAG, and image generation.

★2.1k ⑂ 319 ⚠ 52 Shell updated 20 hours ago

ai-agentsamdcomfyuidocker

csghub-server is the backend server for CSGHub which helps user to manage datasets, modes, and also run Model Inference, Finetune and Application Spaces.

★1.1k ⑂ 231 ⚠ 7 Go updated yesterday

aidatasetsgolanghuggingface

All Results

Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs

★3.8k ⑂ 320 ⚠ 180 Python updated 21 days ago

fine-tuninggptllamallm

Fast LLM speculative inference server for consumer hardware.

★2.6k ⑂ 240 ⚠ 52 C++ updated 19 hours ago

cudacuda-kernelsdflashkernel

Open-source inference server and production cluster for all the models your agent needs.

★2.1k ⑂ 183 ⚠ 12 Python updated 2 days ago

bgecolbertdata-pipelinedeep-learning

OpenAI and Anthropic compatible server for Apple Silicon. Run LLMs and vision-language models (Llama, Qwen-VL, LLaVA) with continuous batching, MCP tool calling, and multimodal su…

★1.4k ⑂ 188 ⚠ 64 Python updated 6 days ago

anthropicapple-siliconaudio-processingclaude-code

⚡ Native MLX Swift LLM inference server for Apple Silicon. OpenAI-compatible API, SSD streaming for 100B+ MoE models, TurboQuant KV cache compression, MACOS + iOS iPhone app.

★694 ⑂ 39 ⚠ 7 Swift updated 1 month ago

apple-siliinferenceiosllm

StatsPAI is the first agent-native Python library for causal inference and applied econometrics — unified API, broad cross-method coverage, structured result objects, machine-read…

★234 ⑂ 44 ⚠ 4 Python updated 12 hours ago

agent-nativeai-agentscausal-discoverycausal-inference

Efficent platform for inference and serving local LLMs including an OpenAI compatible API server.

★676 ⑂ 80 ⚠ 28 Rust updated 7 days ago

Open source, local, and self-hosted highly optimized language inference server supporting ASR/STT, TTS, and LLM across WebRTC, REST, and WS

★505 ⑂ 59 ⚠ 41 Python updated 4 months ago

cudadeep-learningllamallm

DIffbot LLM Inference Server

★240 ⑂ 27 ⚠ 1 Python updated 9 months ago

Frequently Asked Questions

What is the best open-source llm inference servers on GitHub?

Based on the RepoRadar scoring engine, jundot/omlx is currently the top-ranked option with 16.8k stars and a score of 85/100.

How are open-source llm inference servers repositories ranked?

Repositories are ranked by the RepoRadar score — a composite of five dimensions: Popularity (35%), Freshness (25%), Maintenance (20%), Community (10%), and Completeness (10%). Scores range from 0–100.

When was this list last updated?

This list was last updated on 2026-06-19. Data is sourced directly from GitHub's public API. No cached or fabricated repositories are used.

→Kubernetes Dashboards →Golang Cli Frameworks →Google Maps Scrapers →Websocket Libraries