llama.cpp - 开源替代 OpenAI API

📖 项目简介

C++ 实现的高性能 LLM 推理引擎，支持在消费级硬件上运行大模型。

🔗 GitHub 项目地址

🔄 可替代的商用软件

OpenAI API

📝 项目原文介绍（英文）

llama.cpp Manifesto / ggml / ops / maintainer PRs%20sort%3Aupdated-desc) LLM inference in C/C++ Recent API changes - Changelog for libllama API - Changelog for llama-server REST API Hot topics - Hugging Face cache migration: models downloaded with -hf are now stored in the standard Hugging Face cache directory, enabling sharing with other HF tools. - guide : using the new WebUI of llama.cpp - guide : running gpt-oss with llama.cpp - [[FEEDBACK] Better packaging for llama.cpp to support downstrea

本地部署 LLM推理

💬 社区讨论

📌 I have implemented speculative drafting via TCP/IP. 50%+ speedup of main model. Any interest in this for llama.cpp? — 由 xkmire 发布于 2026-07-15

📌 Getting 1.09 t/s on gtx 1650 is normal ???? — 由 Satyam1Vishwakarma 发布于 2026-07-15

📌 Testing Improvements — 由 am17an 发布于 2026-07-15

📌 Towards Reconfigurable llama-server Runtimes — 由 wadealexc 发布于 2026-07-14

📌 Struggling to get Vulkan to work for 285hx iGPU — 由 StudioNirin 发布于 2026-07-13

⚠️ 免责声明：本文内容整理自 GitHub 开源社区，旨在分享和介绍优秀的开源替代方案。