Run a local LLM via Llama.cpp
Last updated: August 20, 2025This guide demonstrates how to install and run a large language model (LLM) on your local workstation via Llama.cpp and, optionally, Open WebUI.
Choose a Model
There are many excellent models to choose from, but ultimately the optimal choice depends on your target use case as well your available system resources.
For text generation — as opposed to image/video/voice — Hugging Face offers a fairly comprehensive list of popular models.
For the purposes of this guide, we are going to use DeepSeek R1 0528 8B, a model with 8 billion parameters that when downloaded needs about five gigabytes of disk space.
Install Llama.cpp
As noted in the relevant installation documentation, Llama.cpp can be installed on Linux and MacOS via the Homebrew package manager:
brew install llama.cpp
Alternatively, you can download a pre-built binary from the releases page, or if you prefer, build from source.
Run Llama.cpp Server
Once Llama.cpp is installed, we can run its built-in server on any arbitrary port except for 8080, which we will reserve for Open WebUI. Upon first invocation, the Llama.cpp server will download and run the specified model:
llama-server --port 8888 -hf unsloth/DeepSeek-R1-0528-Qwen3-8B-GGUF:Q4_K_XL
The Llama.cpp server should now be running on the specified port. Visit https://localhost:8888 in your browser to load the web interface and start submitting prompts.
Open WebUI (optional)
While the web interface provided by llama-server is great for accessing models managed by Llama.cpp, Open WebUI is a web front-end that makes it easy to access non-Llama.cpp-managed models as well as remote LLMs. To install it, I suggest using Pipx (in a separate terminal tab):
pipx install open-webui
If you prefer uv, the corresponding installation command would be:
uv tool install open-webui
Start Open WebUI:
open-webui serve
Open your browser and visit: https://localhost:8080
Tap “Get Started”, and enter any data you want for for name, email, and password. This data is used for your local account, so you can enter arbitrary dummy data if you prefer.
(Note: Open WebUI claims to not make any external connections, but on startup I noticed that it connected to HuggingFace, which personally I do not mind since that is where many models are located. Upon account creation, however, Open WebUI also tried to connect to OpenAI and GitHub, both of which I blocked at the network level. I imagine these are innocent and useful to use this tool, but I blocked them since I do not need connections to those services.)
Open WebUI Configuration
User avatar icon > Admin Panel > Connections > (disable OpenAPI and Ollama APIs)
User avatar icon > Settings > Connections > + (Add Connection)
URL: http://127.0.0.1:8888/v1
Key: (leave blank)
Tap the Save button and close the Settings modal.
Select a model from the top navigation if not already selected. Select Temporary Chat if you want to test or otherwise avoid queries/responses from being logged.
Tap on the field labeled with “How can I help you today?” and enter your prompt to query the LLM.
For more information about using Open WebUI with Llama.cpp, refer to the related documentation.
Summary
Now you can run LLMs on your own workstation. If you have any questions, comments, or suggestions, please reach out via the Fediverse!