HOME
ABOUT
- RESULTS
- differences
- BENEFITS
- HISTORY
- TEAM
- LOCATION
- FACILITIES
- BANKING
- MEMBERSHIPS
- APPROVALS
- LICENCES
- SUPPLIERS
- SPONSORSHIPS
- MEDIA
- PRIVACY
AUCTIONS
SHIPPING
FEES
- TS REWARDS
TOOLS
guides
FAQ
CONTACT
- CONNECT

VEHICLES
BRAND
- JAPANESE CARS
  - DAIHATSU
  - EUNOS
  - FORD
  - HONDA
  - ISUZU
  - LEXUS
  - MAZDA
  - MITSUBISHI
  - MITSUOKA
  - NISSAN
  - SUBARU
  - SUZUKI
  - TOYOTA
- GERMAN CARS
- AMERICAN CARS
- BRITISH CARS
- ITALIAN CARS
- FRENCH CARS
- SWEDISH CARS
- KOREAN CARS
TYPE
- mobility
- VENDING
- instruction
- TAXIS
- AMBULANCES
- FIRE ENGINES
- HEARSES
- LIMOUSINES
- COMMERCIAL
CLASS
FUEL
TRUCKS
minitrucks
- DAIHATSU
- HONDA
- MAZDA
- MITSUBISHI
- NISSAN
- SUBARU
- SUZUKI
- DUMP
- CRANE
- CAMPER
- REFRIGERATED
- 4WD
- NEW
BUSES
MOTORHOMES
- YAHOO!
- RAKUTEN
- DEALER

PARTS
- FREE REPORT
- PARTS CONTAINERS
- PARTS SYSTEMS
- PARTS PROTECTION
- BODY SHELLS
- DISMANTLING
- ONLINE PARTS
- NEW PARTS
- INTERIOR PARTS
- EXTERIOR PARTS
  - BONNETS
  - BUMPERS
  - GRILLES
  - FENDERS
  - DOORS
  - TRUNKS
  - SPOILERS
  - LIGHTS
  - EMBLEMS
  - CAMERAS
- ENGINES
- TRANSMISSIONS
- WHEELS & TYRES
  - WHEELS
  - TYRES
CUTS
PERFORMANCE PARTS
TRUCK PARTS
MOTORBIKE PARTS
- MOTORBIKE ENGINES
- MOTORBIKE ACCESSORIES

MOTORBIKES
MARINE
FORKLIFTS
MACHINERY
AGRICULTURAL
OTHER
COUNTRY
- AUSTRALIA
- CANADA
- KENYA
- MYANMAR
- NEW ZEALAND
- PAKISTAN
- TANZANIA
- UNITED STATES

CARVIEW

MOTORHOMES

Select Language

HTTP/2 308 content-type: text/html; charset=utf-8 date: Sun, 28 Dec 2025 07:37:28 GMT location: /guide/local-llm-llama-cpp/ server: Caddy content-length: 63 HTTP/2 200 content-encoding: gzip content-type: text/html; charset=utf-8 date: Sun, 28 Dec 2025 07:37:28 GMT etag: "dc7bk0q56n7k9a2-gzip" last-modified: Wed, 20 Aug 2025 14:19:32 GMT server: Caddy vary: Accept-Encoding Hacker Codex

Tech esoterica brought to you by @jmayer

Home About Archives

Follow via:

atom feed
Twitter
GitHub

Hacker Codex Linux servers · Python development · MacOS tinkering

Run a local LLM via Llama.cpp

Last updated: August 20, 2025

This guide demonstrates how to install and run a large language model (LLM) on your local workstation via Llama.cpp and, optionally, Open WebUI.

Choose a Model

There are many excellent models to choose from, but ultimately the optimal choice depends on your target use case as well your available system resources.

For text generation — as opposed to image/video/voice — Hugging Face offers a fairly comprehensive list of popular models.

For the purposes of this guide, we are going to use DeepSeek R1 0528 8B, a model with 8 billion parameters that when downloaded needs about five gigabytes of disk space.

Install Llama.cpp

As noted in the relevant installation documentation, Llama.cpp can be installed on Linux and MacOS via the Homebrew package manager:

brew install llama.cpp

Alternatively, you can download a pre-built binary from the releases page, or if you prefer, build from source.

Run Llama.cpp Server

Once Llama.cpp is installed, we can run its built-in server on any arbitrary port except for 8080, which we will reserve for Open WebUI. Upon first invocation, the Llama.cpp server will download and run the specified model:

llama-server --port 8888 -hf unsloth/DeepSeek-R1-0528-Qwen3-8B-GGUF:Q4_K_XL

The Llama.cpp server should now be running on the specified port. Visit https://localhost:8888 in your browser to load the web interface and start submitting prompts.

Open WebUI (optional)

While the web interface provided by llama-server is great for accessing models managed by Llama.cpp, Open WebUI is a web front-end that makes it easy to access non-Llama.cpp-managed models as well as remote LLMs. To install it, I suggest using Pipx (in a separate terminal tab):

pipx install open-webui

If you prefer uv, the corresponding installation command would be:

uv tool install open-webui

Start Open WebUI:

open-webui serve

Open your browser and visit: https://localhost:8080

Tap “Get Started”, and enter any data you want for for name, email, and password. This data is used for your local account, so you can enter arbitrary dummy data if you prefer.

(Note: Open WebUI claims to not make any external connections, but on startup I noticed that it connected to HuggingFace, which personally I do not mind since that is where many models are located. Upon account creation, however, Open WebUI also tried to connect to OpenAI and GitHub, both of which I blocked at the network level. I imagine these are innocent and useful to use this tool, but I blocked them since I do not need connections to those services.)

Open WebUI Configuration

User avatar icon > Admin Panel > Connections > (disable OpenAPI and Ollama APIs)

User avatar icon > Settings > Connections > + (Add Connection)

URL: http://127.0.0.1:8888/v1
Key: (leave blank)

Tap the Save button and close the Settings modal.

Select a model from the top navigation if not already selected. Select Temporary Chat if you want to test or otherwise avoid queries/responses from being logged.

Tap on the field labeled with “How can I help you today?” and enter your prompt to query the LLM.

For more information about using Open WebUI with Llama.cpp, refer to the related documentation.

Summary

Now you can run LLMs on your own workstation. If you have any questions, comments, or suggestions, please reach out via the Fediverse!

HOME
ABOUT
AUCTIONS
SHIPPING
FEES
TOOLS
HOW
FAQ
CONTACT

Original Source | Taken Source