_                    _
  ___ | |__   __ _ _______ | |_
 / _ \| '_ \ / _` |_  / _ \| __|
| (_) | | | | (_| |/ / (_) | |_
 \___/|_| |_|\__,_/___\___/ \__|

llamarun language models locally

clone:
compile:
- For Vulkan(AMD GPU): ; cmake --build build --config Release [-j MAX_CORES]
- When updating, before compiling:

(back to top)

The binaries are located at: LLAMA.CPP_PATH/build/bin/

(back to top)

The following options apply for both server and cli.

-m MODEL_PATH : model to use.
--no-mmap : no memory map, useful with small models if the gpu can handle it.
-ngl N : load N layers into gpu.
--temp N : model temperature.
-c N : context size.
-t N : threads.

--host HOST : host to use instead of 127.0.0.1.
--port PORT : port to use instead of 8080.

--color [on|off|auto] : coloured chat on/off/auto.
--f FILEPATH : FILEPATH containing a prompt for the system.

(back to top)

Models can be downloaded from HuggingFace , some of the tested models are listed in the links.

(back to top)

ai , whisper

- llama.cpp - Github
- Huggingface - Models repositories

- GLM-4.6V-Flash-Q8_0-GGUF
- TheDrummer_Cydonia-24B-v4.3-GGUF
- Impish_Nemo_12B_GGUF
- Llama-3.2-3B-Instruct-GGUF
- Llama-3.2-3B-Instruct-uncensored-GGUF
- Qwen_Qwen3-0.6B-GGUF

(back to top)

ohazot | about | ohazot.com <admin@ohazot.com>

This document applies to: linux , OpenBSD 7.8 | Created:2026-04-02|Updated:2026-04-02|