ohazot_llama

       _                    _
  ___ | |__   __ _ _______ | |_
 / _ \| '_ \ / _` |_  / _ \| __|
| (_) | | | | (_| |/ / (_) | |_
 \___/|_| |_|\__,_/___\___/ \__|

llama — run language models locally

COMPILE

clone:	`git clone https://github.com/ggml-org/llama.cpp`
compile:	`cmake -B build -DLLAMA_CURL=OFF ; cmake --build build --config Release [-j MAX_CORES]`
- For Vulkan(AMD GPU):	`cmake -B build -DGGML_VULKAN=ON -DLLAMA_CURL=OFF`; `cmake --build build --config Release [-j MAX_CORES]`
- When updating, before compiling:	`rm -r build`

BINARIES

The binaries are located at: LLAMA.CPP_PATH/build/bin/

OPTIONS

The following options apply for both server and cli.

-m MODEL_PATH	: model to use.
--no-mmap	: no memory map, useful with small models if the gpu can handle it.
-ngl N	: load N layers into gpu.
--temp N	: model temperature.
-c N	: context size.
-t N	: threads.

llama-server

--host HOST	: host to use instead of 127.0.0.1.
--port PORT	: port to use instead of 8080.

llama-cli

--color [on\|off\|auto]	: coloured chat on/off/auto.
--f FILEPATH	: FILEPATH containing a prompt for the system.

MODELS

Models can be downloaded from HuggingFace, some of the tested models are listed in the links.

TODO

- Translate to Spanish.

SEE ALSO

links

- llama.cpp - Github

- Huggingface - Models repositories

models

- GLM-4.6V-Flash-Q8_0-GGUF

- TheDrummer_Cydonia-24B-v4.3-GGUF

- Impish_Nemo_12B_GGUF

- Llama-3.2-3B-Instruct-GGUF

- Llama-3.2-3B-Instruct-uncensored-GGUF

- Qwen_Qwen3-0.6B-GGUF

HISTORY

2026-04-02

: Created.

AUTHORS

ohazot | about | ohazot.com <admin@ohazot.com>

This document applies to: linux , OpenBSD 7.8 | 2026-04-02