.Dd Created:2026-04-02|Updated:2026-04-02| .de ocsi \\$* , .. .de oxr .ocsi .Xr \\$* .. .de oit .It \\$* .. .de obdi .Bl -dash -compact .oit \\$* .. .de obdl .Bd -literal -compact \\$* .. .de onote .Bl -hang -compact .oit \\$* .El .. .de ocomm .Bl -diag -compact .oit \\$* .El .. .de opsy .Pp .Sy - \\$* .. .de obc .Bl -column \\$* .. .de obc2 .obc opt desc .. .de obc3 .obc option arguments description .. .nr r 0 .ie ( \nr == 1 ) \{\ .de cos .Os OpenBSD 7.8 , linux | .. .de dm .opsy OpenBSD manpages: \\$* .. .\} .ie ( \nr == 0 ) \{\ .de cos .Os linux , OpenBSD 7.8 | .. .de dm .opsy Archlinux manpages: \\$* .. .\} .Dt LLAMA oh .cos .Nm llama .Nd run language models locally .Sh COMPILE .obc compiling_get_models .It clone : Ta Li git clone https://github.com/ggml-org/llama.cpp .It compile : Ta Li cmake -B build -DLLAMA_CURL=OFF \; cmake --build build --config Release [-j MAX_CORES] .It - For Vulkan(AMD GPU): Ta Li cmake -B build -DGGML_VULKAN=ON -DLLAMA_CURL=OFF ; cmake --build build --config Release [-j MAX_CORES] .It - When updating, before compiling : Ta Li rm -r build .El .Sh BINARIES The binaries are located at: .Pa LLAMA.CPP_PATH/build/bin/ .Sh OPTIONS The following options apply for both server and cli. .obc _m_model_p .It -m MODEL_PATH Ta : model to use. .It --no-mmap Ta : no memory map, useful with small models if the gpu can handle it. .It -ngl N Ta : load N layers into gpu. .It --temp N Ta : model temperature. .It -c N Ta : context size. .It -t N Ta : threads. .El .Sh llama-server .obc host_hos .It --host HOST Ta : host to use instead of 127.0.0.1. .It --port PORT Ta : port to use instead of 8080. .El .Sh llama-cli .obc __color__on_off_au .It --color [on|off|auto] Ta : coloured chat on/off/auto. .It --f FILEPATH Ta : FILEPATH containing a prompt for the system. .El .Sh MODELS Models can be downloaded from .Lk https://huggingface.co/ HuggingFace , some of the tested models are listed in the links. .Sh SEE ALSO .oxr ai oh .Xr whisper oh .Ss links .obc2 .It - Lk https://github.com/ggml-org/llama.cpp llama.cpp - Github .It - Lk https://huggingface.co/ Huggingface - Models repositories .El .Ss models .obc2 .It - Lk https://huggingface.co/NikolayKozloff/GLM-4.6V-Flash-Q8_0-GGUF GLM-4.6V-Flash-Q8_0-GGUF .It - Lk https://huggingface.co/bartowski/TheDrummer_Cydonia-24B-v4.3-GGUF TheDrummer_Cydonia-24B-v4.3-GGUF .It - Lk https://huggingface.co/SicariusSicariiStuff/Impish_Nemo_12B_GGUF Impish_Nemo_12B_GGUF .It - Lk https://huggingface.co/bartowski/Llama-3.2-3B-Instruct-GGUF Llama-3.2-3B-Instruct-GGUF .It - Lk https://huggingface.co/bartowski/Llama-3.2-3B-Instruct-uncensored-GGUF Llama-3.2-3B-Instruct-uncensored-GGUF .It - Lk https://huggingface.co/bartowski/Qwen_Qwen3-0.6B-GGUF Qwen_Qwen3-0.6B-GGUF .El .Sh AUTHORS .An -nosplit .Xr ohazot oh | .Xr about oh | .Lk https://ohazot.com ohazot.com .Aq Mt admin@ohazot.com