LLAMA(oh) LOCAL LLAMA(oh) llama - run language models locally COMPILE clone: git clone https://github.com/ggml-org/llama.cpp compile: cmake -B build -DLLAMA_CURL=OFF ; cmake --build build --config Release [-j MAX_CORES] - For Vulkan(AMD GPU): cmake -B build -DGGML_VULKAN=ON -DLLAMA_CURL=OFF; cmake --build build --config Release [-j MAX_CORES] - When updating, before compiling: rm -r build BINARIES The binaries are located at: LLAMA.CPP_PATH/build/bin/ OPTIONS The following options apply for both server and cli. -m MODEL_PATH : model to use. --no-mmap : no memory map, useful with small models if the gpu can handle it. -ngl N : load N layers into gpu. --temp N : model temperature. -c N : context size. -t N : threads. llama-server --host HOST : host to use instead of 127.0.0.1. --port PORT : port to use instead of 8080. llama-cli --color [on|off|auto] : coloured chat on/off/auto. --f FILEPATH : FILEPATH containing a prompt for the system. MODELS Models can be downloaded from HuggingFace: https://huggingface.co/ , some of the tested models are listed in the links. SEE ALSO ai(oh) , whisper(oh) links - llama.cpp - Github: https://github.com/ggml-org/llama.cpp - Huggingface - Models repositories: https://huggingface.co/ models - GLM-4.6V-Flash-Q8_0-GGUF: https://huggingface.co/NikolayKozloff/GLM-4.6V-Flash-Q8_0-GGUF - TheDrummer_Cydonia-24B-v4.3-GGUF: https://huggingface.co/bartowski/TheDrummer_Cydonia-24B-v4.3-GGUF - Impish_Nemo_12B_GGUF: https://huggingface.co/SicariusSicariiStuff/Impish_Nemo_12B_GGUF - Llama-3.2-3B-Instruct-GGUF: https://huggingface.co/bartowski/Llama-3.2-3B-Instruct-GGUF - Llama-3.2-3B-Instruct-uncensored-GGUF: https://huggingface.co/bartowski/Llama-3.2-3B-Instruct-uncensored-GGUF - Qwen_Qwen3-0.6B-GGUF: https://huggingface.co/bartowski/Qwen_Qwen3-0.6B-GGUF AUTHORS ohazot(oh) | about(oh) | ohazot.com: https://ohazot.com linux , OpenBSD 7.8 | Created:2026-04-02|Updated:2026-04-02| LLAMA(oh)