_ _ ___ | |__ __ _ _______ | |_ / _ \| '_ \ / _` |_ / _ \| __| | (_) | | | | (_| |/ / (_) | |_ \___/|_| |_|\__,_/___\___/ \__|
| ohazot | docs | links | dev | conf | txt |
| es | en |
| mdoc file |
| search |
llama —
run language models locally
COMPILE
| clone: | git clone
https://github.com/ggml-org/llama.cpp |
| compile: | cmake -B build
-DLLAMA_CURL=OFF ; cmake --build build --config Release [-j
MAX_CORES] |
| - For Vulkan(AMD GPU): | cmake -B build
-DGGML_VULKAN=ON -DLLAMA_CURL=OFF; cmake
--build build --config Release [-j MAX_CORES] |
| - When updating, before compiling: | rm -r
build |
BINARIES
The binaries are located at: LLAMA.CPP_PATH/build/bin/
OPTIONS
The following options apply for both server and cli.
| -m MODEL_PATH | : model to use. |
| --no-mmap | : no memory map, useful with small models if the gpu can handle it. |
| -ngl N | : load N layers into gpu. |
| --temp N | : model temperature. |
| -c N | : context size. |
| -t N | : threads. |
llama-server
| --host HOST | : host to use instead of 127.0.0.1. |
| --port PORT | : port to use instead of 8080. |
llama-cli
| --color [on|off|auto] | : coloured chat on/off/auto. |
| --f FILEPATH | : FILEPATH containing a prompt for the system. |
MODELS
Models can be downloaded from HuggingFace , some of the tested models are listed in the links.
SEE ALSO
links
| - llama.cpp - Github |
| - Huggingface - Models repositories |
models
| - GLM-4.6V-Flash-Q8_0-GGUF |
| - TheDrummer_Cydonia-24B-v4.3-GGUF |
| - Impish_Nemo_12B_GGUF |
| - Llama-3.2-3B-Instruct-GGUF |
| - Llama-3.2-3B-Instruct-uncensored-GGUF |
| - Qwen_Qwen3-0.6B-GGUF |