...
| Code Block |
|---|
(base) root@server1:~/llama-cpp# ./ollama list NAME ID SIZE MODIFIED phi3:14b cf611a26b048 7.9 GB 3 minutes ago llama3.3:70b a6eb4748fd29 42 GB 16 minutes ago mistral-small3.1:24b b9aaf0c2586a 15 GB 23 minutes ago llama4:scout 4f01ed6b6e01 67 GB 56 minutes ago openchat:7b 537a4e03b649 4.1 GB About an hour ago qwen3:32b e1c9f234c6eb 20 GB 282 minuteshours ago gemma3:27b a418f5838eaf 17 GB 37 2 minuteshours ago deepseek-r1:70b 0c1615a8ca32 42 GB About an3 hourhours ago |
pull model
| Code Block |
|---|
(base) root@server1:~/llama-cpp# ./ollama list NAME ID SIZE MODIFIED qwen3:32b e1c9f234c6eb 20 GB 28 minutes ago gemma3:27b a418f5838eaf 17 GB 37 minutes ago deepseek-r1:70b 0c1615a8ca32 42 GB About an hour ago (base) root@server1:~/llama-cpp# ./ollama pull openchat:7b pulling manifest pulling 1cecc26325a1... 100% ▕████████████████████████████████████████████████████████████████████████████████ ▏ 4.1 GB/4.1 GB 102 MB/s 0s pulling 43070e2d4e53... 100% ▕████████████████████████████████████████████████████████████████████████████████▏ 11 KB pulling d68706c17530... 100% ▕████████████████████████████████████████████████████████████████████████████████▏ 98 B pulling 415f0f6b43dd... 100% ▕████████████████████████████████████████████████████████████████████████████████▏ 65 B pulling 278996753456... 100% ▕████████████████████████████████████████████████████████████████████████████████▏ 483 B verifying sha256 digest writing manifest success |
...
| Model | sec to load the model | layers to GPU | |
|---|---|---|---|
DeepSeek R1 Distill Llama 70B | 54.25 | 81/81 | |
llama3.3:70b | 53.34 | 81/81 | |
Qwen3 32B | 28.04 | 65/65 | |
phi3:14b | 19.09 | 41/41 | |
openchat7b | 6.53 | 33/33 | |
llama4:scout | |||
Llama 3.1 70B Instruct 2024 12 | |||
gemma3:27b | |||
mistral-small3.1:24b |
ollama CPU
install
| Code Block |
|---|
curl -fsSL https://ollama.com/install.sh | sh |
| Code Block |
|---|
>>> Installing ollama to /usr/local
>>> Downloading Linux amd64 bundle
######################################################################## 100.0%
>>> Creating ollama user...
>>> Adding ollama user to render group...
>>> Adding ollama user to video group...
>>> Adding current user to ollama group...
>>> Creating ollama systemd service...
>>> Enabling and starting ollama service...
Created symlink /etc/systemd/system/default.target.wants/ollama.service → /etc/systemd/system/ollama.service.
>>> The Ollama API is now available at 127.0.0.1:11434.
>>> Install complete. Run "ollama" from the command line.
WARNING: No NVIDIA/AMD GPU detected. Ollama will run in CPU-only mode. |
pull image
| Code Block |
|---|
ollama pull mistral-small3.1:24b |
if model already downloaded in Intel GPU version
| Code Block |
|---|
rm -Rf /usr/share/ollama/.ollama/models/
mv /root/.ollama/models/ /usr/share/ollama/.ollama/models/
ln -s /usr/share/ollama/.ollama/models/ /root/.ollama/models
ollama list |
| Code Block |
|---|
(base) root@server1:~# ollama list
NAME ID SIZE MODIFIED
phi3:14b cf611a26b048 7.9 GB 23 minutes ago
llama3.3:70b a6eb4748fd29 42 GB 36 minutes ago
mistral-small3.1:24b b9aaf0c2586a 15 GB 43 minutes ago
llama4:scout 4f01ed6b6e01 67 GB About an hour ago
openchat:7b 537a4e03b649 4.1 GB 2 hours ago
qwen3:32b e1c9f234c6eb 20 GB 3 hours ago
gemma3:27b a418f5838eaf 17 GB 3 hours ago
deepseek-r1:70b 0c1615a8ca32 42 GB 4 hours ago
|
| Code Block |
|---|
(base) root@server1:~# ollama --version
ollama version is 0.7.0 |
| Model | llama runner started in | param | SIZE | CPU Model Buffer size | tokens/s |
|---|---|---|---|---|---|
deepseek-r1:70b | 42 GB | ||||
llama3.3:70b | 42 GB | ||||
Qwen3 32B | 20 GB | ||||
phi3:14b | 3.52 | 14B | 7.9 GB | 7530.58 | 15.12 tokens/s |
openchat7b | 4.1 GB | ||||
llama4:scout | |||||
gemma3:27b | 17 GB | ||||
mistral-small3.1:24b | 15 GB |
llama.cpp
https://github.com/ggml-org/llama.cpp
...
| Code Block |
|---|
apt install -y libcurl-ocaml-dev git clone https://github.com/ggml-org/llama.cpp cd llama.cpp cmake -B build cmake --build build --config Release cd build make install ldconfig |
Intel oneMKL
...
tbd
use
| Code Block |
|---|
llama-cli -m model.gguf llama-server -m model.gguf --port 8080 llama-bench -m model.gguf llama-run |
...