...

Code Block

(base) root@server1:~/llama-cpp# ./ollama list
NAME                    ID              SIZE      MODIFIED
phi3:14b                cf611a26b048    7.9 GB    3 minutes ago
llama3.3:70b            a6eb4748fd29    42 GB     16 minutes ago
mistral-small3.1:24b    b9aaf0c2586a    15 GB     23 minutes ago
llama4:scout            4f01ed6b6e01    67 GB     56 minutes ago
openchat:7b             537a4e03b649    4.1 GB    About an hour ago
qwen3:32b               e1c9f234c6eb    20 GB     282 minuteshours ago
gemma3:27b              a418f5838eaf    17 GB    37 2 minuteshours ago
deepseek-r1:70b         0c1615a8ca32    42 GB    About an3 hourhours ago

pull model

Code Block

(base) root@server1:~/llama-cpp# ./ollama list
NAME               ID              SIZE     MODIFIED
qwen3:32b          e1c9f234c6eb    20 GB    28 minutes ago
gemma3:27b         a418f5838eaf    17 GB    37 minutes ago
deepseek-r1:70b    0c1615a8ca32    42 GB    About an hour ago
(base) root@server1:~/llama-cpp# ./ollama pull openchat:7b
pulling manifest
pulling 1cecc26325a1... 100% ▕████████████████████████████████████████████████████████████████████████████████ ▏ 4.1 GB/4.1 GB  102 MB/s      0s
pulling 43070e2d4e53... 100% ▕████████████████████████████████████████████████████████████████████████████████▏  11 KB
pulling d68706c17530... 100% ▕████████████████████████████████████████████████████████████████████████████████▏   98 B
pulling 415f0f6b43dd... 100% ▕████████████████████████████████████████████████████████████████████████████████▏   65 B
pulling 278996753456... 100% ▕████████████████████████████████████████████████████████████████████████████████▏  483 B
verifying sha256 digest
writing manifest
success

...

Model	sec to load the model	layers to GPU
DeepSeek R1 Distill Llama 70B	54.25	81/81
llama3.3:70b	53.34	81/81
Qwen3 32B	28.04	65/65
phi3:14b	19.09	41/41
openchat7b	6.53	33/33
llama4:scout
Llama 3.1 70B Instruct 2024 12
gemma3:27b
mistral-small3.1:24b

ollama CPU

install

Code Block
curl -fsSL https://ollama.com/install.sh \| sh

Code Block

>>> Installing ollama to /usr/local
>>> Downloading Linux amd64 bundle
######################################################################## 100.0%
>>> Creating ollama user...
>>> Adding ollama user to render group...
>>> Adding ollama user to video group...
>>> Adding current user to ollama group...
>>> Creating ollama systemd service...
>>> Enabling and starting ollama service...
Created symlink /etc/systemd/system/default.target.wants/ollama.service → /etc/systemd/system/ollama.service.
>>> The Ollama API is now available at 127.0.0.1:11434.
>>> Install complete. Run "ollama" from the command line.
WARNING: No NVIDIA/AMD GPU detected. Ollama will run in CPU-only mode.

pull image

Code Block
ollama pull mistral-small3.1:24b

if model already downloaded in Intel GPU version

Code Block
rm -Rf /usr/share/ollama/.ollama/models/ mv /root/.ollama/models/ /usr/share/ollama/.ollama/models/ ln -s /usr/share/ollama/.ollama/models/ /root/.ollama/models ollama list

Code Block

(base) root@server1:~# ollama list
NAME                    ID              SIZE      MODIFIED
phi3:14b                cf611a26b048    7.9 GB    23 minutes ago
llama3.3:70b            a6eb4748fd29    42 GB     36 minutes ago
mistral-small3.1:24b    b9aaf0c2586a    15 GB     43 minutes ago
llama4:scout            4f01ed6b6e01    67 GB     About an hour ago
openchat:7b             537a4e03b649    4.1 GB    2 hours ago
qwen3:32b               e1c9f234c6eb    20 GB     3 hours ago
gemma3:27b              a418f5838eaf    17 GB     3 hours ago
deepseek-r1:70b         0c1615a8ca32    42 GB     4 hours ago

Code Block
(base) root@server1:~# ollama --version ollama version is 0.7.0

Model	llama runner started in	param	SIZE	CPU Model Buffer size	tokens/s
deepseek-r1:70b			42 GB
llama3.3:70b			42 GB
Qwen3 32B			20 GB
phi3:14b	3.52	14B	7.9 GB	7530.58	15.12 tokens/s
openchat7b			4.1 GB
llama4:scout
gemma3:27b			17 GB
mistral-small3.1:24b			15 GB

llama.cpp

https://github.com/ggml-org/llama.cpp

...

Code Block
apt install -y libcurl-ocaml-dev git clone https://github.com/ggml-org/llama.cpp cd llama.cpp cmake -B build cmake --build build --config Release cd build make install ldconfig

Intel oneMKL

...

tbd

use

Code Block
llama-cli -m model.gguf llama-server -m model.gguf --port 8080 llama-bench -m model.gguf llama-run

...

Page tree

Versions Compared

Old Version 6

New Version 7

Key

ollama CPU

llama.cpp

Page tree

Page History

Versions Compared

Old Version 6

New Version 7

Key

ollama CPU

llama.cpp