...
| Model | sec to load the model | layers to GPU | prompt eval rate | eval rate | compared to CPU | ||
|---|---|---|---|---|---|---|---|
deepseek-r1:70b | 54.25 (2.5x slower) | 81/81 | 0.89 tokens/s | 1.62 tokens/s | -2.5x/+1.3x | ||
llama3.3:70b | 53.34 (2.5x slower) | 81/81 | 1.52 tokens/s | 1.44 tokens/s | -1.5x/+1.2x | ||
qwen3:32b | 28.04 (2.8x slower) | 65/65 | 3.76 tokens/s | 2.93 tokens/s | -1.5x/+1.1x | ||
phi3:14b | 19.09 (5.4x slower) | 41/41 | 10.48 tokens/s | 7.70 tokens/s | -1.4x/+1.9x | ||
deepseek-v2:16b | 14.56 (3.6x slower) | 28/28 | 4.96 tokens/s | 11.26 tokens/s | -11.8x/-2.2x | ||
openchat:7b | 6.53 (2.6x slower) | 33/33 | 29.24 tokens/s | 16.35 tokens/s | 1x/+1.5x | ||
llama4:scout | N/ | llama4:scout | N/A | N/A | N/A | N/A | |
gemma3:27b | N/A | N/A | N/A | N/A | |||
mistral-small3.1:24b | N/A | N/A | N/A | N/A |
ollama CPU
...
...
deepseek-r1:70b
...
llama3.3:70b
...
qwen3:32b
...
gemma3:27b
...
1.76
...
mistral-small3.1:24b
...
3.26
...
llama4:scout
...
13.55
...
deepseek-v2:16b
...
4.02
...
phi3:14b
...
openchat:7b
...
install
| Code Block |
|---|
curl -fsSL https://ollama.com/install.sh | sh |
| Code Block |
|---|
>>> |
ollama CPU
install
| Code Block |
|---|
curl -fsSL https://ollama.com/install.sh | sh |
| Code Block |
|---|
>>> Installing ollama to /usr/local
>>> Downloading Linux amd64 bundle
######################################################################## 100.0%
>>> Creating ollama user...
>>> Adding ollama user to render group...
>>> Adding ollama user to video group...
>>> Adding current user to ollama group...
>>> Creating ollama systemd service...
>>> Enabling and starting ollama service...
Created symlink /etc/systemd/system/default.target.wants/ollama.service → /etc/systemd/system/ollama.service.
>>> The Ollama API is now available at 127.0.0.1:11434.
>>> Install complete. Run "ollama" from the command line.
WARNING: No NVIDIA/AMD GPU detected. Ollama will run in CPU-only mode. |
...
| Code Block |
|---|
(base) root@server1:~# ollama --version ollama version is 0.7.0 |
| Model | started in (seconds) | param | SIZE | prompt eval rate | eval rate |
|---|---|---|---|---|---|
deepseek-r1:70b | 21.34 | 70B | 42 GB | 2.20 tokens/s | 1.24 tokens/s |
llama3.3:70b | 21.34 | 70B | 42 GB | 2.39 tokens/s | 1.23 tokens/s |
qwen3:32b | 10.04 | 32B | 20 GB | 5.63 tokens/s | 2 |
| .54 tokens/s | |||||
gemma3:27b | 1.76 | 27B | 17 GB | 6.66 tokens/s | 3.03 tokens/s |
mistral-small3.1:24b | 3. |
26 |
| 24B |
| 15 GB |
| 7. |
| 72 tokens/s |
| 3. |
| 60 tokens/s |
llama4:scout |
13. |
55 |
| 17B |
| 67 GB |
| 11. |
| 47 tokens/s |
| 4. |
| 76 tokens/s |
deepseek-v2: |
16b |
4. |
02 |
| 16B |
| 8.9 GB |
| 58. |
| 75 tokens/s |
| 24. |
| 50 tokens/s |
phi3: |
14b |
| 3. |
| 52 |
| 14B |
| 7.9 GB |
| 15. |
| 12 tokens/s |
| 6. |
| 05 tokens/s |
mistral-small3.1:24b
3.26
openchat:7b | 2.51 | 7B | 4.1 GB | 30.37 tokens/s |
| 11. |
| 19 tokens/s |
llama.cpp
https://github.com/ggml-org/llama.cpp
...