Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Modelsec to load the modellayers to GPUprompt eval rateeval ratecompared to CPU

deepseek-r1:70b

54.25 (2.5x slower)

81/81

0.89 tokens/s1.62 tokens/s-2.5x/+1.3x

llama3.3:70b

53.34 (2.5x slower)

81/811.52 tokens/s1.44 tokens/s-1.5x/+1.2x

qwen3:32b

28.04 (2.8x slower)

65/65

3.76 tokens/s2.93 tokens/s-1.5x/+1.1x

phi3:14b

19.09 (5.4x slower)

41/4110.48 tokens/s7.70 tokens/s-1.4x/+1.9x

deepseek-v2:16b

14.56 (3.6x slower)

28/284.96 tokens/s11.26 tokens/s-11.8x/-2.2x

openchat:7b

6.53 (2.6x slower)

33/33

29.24 tokens/s16.35 tokens/s1x/+1.5x

llama4:scout

N/

llama4:scout

N/A

N/AN/AN/A

gemma3:27b

N/A

N/AN/AN/A

mistral-small3.1:24b

N/A

N/AN/AN/A


ollama CPU

...

...

deepseek-r1:70b

...

llama3.3:70b

...

qwen3:32b

...

gemma3:27b

...

1.76

...

mistral-small3.1:24b

...

3.26

...

llama4:scout

...

13.55

...

deepseek-v2:16b

...

4.02

...

phi3:14b

...

openchat:7b

...

install

Code Block
curl -fsSL https://ollama.com/install.sh | sh
Code Block
>>> 

ollama CPU

install

Code Block
curl -fsSL https://ollama.com/install.sh | sh
Code Block
>>> Installing ollama to /usr/local
>>> Downloading Linux amd64 bundle
######################################################################## 100.0%
>>> Creating ollama user...
>>> Adding ollama user to render group...
>>> Adding ollama user to video group...
>>> Adding current user to ollama group...
>>> Creating ollama systemd service...
>>> Enabling and starting ollama service...
Created symlink /etc/systemd/system/default.target.wants/ollama.service → /etc/systemd/system/ollama.service.
>>> The Ollama API is now available at 127.0.0.1:11434.
>>> Install complete. Run "ollama" from the command line.
WARNING: No NVIDIA/AMD GPU detected. Ollama will run in CPU-only mode.

...

Code Block
(base) root@server1:~# ollama --version
ollama version is 0.7.0



Modelstarted in (seconds)paramSIZEprompt eval rateeval rate

deepseek-r1:70b

21.3470B42 GB2.20 tokens/s1.24 tokens/s

llama3.3:70b

21.3470B42 GB2.39 tokens/s1.23 tokens/s
Qwen3 32B

qwen3:32b

10.0432B20 GB5.63 tokens/s2
.54
.54 tokens/s

gemma3:27b

1.76

27B17 GB6.66 tokens/s3.03 tokens/s
phi3:14b

mistral-small3.1:24b

3.

52

26

14B
24B
7.9
15 GB
15
7.
12
72 tokens/s
6
3.
05
60 tokens/s
openchat7b

llama4:scout

2

13.

51

55

7B
17B
4.1
67 GB
30
11.
37
47 tokens/s
11
4.
19
76 tokens/s
llama4

deepseek-v2:

scout

16b

13

4.

55

02

17B
16B
67
8.9 GB
11
58.
47
75 tokens/s
4
24.
76
50 tokens/s
gemma3

phi3:

27b

14b

1
3.
76
52
27B
14B
17
7.9 GB
6
15.
66
12 tokens/s
3
6.
03
05 tokens/s

mistral-small3.1:24b

3.26

24B15 GB7.72

openchat:7b

2.517B4.1 GB30.37 tokens/s
3
11.
60
19 tokens/s


llama.cpp

https://github.com/ggml-org/llama.cpp

...