Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.1-GGUF/tree/main

https://github.com/AUTOMATIC1111/stable-diffusion-webui



Intel tools

Intel oneAPI

oneMKL - OneAPI Math Kernel Library, OneDNN - oneAPI Deep Neural Network Library

...

Modelsec to load the modellayers to GPUprompt eval rateeval rate

deepseek-r1:70b

54.25 (2.5x slower)

81/81

0.89 tokens/s1.62 tokens/s

llama3.3:70b

53.34 (2.5x slower)

81/811.52 tokens/s1.44 tokens/s

qwen3:32b

28.04 (2.8x slower)

65/65

3.76 tokens/s2.93 tokens/s

phi3:14b

19.09 (5.4x slower)

41/4110.48 tokens/s7.70 tokens/s

deepseek-v2:16b

14.56 (3.6x slower)

28/284.96 tokens/s11.26 tokens/s

openchat:7b

6.53 (2.6x slower)

33/33

29.24 tokens/s16.35 tokens/s

llama4:scout

N/A

N/AN/AN/A

gemma3:27b

N/A

N/AN/AN/A

mistral-small3.1:24b

N/A

N/AN/AN/A

CPU vs GPU

Modelstarted in (seconds)paramSIZEprompt eval rateeval rate

deepseek-r1:70b

21.3470B42 GB2.20 tokens/s1.24 tokens/s

llama3.3:70b

21.3470B42 GB2.39 tokens/s1.23 tokens/s

qwen3:32b

10.0432B20 GB5.63 tokens/s2.54 tokens/s

gemma3:27b

1.76

27B17 GB6.66 tokens/s3.03 tokens/s

mistral-small3.1:24b

3.26

24B15 GB7.72 tokens/s3.60 tokens/s

llama4:scout

13.55

17B67 GB11.47 tokens/s4.76 tokens/s

deepseek-v2:16b

4.02

16B8.9 GB58.75 tokens/s24.50 tokens/s

phi3:14b

3.5214B7.9 GB15.12 tokens/s6.05 tokens/s

openchat:7b

2.517B4.1 GB30.37 tokens/s11.19 tokens/s

ollama CPU

install

Code Block
curl -fsSL https://ollama.com/install.sh | sh

...