...
| Model | started in (seconds) | param | SIZE | CPU Model Buffer size | prompt eval rate | eval rate |
|---|---|---|---|---|---|---|
deepseek-r1:70b | 21.34 | 70B | 42 GB | 2.20 tokens/s | 1.24 tokens/s | |
llama3.3:70b | 21.34 | 70B | 42 GB | 2.39 tokens/s | 1.23 tokens/s | |
Qwen3 32B | 10.04 | 32B | 20 GB19259.71 MiB | 5.63 tokens/s | 2.54 tokens/s | |
phi3:14b | 3.52 | 14B | 7.9 GB7530.58 MiB | 15.12 tokens/s | 6.05 tokens/s | |
openchat7b | 10.04 | 7B | 4.1 GB | |||
llama4:scout | 13.55 | 67 GB | 11.47 tokens/s | 4.76 tokens/s | ||
gemma3:27b | 1.76 | 27B | 17 GB | |||
mistral-small3.1:24b | 24B | 15 GB |
llama.cpp
https://github.com/ggml-org/llama.cpp
...