...
| Model | started in (seconds) | param | SIZE | prompt eval rate | eval rate |
|---|---|---|---|---|---|
deepseek-r1:70b | 21.34 | 70B | 42 GB | 2.20 tokens/s | 1.24 tokens/s |
llama3.3:70b | 21.34 | 70B | 42 GB | 2.39 tokens/s | 1.23 tokens/s |
Qwen3 32B | 10.04 | 32B | 20 GB | 5.63 tokens/s | 2.54 tokens/s |
phi3:14b | 3.52 | 14B | 7.9 GB | 15.12 tokens/s | 6.05 tokens/s |
openchat7b | 10.04 | 7B | 4.1 GB | ||
llama4:scout | 13.55 | 67 GB | 11.47 tokens/s | 4.76 tokens/s | |
gemma3:27b | 1.76 | 27B | 17 GB | 6.66 tokens/s | 3.03 tokens/s |
mistral-small3.1:24b | 3/26 | 24B | 15 GB |
llama.cpp
https://github.com/ggml-org/llama.cpp
...