...
| Code Block |
|---|
source llm_env/bin/activate #pip install open-webui==0.2.5 pip install open-webui # 0.6.10 open-webui serve |
Benchmark LLM
| Code Block |
|---|
git clone https://github.com/tabletuser-blogspot/ollama-benchmark
cd ollama-benchmark/
chmod +x obench.sh
time ./obench.sh |
GPU backend models performance
...
| Code Block |
|---|
(base) root@server1:~# ollama --version ollama version is 0.7.0 |
| Model | started in (seconds) | param | SIZE | prompt eval rate | eval rate |
|---|---|---|---|---|---|
deepseek-r1:70b | 21.34 | 70B | 42 GB | 2.20 tokens/s | 1.24 tokens/s |
llama3.3:70b | 21.34 | 70B | 42 GB | 2.39 tokens/s | 1.23 tokens/s |
qwen3:32b | 10.04 | 32B | 20 GB | 5.63 tokens/s | 2.54 tokens/s |
gemma3:27b | 1.76 | 27B | 17 GB | 6.66 tokens/s | 3.03 tokens/s |
mistral-small3.1:24b | 3.26 | 24B | 15 GB | 7.72 tokens/s | 3.60 tokens/s |
llama4:scout | 13.55 | 17B | 67 GB | 11.47 tokens/s | 4.76 tokens/s |
deepseek-v2:16b | 4.02 | 16B | 8.9 GB | 58.75 tokens/s | 24.50 tokens/s |
phi3:14b | 3.52 | 14B | 7.9 GB | 15.12 tokens/s | 6.05 tokens/s |
openchat:7b | 2.51 | 7B | 4.1 GB | 30.37 tokens/s | 11.19 tokens/s |
llama.cpp
https://github.com/ggml-org/llama.cpp
...


