Code Block
source llm_env/bin/activate #pip install open-webui==0.2.5 pip install open-webui # 0.6.10 open-webui serve

Benchmark LLM

Code Block
git clone https://github.com/tabletuser-blogspot/ollama-benchmark cd ollama-benchmark/ chmod +x obench.sh time ./obench.sh

GPU backend models performance

...

Code Block
(base) root@server1:~# ollama --version ollama version is 0.7.0

Model	started in (seconds)	param	SIZE	prompt eval rate	eval rate
deepseek-r1:70b	21.34	70B	42 GB	2.20 tokens/s	1.24 tokens/s
llama3.3:70b	21.34	70B	42 GB	2.39 tokens/s	1.23 tokens/s
qwen3:32b	10.04	32B	20 GB	5.63 tokens/s	2.54 tokens/s
gemma3:27b	1.76	27B	17 GB	6.66 tokens/s	3.03 tokens/s
mistral-small3.1:24b	3.26	24B	15 GB	7.72 tokens/s	3.60 tokens/s
llama4:scout	13.55	17B	67 GB	11.47 tokens/s	4.76 tokens/s
deepseek-v2:16b	4.02	16B	8.9 GB	58.75 tokens/s	24.50 tokens/s
phi3:14b	3.52	14B	7.9 GB	15.12 tokens/s	6.05 tokens/s
openchat:7b	2.51	7B	4.1 GB	30.37 tokens/s	11.19 tokens/s

llama.cpp

...