With Power Limits 95/110, Ubuntu 24.04, Intel Core Ultra 9 185H,
Crucial 2x128GB 128GB (2x64GB) 5600MT/s DDR5 SODIMM, WD_BLACK SN850x 8TB
GPU Version: intel-ollama-0.6.2 for GPU SYCL0 (Intel(R) Arc(TM) Graphics) - 120187 MiB free for GPU
CPU Version: ollama version is 0.7.0 for CPU 123.5 GiB available
| Model | prompt eval rate | second prompt | eval rate | |||||
|---|---|---|---|---|---|---|---|---|
| Name | Params, B | SIZE, GB | CPU | GPU | CPU | GPU | CPU | GPU |
| phi4:14b-q4_K_M | 14 | 9.1 | 12.98 | 9.87 | 100.02 | 113.45 | 5.78 | 6.77 |
| phi4-mini:3.8b-q4_K_M | 3.8 | 2.5 | 49.26 | - | 186.63 | - | 18.83 | - |
| phi4:14b-fp16 | 14 | 29 | 12.45 | 11.53 | 35.53 | 40.16 | 2.08 | 2.09 |
| openthinker:32b-v2-fp16 | 32 | 65 | 4.45 | 7.31 | 20.24 | 22.31 | 0.81 | 0.90 |
| openthinker:32b | 32 | 19 | 5.93 | 4.59 | 66.31 | 69.74 | 2.60 | 2.78 |
| dolphin-phi:2.7b | 2.7 | 1.6 | 85.67 | 86.81 | 744.07 | 649.43 | 25.42 | 21.73 |
| dolphin3:8b | 3.8 | 4.9 | 26.04 | 30.97 | 325.85 | 373.30 | 10.76 | 12.58 |
| tinyllama:1.1b | 1.1 | 0.6 | 198.18 | 112.98 | 2595.12 | 2211.21 | 62.99 | 57.53 |
| deepseek-v2:16b | 16 | 8.9 | 59.47 | 15.83 | 361.51 | 175.02 | 24.39 | 12.00 |
| phi3:14b | 14 | 7.9 | 15.60 | 10.51 | 101.53 | 128.59 | 6.07 | 7.67 |
| llama3.3:70b | 70 | 42 | 2.60 | 1.54 | 21.35 | 23.37 | 1.25 | 1.37 |
| mistral-small3.1:24b | 24 | 15 | 7.71 | - | 1321.32 | - | 3.64 | - |
| llama4:scout | 17 | 67 | 11.14 | - | 1683.33 | - | 4.81 | - |
| openchat:7b | 7 | 4.1 | 30.47 | 27.15 | 273.39 | 361.04 | 11.10 | 14.81 |
| qwen3:32b | 32 | 20 | 5.67 | 2.84 | 38.88 | 41.60 | 2.53 | 2.73 |
| gemma3:27b | 27 | 17 | 6.60 | - | 49.38 | - | 3.04 | - |
| deepseek-r1:70b | 70 | 42 | 2.63 | 0.89 | 12.39 | 14.13 | 1.24 | 1.38 |
...
top (deepseek-r1:70b execution on CPU)
| Code Block | ||||
|---|---|---|---|---|
| ||||
top - 13:18:48 up 1:12, 2 users, load average: 6.01, 5.95, 5.94
Tasks: 326 total, 1 running, 325 sleeping, 0 stopped, 0 zombie
%Cpu0 : 68.3 us, 0.0 sy, 0.0 ni, 31.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu1 : 38.7 us, 0.0 sy, 0.0 ni, 61.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu2 : 61.7 us, 0.0 sy, 0.0 ni, 38.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu3 : 99.7 us, 0.0 sy, 0.0 ni, 0.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu4 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu5 : 29.0 us, 0.0 sy, 0.0 ni, 71.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu6 : 85.7 us, 0.0 sy, 0.0 ni, 14.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu7 : 11.3 us, 0.0 sy, 0.0 ni, 88.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu8 : 69.0 us, 0.0 sy, 0.0 ni, 31.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu9 : 26.6 us, 0.0 sy, 0.0 ni, 73.4 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu10 : 66.7 us, 0.0 sy, 0.0 ni, 33.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu11 : 31.2 us, 0.0 sy, 0.0 ni, 68.8 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu12 : 1.0 us, 0.0 sy, 0.0 ni, 99.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu13 : 1.7 us, 0.0 sy, 0.0 ni, 98.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu14 : 0.7 us, 0.0 sy, 0.0 ni, 99.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu15 : 0.3 us, 0.0 sy, 0.0 ni, 99.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu16 : 0.0 us, 0.3 sy, 0.0 ni, 99.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu17 : 1.3 us, 0.0 sy, 0.0 ni, 98.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu18 : 3.3 us, 0.0 sy, 0.0 ni, 96.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu19 : 4.3 us, 0.0 sy, 0.0 ni, 95.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu20 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu21 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
MiB Mem : 35.4/128337.6 [||||||||||||||||||||||||| ]
MiB Swap: 0.0/8192.0 [ ]
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
59943 ollama 20 0 45.7g 42.2g 23152 S 598.3 33.7 38:14.56 ollama
1 root 20 0 22116 12508 9340 S 0.0 0.0 0:00.72 systemd
2 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kthreadd
3 root 20 0 0 0 0 S 0.0 0.0 0:00.00 pool_workqueue_release
4 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 kworker/R-rcu_gp
5 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 kworker/R-sync_wq
6 root 0 -20 0 0 0 I 0.0 |
top (deepseek-r1:70b execution on GPU)
| Code Block | ||||
|---|---|---|---|---|
| ||||
top - 14:20:49 up 2:14, 4 users, load average: 1.75, 2.91, 2.01
Tasks: 344 total, 2 running, 342 sleeping, 0 stopped, 0 zombie
%Cpu0 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu1 : 0.0 us, 0.0 sy, 0.0 ni, 0.3 id, 99.7 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu2 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu3 : 44.0 us, 56.0 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu4 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu5 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu6 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu7 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu8 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu9 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu10 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu11 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu12 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu13 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu14 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu15 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu16 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu17 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu18 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu19 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu20 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu21 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
MiB Mem : 54.3/128337.6 [||||||||||||||||||||||||||||||||||||||||||||||||||||||| ]
MiB Swap: 0.0/8192.0 [ ]
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
68407 root 20 0 4472460 1.3g 369680 R 100.3 1.0 2:31.49 ollama-lib
1 root 20 0 22136 12508 9340 S 0.0 0.0 0:00.81 systemd
2 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kthreadd
3 root 20 0 0 0 0 S 0.0 0.0 0:00.00 pool_workqueue_release
|
script
| Code Block | ||||
|---|---|---|---|---|
| ||||
#!/bin/bash # Benchmark using ollama gives rate of tokens per second # idea taken from https://taoofmac.com/space/blog/2024/01/20/1800 # other colors #Black 0;30 Dark Gray 1;30 #Red 0;31 Light Red 1;31 #Green 0;32 Light Green 1;32 #Brown/Orange 0;33 Yellow 1;33 #Blue 0;34 Light Blue 1;34 #Purple 0;35 Light Purple 1;35 #Cyan 0;36 Light Cyan 1;36 #Light Gray 0;37 White 1;37 #ANSI option #RED='\033[0;31m' #NC='\033[0m' # No Color #echo -e "${red}Hello Stackoverflow${NC}" #set -e used for troubleshooting set -e #colors available borange='\e[0;33m' yellow='\e[1;33m' purplebatch-obench.sh script is modification of obench.sh from https://github.com/tabletuser-blogspot/ollama-benchmark # done by liutyi for https://wiki.liutyi.info test set -e borange='\e[0;33m' yellow='\e[1;33m' purple='\e[0;35m' green='\e[0;32m' red='\e[0;31m' blue='\e[0;34m' NC='\e[0m' # No Color cpu_def=$(cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor) echo "Setting cpu governor to" sudo echo performance | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor gpu_avail=$(sudo lshw -C display | grep product: | head -1 | cut -c17-) cpugover=$(cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor) cpu_used=$(lscpu | grep 'Model name' | cut -f 2 -d ":" | awk '{$1=$1}1') echo "" echo "Simple benchmark using ollama and" echo "whatever local Model is installed." echo "Does not identify if $gpu_avail is benchmarking" echo "" benchmark=3 echo "How many times to run the benchmark?" echo $benchmark echo "" for model in `ollama ls |awk '{print $1}'|grep -v NAME`; do echo -e "Total runs "${purple}$benchmark${NC} echo "" #echo "Current models available locally" #echo "" #ollama list #echo "" #echo "Example enter tinyllama or dolphin-phi" echo "" echo $model ollama show $model --system echo "" | tee -a results.txt echo -e "Will use model: "${green}$model${NC} | tee -a results.txt echo "" | tee -a results.txt echo -e Will benchmark the tokens per second for ${cpu_used} and or ${gpu_avail} | tee -a results.txt echo "" | tee -a results.txt echo "" | tee -a results.txt echo -e Running benchmark ${purple}$benchmark${NC} times for ${cpu_used} and or ${gpu_avail} | tee -a results.txt echo -e with ${borange}$cpugover${NC} setting for cpu governor | tee -a results.txt echo "" | tee -a results.txt for run in $(seq 1 $benchmark); do echo "Why is the blue sky blue?" | ollama run $model --verbose 2>&1 >/dev/null | grep "eval rate:" | tee -a results.txt ; avg=$(cat results.txt | grep -v "prompt eval rate:" |tail -n $benchmark | awk '{print $3}' | awk 'NR>1{ tot+=$1 } END{ print tot/(NR-1) }') done echo "" | tee -a results.txt echo -e ${red}$avg${NC} is the average ${blue}tokens per second${NC} using ${green}$model${NC} model | tee -a results.txt echo for $cpu_used and or $gpu_avail | tee -a results.txt done echo echo -e using ${borange}$cpugover${NC} for cpu governor. echo "" echo "Setting cpu governor to" sudo echo $cpu_def | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor #comment this out if you are repeating the same model #this clears model from Vram sudo systemctl stop ollama; sudo systemctl start ollama #EOFecho . |
FROM REDIT
| Code Block |
|---|
Old quant types (some base model types require these): - Q4_0: small, very high quality loss - legacy, prefer using Q3_K_M - Q4_1: small, substantial quality loss - legacy, prefer using Q3_K_L - Q5_0: medium, balanced quality - legacy, prefer using Q4_K_M - Q5_1: medium, low quality loss - legacy, prefer using Q5_K_M New quant types (recommended): - Q2_K: smallest, extreme quality loss - not recommended - Q3_K: alias for Q3_K_M - Q3_K_S: very small, very high quality loss - Q3_K_M: very small, very high quality loss - Q3_K_L: small, substantial quality loss - Q4_K: alias for Q4_K_M - Q4_K_S: small, significant quality loss - Q4_K_M: medium, balanced quality - recommended - Q5_K: alias for Q5_K_M - Q5_K_S: large, low quality loss - recommended - Q5_K_M: large, very low quality loss - recommended - Q6_K: very large, extremely low quality loss - Q8_0: very large, extremely low quality loss - not recommended - F16: extremely large, virtually no quality loss - not recommended - F32: absolutely huge, lossless - not recommended |
...