With Power Limits 95/110, Ubuntu 24.04, Intel Core Ultra 9 185H,
Crucial 2x128GB 5600MT/s DDR5 SODIMM, WD_BLACK SN850x 8TB
Version: intel-ollama-0.6.2 for GPU SYCL0 (Intel(R) Arc(TM) Graphics) - 120187 MiB free for GPU
| Model | prompt eval rate | second prompt | eval rate | |||||
|---|---|---|---|---|---|---|---|---|
| Name | Params, B | SIZE, GB | CPU | GPU | CPU | GPU | CPU | GPU |
| openthinker:32b-v2-fp16 | 32 | 19 | ||||||
| openthinker:32b | 32 | 19 | ||||||
| dolphin-phi:2.7b | 2.7 | 1.6 | 85.67 | 86.81 | 744.07 | 649.43 | 25.42 | 21.73 |
| dolphin3:8b | 3.8 | 4.9 | 26.04 | 30.97 | 325.85 | 373.30 | 10.76 | 12.58 |
| tinyllama:1.1b | 1.1 | 0.6 | 198.18 | 112.98 | 2595.12 | 2211.21 | 62.99 | 57.53 |
| deepseek-v2:16b | 16 | 8.9 | 59.47 | 15.83 | 361.51 | 175.02 | 24.39 | 12.00 |
| phi3:14b | 14 | 7.9 | 15.60 | 10.51 | 101.53 | 128.59 | 6.07 | 7.67 |
| llama3.3:70b | 70 | 42 | 2.60 | 1.54 | 21.35 | 23.37 | 1.25 | 1.37 |
| mistral-small3.1:24b | 24 | 15 | 7.71 | - | 1321.32 | - | 3.64 | - |
| llama4:scout | 17 | 67 | 11.14 | - | 1683.33 | - | 4.81 | - |
| openchat:7b | 7 | 4.1 | 30.47 | 27.15 | 273.39 | 361.04 | 11.10 | 14.81 |
| qwen3:32b | 32 | 20 | 5.67 | 2.84 | 38.88 | 41.60 | 2.53 | 2.73 |
| gemma3:27b | 27 | 17 | 6.60 | - | 49.38 | - | 3.04 | - |
| deepseek-r1:70b | 70 | 42 | 2.63 | 0.89 | 12.39 | 14.13 | 1.24 | |
root@server1:~# ollama list NAME ID SIZE MODIFIED dolphin-phi:2.7b c5761fc77240 1.6 GB About an hour ago dolphin3:8b d5ab9ae8e1f2 4.9 GB About an hour ago tinyllama:1.1b 2644915ede35 637 MB About an hour ago deepseek-v2:16b 7c8c332f2df7 8.9 GB 18 hours ago phi3:14b cf611a26b048 7.9 GB 20 hours ago llama3.3:70b a6eb4748fd29 42 GB 21 hours ago mistral-small3.1:24b b9aaf0c2586a 15 GB 21 hours ago llama4:scout 4f01ed6b6e01 67 GB 21 hours ago openchat:7b 537a4e03b649 4.1 GB 22 hours ago qwen3:32b e1c9f234c6eb 20 GB 23 hours ago gemma3:27b a418f5838eaf 17 GB 23 hours ago deepseek-r1:70b 0c1615a8ca32 42 GB 23 hours ago
Run batch on CPU
execution time is ~65m
Run batch on GPU
execution time 53 minutes with 3/12 models skipped
sensors (deepseek-r1:70b execution on CPU) at power consumption ~80W
sensors
iwlwifi_1-virtual-0
Adapter: Virtual device
temp1: N/A
spd5118-i2c-6-50
Adapter: SMBus I801 adapter at efa0
temp1: +78.2°C (low = +0.0°C, high = +55.0°C)
(crit low = +0.0°C, crit = +85.0°C)
nvme-pci-0200
Adapter: PCI adapter
Composite: +39.9°C (low = -273.1°C, high = +82.8°C)
(crit = +84.8°C)
acpi_fan-acpi-0
Adapter: ACPI interface
fan1: N/A
coretemp-isa-0000
Adapter: ISA adapter
Package id 0: +101.0°C (high = +110.0°C, crit = +110.0°C)
Core 0: +83.0°C (high = +110.0°C, crit = +110.0°C)
Core 1: +83.0°C (high = +110.0°C, crit = +110.0°C)
Core 2: +84.0°C (high = +110.0°C, crit = +110.0°C)
Core 3: +84.0°C (high = +110.0°C, crit = +110.0°C)
Core 4: +84.0°C (high = +110.0°C, crit = +110.0°C)
Core 5: +84.0°C (high = +110.0°C, crit = +110.0°C)
Core 6: +84.0°C (high = +110.0°C, crit = +110.0°C)
Core 7: +84.0°C (high = +110.0°C, crit = +110.0°C)
Core 8: +101.0°C (high = +110.0°C, crit = +110.0°C)
Core 12: +100.0°C (high = +110.0°C, crit = +110.0°C)
Core 16: +100.0°C (high = +110.0°C, crit = +110.0°C)
Core 20: +99.0°C (high = +110.0°C, crit = +110.0°C)
Core 24: +97.0°C (high = +110.0°C, crit = +110.0°C)
Core 28: +100.0°C (high = +110.0°C, crit = +110.0°C)
Core 32: +73.0°C (high = +110.0°C, crit = +110.0°C)
Core 33: +73.0°C (high = +110.0°C, crit = +110.0°C)
nvme-pci-0100
Adapter: PCI adapter
Composite: +56.9°C (low = -5.2°C, high = +89.8°C)
(crit = +93.8°C)
Sensor 1: +70.8°C (low = -273.1°C, high = +65261.8°C)
Sensor 2: +47.9°C (low = -273.1°C, high = +65261.8°C)
Sensor 3: +46.9°C (low = -273.1°C, high = +65261.8°C)
acpitz-acpi-0
Adapter: ACPI interface
temp1: +27.8°C
sensors (deepseek-r1:70b execution on GPU) at power consumption ~60W
(base) root@server1:~# sensors
iwlwifi_1-virtual-0
Adapter: Virtual device
temp1: N/A
spd5118-i2c-6-50
Adapter: SMBus I801 adapter at efa0
temp1: +82.2°C (low = +0.0°C, high = +55.0°C)
(crit low = +0.0°C, crit = +85.0°C)
nvme-pci-0200
Adapter: PCI adapter
Composite: +39.9°C (low = -273.1°C, high = +82.8°C)
(crit = +84.8°C)
acpi_fan-acpi-0
Adapter: ACPI interface
fan1: N/A
coretemp-isa-0000
Adapter: ISA adapter
Package id 0: +97.0°C (high = +110.0°C, crit = +110.0°C)
Core 0: +58.0°C (high = +110.0°C, crit = +110.0°C)
Core 1: +59.0°C (high = +110.0°C, crit = +110.0°C)
Core 2: +58.0°C (high = +110.0°C, crit = +110.0°C)
Core 3: +59.0°C (high = +110.0°C, crit = +110.0°C)
Core 4: +67.0°C (high = +110.0°C, crit = +110.0°C)
Core 5: +68.0°C (high = +110.0°C, crit = +110.0°C)
Core 6: +67.0°C (high = +110.0°C, crit = +110.0°C)
Core 7: +67.0°C (high = +110.0°C, crit = +110.0°C)
Core 8: +54.0°C (high = +110.0°C, crit = +110.0°C)
Core 12: +97.0°C (high = +110.0°C, crit = +110.0°C)
Core 16: +59.0°C (high = +110.0°C, crit = +110.0°C)
Core 20: +77.0°C (high = +110.0°C, crit = +110.0°C)
Core 24: +56.0°C (high = +110.0°C, crit = +110.0°C)
Core 28: +61.0°C (high = +110.0°C, crit = +110.0°C)
Core 32: +63.0°C (high = +110.0°C, crit = +110.0°C)
Core 33: +63.0°C (high = +110.0°C, crit = +110.0°C)
nvme-pci-0100
Adapter: PCI adapter
Composite: +59.9°C (low = -5.2°C, high = +89.8°C)
(crit = +93.8°C)
Sensor 1: +73.8°C (low = -273.1°C, high = +65261.8°C)
Sensor 2: +50.9°C (low = -273.1°C, high = +65261.8°C)
Sensor 3: +49.9°C (low = -273.1°C, high = +65261.8°C)
acpitz-acpi-0
Adapter: ACPI interface
temp1: +27.8°C
top (deepseek-r1:70b execution on CPU)
top - 13:18:48 up 1:12, 2 users, load average: 6.01, 5.95, 5.94
Tasks: 326 total, 1 running, 325 sleeping, 0 stopped, 0 zombie
%Cpu0 : 68.3 us, 0.0 sy, 0.0 ni, 31.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu1 : 38.7 us, 0.0 sy, 0.0 ni, 61.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu2 : 61.7 us, 0.0 sy, 0.0 ni, 38.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu3 : 99.7 us, 0.0 sy, 0.0 ni, 0.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu4 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu5 : 29.0 us, 0.0 sy, 0.0 ni, 71.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu6 : 85.7 us, 0.0 sy, 0.0 ni, 14.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu7 : 11.3 us, 0.0 sy, 0.0 ni, 88.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu8 : 69.0 us, 0.0 sy, 0.0 ni, 31.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu9 : 26.6 us, 0.0 sy, 0.0 ni, 73.4 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu10 : 66.7 us, 0.0 sy, 0.0 ni, 33.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu11 : 31.2 us, 0.0 sy, 0.0 ni, 68.8 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu12 : 1.0 us, 0.0 sy, 0.0 ni, 99.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu13 : 1.7 us, 0.0 sy, 0.0 ni, 98.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu14 : 0.7 us, 0.0 sy, 0.0 ni, 99.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu15 : 0.3 us, 0.0 sy, 0.0 ni, 99.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu16 : 0.0 us, 0.3 sy, 0.0 ni, 99.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu17 : 1.3 us, 0.0 sy, 0.0 ni, 98.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu18 : 3.3 us, 0.0 sy, 0.0 ni, 96.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu19 : 4.3 us, 0.0 sy, 0.0 ni, 95.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu20 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu21 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
MiB Mem : 35.4/128337.6 [||||||||||||||||||||||||| ]
MiB Swap: 0.0/8192.0 [ ]
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
59943 ollama 20 0 45.7g 42.2g 23152 S 598.3 33.7 38:14.56 ollama
1 root 20 0 22116 12508 9340 S 0.0 0.0 0:00.72 systemd
2 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kthreadd
3 root 20 0 0 0 0 S 0.0 0.0 0:00.00 pool_workqueue_release
4 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 kworker/R-rcu_gp
5 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 kworker/R-sync_wq
6 root 0 -20 0 0 0 I 0.0
top - 14:20:49 up 2:14, 4 users, load average: 1.75, 2.91, 2.01
Tasks: 344 total, 2 running, 342 sleeping, 0 stopped, 0 zombie
%Cpu0 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu1 : 0.0 us, 0.0 sy, 0.0 ni, 0.3 id, 99.7 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu2 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu3 : 44.0 us, 56.0 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu4 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu5 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu6 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu7 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu8 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu9 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu10 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu11 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu12 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu13 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu14 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu15 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu16 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu17 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu18 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu19 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu20 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu21 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
MiB Mem : 54.3/128337.6 [||||||||||||||||||||||||||||||||||||||||||||||||||||||| ]
MiB Swap: 0.0/8192.0 [ ]
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
68407 root 20 0 4472460 1.3g 369680 R 100.3 1.0 2:31.49 ollama-lib
1 root 20 0 22136 12508 9340 S 0.0 0.0 0:00.81 systemd
2 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kthreadd
3 root 20 0 0 0 0 S 0.0 0.0 0:00.00 pool_workqueue_release
script
#!/bin/bash
# Benchmark using ollama gives rate of tokens per second
# idea taken from https://taoofmac.com/space/blog/2024/01/20/1800
# other colors
#Black 0;30 Dark Gray 1;30
#Red 0;31 Light Red 1;31
#Green 0;32 Light Green 1;32
#Brown/Orange 0;33 Yellow 1;33
#Blue 0;34 Light Blue 1;34
#Purple 0;35 Light Purple 1;35
#Cyan 0;36 Light Cyan 1;36
#Light Gray 0;37 White 1;37
#ANSI option
#RED='\033[0;31m'
#NC='\033[0m' # No Color
#echo -e "${red}Hello Stackoverflow${NC}"
#set -e used for troubleshooting
set -e
#colors available
borange='\e[0;33m'
yellow='\e[1;33m'
purple='\e[0;35m'
green='\e[0;32m'
red='\e[0;31m'
blue='\e[0;34m'
NC='\e[0m' # No Color
cpu_def=$(cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor)
echo "Setting cpu governor to"
sudo echo performance | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
gpu_avail=$(sudo lshw -C display | grep product: | head -1 | cut -c17-)
cpugover=$(cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor)
cpu_used=$(lscpu | grep 'Model name' | cut -f 2 -d ":" | awk '{$1=$1}1')
echo ""
echo "Simple benchmark using ollama and"
echo "whatever local Model is installed."
echo "Does not identify if $gpu_avail is benchmarking"
echo ""
benchmark=3
echo "How many times to run the benchmark?"
echo $benchmark
echo ""
for model in `ollama ls |awk '{print $1}'|grep -v NAME`; do
echo -e "Total runs "${purple}$benchmark${NC}
echo ""
#echo "Current models available locally"
#echo ""
#ollama list
#echo ""
#echo "Example enter tinyllama or dolphin-phi"
echo ""
echo $model
ollama show $model --system
echo "" | tee -a results.txt
echo -e "Will use model: "${green}$model${NC} | tee -a results.txt
echo "" | tee -a results.txt
echo -e Will benchmark the tokens per second for ${cpu_used} and or ${gpu_avail} | tee -a results.txt
echo "" | tee -a results.txt
echo "" | tee -a results.txt
echo -e Running benchmark ${purple}$benchmark${NC} times for ${cpu_used} and or ${gpu_avail} | tee -a results.txt
echo -e with ${borange}$cpugover${NC} setting for cpu governor | tee -a results.txt
echo "" | tee -a results.txt
for run in $(seq 1 $benchmark); do
echo "Why is the blue sky blue?" | ollama run $model --verbose 2>&1 >/dev/null | grep "eval rate:" | tee -a results.txt ;
avg=$(cat results.txt | grep -v "prompt eval rate:" |tail -n $benchmark | awk '{print $3}' | awk 'NR>1{ tot+=$1 } END{ print tot/(NR-1) }')
done
echo "" | tee -a results.txt
echo -e ${red}$avg${NC} is the average ${blue}tokens per second${NC} using ${green}$model${NC} model | tee -a results.txt
echo for $cpu_used and or $gpu_avail | tee -a results.txt
done
echo
echo -e using ${borange}$cpugover${NC} for cpu governor.
echo ""
echo "Setting cpu governor to"
sudo echo $cpu_def | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
#comment this out if you are repeating the same model
#this clears model from Vram
sudo systemctl stop ollama; sudo systemctl start ollama
#EOF