With Power Limits 95/110, Ubuntu 24.04, Intel Core Ultra 9 185H,
Crucial 128GB (2x64GB) 5600MT/s DDR5 SODIMM, WD_BLACK SN850x 8TB
GPU Version: intel-ollama-0.6.2 GPU SYCL0 (Intel(R) Arc(TM) Graphics) - 120187 MiB
CPU Version: ollama version is 0.7.0 CPU 123.5 GiB available
| Model | prompt eval rate | second prompt | eval rate | |||||
|---|---|---|---|---|---|---|---|---|
| Name | Params, B | SIZE, GB | CPU | GPU | CPU | GPU | CPU | GPU |
| phi4:14b-q4_K_M | 14 | 9.1 | 12.98 | 9.87 | 100.02 | 113.45 | 5.78 | 6.77 |
| phi4-mini:3.8b-q4_K_M | 3.8 | 2.5 | 49.26 | - | 186.63 | - | 18.83 | - |
| phi4:14b-fp16 | 14 | 29 | 12.45 | 11.53 | 35.53 | 40.16 | 2.08 | 2.09 |
| openthinker:32b-v2-fp16 | 32 | 65 | 4.45 | 7.31 | 20.24 | 22.31 | 0.81 | 0.90 |
| openthinker:32b | 32 | 19 | 5.93 | 4.59 | 66.31 | 69.74 | 2.60 | 2.78 |
| dolphin-phi:2.7b | 2.7 | 1.6 | 85.67 | 86.81 | 744.07 | 649.43 | 25.42 | 21.73 |
| dolphin3:8b | 3.8 | 4.9 | 26.04 | 30.97 | 325.85 | 373.30 | 10.76 | 12.58 |
| tinyllama:1.1b | 1.1 | 0.6 | 198.18 | 112.98 | 2595.12 | 2211.21 | 62.99 | 57.53 |
| deepseek-v2:16b | 16 | 8.9 | 59.47 | 15.83 | 361.51 | 175.02 | 24.39 | 12.00 |
| phi3:14b | 14 | 7.9 | 15.60 | 10.51 | 101.53 | 128.59 | 6.07 | 7.67 |
| llama3.3:70b | 70 | 42 | 2.60 | 1.54 | 21.35 | 23.37 | 1.25 | 1.37 |
| mistral-small3.1:24b | 24 | 15 | 7.71 | - | 1321.32 | - | 3.64 | - |
| llama4:scout | 17 | 67 | 11.14 | - | 1683.33 | - | 4.81 | - |
| openchat:7b | 7 | 4.1 | 30.47 | 27.15 | 273.39 | 361.04 | 11.10 | 14.81 |
| qwen3:32b | 32 | 20 | 5.67 | 2.84 | 38.88 | 41.60 | 2.53 | 2.73 |
| gemma3:27b | 27 | 17 | 6.60 | - | 49.38 | - | 3.04 | - |
| deepseek-r1:70b | 70 | 42 | 2.63 | 0.89 | 12.39 | 14.13 | 1.24 | 1.38 |
root@server1:~# ollama list NAME ID SIZE MODIFIED phi4:14b-q4_K_M ac896e5b8b34 9.1 GB About an hour ago phi4-mini:3.8b-q4_K_M 78fad5d182a7 2.5 GB About an hour ago phi4:14b-fp16 227695f919b5 29 GB 5 hours ago openthinker:32b-v2-fp16 bedb555dcf18 65 GB 5 hours ago openthinker:32b 04b5937dcb16 19 GB 5 hours ago dolphin-phi:2.7b c5761fc77240 1.6 GB 8 hours ago dolphin3:8b d5ab9ae8e1f2 4.9 GB 8 hours ago tinyllama:1.1b 2644915ede35 637 MB 8 hours ago deepseek-v2:16b 7c8c332f2df7 8.9 GB 26 hours ago phi3:14b cf611a26b048 7.9 GB 28 hours ago llama3.3:70b a6eb4748fd29 42 GB 28 hours ago mistral-small3.1:24b b9aaf0c2586a 15 GB 28 hours ago llama4:scout 4f01ed6b6e01 67 GB 29 hours ago openchat:7b 537a4e03b649 4.1 GB 29 hours ago qwen3:32b e1c9f234c6eb 20 GB 30 hours ago gemma3:27b a418f5838eaf 17 GB 30 hours ago deepseek-r1:70b 0c1615a8ca32 42 GB 31 hours ago
Switch to GPU
systemctl stop ollama.service source llm_env/bin/activate pip install --pre --upgrade ipex-llm[cpp] cd llama-cpp # Run Ollama Serve with Intel GPU export OLLAMA_NUM_GPU=999 export OLLAMA_THREADS=22 export OMP_NUM_THREADS=22 export ZES_ENABLE_SYSMAN=1 export no_proxy=localhost,127.0.0.1 source /opt/intel/oneapi/setvars.sh export SYCL_CACHE_PERSISTENT=1 OLLAMA_HOST=0.0.0.0 ./ollama serve
Switch back to CPU
# CTRL + C systemctl start ollama.service
Run batch on CPU
execution time is ~65m
Run batch on GPU
execution time 53 minutes with 3/12 models skipped
sensors (deepseek-r1:70b execution on CPU) at power consumption ~80W
sensors
iwlwifi_1-virtual-0
Adapter: Virtual device
temp1: N/A
spd5118-i2c-6-50
Adapter: SMBus I801 adapter at efa0
temp1: +78.2°C (low = +0.0°C, high = +55.0°C)
(crit low = +0.0°C, crit = +85.0°C)
nvme-pci-0200
Adapter: PCI adapter
Composite: +39.9°C (low = -273.1°C, high = +82.8°C)
(crit = +84.8°C)
acpi_fan-acpi-0
Adapter: ACPI interface
fan1: N/A
coretemp-isa-0000
Adapter: ISA adapter
Package id 0: +101.0°C (high = +110.0°C, crit = +110.0°C)
Core 0: +83.0°C (high = +110.0°C, crit = +110.0°C)
Core 1: +83.0°C (high = +110.0°C, crit = +110.0°C)
Core 2: +84.0°C (high = +110.0°C, crit = +110.0°C)
Core 3: +84.0°C (high = +110.0°C, crit = +110.0°C)
Core 4: +84.0°C (high = +110.0°C, crit = +110.0°C)
Core 5: +84.0°C (high = +110.0°C, crit = +110.0°C)
Core 6: +84.0°C (high = +110.0°C, crit = +110.0°C)
Core 7: +84.0°C (high = +110.0°C, crit = +110.0°C)
Core 8: +101.0°C (high = +110.0°C, crit = +110.0°C)
Core 12: +100.0°C (high = +110.0°C, crit = +110.0°C)
Core 16: +100.0°C (high = +110.0°C, crit = +110.0°C)
Core 20: +99.0°C (high = +110.0°C, crit = +110.0°C)
Core 24: +97.0°C (high = +110.0°C, crit = +110.0°C)
Core 28: +100.0°C (high = +110.0°C, crit = +110.0°C)
Core 32: +73.0°C (high = +110.0°C, crit = +110.0°C)
Core 33: +73.0°C (high = +110.0°C, crit = +110.0°C)
nvme-pci-0100
Adapter: PCI adapter
Composite: +56.9°C (low = -5.2°C, high = +89.8°C)
(crit = +93.8°C)
Sensor 1: +70.8°C (low = -273.1°C, high = +65261.8°C)
Sensor 2: +47.9°C (low = -273.1°C, high = +65261.8°C)
Sensor 3: +46.9°C (low = -273.1°C, high = +65261.8°C)
acpitz-acpi-0
Adapter: ACPI interface
temp1: +27.8°C
sensors (deepseek-r1:70b execution on GPU) at power consumption ~60W
(base) root@server1:~# sensors
iwlwifi_1-virtual-0
Adapter: Virtual device
temp1: N/A
spd5118-i2c-6-50
Adapter: SMBus I801 adapter at efa0
temp1: +82.2°C (low = +0.0°C, high = +55.0°C)
(crit low = +0.0°C, crit = +85.0°C)
nvme-pci-0200
Adapter: PCI adapter
Composite: +39.9°C (low = -273.1°C, high = +82.8°C)
(crit = +84.8°C)
acpi_fan-acpi-0
Adapter: ACPI interface
fan1: N/A
coretemp-isa-0000
Adapter: ISA adapter
Package id 0: +97.0°C (high = +110.0°C, crit = +110.0°C)
Core 0: +58.0°C (high = +110.0°C, crit = +110.0°C)
Core 1: +59.0°C (high = +110.0°C, crit = +110.0°C)
Core 2: +58.0°C (high = +110.0°C, crit = +110.0°C)
Core 3: +59.0°C (high = +110.0°C, crit = +110.0°C)
Core 4: +67.0°C (high = +110.0°C, crit = +110.0°C)
Core 5: +68.0°C (high = +110.0°C, crit = +110.0°C)
Core 6: +67.0°C (high = +110.0°C, crit = +110.0°C)
Core 7: +67.0°C (high = +110.0°C, crit = +110.0°C)
Core 8: +54.0°C (high = +110.0°C, crit = +110.0°C)
Core 12: +97.0°C (high = +110.0°C, crit = +110.0°C)
Core 16: +59.0°C (high = +110.0°C, crit = +110.0°C)
Core 20: +77.0°C (high = +110.0°C, crit = +110.0°C)
Core 24: +56.0°C (high = +110.0°C, crit = +110.0°C)
Core 28: +61.0°C (high = +110.0°C, crit = +110.0°C)
Core 32: +63.0°C (high = +110.0°C, crit = +110.0°C)
Core 33: +63.0°C (high = +110.0°C, crit = +110.0°C)
nvme-pci-0100
Adapter: PCI adapter
Composite: +59.9°C (low = -5.2°C, high = +89.8°C)
(crit = +93.8°C)
Sensor 1: +73.8°C (low = -273.1°C, high = +65261.8°C)
Sensor 2: +50.9°C (low = -273.1°C, high = +65261.8°C)
Sensor 3: +49.9°C (low = -273.1°C, high = +65261.8°C)
acpitz-acpi-0
Adapter: ACPI interface
temp1: +27.8°C
top (deepseek-r1:70b execution on CPU)
top (deepseek-r1:70b execution on GPU)
script
FROM REDIT
Old quant types (some base model types require these): - Q4_0: small, very high quality loss - legacy, prefer using Q3_K_M - Q4_1: small, substantial quality loss - legacy, prefer using Q3_K_L - Q5_0: medium, balanced quality - legacy, prefer using Q4_K_M - Q5_1: medium, low quality loss - legacy, prefer using Q5_K_M New quant types (recommended): - Q2_K: smallest, extreme quality loss - not recommended - Q3_K: alias for Q3_K_M - Q3_K_S: very small, very high quality loss - Q3_K_M: very small, very high quality loss - Q3_K_L: small, substantial quality loss - Q4_K: alias for Q4_K_M - Q4_K_S: small, significant quality loss - Q4_K_M: medium, balanced quality - recommended - Q5_K: alias for Q5_K_M - Q5_K_S: large, low quality loss - recommended - Q5_K_M: large, very low quality loss - recommended - Q6_K: very large, extremely low quality loss - Q8_0: very large, extremely low quality loss - not recommended - F16: extremely large, virtually no quality loss - not recommended - F32: absolutely huge, lossless - not recommended