Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Modelprompt eval ratesecond prompteval rate
NameParams, BSIZE, GBCPUGPUCPUGPUCPUGPU
orca-mini:3b
3
2.0






orca-mini:7b
73.8





orca-mini:13b
137.4





orca-mini:70b
7038





phi4:14b-q4_K_M149.112.989.87100.02113.455.786.77
phi4-mini:3.8b-q4_K_M3.82.549.26-186.63-18.83-
phi4:14b-fp16142912.4511.5335.5340.162.082.09
openthinker:32b-v2-fp1632654.457.3120.2422.310.810.90
openthinker:32b32195.934.5966.3169.742.602.78
dolphin-phi:2.7b2.71.685.6786.81744.07649.4325.4221.73
dolphin3:8b3.84.926.0430.97325.85373.3010.7612.58
tinyllama:1.1b1.10.6198.18112.982595.122211.2162.9957.53
deepseek-v2:16b168.959.4715.83361.51175.0224.3912.00
phi3:14b147.915.6010.51101.53128.596.077.67
llama3.3:70b70422.601.5421.3523.371.251.37
mistral-small3.1:24b24157.71-1321.32-3.64-
llama4:scout176711.14-1683.33-4.81-
openchat:7b74.130.4727.15273.39361.0411.1014.81
qwen3:32b 32205.672.8438.8841.602.532.73
gemma3:27b27176.60-49.38-3.04-
deepseek-r1:70b70422.630.8912.3914.131.241.38



Code Block
root@server1:~#~/ollama-benchmark# ollama list
NAME                       ID              SIZE      MODIFIED
orca-mini:3b               2dbd9f439647    2.0 GB    19 seconds ago
orca-mini:7b               9c9618e2e895    3.8 GB    5 minutes ago
orca-mini:13b              1b4877c90807    7.4 GB    8 minutes ago
orca-mini:70b              f184c0860491    38 GB     12 minutes ago
phi4:14b-q4_K_M            ac896e5b8b34    9.1 GB    About12 an hourhours ago
phi4-mini:3.8b-q4_K_M      78fad5d182a7    2.5 GB    About12 an hourhours ago
phi4:14b-fp16              227695f919b5    29 GB     515 hours ago
openthinker:32b-v2-fp16    bedb555dcf18    65 GB     516 hours ago
openthinker:32b            04b5937dcb16    19 GB     516 hours ago
dolphin-phi:2.7b           c5761fc77240    1.6 GB    819 hours ago
dolphin3:8b                d5ab9ae8e1f2    4.9 GB    819 hours ago
tinyllama:1.1b             2644915ede35    637 MB    819 hours ago
deepseek-v2:16b            7c8c332f2df7    8.9 GB    2636 hours ago
phi3:14b                   cf611a26b048    7.9 GB    2838 hours ago
llama3.3:70b               a6eb4748fd29    42 GB     2839 hours ago
mistral-small3.1:24b       b9aaf0c2586a    15 GB     2839 hours ago
llama4:scout               4f01ed6b6e01    67 GB     2939 hours ago
openchat:7b                537a4e03b649    4.1 GB    2940 hours ago
qwen3:32b                  e1c9f234c6eb    20 GB     3041 hours ago
gemma3:27b                 a418f5838eaf    17 GB     3041 hours ago
deepseek-r1:70b            0c1615a8ca32    42 GB     3142 hours ago


Switch to GPU

Code Block
systemctl stop ollama.service
source llm_env/bin/activate
pip install --pre --upgrade ipex-llm[cpp]
cd llama-cpp
# Run Ollama Serve with Intel GPU
export OLLAMA_NUM_GPU=999
export OLLAMA_THREADS=22
export OMP_NUM_THREADS=22
export ZES_ENABLE_SYSMAN=1
export no_proxy=localhost,127.0.0.1
source /opt/intel/oneapi/setvars.sh
export SYCL_CACHE_PERSISTENT=1
OLLAMA_HOST=0.0.0.0 ./ollama serve

...