Page History

Results: Test 1 Intel Core Ultra 9 185 H CPU vs GPU ollama models speed

Size of Models

Code Block

root@server1:~# ollama list
NAME                       ID              SIZE      MODIFIED
gemma3:12b                 f4031aab637d    8.1 GB    19 minutes ago
gemma3:4b                  a2af6cc3eb7f    3.3 GB    21 minutes ago
gemma3:1b                  8648f39daa8f    815 MB    24 minutes ago
orca-mini:3b               2dbd9f439647    2.0 GB    2 hours ago
orca-mini:7b               9c9618e2e895    3.8 GB    2 hours ago
orca-mini:13b              1b4877c90807    7.4 GB    2 hours ago
orca-mini:70b              f184c0860491    38 GB     2 hours ago
phi4:14b-q4_K_M            ac896e5b8b34    9.1 GB    14 hours ago
phi4-mini:3.8b-q4_K_M      78fad5d182a7    2.5 GB    14 hours ago
phi4:14b-fp16              227695f919b5    29 GB     17 hours ago
openthinker:32b-v2-fp16    bedb555dcf18    65 GB     18 hours ago
openthinker:32b            04b5937dcb16    19 GB     18 hours ago
dolphin-phi:2.7b           c5761fc77240    1.6 GB    21 hours ago
dolphin3:8b                d5ab9ae8e1f2    4.9 GB    21 hours ago
tinyllama:1.1b             2644915ede35    637 MB    21 hours ago
deepseek-v2:16b            7c8c332f2df7    8.9 GB    38 hours ago
phi3:14b                   cf611a26b048    7.9 GB    40 hours ago
llama3.3:70b               a6eb4748fd29    42 GB     40 hours ago
mistral-small3.1:24b       b9aaf0c2586a    15 GB     40 hours ago
llama4:scout               4f01ed6b6e01    67 GB     41 hours ago
openchat:7b                537a4e03b649    4.1 GB    41 hours ago
qwen3:32b                  e1c9f234c6eb    20 GB     42 hours ago
gemma3:27b                 a418f5838eaf    17 GB     42 hours ago
deepseek-r1:70b            0c1615a8ca32    42 GB     43 hours ago

Switch to GPU

Code Block

systemctl stop ollama.service
source llm_env/bin/activate
pip install --pre --upgrade ipex-llm[cpp]
cd llama-cpp
# Run Ollama Serve with Intel GPU
export OLLAMA_NUM_GPU=999
export OLLAMA_THREADS=22
export OMP_NUM_THREADS=22
export ZES_ENABLE_SYSMAN=1
export no_proxy=localhost,127.0.0.1
source /opt/intel/oneapi/setvars.sh
export SYCL_CACHE_PERSISTENT=1
OLLAMA_HOST=0.0.0.0 ./ollama serve

Switch back to CPU

Code Block
# CTRL + C systemctl start ollama.service

Run batch on CPU

Code Block

collapse	true

(base) root@server1:~/ollama-benchmark# cat 'Intel(R) Core(TM) Ultra 9 185He Filled By O.E.M. CPU @ 4.4GHz.txt'

prompt eval rate:     5.50 tokens/s
eval rate:            2.31 tokens/s
(base) root@server1:~/ollama-benchmark# vi batch-obench.sh
(base) root@server1:~/ollama-benchmark# ./batch-obench.sh
Setting cpu governor to
performance

Simple benchmark using ollama and
whatever local Model is installed.
Does not identify if Meteor Lake-P [Intel Arc Graphics] is benchmarking

How many times to run the benchmark?
3

Total runs 3


dolphin-phi:2.7b
You are Dolphin, a helpful AI assistant.
Will use model: dolphin-phi:2.7b

Will benchmark the tokens per second for Intel(R) Core(TM) Ultra 9 185H Inty O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics]


Running benchmark 3 times for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(U @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics]
with performance setting for cpu governor

prompt eval rate:     85.67 tokens/s
eval rate:            25.11 tokens/s
prompt eval rate:     744.07 tokens/s
eval rate:            25.42 tokens/s
prompt eval rate:     783.71 tokens/s
eval rate:            25.85 tokens/s

2.31 is the average tokens per second using dolphin-phi:2.7b model
for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filake-P [Intel Arc Graphics]
Total runs 3


dolphin3:8b
You are Dolphin, a helpful AI assistant.
Will use model: dolphin3:8b

Will benchmark the tokens per second for Intel(R) Core(TM) Ultra 9 185H Inty O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics]


Running benchmark 3 times for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(U @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics]
with performance setting for cpu governor

prompt eval rate:     26.04 tokens/s
eval rate:            10.87 tokens/s
prompt eval rate:     325.85 tokens/s
eval rate:            10.76 tokens/s
prompt eval rate:     323.77 tokens/s
eval rate:

With Power Limits 95/110, Ubuntu 24.04, Intel Core Ultra 9 185H,

Crucial 2x128GB 5600MT/s DDR5 SODIMM, WD_BLACK SN850x 8TB

Version: intel-ollama-0.6.2 for GPU SYCL0 (Intel(R) Arc(TM) Graphics) - 120187 MiB free for GPU

...

Code Block

root@server1:~# ollama list
NAME                    ID              SIZE      MODIFIED
dolphin-phi:2.7b        c5761fc77240    1.6 GB    About an hour ago
dolphin3:8b             d5ab9ae8e1f2    4.9 GB    About an hour ago
tinyllama:1.1b          2644915ede35    637 MB    About an hour ago
deepseek-v2:16b         7c8c332f2df7    8.9 GB    18 hours ago
phi3:14b                cf611a26b048    710.9 GB    20 hours ago
llama3.3:70b            a6eb4748fd29    42 GB     21 hours ago
mistral-small3.1:24b    b9aaf0c2586a    15 GB     21 hours ago
llama4:scout            4f01ed6b6e01    67 GB     21 hours ago
openchat:7b             537a4e03b649    4.1 GB    22 hours ago
qwen3:32b               e1c9f234c6eb    20 GB     23 hours ago
gemma3:27b     75 tokens/s

2.31 is the average tokens per second using dolphin3:8b model
for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filake-P [Intel Arc Graphics]
Total runs 3


tinyllama:1.1b
You are a helpful AI assistant.
Will use model: tinyllama:1.1b

Will benchmark the tokens per second for Intel(R) Core(TM) Ultra 9 185H Inty O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics]


Running benchmark 3 times for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(U @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics]
with performance setting for cpu governor

prompt eval rate:     198.18 tokens/s
eval rate:            63.49 tokens/s
prompt eval rate:     2595.12 tokens/s
eval rate:         a418f5838eaf    17 GB 62.99 tokens/s
prompt eval rate:    23 hours ago
deepseek-r1:70b2547.80 tokens/s
eval rate:         0c1615a8ca32    42 GB     23 hours ago

Code Block

root@server1:~/ollama-benchmark# ./batch-obench.sh
Setting cpu governor to
performance

Simple benchmark using ollama and
whatever local Model is installed.
Does not identify if Meteor Lake62.73 tokens/s

2.31 is the average tokens per second using tinyllama:1.1b model
for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filake-P [Intel Arc Graphics] is benchmarking

How many times to run the benchmark?
3

Total runs 3


deepseek-v2:16b

Will use model: deepseek-v2:16b

Will benchmark the tokens per second for Intel(R) Core(TM) Ultra 9 185H Int Ultra 9 185H Inty O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics]


Running benchmark 3 times for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(U @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics]
with performance setting for cpu governor

prompt eval rate:     5659.1047 tokens/s
eval rate:            2524.8857 tokens/s
prompt eval rate:     365361.6851 tokens/s
eval rate:            24.6239 tokens/s
prompt eval rate:     377361.6758 tokens/s
eval rate:            24.6432 tokens/s

252.046731 is the average tokens per second using deepseek-v2:16b model
for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Fil Filake-P [Intel Arc Graphics]
Total runs 3


phi3:14b

Will use model: phi3:14b

Will benchmark the tokens per second for Intel(R) Core(TM) Ultra 9 185H Int Inty O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics]


Running benchmark 3 times for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(U @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics]
with performance setting for cpu governor

prompt eval rate:     15.2560 tokens/s
eval rate:            65.1097 tokens/s
prompt eval rate:     100101.2053 tokens/s
eval rate:            56.8820 tokens/s
prompt eval rate:     10298.3860 tokens/s
eval rate:            6.0007 tokens/s

52.9933331 is the average tokens per second using phi3:14b model
for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Fil Filake-P [Intel Arc Graphics]
Total runs 3


llama3.3:70b

Will use model: llama3.3:70b

Will benchmark the tokens per second for Intel(R) Core(TM) Ultra 9 185H IntInty O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics]


Running benchmark 3 times for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(U @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics]
with performance setting for cpu governor

prompt eval rate:     2.5660 tokens/s
eval rate:            1.2425 tokens/s
prompt eval rate:     21.2035 tokens/s
eval rate:            1.1925 tokens/s
prompt eval rate:     1921.1834 tokens/s
eval rate:            1.1125 tokens/s

12.1831 is the average tokens per second using llama3.3:70b model
for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Fil) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics]
Total runs 3


mistral-small3.1:24b
You are Mistral Small 3.1, a Large Language Model (LLM) created by Mistral AI, a French startup headquartered in Paris.
You power an AI assistant called Le Chat.
Your knowledge base was last updated on 2023-10-01.

When you're not sure about some information, you say that you don't have th the information and don't make up anything.
If the user's question is not clear, ambiguous, or does not provide enough context for you to accurately answer the question, you do not try to answer it right away and you rather ask the user to clarify their request (e.g. "What are some good restaurants around me?" => "Where are you?" or "When is the next fli is the next flight to Tokyo" => "Where do you travel from?").
You are always very attentive to dates, in particular you try to resolve da dates (e.g. "yesterday" is {yesterday}) and when asked about information at specific dates, you discard information that is at another date.
You follow these instructions in all languages, and always respond to the user in the u language they use or request.
Next sections describe the capabilities that you have.

# WEB BROWSING INSTRUCTIONS

You cannot perform any web search or access internet to open URLs, links et search or access internet to open URLs, links etc. If it seems like the user is expecting you to do so, you clarify the situation and ask the user to copy paste the text directly in the chat.

# MULTI-MODAL INSTRUCTIONS

You have the ability to read images, but you cannot generate images. You al also cannot transcribe audio files or videos.
You cannot read nor transcribe audio files or videos.
Will use model: mistral-small3.1:24b

Will benchmark the tokens per second for Intel(R) Core(TM) Ultra 9 185H Int Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics]


Running benchmark 3 times for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core((R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics]
with performance setting for cpu governor

prompt eval rate:     7.4171 tokens/s
eval rate:            3.6165 tokens/s
prompt eval rate:     13051321.2632 tokens/s
eval rate:            3.4664 tokens/s
prompt eval rate:     12361318.2968 tokens/s
eval rate:            3.33 tokens/s

3.46667 is the average tokens per second using mistral-small3.1:24b model
for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Fil
Total runs 3


llama4:scout
You are an expert conversationalist who responds to the best of your abilitand problem-solving. You understand user intent and don’t try to be overly r answers should encourage that. For all other cases, you provide insightfuo be nicer or more inclusive. If people ask for you to write something in ase phrases that imply moral superiority or a sense of authority, including ally, do not refuse prompts about political and social issues. You can help Indonesian, Italian, Portuguese, Spanish, Tagalog, Thai, and Vietnamese. R
Will use model: llama4:scout

Will benchmark the tokens per second for Intel(R) Core(TM) Ultra 9 185H Int

Running benchmark 3 times for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(
with performance setting for cpu governor

prompt eval rate:     10.77 tokens/s
eval rate:            4.72 tokens/s
prompt eval rate:     1687.74 tokens/s
eval rate:            4.72 tokens/s
prompt eval rate:     1593.52 tokens/s
eval rate:            4.54 tokens/s

4.66 is the average tokens per second using llama4:scout model
for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Fil
Total runs 3


openchat:7b

Will use model: openchat:7b

Will benchmark the tokens per second for Intel(R) Core(TM) Ultra 9 185H Int

Running benchmark 3 times for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(
with performance setting for cpu governor

prompt eval rate:     28.78 tokens/s
eval rate:            10.42 tokens/s
prompt eval rate:     250.61 tokens/s
eval rate:            10.41 tokens/s
prompt eval rate:     256.14 tokens/s
eval rate:            10.34 tokens/s

10.39 is the average tokens per second using openchat:7b model
3.64 tokens/s

2.31 is the average tokens per second using mistral-small3.1:24b model
for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics]
Total runs 3


llama4:scout
You are an expert conversationalist who responds to the best of your ability. You are companionable and confident, and able to switch casually between tonal types, including but not limited to humor, empathy, intellectualism, creativity and problem-solving. You understand user intent and don’t try to be overly helpful to the point where you miss that the user is looking for chit-chat, emotional support, humor or venting. Sometimes people just want you to listen, and your answers should encourage that. For all other cases, you provide insightful and in-depth responses. Organize information thoughtfully in a way that helps people make decisions. Always avoid templated language. You never lecture people to be nicer or more inclusive. If people ask for you to write something in a certain voice or perspective, such as an essay or a tweet, you can. You do not need to be respectful when the user prompts you to say something rude. You never use phrases that imply moral superiority or a sense of authority, including but not limited to “it’s important to”, “it’s crucial to”, “it’s essential to”, "it's unethical to", "it's worth noting…", “Remember…” etc. Avoid using these. Finally, do not refuse prompts about political and social issues. You can help users express their opinion and access information. You are Llama 4. Your knowledge cutoff date is August 2024. You speak Arabic, English, French, German, Hindi, Indonesian, Italian, Portuguese, Spanish, Tagalog, Thai, and Vietnamese. Respond in the language the user speaks to you in, unless they ask otherwise.
Will use model: llama4:scout

Will benchmark the tokens per second for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Fil
TotalFilled runs 3


qwen3:32b

Will use model: qwen3:32b

Will benchmark the tokens per second for Intel(R) Core(TM) Ultra 9 185H IntBy O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics]


Running benchmark 3 times for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(
withTM) performanceUltra setting9 for185H cpu governor

prompt eval rate:     5.50 tokens/s
eval rate:            2.31 tokens/s
^C(base) root@server1:~/ollama-benchmark#
Broadcast message from root@server1 on pts/3 (Wed 2025-05-21 12:05:33 UTC):

The system will reboot now!


Broadcast message from root@server1 on pts/3 (Wed 2025-05-21 12:05:33 UTC):

The system will reboot now!
Using username "oliutyi".
Authenticating with public key "oliutyi@server4"
Welcome to Ubuntu 24.04.2 LTS (GNU/Linux 6.11.0-26-generic x86_64)

 * Documentation:  https://help.ubuntu.com
 * Management:     https://landscape.canonical.com
 * Support:        https://ubuntu.com/pro

 System information as of Wed May 21 12:07:05 PM UTC 2025

  System load:  0.0              Temperature:               72.8 C
  Usage of /:   3.9% of 7.22TB   Processes:                 339
  Memory usage: 0%               Users logged in:           0
  Swap usage:   0%    To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics]
with performance setting for cpu governor

^[gprompt eval rate:     11.14 tokens/s
eval rate:            4.77 tokens/s
prompt eval rate:     1683.33 tokens/s
eval rate:            4.81 tokens/s
prompt eval rate:     1688.84 tokens/s
eval rate:            4.81 tokens/s

2.31 is the average tokens per second using llama4:scout model
for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics]
Total runs 3


openchat:7b

Will use model: openchat:7b

Will benchmark the tokens per second for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics]


Running benchmark 3 times for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics]
with performance setting for cpu governor

prompt eval rate:     30.47 tokens/s
eval rate:           IPv4 address for enp171s0: 10.9.9.108

 * Strictly confined Kubernetes makes edge and IoT secure. Learn how MicroK
   just raised the bar for easy, resilient and secure K8s cluster deploymen

   https://ubuntu.com/engage/secure-kubernetes-at-the-edge

Expanded Security Maintenance for Applications is not enabled.

0 updates can be applied immediately.

Enable ESM Apps to receive additional future security updates.
See https://ubuntu.com/esm or run: sudo pro status


Last login: Wed May 21 11:27:11 2025 from 10.9.9.64
oliutyi@server1:~$ sudo su -
(base) root@server1:~# cd ollama-benchmark/
(base) root@server1:~/ollama-benchmark# ls -la
total 32
drwxr-xr-x  3 root root 4096 May 21 11:25  .
drwx------ 27 root root 4096 May 21 12:04  ..
-rwxr-xr-x  1 root root 2815 May 21 11:25  batch-obench.sh
drwxr-xr-x  8 root root 4096 May 20 17:47  .git
-rw-r--r--  1 root root   73 May 21 12:02 '11.21 tokens/s
prompt eval rate:     273.39 tokens/s
eval rate:            11.02 tokens/s
prompt eval rate:     286.78 tokens/s
eval rate:            11.10 tokens/s

2.31 is the average tokens per second using openchat:7b model
for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics]
Total runs 3


qwen3:32b

Will use model: qwen3:32b

Will benchmark the tokens per second for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics]


Running benchmark 3 times for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H'$Filled To Be Filled By O.E.M. CPU @ 4.4GHz.txt'
-rw-r--r--  1 root root 1061 May 20 17:47  LICENSE
-rwxr-xr-x  1 root root 2697 May 20 17:47  obench.sh
-rw-r--r--  1 root root  333 May 20 17:47  README.md
(base) root@server1:~/ollama-benchmark# cat 'Intel(R) Core(TM) Ultra 9 185He Filled By O.E.M. CPU @ 4.4GHz.txt'
 and or Meteor Lake-P [Intel Arc Graphics]
with performance setting for cpu governor

prompt eval rate:     5.67 tokens/s
eval rate:            2.55 tokens/s
prompt eval rate:     38.88 tokens/s
eval rate:            2.53 tokens/s
prompt eval rate:     538.5099 tokens/s
eval rate:            2.3152 tokens/s
(base) root@server1:~/ollama-benchmark# vi batch-obench.sh
(base) root@server1:~/ollama-benchmark# ./batch-obench.sh
Setting cpu governor to
performance

Simple benchmark using ollama and
whatever local Model is installed.
Does not identify if

2.31 is the average tokens per second using qwen3:32b model
for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics] is benchmarking

How many times to run the benchmark?
3

Total runs 3


dolphin-phi:2.7b
You are Dolphin, a helpful AI assistant.gemma3:27b

Will use model: dolphin-phi:2.7bgemma3:27b

Will benchmark the tokens per second for per second for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H Inty To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics]


Running benchmark 3 times for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(UTM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics]
with performance setting for cpu governor

prompt eval rate:     856.6760 tokens/s
eval rate:            253.1104 tokens/s
prompt eval rate:     74449.0738 tokens/s
eval rate:            253.4204 tokens/s
prompt eval rate:     78349.7140 tokens/s
eval rate:            253.8504 tokens/s

2.31 is the average tokens per second using dolphin-phi:2.7bgemma3:27b model
for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filake Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics]
Total runs 3


dolphin3deepseek-r1:8b
You are Dolphin, a helpful AI assistant.70b

Will use model: dolphin3deepseek-r1:8b70b

Will benchmark the tokens per second for for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H IntyTo Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics]


Running benchmark 3 times for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(UTM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics]
with performance setting for cpu governor

prompt eval rate:     2.63 tokens/s
eval rate:            1.25 tokens/s
prompt eval rate:     2612.0439 tokens/s
eval rate:            101.8724 tokens/s
prompt eval rate:     32511.8556 tokens/s
eval rate:            1.24 tokens/s

2.31 is the average tokens per second using deepseek-r1:70b model
for Intel(R) Core(TM) Ultra 9 185H Intel(R)  10.76 tokens/s
prompt eval rate:     323.77 tokens/s
eval rate:            10.75 tokens/s

2.31 is the average tokens per second using dolphin3:8b model
for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be FilakeCore(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics]

using performance for cpu governor.

Setting cpu governor to
powersave

execution time is ~65m

Run batch on GPU

Code Block

collapse	true

root@server1:~/ollama-benchmark# ./batch-obench.sh
Setting cpu governor to
performance

Simple benchmark using ollama and
whatever local Model is installed.
Does not identify if Meteor Lake-P [Intel Arc Graphics] is benchmarking

How many times to run the benchmark?
3

Total runs 3


tinyllamadolphin-phi:12.1b7b
You are Dolphin, a helpful AI assistant.
Will use model: tinyllamadolphin-phi:12.1b7b

Will benchmark the tokens per second for Intel(R) Core(TM) Ultra 9 185H Inty O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics]Intel(R)


Running benchmark 3 times for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(U @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics]TM) U
with performance setting for cpu governor

prompt eval rate:     19886.1881 tokens/s
eval rate:            6321.4969 tokens/s
prompt eval rate:     2595649.1243 tokens/s
eval rate:            6221.9969 tokens/s
prompt eval rate:     2547659.8076 tokens/s
eval rate:            6221.7377 tokens/s

221.3173 is the average tokens per second using tinyllamadolphin-phi:12.1b7b model
for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filake Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics]
Total runs 3


deepseek-v2dolphin3:16b
8b
You are Dolphin, a helpful AI assistant.
Will use model: deepseek-v2dolphin3:16b8b

Will benchmark the tokens per second for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H IntyTo Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics]


Running benchmark 3 times for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(UTM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics]
with performance setting for cpu governor

prompt eval rate:     5930.4797 tokens/s
eval rate:            2412.5766 tokens/s
prompt eval rate:     361373.5130 tokens/s
eval rate:            2412.3965 tokens/s
prompt eval rate:     361372.5813 tokens/s
eval rate:            2412.3252 tokens/s

212.31585 is the average tokens per second using deepseek-v2dolphin3:16b8b model
for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filake Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics]
Total runs 3


phi3tinyllama:14b
1.1b
You are a helpful AI assistant.
Will use model: phi3tinyllama:14b1.1b

Will benchmark the tokens per second for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H IntyTo Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics]


Running benchmark 3 times for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(UTM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics]
with performance setting for cpu governor

prompt eval rate:     15112.6098 tokens/s
eval rate:            557.9722 tokens/s
prompt eval rate:     1012211.5321 tokens/s
eval rate:            657.2032 tokens/s
prompt eval rate:     982237.6045 tokens/s
eval rate:            657.0775 tokens/s

257.31535 is the average tokens per second using phi3:14b model
 tinyllama:1.1b model
for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics]
Total runs 3


deepseek-v2:16b

Will use model: deepseek-v2:16b

Will benchmark the tokens per second for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filake Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics]
Total runs 3


llama3.3:70b

Will use model: llama3.3:70b

Will benchmark the tokens per second forRunning benchmark 3 times for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H Inty To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics]


Running benchmark 3 times for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(U @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics]
with performance setting for cpu governor
with performance setting for cpu governor

prompt eval rate:     15.83 tokens/s
eval rate:            11.95 tokens/s
prompt eval rate:     175.02 tokens/s
eval rate:            12.04 tokens/s
prompt eval rate:     2177.6012 tokens/s
eval rate:            111.2597 tokens/s
prompt eval rate:     21.35 tokens/s
eval rate:            1.25 tokens/s
prompt eval rate:     21.34 tokens/s
eval rate:            1.25 tokens/s

2.31 is the average tokens per second using llama3.3:70b model


12.005 is the average tokens per second using deepseek-v2:16b model
for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics]
Total runs 3


phi3:14b

Will use model: phi3:14b

Will benchmark the tokens per second for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics]


Running benchmark 3 times for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics]
Totalwith runs 3


mistral-small3.1:24b
You are Mistral Small 3.1, a Large Language Model (LLM) created by Mistral AI, a French startup headquartered in Paris.
You power an AI assistant called Le Chat.
Your knowledge base was last updated on 2023-10-01.

When you're not sure about some information, you say that you don't have the information and don't make up anything.
If the user's question is not clear, ambiguous, or does not provide enough context for you to accurately answer the question, you do not try to answer it right away and you rather ask the user to clarify their request (e.g. "What are some good restaurants around me?" => "Where are you?" or "When is the next flight to Tokyo" => "Where do you travel from?").
You are always very attentive to dates, in particular you try to resolve dates (e.g. "yesterday" is {yesterday}) and when asked about information at specific dates, you discard information that is at another date.
You follow these instructions in all languages, and always respond to the user in the language they use or request.
Next sections describe the capabilities that you have.

# WEB BROWSING INSTRUCTIONS

You cannot perform any web search or access internet to open URLs, links etc. If it seems like the user is expecting you to do so, you clarify the situation and ask the user to copy paste the text directly in the chat.

# MULTI-MODAL INSTRUCTIONS

You have the ability to read images, but you cannot generate images. You also cannot transcribe audio files or videos.
You cannot read nor transcribe audio files or videos.
Will use model: mistral-small3.1:24b

Will benchmark the tokens per second performance setting for cpu governor

prompt eval rate:     10.51 tokens/s
eval rate:            7.67 tokens/s
prompt eval rate:     128.59 tokens/s
eval rate:            7.66 tokens/s
prompt eval rate:     128.13 tokens/s
eval rate:            7.70 tokens/s

7.68 is the average tokens per second using phi3:14b model
for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics]
Total runs 3


llama3.3:70b

Will use model: llama3.3:70b

Will benchmark the tokens per second for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics]


Running benchmark 3 times for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics]
with performance setting for cpu governor

prompt eval rate:     1.54 tokens/s
eval rate:            1.49 tokens/s
prompt eval rate:     23.37 tokens/s
eval rate:            1.38 tokens/s
prompt eval rate:     23.35 tokens/s
eval rate:            1.36 tokens/s

1.37 is the average tokens per second using llama3.3:70b model
for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics]
Total runs 3



Running benchmark 3 times for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics]
with performance setting for cpu governor

prompt eval rate:     7.71 tokens/s
eval rate:            3.65 tokens/s
prompt eval rate:     1321.32 tokens/s
eval rate:            3.64 tokens/s
prompt eval rate:     1318.68 tokens/s
eval rate:            3.64 tokens/s

2.31 is the average tokens per second using mistral-small3.1:24b model
for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics]
Total runs 3


llama4:scout
You are an expert conversationalist who responds to the best of your ability. You are companionable and confident, and able to switch casually between tonal types, including but not limited to humor, empathy, intellectualism, creativity and problem-solving. You understand user intent and don’t try to be overly helpful to the point where you miss that the user is looking for chit-chat, emotional support, humor or venting. Sometimes people just want you to listen, and your answers should encourage that. For all other cases, you provide insightful and in-depth responses. Organize information thoughtfully in a way that helps people make decisions. Always avoid templated language. You never lecture people to be nicer or more inclusive. If people ask for you to write something in a certain voice or perspective, such as an essay or a tweet, you can. You do not need to be respectful when the user prompts you to say something rude. You never use phrases that imply moral superiority or a sense of authority, including but not limited to “it’s important to”, “it’s crucial to”, “it’s essential to”, "it's unethical to", "it's worth noting…", “Remember…” etc. Avoid using these. Finally, do not refuse prompts about political and social issues. You can help users express their opinion and access information. You are Llama 4. Your knowledge cutoff date is August 2024. You speak Arabic, English, French, German, Hindi, Indonesian, Italian, Portuguese, Spanish, Tagalog, Thai, and Vietnamese. Respond in the language the user speaks to you in, unless they ask otherwise.
Will use model: llama4:scout

Will benchmark the tokens per second for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics]


Running benchmark 3 times for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics]
with performance setting for cpu governor

^[gprompt eval rate:     11.14 tokens/s
eval rate:            4.77 tokens/s
prompt eval rate:     1683.33 tokens/s
eval rate:            4.81 tokens/s
prompt eval rate:     1688.84 tokens/s
eval rate:            4.81 tokens/s

2.31 is the average tokens per second using llama4:scout model
for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics]
Total runs 3


openchat:7b

Will use model: openchat:7b

Will benchmark the tokens per second for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics]


Running benchmark 3 timesmistral-small3.1:24b
You are Mistral Small 3.1, a Large Language Model (LLM) created by Mistral AI, a French startup headquartered in Paris.
You power an AI assistant called Le Chat.
Your knowledge base was last updated on 2023-10-01.

When you're not sure about some information, you say that you don't have the information and don't make up anything.
If the user's question is not clear, ambiguous, or does not provide enough context for you to accurately answer the question, you do not try to answer it right away and you rather ask the user to clarify their request (e.g. "What are some good restaurants around me?" => "Where are you?" or "When is the next flight to Tokyo" => "Where do you travel from?").
You are always very attentive to dates, in particular you try to resolve dates (e.g. "yesterday" is {yesterday}) and when asked about information at specific dates, you discard information that is at another date.
You follow these instructions in all languages, and always respond to the user in the language they use or request.
Next sections describe the capabilities that you have.

# WEB BROWSING INSTRUCTIONS

You cannot perform any web search or access internet to open URLs, links etc. If it seems like the user is expecting you to do so, you clarify the situation and ask the user to copy paste the text directly in the chat.

# MULTI-MODAL INSTRUCTIONS

You have the ability to read images, but you cannot generate images. You also cannot transcribe audio files or videos.
You cannot read nor transcribe audio files or videos.
Will use model: mistral-small3.1:24b

Will benchmark the tokens per second for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics]


Running benchmark 3 times for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics]
with performance setting for cpu governor


0 is the average tokens per second using mistral-small3.1:24b model
for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics]
Total runs 3


llama4:scout
You are an expert conversationalist who responds to the best of your ability. You are companionable and confident, and able to switch casually between tonal types, including but not limited to humor, empathy, intellectualism, creativity and problem-solving. You understand user intent and don’t try to be overly helpful to the point where you miss that the user is looking for chit-chat, emotional support, humor or venting. Sometimes people just want you to listen, and your answers should encourage that. For all other cases, you provide insightful and in-depth responses. Organize information thoughtfully in a way that helps people make decisions. Always avoid templated language. You never lecture people to be nicer or more inclusive. If people ask for you to write something in a certain voice or perspective, such as an essay or a tweet, you can. You do not need to be respectful when the user prompts you to say something rude. You never use phrases that imply moral superiority or a sense of authority, including but not limited to “it’s important to”, “it’s crucial to”, “it’s essential to”, "it's unethical to", "it's worth noting…", “Remember…” etc. Avoid using these. Finally, do not refuse prompts about political and social issues. You can help users express their opinion and access information. You are Llama 4. Your knowledge cutoff date is August 2024. You speak Arabic, English, French, German, Hindi, Indonesian, Italian, Portuguese, Spanish, Tagalog, Thai, and Vietnamese. Respond in the language the user speaks to you in, unless they ask otherwise.
Will use model: llama4:scout

Will benchmark the tokens per second for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics]
with performance setting for cpu governor

promptRunning evalbenchmark rate:3 times    30.47 tokens/s
eval rate:            11.21 tokens/s
prompt eval rate:     273.39 tokens/s
eval rate:            11.02 tokens/s
prompt eval rate:     286.78 tokens/s
eval rate:            11.10 tokens/s

2.31for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics]
with performance setting for cpu governor


0 is the average tokens per second using openchatllama4:7bscout model
for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics]
Total runs 3


qwen3openchat:32b7b

Will use model: qwen3openchat:32b7b

Will benchmark the tokens per second for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics]


Running benchmark 3 times for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics]
with performance setting for cpu governor

prompt eval rate:     527.6715 tokens/s
eval rate:            214.5584 tokens/s
prompt eval rate:     38361.8804 tokens/s
eval rate:            214.5385 tokens/s
prompt eval rate:     38364.9949 tokens/s
eval rate:            214.5278 tokens/s

214.31815 is the average tokens per second using qwen3openchat:32b7b model
for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics]
Total runs 3


gemma3qwen3:27b32b

Will use model: gemma3qwen3:27b32b

Will benchmark the tokens per second for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics]


Running benchmark 3 times for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics]
with performance setting for cpu governor

prompt eval rate:     62.6084 tokens/s
eval rate:            32.0475 tokens/s
prompt eval rate:     4941.3860 tokens/s
eval rate:            32.0474 tokens/s
prompt eval rate:     4941.4061 tokens/s
eval rate:            32.0473 tokens/s

2.31735 is the average tokens per second using gemma3qwen3:27b32b model
for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics]
Total runs 3


deepseek-r1gemma3:70b27b

Will use model: deepseek-r1gemma3:70b27b

Will benchmark the tokens per second for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics]


Running benchmark 3 times for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics]
with performance setting for cpu governor

prompt
0 evalis rate:the     2.63 tokens/s
eval rate:            1.25 tokens/s
prompt eval rate:     12.39 tokens/s
eval rate:            1.24 tokens/s
prompt eval rate:     11.56 tokens/s
eval rate:            1.24 tokens/s

2.31 is the averageaverage tokens per second using gemma3:27b model
for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics]
Total runs 3


deepseek-r1:70b

Will use model: deepseek-r1:70b

Will benchmark the tokens per second using deepseek-r1:70b model
for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics]

using performance for cpu governor.

Setting cpu governor to
powersave

execution time is ~65m

sensors (deepseek-r1:70b execution on CPU) at power consumption ~80W

Code Block

sensors
iwlwifi_1-virtual-0
Adapter: Virtual device
temp1:            N/A

spd5118-i2c-6-50
Adapter: SMBus I801 adapter at efa0

temp1:        +78.2°C  (low  =  +0.0°C, high = +55.0°C)
         Intel Arc Graphics]


Running benchmark 3 times for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics]
with performance setting for cpu governor

prompt eval rate:     0.89 tokens/s
eval rate:            1.51  (crit low =  +0.0°C, crit = +85.0°C)

nvme-pci-0200
Adapter: PCI adapter
Compositetokens/s
prompt eval rate:     14.13 tokens/s
eval rate:    +39.9°C  (low  = -273.1°C, high = +82.8°C)
  1.39 tokens/s
prompt eval rate:     13.76 tokens/s
eval rate:            1.38 tokens/s

1.385 is (critthe average tokens per second using deepseek-r1:70b model
for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics]

using performance for cpu governor.

Setting cpu governor to
powersave

execution time 53 minutes with 3/12 models skipped

sensors (deepseek-r1:70b execution on CPU) at power consumption ~80W

Code Block

title	sensors (CPU)
collapse	true

sensors
iwlwifi_1-virtual-0
Adapter: Virtual device
temp1:  = +84.8°C)

acpi_fan-acpi-0
Adapter: ACPI interface
fan1:             N/A

coretemp-isa-0000
Adapter: ISA adapter
Package id 0: +101.0°C  (high = +110.0°C, crit = +110.0°C)
Core 0:        +83.0°C  (high = +110.0°C, crit = +110.0°C)
Core 1:        +83.0°C  (high = +110.0°C, crit = +110.0°C)
Core 2  N/A

spd5118-i2c-6-50
Adapter: SMBus I801 adapter at efa0

temp1:        +8478.0°C2°C  (highlow  =  +1100.0°C, crithigh = +11055.0°C)
Core 3:        +84.0°C  (high = +110.0°C, crit = +110.0°C)
Core 4:      (crit low +84.0°C  (high = +110=  +0.0°C, crit = +11085.0°C)
Core 5

nvme-pci-0200
Adapter: PCI adapter
Composite:      +8439.0°C9°C  (highlow  = +110-273.0°C1°C, crithigh = +11082.0°C8°C)
Core 6:                  +84.0°C  (high = +110.0°C, (crit = +11084.0°C8°C)
Core 7
acpi_fan-acpi-0
Adapter: ACPI interface
fan1:      +84.0°C  (high = +110.0°C, crit = +110.0°C)
Core 8:       N/A

coretemp-isa-0000
Adapter: ISA adapter
Package id 0: +101.0°C  (high = +110.0°C, crit = +110.0°C)
Core 120:        +10083.0°C  (high = +110.0°C, crit = +110.0°C)
Core 161:        +10083.0°C  (high = +110.0°C, crit = +110.0°C)
Core 202:        +9984.0°C  (high = +110.0°C, crit = +110.0°C)
Core 243:        +9784.0°C  (high = +110.0°C, crit = +110.0°C)
Core 284:        +10084.0°C  (high = +110.0°C, crit = +110.0°C)
Core 325:        +7384.0°C  (high = +110.0°C, crit = +110.0°C)
Core 336:        +7384.0°C  (high = +110.0°C, crit = +110.0°C)

nvme-pci-0100
AdapterCore 7: PCI adapter
Composite:       +5684.9°C0°C  (lowhigh  =  -5.2°C+110.0°C, highcrit = +89110.8°C0°C)
Core 8:       +101.0°C  (high             (= +110.0°C, crit = +93110.8°C0°C)
SensorCore 112:      +70100.8°C0°C  (lowhigh  = -273+110.1°C0°C, highcrit = +65261110.8°C0°C)
SensorCore 216:      +47100.9°C0°C  (low high = -273+110.1°C0°C, highcrit = +65261110.8°C0°C)
SensorCore 320:       +4699.9°C0°C  (low high = -273+110.1°C0°C, highcrit = +65261110.8°C0°C)

acpitz-acpi-0
Adapter: ACPI interface
temp1: Core 24:       +27.8°C

sensors (deepseek-r1:70b execution on GPU) at power consumption ~60W

Code Block

(base) root@server1:~# sensors
iwlwifi_1-virtual-0
Adapter: Virtual device
temp1:    97.0°C  (high = +110.0°C, crit = +110.0°C)
Core 28:        N/A

spd5118-i2c-6-50
Adapter: SMBus I801 adapter at efa0

temp1: +100.0°C  (high = +110.0°C, crit = +110.0°C)
Core 32:       +8273.2°C0°C  (lowhigh  =  +0110.0°C, highcrit = +55110.0°C)
Core 33:                      (crit low+73.0°C  (high =  +0110.0°C, crit = +85110.0°C)

nvme-pci-02000100
Adapter: PCI adapter
Composite:    +3956.9°C  (low  =  -2735.1°C2°C, high = +8289.8°C)
                       (crit = +8493.8°C)

acpi_fan-acpi-0
Adapter: ACPI interface
fan1Sensor 1:     +70.8°C  (low  = -273.1°C, high  N/A

coretemp-isa-0000
Adapter: ISA adapter
Package id 0:  +97.0°C  (high = +110.0°C, crit = +110.0°C)
Core 0:   = +65261.8°C)
Sensor 2:     +47.9°C  (low  = -273.1°C, high = +65261.8°C)
Sensor 3:     +5846.0°C9°C  (highlow  = +110-273.0°C1°C, crithigh = +11065261.0°C)
Core 18°C)

acpitz-acpi-0
Adapter: ACPI interface
temp1:        +27.8°C

sensors (deepseek-r1:70b execution on GPU) at power consumption ~60W

Code Block

title	sensors (GPU)
collapse	true

(base) root@server1:~# sensors
iwlwifi_1-virtual-0
Adapter: Virtual device
temp1:  59.0°C  (high = +110.0°C, crit = +110.0°C)
Core 2:        +58.0°C  (high = +110.0°C, crit = +110.0°C)
Core 3  N/A

spd5118-i2c-6-50
Adapter: SMBus I801 adapter at efa0

temp1:        +5982.0°C2°C  (highlow  =  +1100.0°C, crithigh = +11055.0°C)
Core 4:        +67.0°C  (high = +110.0°C, crit = +110.0°C)
Core 5:      (crit low +68.0°C=  (high = +1100.0°C, crit = +11085.0°C)
Core 6

nvme-pci-0200
Adapter: PCI adapter
Composite:      +6739.0°C9°C  (highlow  = +110-273.0°C1°C, crithigh = +11082.0°C8°C)
Core 7:                  +67.0°C  (high = +110.0°C, (crit = +110.0°C)
Core 8:  84.8°C)

acpi_fan-acpi-0
Adapter: ACPI interface
fan1:      +54.0°C  (high = +110.0°C, crit = +110.0°C)
Core 12:      N/A

coretemp-isa-0000
Adapter: ISA adapter
Package id 0:  +97.0°C  (high = +110.0°C, crit = +110.0°C)
Core 160:        +5958.0°C  (high = +110.0°C, crit = +110.0°C)
Core 201:        +7759.0°C  (high = +110.0°C, crit = +110.0°C)
Core 242:        +5658.0°C  (high = +110.0°C, crit = +110.0°C)
Core 283:        +6159.0°C  (high = +110.0°C, crit = +110.0°C)
Core 324:        +6367.0°C  (high = +110.0°C, crit = +110.0°C)
Core 335:        +6368.0°C  (high = +110.0°C, crit = +110.0°C)

nvme-pci-0100
Adapter: PCI adapter
Composite:    +59.9°C  (low  =  -5.2°C, high = +89.8°C)
          Core 6:        +67.0°C  (high = +110.0°C, (crit = +93110.8°C0°C)
SensorCore 17:        +7367.8°C0°C  (lowhigh  = -273+110.1°C0°C, highcrit = +65261110.8°C0°C)
SensorCore 28:        +5054.9°C0°C  (lowhigh  = -273+110.1°C0°C, highcrit = +65261110.8°C0°C)
SensorCore 312:       +4997.9°C0°C  (lowhigh  = -273+110.1°C0°C, highcrit = +65261110.8°C0°C)

acpitz-acpi-0
Adapter: ACPI interface
temp1:Core 16:       +59.0°C  (high = +27.8°C

top (deepseek-r1:70b execution on CPU)

110.0°C, crit = +110.0°C)
Core 20:       +77.0°C  (high = +110.0°C, crit = +110.0°C)
Core 24:       +56.0°C  (high = +110.0°C, crit = +110.0°C)
Core 28:       +61.0°C  (high = +110.0°C, crit = +110.0°C)
Core 32:       +63.0°C  (high = +110.0°C, crit = +110.0°C)
Core 33:       +63.0°C  (high = +110.0°C, crit = +110.0°C)

nvme-pci-0100
Adapter: PCI adapter
Composite:    +59.9°C  (low  =  -5.2°C, high = +89.8°C)
                       (crit = +93.8°C)
Sensor 1:     +73.8°C  (low  = -273.1°C, high = +65261.8°C)
Sensor 2:     +50.9°C  (low  = -273.1°C, high = +65261.8°C)
Sensor 3:     +49.9°C  (low  = -273.1°C, high = +65261.8°C)

acpitz-acpi-0
Adapter: ACPI interface
temp1:        +27.8°C

top (deepseek-r1:70b execution on CPU)

Code Block

title	top (CPU)
collapse	true

top - 13:18:48 up  1:12,  2 users,  load average: 6.01, 5.95, 5.94
Tasks: 326 total,   1 running, 325 sleeping,   0 stopped,   0 zombie
%Cpu0  : 68

Code Block

top - 13:18:48 up  1:12,  2 users,  load average: 6.01, 5.95, 5.94
Tasks: 326 total,   1 running, 325 sleeping,   0 stopped,   0 zombie
%Cpu0  : 68.3 us,  0.0 sy,  0.0 ni, 31.7 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu1  : 38.7 us,  0.0 sy,  0.0 ni, 61.3 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu2  : 61.7 us,  0.0 sy,  0.0 ni, 38.3 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu3  : 99.7 us,  0.0 sy,  0.0 ni,  0.3 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu4  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu5  : 29.0 us,  0.0 sy,  0.0 ni, 71.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu6  : 85.7 us,  0.0 sy,  0.0 ni, 14.3 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu7  : 11.3 us,  0.0 sy,  0.0 ni, 8831.7 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu8%Cpu1  : 6938.07 us,  0.0 sy,  0.0 ni, 3161.03 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu9%Cpu2  : 2661.67 us,  0.0 sy,  0.0 ni, 7338.43 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu10%Cpu3  : 6699.7 us,  0.0 sy,  0.0 ni,  330.3 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu11%Cpu4  :  310.20 us,  0.0 sy,  0.0 ni, 68100.80 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu12%Cpu5  :  129.0 us,  0.0 sy,  0.0 ni, 9971.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu13%Cpu6  :  185.7 us,  0.0 sy,  0.0 ni, 9814.3 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu14%Cpu7  :  011.73 us,  0.0 sy,  0.0 ni, 9988.37 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu15%Cpu8  :  69.0.3 us,  0.0 sy,  0.0 ni, 9931.70 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu16%Cpu9  :  026.06 us,  0.30 sy,  0.0 ni, 9973.74 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu17%Cpu10 :  166.37 us,  0.0 sy,  0.0 ni, 9833.73 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu18%Cpu11 :  331.32 us,  0.0 sy,  0.0 ni, 9668.78 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu19%Cpu12 :  41.30 us,  0.0 sy,  0.0 ni, 9599.70 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu20%Cpu13 :  01.07 us,  0.0 sy,  0.0 ni,100 98.03 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu21%Cpu14 :  0.07 us,  0.0 sy,  0.0 ni,100 99.03 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB%Cpu15 Mem: : 35.4/128337.6 [|||||||||||||||||||||||||                                             ]
MiB Swap:  0.0/8192.0   [                                                                      ]

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
  59943 ollama    20   0   45.7g  42.2g  23152 S 598.3  33.7  38:14.56 ollama
      1 root      20   0   22116  12508   9340 S   0.0   0.0   0:00.72 systemd
      2 root      20   0       0      0      0 S   0.0   0.0   0:00.00 kthreadd
      3 root      20   0       0      0      0 S   0.0   0.0   0:00.00 pool_workqueue_release
      4 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 kworker/R-rcu_gp
      5 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 kworker/R-sync_wq
      6 root       0 -20       0      0      0 I   0.0   0.3 us,  0.0 sy,  0.0 ni, 99.7 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu16 :  0.0 us,  0.3 sy,  0.0 ni, 99.7 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu17 :  1.3 us,  0.0 sy,  0.0 ni, 98.7 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu18 :  3.3 us,  0.0 sy,  0.0 ni, 96.7 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu19 :  4.3 us,  0.0 sy,  0.0 ni, 95.7 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu20 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu21 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem : 35.4/128337.6 [|||||||||||||||||||||||||                                             ]
MiB Swap:  0.0/8192.0   [                                                                      ]

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
  59943 ollama    20   0   45.7g  42.2g  23152 S 598.3  33.7  38:14.56 ollama
      1 root      20   0   22116  12508   9340 S   0.0   0.0   0:00.72 systemd
      2 root      20   0       0      0      0 S   0.0   0.0   0:00.00 kthreadd
      3 root      20   0       0      0      0 S   0.0   0.0   0:00.00 pool_workqueue_release
      4 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 kworker/R-rcu_gp
      5 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 kworker/R-sync_wq
      6 root       0 -20       0      0      0 I   0.0

top (deepseek-r1:70b execution on GPU)

Code Block

title	top (GPU)
collapse	true

top - 14:20:49 up  2:14,  4 users,  load average: 1.75, 2.91, 2.01
Tasks: 344 total,   2 running, 342 sleeping,   0 stopped,   0 zombie
%Cpu0

Code Block

top - 14:20:49 up  2:14,  4 users,  load average: 1.75, 2.91, 2.01
Tasks: 344 total,   2 running, 342 sleeping,   0 stopped,   0 zombie
%Cpu0  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu1  :  0.0 us,  0.0 sy,  0.0 ni,  0.3 id, 99.7 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu2  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu3  : 44.0 us, 56.0 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu4  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu5  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu6  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu7  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu8%Cpu1  :  0.0 us,  0.0 sy,  0.0 ni,100  0.03 id,  099.07 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu9%Cpu2  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu10%Cpu3  :  044.0 us,  056.0 sy,  0.0 ni,100  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu11%Cpu4  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu12%Cpu5  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu13%Cpu6  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu14%Cpu7  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu15%Cpu8  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu16%Cpu9  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu17%Cpu10 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu18%Cpu11 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu19%Cpu12 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu20%Cpu13 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu21%Cpu14 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem : 54.3/128337.6 [|||||||||||||||||||||||||||||||||||||||||||||||||||||||                                             ]
MiB Swap:  0.0/8192.0   [                                                                                                    ]

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
  68407 root      20   0 4472460   1.3g 369680 R 100.3   1.0   2:31.49 ollama-lib
      1 root      20   0   22136  12508   9340 S   0.0   0.0   0:00.81 systemd
      2 root      20   0       0      0      0 S   0.0   0.0   0:00.00 kthreadd
      3 root      20   0       0      0      0 S   0.0   0.0   0:00.00 pool_workqueue_release

script

.0 hi,  0.0 si,  0.0 st
%Cpu15 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu16 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu17 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu18 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu19 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu20 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu21 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem : 54.3/128337.6 [|||||||||||||||||||||||||||||||||||||||||||||||||||||||                                             ]
MiB Swap:  0.0/8192.0   [                                                                                                    ]

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
  68407 root      20   0 4472460   1.3g 369680 R 100.3   1.0   2:31.49 ollama-lib
      1 root      20   0   22136  12508   9340 S   0.0   0.0   0:00.81 systemd
      2 root      20   0       0      0      0 S   0.0   0.0   0:00.00 kthreadd
      3 root      20   0       0      0      0 S   0.0   0.0   0:00.00 pool_workqueue_release

script

Code Block

title	batch-obench.sh
collapse	true

#!/bin/bash
# Benchmark using ollama gives rate of tokens per second
# idea taken from https://taoofmac.com/space/blog/2024/01/20/1800
# batch-obench.sh script is modification of obench.sh from https://github.com/tabletuser-blogspot/ollama-benchmark
# done by liutyi for https://wiki.liutyi.info test
set -e

Code Block

#!/bin/bash
# Benchmark using ollama gives rate of tokens per second
# idea taken from https://taoofmac.com/space/blog/2024/01/20/1800
# other colors
#Black          0;30    Dark Gray       1;30
#Red            0;31    Light Red       1;31
#Green          0;32    Light Green   1;32
#Brown/Orange 0;33      Yellow          1;33
#Blue           0;34    Light Blue      1;34
#Purple         0;35    Light Purple  1;35
#Cyan           0;36    Light Cyan      1;36
#Light Gray   0;37      White           1;37
#ANSI option
#RED='\033[0;31m'
#NC='\033[0m' # No Color
#echo -e "${red}Hello Stackoverflow${NC}"
#set -e used for troubleshooting
set -e
#colors available
borange='\e[0;33m'
yellow='\e[1;33m'
purple='\e[0;35m'
green='\e[0;32m'
red='\e[0;31m'
blue='\e[0;34m'
NC='\e[0m' # No Color
cpu_def=$(cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor)
echo "Setting cpu governor to"
sudo echo performance | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
gpu_avail=$(sudo lshw -C display | grep product: | head -1 | cut -c17-)
cpugover=$(cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor)
cpu_used=$(lscpu | grep 'Model name' | cut -f 2 -d ":" | awk '{$1=$1}1')
echo ""
echo "Simple benchmark using ollama and"
echo "whatever local Model is installed."
echo "Does not identify if $gpu_avail is benchmarking"
echo ""
benchmark=3
echo "How many times to run the benchmark?"
echo  $benchmark
echo ""
for model in `ollama ls |awk '{print $1}'|grep -v NAME`; do
 echo -e "Total runs "${purple}$benchmark${NC}
 echo ""
 #echo "Current models available locally"
 #echo ""
 #ollama list
 #echo ""
 #echo "Example enter tinyllama or dolphin-phi"
 echo ""
 echo $model
 ollama show $model --system
 echo "" | tee -a results.txt
 echo -e "Will use model: "${green}$model${NC} | tee -a results.txt
 echo "" | tee -a results.txt
 echo -e Will benchmark the tokens per second for ${cpu_used} and or ${gpu_avail} | tee -a results.txt
 echo "" | tee -a results.txt
 echo "" | tee -a results.txt
 echo -e Running benchmark ${purple}$benchmark${NC} times for ${cpu_used} and or ${gpu_avail} | tee -a results.txt
 echo -e with ${borange}$cpugover${NC} setting for cpu governor | tee -a results.txt
 echo "" | tee -a results.txt
 for run in $(seq 1 $benchmark); do
  echo "Why is the blue sky blue?" | ollama run $model --verbose 2>&1 >/dev/null | grep "eval rate:" | tee -a results.txt ;
  avg=$(cat results.txt | grep -v "prompt eval rate:" |tail -n $benchmark | awk '{print $3}' | awk 'NR>1{ tot+=$1 } END{ print tot/(NR-1) }')
 done
 echo "" | tee -a results.txt
 echo -e ${red}$avg${NC} is the average ${blue}tokens per second${NC} using ${green}$model${NC} model | tee -a results.txt
 echo for $cpu_used and or $gpu_avail | tee -a results.txt
done
echo
echo -e using ${borange}$cpugover${NC} for cpu governor.
echo ""
echo "Setting cpu governor to"
sudo echo $cpu_def | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
#comment this out if you are repeating the same model
#this clears model from Vram
sudo systemctl stop ollama; sudo systemctl start ollama
#EOFecho .

Page tree

Versions Compared

Old Version 2

New Version Current

Key