With Power Limits 95/110, Ubuntu 24.04, Intel Core Ultra 9 185H,
Crucial 2x128GB 5600MT/s DDR5 SODIMM, WD_BLACK SN850x 8TB
Version: intel-ollama-0.6.2 for GPU SYCL0 (Intel(R) Arc(TM) Graphics) - 120187 MiB free for GPU
| Model | prompt eval rate | second prompt | eval rate | |||||
|---|---|---|---|---|---|---|---|---|
| Name | Params, B | SIZE, GB | CPU | GPU | CPU | GPU | CPU | GPU |
| dolphin-phi:2.7b | 2.7 | 1.6 | 85.67 | 86.81 | 744.07 | 649.43 | 25.42 | 21.73 |
| dolphin3:8b | 3.8 | 4.9 | 26.04 | 30.97 | 325.85 | 373.30 | 10.76 | 12.58 |
| tinyllama:1.1b | 1.1 | 0.6 | 198.18 | 112.98 | 2595.12 | 2211.21 | 62.99 | 57.53 |
| deepseek-v2:16b | 16 | 8.9 | 59.47 | 15.83 | 361.51 | 175.02 | 24.39 | 12.00 |
| phi3:14b | 14 | 7.9 | 15.60 | 10.51 | 101.53 | 128.59 | 6.07 | 7.67 |
| llama3.3:70b | 70 | 42 | 2.60 | 1.54 | 21.35 | 23.37 | 1.25 | 1.37 |
| mistral-small3.1:24b | 24 | 15 | 7.71 | - | 1321.32 | - | 3.64 | - |
| llama4:scout | 17 | 67 | 11.14 | - | 1683.33 | - | 4.81 | - |
| openchat:7b | 7 | 4.1 | 30.47 | 27.15 | 273.39 | 361.04 | 11.10 | 14.81 |
| qwen3:32b | 32 | 20 | 5.67 | 2.84 | 38.88 | 41.60 | 2.53 | 2.73 |
| gemma3:27b | 27 | 17 | 6.60 | - | 49.38 | - | 3.04 | - |
| deepseek-r1:70b | 70 | 42 | 2.63 | 0.89 | 12.39 | 14.13 | 1.24 | |
root@server1:~# ollama list NAME ID SIZE MODIFIED dolphin-phi:2.7b c5761fc77240 1.6 GB About an hour ago dolphin3:8b d5ab9ae8e1f2 4.9 GB About an hour ago tinyllama:1.1b 2644915ede35 637 MB About an hour ago deepseek-v2:16b 7c8c332f2df7 8.9 GB 18 hours ago phi3:14b cf611a26b048 7.9 GB 20 hours ago llama3.3:70b a6eb4748fd29 42 GB 21 hours ago mistral-small3.1:24b b9aaf0c2586a 15 GB 21 hours ago llama4:scout 4f01ed6b6e01 67 GB 21 hours ago openchat:7b 537a4e03b649 4.1 GB 22 hours ago qwen3:32b e1c9f234c6eb 20 GB 23 hours ago gemma3:27b a418f5838eaf 17 GB 23 hours ago deepseek-r1:70b 0c1615a8ca32 42 GB 23 hours ago |
Run batch on CPU
root@server1:~/ollama-benchmark# ./batch-obench.sh
Setting cpu governor to
performance
Simple benchmark using ollama and
whatever local Model is installed.
Does not identify if Meteor Lake-P [Intel Arc Graphics] is benchmarking
How many times to run the benchmark?
3
Total runs 3
deepseek-v2:16b
Will use model: deepseek-v2:16b
Will benchmark the tokens per second for Intel(R) Core(TM) Ultra 9 185H Int
Running benchmark 3 times for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(
with performance setting for cpu governor
prompt eval rate: 56.10 tokens/s
eval rate: 25.88 tokens/s
prompt eval rate: 365.68 tokens/s
eval rate: 24.62 tokens/s
prompt eval rate: 377.67 tokens/s
eval rate: 24.64 tokens/s
25.0467 is the average tokens per second using deepseek-v2:16b model
for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Fil
Total runs 3
phi3:14b
Will use model: phi3:14b
Will benchmark the tokens per second for Intel(R) Core(TM) Ultra 9 185H Int
Running benchmark 3 times for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(
with performance setting for cpu governor
prompt eval rate: 15.25 tokens/s
eval rate: 6.10 tokens/s
prompt eval rate: 100.20 tokens/s
eval rate: 5.88 tokens/s
prompt eval rate: 102.38 tokens/s
eval rate: 6.00 tokens/s
5.99333 is the average tokens per second using phi3:14b model
for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Fil
Total runs 3
llama3.3:70b
Will use model: llama3.3:70b
Will benchmark the tokens per second for Intel(R) Core(TM) Ultra 9 185H Int
Running benchmark 3 times for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(
with performance setting for cpu governor
prompt eval rate: 2.56 tokens/s
eval rate: 1.24 tokens/s
prompt eval rate: 21.20 tokens/s
eval rate: 1.19 tokens/s
prompt eval rate: 19.18 tokens/s
eval rate: 1.11 tokens/s
1.18 is the average tokens per second using llama3.3:70b model
for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Fil
Total runs 3
mistral-small3.1:24b
You are Mistral Small 3.1, a Large Language Model (LLM) created by Mistral
You power an AI assistant called Le Chat.
Your knowledge base was last updated on 2023-10-01.
When you're not sure about some information, you say that you don't have th
If the user's question is not clear, ambiguous, or does not provide enough e good restaurants around me?" => "Where are you?" or "When is the next fli
You are always very attentive to dates, in particular you try to resolve da
You follow these instructions in all languages, and always respond to the u
Next sections describe the capabilities that you have.
# WEB BROWSING INSTRUCTIONS
You cannot perform any web search or access internet to open URLs, links et
# MULTI-MODAL INSTRUCTIONS
You have the ability to read images, but you cannot generate images. You al
You cannot read nor transcribe audio files or videos.
Will use model: mistral-small3.1:24b
Will benchmark the tokens per second for Intel(R) Core(TM) Ultra 9 185H Int
Running benchmark 3 times for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(
with performance setting for cpu governor
prompt eval rate: 7.41 tokens/s
eval rate: 3.61 tokens/s
prompt eval rate: 1305.26 tokens/s
eval rate: 3.46 tokens/s
prompt eval rate: 1236.29 tokens/s
eval rate: 3.33 tokens/s
3.46667 is the average tokens per second using mistral-small3.1:24b model
for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Fil
Total runs 3
llama4:scout
You are an expert conversationalist who responds to the best of your abilitand problem-solving. You understand user intent and don’t try to be overly r answers should encourage that. For all other cases, you provide insightfuo be nicer or more inclusive. If people ask for you to write something in ase phrases that imply moral superiority or a sense of authority, including ally, do not refuse prompts about political and social issues. You can help Indonesian, Italian, Portuguese, Spanish, Tagalog, Thai, and Vietnamese. R
Will use model: llama4:scout
Will benchmark the tokens per second for Intel(R) Core(TM) Ultra 9 185H Int
Running benchmark 3 times for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(
with performance setting for cpu governor
prompt eval rate: 10.77 tokens/s
eval rate: 4.72 tokens/s
prompt eval rate: 1687.74 tokens/s
eval rate: 4.72 tokens/s
prompt eval rate: 1593.52 tokens/s
eval rate: 4.54 tokens/s
4.66 is the average tokens per second using llama4:scout model
for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Fil
Total runs 3
openchat:7b
Will use model: openchat:7b
Will benchmark the tokens per second for Intel(R) Core(TM) Ultra 9 185H Int
Running benchmark 3 times for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(
with performance setting for cpu governor
prompt eval rate: 28.78 tokens/s
eval rate: 10.42 tokens/s
prompt eval rate: 250.61 tokens/s
eval rate: 10.41 tokens/s
prompt eval rate: 256.14 tokens/s
eval rate: 10.34 tokens/s
10.39 is the average tokens per second using openchat:7b model
for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Fil
Total runs 3
qwen3:32b
Will use model: qwen3:32b
Will benchmark the tokens per second for Intel(R) Core(TM) Ultra 9 185H Int
Running benchmark 3 times for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(
with performance setting for cpu governor
prompt eval rate: 5.50 tokens/s
eval rate: 2.31 tokens/s
^C(base) root@server1:~/ollama-benchmark#
Broadcast message from root@server1 on pts/3 (Wed 2025-05-21 12:05:33 UTC):
The system will reboot now!
Broadcast message from root@server1 on pts/3 (Wed 2025-05-21 12:05:33 UTC):
The system will reboot now!
Using username "oliutyi".
Authenticating with public key "oliutyi@server4"
Welcome to Ubuntu 24.04.2 LTS (GNU/Linux 6.11.0-26-generic x86_64)
* Documentation: https://help.ubuntu.com
* Management: https://landscape.canonical.com
* Support: https://ubuntu.com/pro
System information as of Wed May 21 12:07:05 PM UTC 2025
System load: 0.0 Temperature: 72.8 C
Usage of /: 3.9% of 7.22TB Processes: 339
Memory usage: 0% Users logged in: 0
Swap usage: 0% IPv4 address for enp171s0: 10.9.9.108
* Strictly confined Kubernetes makes edge and IoT secure. Learn how MicroK
just raised the bar for easy, resilient and secure K8s cluster deploymen
https://ubuntu.com/engage/secure-kubernetes-at-the-edge
Expanded Security Maintenance for Applications is not enabled.
0 updates can be applied immediately.
Enable ESM Apps to receive additional future security updates.
See https://ubuntu.com/esm or run: sudo pro status
Last login: Wed May 21 11:27:11 2025 from 10.9.9.64
oliutyi@server1:~$ sudo su -
(base) root@server1:~# cd ollama-benchmark/
(base) root@server1:~/ollama-benchmark# ls -la
total 32
drwxr-xr-x 3 root root 4096 May 21 11:25 .
drwx------ 27 root root 4096 May 21 12:04 ..
-rwxr-xr-x 1 root root 2815 May 21 11:25 batch-obench.sh
drwxr-xr-x 8 root root 4096 May 20 17:47 .git
-rw-r--r-- 1 root root 73 May 21 12:02 'Intel(R) Core(TM) Ultra 9 185H'$Filled By O.E.M. CPU @ 4.4GHz.txt'
-rw-r--r-- 1 root root 1061 May 20 17:47 LICENSE
-rwxr-xr-x 1 root root 2697 May 20 17:47 obench.sh
-rw-r--r-- 1 root root 333 May 20 17:47 README.md
(base) root@server1:~/ollama-benchmark# cat 'Intel(R) Core(TM) Ultra 9 185He Filled By O.E.M. CPU @ 4.4GHz.txt'
prompt eval rate: 5.50 tokens/s
eval rate: 2.31 tokens/s
(base) root@server1:~/ollama-benchmark# vi batch-obench.sh
(base) root@server1:~/ollama-benchmark# ./batch-obench.sh
Setting cpu governor to
performance
Simple benchmark using ollama and
whatever local Model is installed.
Does not identify if Meteor Lake-P [Intel Arc Graphics] is benchmarking
How many times to run the benchmark?
3
Total runs 3
dolphin-phi:2.7b
You are Dolphin, a helpful AI assistant.
Will use model: dolphin-phi:2.7b
Will benchmark the tokens per second for Intel(R) Core(TM) Ultra 9 185H Inty O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics]
Running benchmark 3 times for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(U @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics]
with performance setting for cpu governor
prompt eval rate: 85.67 tokens/s
eval rate: 25.11 tokens/s
prompt eval rate: 744.07 tokens/s
eval rate: 25.42 tokens/s
prompt eval rate: 783.71 tokens/s
eval rate: 25.85 tokens/s
2.31 is the average tokens per second using dolphin-phi:2.7b model
for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filake-P [Intel Arc Graphics]
Total runs 3
dolphin3:8b
You are Dolphin, a helpful AI assistant.
Will use model: dolphin3:8b
Will benchmark the tokens per second for Intel(R) Core(TM) Ultra 9 185H Inty O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics]
Running benchmark 3 times for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(U @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics]
with performance setting for cpu governor
prompt eval rate: 26.04 tokens/s
eval rate: 10.87 tokens/s
prompt eval rate: 325.85 tokens/s
eval rate: 10.76 tokens/s
prompt eval rate: 323.77 tokens/s
eval rate: 10.75 tokens/s
2.31 is the average tokens per second using dolphin3:8b model
for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filake-P [Intel Arc Graphics]
Total runs 3
tinyllama:1.1b
You are a helpful AI assistant.
Will use model: tinyllama:1.1b
Will benchmark the tokens per second for Intel(R) Core(TM) Ultra 9 185H Inty O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics]
Running benchmark 3 times for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(U @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics]
with performance setting for cpu governor
prompt eval rate: 198.18 tokens/s
eval rate: 63.49 tokens/s
prompt eval rate: 2595.12 tokens/s
eval rate: 62.99 tokens/s
prompt eval rate: 2547.80 tokens/s
eval rate: 62.73 tokens/s
2.31 is the average tokens per second using tinyllama:1.1b model
for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filake-P [Intel Arc Graphics]
Total runs 3
deepseek-v2:16b
Will use model: deepseek-v2:16b
Will benchmark the tokens per second for Intel(R) Core(TM) Ultra 9 185H Inty O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics]
Running benchmark 3 times for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(U @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics]
with performance setting for cpu governor
prompt eval rate: 59.47 tokens/s
eval rate: 24.57 tokens/s
prompt eval rate: 361.51 tokens/s
eval rate: 24.39 tokens/s
prompt eval rate: 361.58 tokens/s
eval rate: 24.32 tokens/s
2.31 is the average tokens per second using deepseek-v2:16b model
for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filake-P [Intel Arc Graphics]
Total runs 3
phi3:14b
Will use model: phi3:14b
Will benchmark the tokens per second for Intel(R) Core(TM) Ultra 9 185H Inty O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics]
Running benchmark 3 times for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(U @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics]
with performance setting for cpu governor
prompt eval rate: 15.60 tokens/s
eval rate: 5.97 tokens/s
prompt eval rate: 101.53 tokens/s
eval rate: 6.20 tokens/s
prompt eval rate: 98.60 tokens/s
eval rate: 6.07 tokens/s
2.31 is the average tokens per second using phi3:14b model
for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filake-P [Intel Arc Graphics]
Total runs 3
llama3.3:70b
Will use model: llama3.3:70b
Will benchmark the tokens per second for Intel(R) Core(TM) Ultra 9 185H Inty O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics]
Running benchmark 3 times for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(U @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics]
with performance setting for cpu governor
prompt eval rate: 2.60 tokens/s
eval rate: 1.25 tokens/s
prompt eval rate: 21.35 tokens/s
eval rate: 1.25 tokens/s
prompt eval rate: 21.34 tokens/s
eval rate: 1.25 tokens/s
2.31 is the average tokens per second using llama3.3:70b model
for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics]
Total runs 3
mistral-small3.1:24b
You are Mistral Small 3.1, a Large Language Model (LLM) created by Mistral AI, a French startup headquartered in Paris.
You power an AI assistant called Le Chat.
Your knowledge base was last updated on 2023-10-01.
When you're not sure about some information, you say that you don't have the information and don't make up anything.
If the user's question is not clear, ambiguous, or does not provide enough context for you to accurately answer the question, you do not try to answer it right away and you rather ask the user to clarify their request (e.g. "What are some good restaurants around me?" => "Where are you?" or "When is the next flight to Tokyo" => "Where do you travel from?").
You are always very attentive to dates, in particular you try to resolve dates (e.g. "yesterday" is {yesterday}) and when asked about information at specific dates, you discard information that is at another date.
You follow these instructions in all languages, and always respond to the user in the language they use or request.
Next sections describe the capabilities that you have.
# WEB BROWSING INSTRUCTIONS
You cannot perform any web search or access internet to open URLs, links etc. If it seems like the user is expecting you to do so, you clarify the situation and ask the user to copy paste the text directly in the chat.
# MULTI-MODAL INSTRUCTIONS
You have the ability to read images, but you cannot generate images. You also cannot transcribe audio files or videos.
You cannot read nor transcribe audio files or videos.
Will use model: mistral-small3.1:24b
Will benchmark the tokens per second for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics]
Running benchmark 3 times for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics]
with performance setting for cpu governor
prompt eval rate: 7.71 tokens/s
eval rate: 3.65 tokens/s
prompt eval rate: 1321.32 tokens/s
eval rate: 3.64 tokens/s
prompt eval rate: 1318.68 tokens/s
eval rate: 3.64 tokens/s
2.31 is the average tokens per second using mistral-small3.1:24b model
for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics]
Total runs 3
llama4:scout
You are an expert conversationalist who responds to the best of your ability. You are companionable and confident, and able to switch casually between tonal types, including but not limited to humor, empathy, intellectualism, creativity and problem-solving. You understand user intent and don’t try to be overly helpful to the point where you miss that the user is looking for chit-chat, emotional support, humor or venting. Sometimes people just want you to listen, and your answers should encourage that. For all other cases, you provide insightful and in-depth responses. Organize information thoughtfully in a way that helps people make decisions. Always avoid templated language. You never lecture people to be nicer or more inclusive. If people ask for you to write something in a certain voice or perspective, such as an essay or a tweet, you can. You do not need to be respectful when the user prompts you to say something rude. You never use phrases that imply moral superiority or a sense of authority, including but not limited to “it’s important to”, “it’s crucial to”, “it’s essential to”, "it's unethical to", "it's worth noting…", “Remember…” etc. Avoid using these. Finally, do not refuse prompts about political and social issues. You can help users express their opinion and access information. You are Llama 4. Your knowledge cutoff date is August 2024. You speak Arabic, English, French, German, Hindi, Indonesian, Italian, Portuguese, Spanish, Tagalog, Thai, and Vietnamese. Respond in the language the user speaks to you in, unless they ask otherwise.
Will use model: llama4:scout
Will benchmark the tokens per second for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics]
Running benchmark 3 times for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics]
with performance setting for cpu governor
^[gprompt eval rate: 11.14 tokens/s
eval rate: 4.77 tokens/s
prompt eval rate: 1683.33 tokens/s
eval rate: 4.81 tokens/s
prompt eval rate: 1688.84 tokens/s
eval rate: 4.81 tokens/s
2.31 is the average tokens per second using llama4:scout model
for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics]
Total runs 3
openchat:7b
Will use model: openchat:7b
Will benchmark the tokens per second for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics]
Running benchmark 3 times for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics]
with performance setting for cpu governor
prompt eval rate: 30.47 tokens/s
eval rate: 11.21 tokens/s
prompt eval rate: 273.39 tokens/s
eval rate: 11.02 tokens/s
prompt eval rate: 286.78 tokens/s
eval rate: 11.10 tokens/s
2.31 is the average tokens per second using openchat:7b model
for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics]
Total runs 3
qwen3:32b
Will use model: qwen3:32b
Will benchmark the tokens per second for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics]
Running benchmark 3 times for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics]
with performance setting for cpu governor
prompt eval rate: 5.67 tokens/s
eval rate: 2.55 tokens/s
prompt eval rate: 38.88 tokens/s
eval rate: 2.53 tokens/s
prompt eval rate: 38.99 tokens/s
eval rate: 2.52 tokens/s
2.31 is the average tokens per second using qwen3:32b model
for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics]
Total runs 3
gemma3:27b
Will use model: gemma3:27b
Will benchmark the tokens per second for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics]
Running benchmark 3 times for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics]
with performance setting for cpu governor
prompt eval rate: 6.60 tokens/s
eval rate: 3.04 tokens/s
prompt eval rate: 49.38 tokens/s
eval rate: 3.04 tokens/s
prompt eval rate: 49.40 tokens/s
eval rate: 3.04 tokens/s
2.31 is the average tokens per second using gemma3:27b model
for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics]
Total runs 3
deepseek-r1:70b
Will use model: deepseek-r1:70b
Will benchmark the tokens per second for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics]
Running benchmark 3 times for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics]
with performance setting for cpu governor
prompt eval rate: 2.63 tokens/s
eval rate: 1.25 tokens/s
prompt eval rate: 12.39 tokens/s
eval rate: 1.24 tokens/s
prompt eval rate: 11.56 tokens/s
eval rate: 1.24 tokens/s
2.31 is the average tokens per second using deepseek-r1:70b model
for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics]
using performance for cpu governor.
Setting cpu governor to
powersave |
execution time is ~65m
Run batch on GPU
root@server1:~/ollama-benchmark# ./batch-obench.sh
Setting cpu governor to
performance
Simple benchmark using ollama and
whatever local Model is installed.
Does not identify if Meteor Lake-P [Intel Arc Graphics] is benchmarking
How many times to run the benchmark?
3
Total runs 3
dolphin-phi:2.7b
You are Dolphin, a helpful AI assistant.
Will use model: dolphin-phi:2.7b
Will benchmark the tokens per second for Intel(R) Core(TM) Ultra 9 185H Intel(R)
Running benchmark 3 times for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) U
with performance setting for cpu governor
prompt eval rate: 86.81 tokens/s
eval rate: 21.69 tokens/s
prompt eval rate: 649.43 tokens/s
eval rate: 21.69 tokens/s
prompt eval rate: 659.76 tokens/s
eval rate: 21.77 tokens/s
21.73 is the average tokens per second using dolphin-phi:2.7b model
for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics]
Total runs 3
dolphin3:8b
You are Dolphin, a helpful AI assistant.
Will use model: dolphin3:8b
Will benchmark the tokens per second for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics]
Running benchmark 3 times for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics]
with performance setting for cpu governor
prompt eval rate: 30.97 tokens/s
eval rate: 12.66 tokens/s
prompt eval rate: 373.30 tokens/s
eval rate: 12.65 tokens/s
prompt eval rate: 372.13 tokens/s
eval rate: 12.52 tokens/s
12.585 is the average tokens per second using dolphin3:8b model
for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics]
Total runs 3
tinyllama:1.1b
You are a helpful AI assistant.
Will use model: tinyllama:1.1b
Will benchmark the tokens per second for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics]
Running benchmark 3 times for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics]
with performance setting for cpu governor
prompt eval rate: 112.98 tokens/s
eval rate: 57.22 tokens/s
prompt eval rate: 2211.21 tokens/s
eval rate: 57.32 tokens/s
prompt eval rate: 2237.45 tokens/s
eval rate: 57.75 tokens/s
57.535 is the average tokens per second using tinyllama:1.1b model
for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics]
Total runs 3
deepseek-v2:16b
Will use model: deepseek-v2:16b
Will benchmark the tokens per second for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics]
Running benchmark 3 times for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics]
with performance setting for cpu governor
prompt eval rate: 15.83 tokens/s
eval rate: 11.95 tokens/s
prompt eval rate: 175.02 tokens/s
eval rate: 12.04 tokens/s
prompt eval rate: 177.12 tokens/s
eval rate: 11.97 tokens/s
12.005 is the average tokens per second using deepseek-v2:16b model
for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics]
Total runs 3
phi3:14b
Will use model: phi3:14b
Will benchmark the tokens per second for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics]
Running benchmark 3 times for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics]
with performance setting for cpu governor
prompt eval rate: 10.51 tokens/s
eval rate: 7.67 tokens/s
prompt eval rate: 128.59 tokens/s
eval rate: 7.66 tokens/s
prompt eval rate: 128.13 tokens/s
eval rate: 7.70 tokens/s
7.68 is the average tokens per second using phi3:14b model
for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics]
Total runs 3
llama3.3:70b
Will use model: llama3.3:70b
Will benchmark the tokens per second for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics]
Running benchmark 3 times for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics]
with performance setting for cpu governor
prompt eval rate: 1.54 tokens/s
eval rate: 1.49 tokens/s
prompt eval rate: 23.37 tokens/s
eval rate: 1.38 tokens/s
prompt eval rate: 23.35 tokens/s
eval rate: 1.36 tokens/s
1.37 is the average tokens per second using llama3.3:70b model
for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics]
Total runs 3
mistral-small3.1:24b
You are Mistral Small 3.1, a Large Language Model (LLM) created by Mistral AI, a French startup headquartered in Paris.
You power an AI assistant called Le Chat.
Your knowledge base was last updated on 2023-10-01.
When you're not sure about some information, you say that you don't have the information and don't make up anything.
If the user's question is not clear, ambiguous, or does not provide enough context for you to accurately answer the question, you do not try to answer it right away and you rather ask the user to clarify their request (e.g. "What are some good restaurants around me?" => "Where are you?" or "When is the next flight to Tokyo" => "Where do you travel from?").
You are always very attentive to dates, in particular you try to resolve dates (e.g. "yesterday" is {yesterday}) and when asked about information at specific dates, you discard information that is at another date.
You follow these instructions in all languages, and always respond to the user in the language they use or request.
Next sections describe the capabilities that you have.
# WEB BROWSING INSTRUCTIONS
You cannot perform any web search or access internet to open URLs, links etc. If it seems like the user is expecting you to do so, you clarify the situation and ask the user to copy paste the text directly in the chat.
# MULTI-MODAL INSTRUCTIONS
You have the ability to read images, but you cannot generate images. You also cannot transcribe audio files or videos.
You cannot read nor transcribe audio files or videos.
Will use model: mistral-small3.1:24b
Will benchmark the tokens per second for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics]
Running benchmark 3 times for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics]
with performance setting for cpu governor
0 is the average tokens per second using mistral-small3.1:24b model
for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics]
Total runs 3
llama4:scout
You are an expert conversationalist who responds to the best of your ability. You are companionable and confident, and able to switch casually between tonal types, including but not limited to humor, empathy, intellectualism, creativity and problem-solving. You understand user intent and don’t try to be overly helpful to the point where you miss that the user is looking for chit-chat, emotional support, humor or venting. Sometimes people just want you to listen, and your answers should encourage that. For all other cases, you provide insightful and in-depth responses. Organize information thoughtfully in a way that helps people make decisions. Always avoid templated language. You never lecture people to be nicer or more inclusive. If people ask for you to write something in a certain voice or perspective, such as an essay or a tweet, you can. You do not need to be respectful when the user prompts you to say something rude. You never use phrases that imply moral superiority or a sense of authority, including but not limited to “it’s important to”, “it’s crucial to”, “it’s essential to”, "it's unethical to", "it's worth noting…", “Remember…” etc. Avoid using these. Finally, do not refuse prompts about political and social issues. You can help users express their opinion and access information. You are Llama 4. Your knowledge cutoff date is August 2024. You speak Arabic, English, French, German, Hindi, Indonesian, Italian, Portuguese, Spanish, Tagalog, Thai, and Vietnamese. Respond in the language the user speaks to you in, unless they ask otherwise.
Will use model: llama4:scout
Will benchmark the tokens per second for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics]
Running benchmark 3 times for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics]
with performance setting for cpu governor
0 is the average tokens per second using llama4:scout model
for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics]
Total runs 3
openchat:7b
Will use model: openchat:7b
Will benchmark the tokens per second for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics]
Running benchmark 3 times for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics]
with performance setting for cpu governor
prompt eval rate: 27.15 tokens/s
eval rate: 14.84 tokens/s
prompt eval rate: 361.04 tokens/s
eval rate: 14.85 tokens/s
prompt eval rate: 364.49 tokens/s
eval rate: 14.78 tokens/s
14.815 is the average tokens per second using openchat:7b model
for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics]
Total runs 3
qwen3:32b
Will use model: qwen3:32b
Will benchmark the tokens per second for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics]
Running benchmark 3 times for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics]
with performance setting for cpu governor
prompt eval rate: 2.84 tokens/s
eval rate: 2.75 tokens/s
prompt eval rate: 41.60 tokens/s
eval rate: 2.74 tokens/s
prompt eval rate: 41.61 tokens/s
eval rate: 2.73 tokens/s
2.735 is the average tokens per second using qwen3:32b model
for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics]
Total runs 3
gemma3:27b
Will use model: gemma3:27b
Will benchmark the tokens per second for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics]
Running benchmark 3 times for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics]
with performance setting for cpu governor
0 is the average tokens per second using gemma3:27b model
for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics]
Total runs 3
deepseek-r1:70b
Will use model: deepseek-r1:70b
Will benchmark the tokens per second for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics]
Running benchmark 3 times for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics]
with performance setting for cpu governor
prompt eval rate: 0.89 tokens/s
eval rate: 1.51 tokens/s
prompt eval rate: 14.13 tokens/s
eval rate: 1.39 tokens/s
prompt eval rate: 13.76 tokens/s
eval rate: 1.38 tokens/s
1.385 is the average tokens per second using deepseek-r1:70b model
for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics]
using performance for cpu governor.
Setting cpu governor to
powersave
|
execution time 53 minutes with 3/12 models skipped
sensors (deepseek-r1:70b execution on CPU) at power consumption ~80W
sensors
iwlwifi_1-virtual-0
Adapter: Virtual device
temp1: N/A
spd5118-i2c-6-50
Adapter: SMBus I801 adapter at efa0
temp1: +78.2°C (low = +0.0°C, high = +55.0°C)
(crit low = +0.0°C, crit = +85.0°C)
nvme-pci-0200
Adapter: PCI adapter
Composite: +39.9°C (low = -273.1°C, high = +82.8°C)
(crit = +84.8°C)
acpi_fan-acpi-0
Adapter: ACPI interface
fan1: N/A
coretemp-isa-0000
Adapter: ISA adapter
Package id 0: +101.0°C (high = +110.0°C, crit = +110.0°C)
Core 0: +83.0°C (high = +110.0°C, crit = +110.0°C)
Core 1: +83.0°C (high = +110.0°C, crit = +110.0°C)
Core 2: +84.0°C (high = +110.0°C, crit = +110.0°C)
Core 3: +84.0°C (high = +110.0°C, crit = +110.0°C)
Core 4: +84.0°C (high = +110.0°C, crit = +110.0°C)
Core 5: +84.0°C (high = +110.0°C, crit = +110.0°C)
Core 6: +84.0°C (high = +110.0°C, crit = +110.0°C)
Core 7: +84.0°C (high = +110.0°C, crit = +110.0°C)
Core 8: +101.0°C (high = +110.0°C, crit = +110.0°C)
Core 12: +100.0°C (high = +110.0°C, crit = +110.0°C)
Core 16: +100.0°C (high = +110.0°C, crit = +110.0°C)
Core 20: +99.0°C (high = +110.0°C, crit = +110.0°C)
Core 24: +97.0°C (high = +110.0°C, crit = +110.0°C)
Core 28: +100.0°C (high = +110.0°C, crit = +110.0°C)
Core 32: +73.0°C (high = +110.0°C, crit = +110.0°C)
Core 33: +73.0°C (high = +110.0°C, crit = +110.0°C)
nvme-pci-0100
Adapter: PCI adapter
Composite: +56.9°C (low = -5.2°C, high = +89.8°C)
(crit = +93.8°C)
Sensor 1: +70.8°C (low = -273.1°C, high = +65261.8°C)
Sensor 2: +47.9°C (low = -273.1°C, high = +65261.8°C)
Sensor 3: +46.9°C (low = -273.1°C, high = +65261.8°C)
acpitz-acpi-0
Adapter: ACPI interface
temp1: +27.8°C
|
sensors (deepseek-r1:70b execution on GPU) at power consumption ~60W
(base) root@server1:~# sensors
iwlwifi_1-virtual-0
Adapter: Virtual device
temp1: N/A
spd5118-i2c-6-50
Adapter: SMBus I801 adapter at efa0
temp1: +82.2°C (low = +0.0°C, high = +55.0°C)
(crit low = +0.0°C, crit = +85.0°C)
nvme-pci-0200
Adapter: PCI adapter
Composite: +39.9°C (low = -273.1°C, high = +82.8°C)
(crit = +84.8°C)
acpi_fan-acpi-0
Adapter: ACPI interface
fan1: N/A
coretemp-isa-0000
Adapter: ISA adapter
Package id 0: +97.0°C (high = +110.0°C, crit = +110.0°C)
Core 0: +58.0°C (high = +110.0°C, crit = +110.0°C)
Core 1: +59.0°C (high = +110.0°C, crit = +110.0°C)
Core 2: +58.0°C (high = +110.0°C, crit = +110.0°C)
Core 3: +59.0°C (high = +110.0°C, crit = +110.0°C)
Core 4: +67.0°C (high = +110.0°C, crit = +110.0°C)
Core 5: +68.0°C (high = +110.0°C, crit = +110.0°C)
Core 6: +67.0°C (high = +110.0°C, crit = +110.0°C)
Core 7: +67.0°C (high = +110.0°C, crit = +110.0°C)
Core 8: +54.0°C (high = +110.0°C, crit = +110.0°C)
Core 12: +97.0°C (high = +110.0°C, crit = +110.0°C)
Core 16: +59.0°C (high = +110.0°C, crit = +110.0°C)
Core 20: +77.0°C (high = +110.0°C, crit = +110.0°C)
Core 24: +56.0°C (high = +110.0°C, crit = +110.0°C)
Core 28: +61.0°C (high = +110.0°C, crit = +110.0°C)
Core 32: +63.0°C (high = +110.0°C, crit = +110.0°C)
Core 33: +63.0°C (high = +110.0°C, crit = +110.0°C)
nvme-pci-0100
Adapter: PCI adapter
Composite: +59.9°C (low = -5.2°C, high = +89.8°C)
(crit = +93.8°C)
Sensor 1: +73.8°C (low = -273.1°C, high = +65261.8°C)
Sensor 2: +50.9°C (low = -273.1°C, high = +65261.8°C)
Sensor 3: +49.9°C (low = -273.1°C, high = +65261.8°C)
acpitz-acpi-0
Adapter: ACPI interface
temp1: +27.8°C
|
top (deepseek-r1:70b execution on CPU)
top - 13:18:48 up 1:12, 2 users, load average: 6.01, 5.95, 5.94
Tasks: 326 total, 1 running, 325 sleeping, 0 stopped, 0 zombie
%Cpu0 : 68.3 us, 0.0 sy, 0.0 ni, 31.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu1 : 38.7 us, 0.0 sy, 0.0 ni, 61.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu2 : 61.7 us, 0.0 sy, 0.0 ni, 38.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu3 : 99.7 us, 0.0 sy, 0.0 ni, 0.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu4 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu5 : 29.0 us, 0.0 sy, 0.0 ni, 71.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu6 : 85.7 us, 0.0 sy, 0.0 ni, 14.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu7 : 11.3 us, 0.0 sy, 0.0 ni, 88.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu8 : 69.0 us, 0.0 sy, 0.0 ni, 31.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu9 : 26.6 us, 0.0 sy, 0.0 ni, 73.4 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu10 : 66.7 us, 0.0 sy, 0.0 ni, 33.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu11 : 31.2 us, 0.0 sy, 0.0 ni, 68.8 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu12 : 1.0 us, 0.0 sy, 0.0 ni, 99.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu13 : 1.7 us, 0.0 sy, 0.0 ni, 98.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu14 : 0.7 us, 0.0 sy, 0.0 ni, 99.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu15 : 0.3 us, 0.0 sy, 0.0 ni, 99.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu16 : 0.0 us, 0.3 sy, 0.0 ni, 99.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu17 : 1.3 us, 0.0 sy, 0.0 ni, 98.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu18 : 3.3 us, 0.0 sy, 0.0 ni, 96.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu19 : 4.3 us, 0.0 sy, 0.0 ni, 95.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu20 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu21 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
MiB Mem : 35.4/128337.6 [||||||||||||||||||||||||| ]
MiB Swap: 0.0/8192.0 [ ]
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
59943 ollama 20 0 45.7g 42.2g 23152 S 598.3 33.7 38:14.56 ollama
1 root 20 0 22116 12508 9340 S 0.0 0.0 0:00.72 systemd
2 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kthreadd
3 root 20 0 0 0 0 S 0.0 0.0 0:00.00 pool_workqueue_release
4 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 kworker/R-rcu_gp
5 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 kworker/R-sync_wq
6 root 0 -20 0 0 0 I 0.0 |
top - 14:20:49 up 2:14, 4 users, load average: 1.75, 2.91, 2.01
Tasks: 344 total, 2 running, 342 sleeping, 0 stopped, 0 zombie
%Cpu0 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu1 : 0.0 us, 0.0 sy, 0.0 ni, 0.3 id, 99.7 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu2 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu3 : 44.0 us, 56.0 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu4 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu5 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu6 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu7 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu8 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu9 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu10 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu11 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu12 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu13 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu14 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu15 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu16 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu17 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu18 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu19 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu20 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu21 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
MiB Mem : 54.3/128337.6 [||||||||||||||||||||||||||||||||||||||||||||||||||||||| ]
MiB Swap: 0.0/8192.0 [ ]
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
68407 root 20 0 4472460 1.3g 369680 R 100.3 1.0 2:31.49 ollama-lib
1 root 20 0 22136 12508 9340 S 0.0 0.0 0:00.81 systemd
2 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kthreadd
3 root 20 0 0 0 0 S 0.0 0.0 0:00.00 pool_workqueue_release
|
script
#!/bin/bash
# Benchmark using ollama gives rate of tokens per second
# idea taken from https://taoofmac.com/space/blog/2024/01/20/1800
# other colors
#Black 0;30 Dark Gray 1;30
#Red 0;31 Light Red 1;31
#Green 0;32 Light Green 1;32
#Brown/Orange 0;33 Yellow 1;33
#Blue 0;34 Light Blue 1;34
#Purple 0;35 Light Purple 1;35
#Cyan 0;36 Light Cyan 1;36
#Light Gray 0;37 White 1;37
#ANSI option
#RED='\033[0;31m'
#NC='\033[0m' # No Color
#echo -e "${red}Hello Stackoverflow${NC}"
#set -e used for troubleshooting
set -e
#colors available
borange='\e[0;33m'
yellow='\e[1;33m'
purple='\e[0;35m'
green='\e[0;32m'
red='\e[0;31m'
blue='\e[0;34m'
NC='\e[0m' # No Color
cpu_def=$(cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor)
echo "Setting cpu governor to"
sudo echo performance | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
gpu_avail=$(sudo lshw -C display | grep product: | head -1 | cut -c17-)
cpugover=$(cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor)
cpu_used=$(lscpu | grep 'Model name' | cut -f 2 -d ":" | awk '{$1=$1}1')
echo ""
echo "Simple benchmark using ollama and"
echo "whatever local Model is installed."
echo "Does not identify if $gpu_avail is benchmarking"
echo ""
benchmark=3
echo "How many times to run the benchmark?"
echo $benchmark
echo ""
for model in `ollama ls |awk '{print $1}'|grep -v NAME`; do
echo -e "Total runs "${purple}$benchmark${NC}
echo ""
#echo "Current models available locally"
#echo ""
#ollama list
#echo ""
#echo "Example enter tinyllama or dolphin-phi"
echo ""
echo $model
ollama show $model --system
echo "" | tee -a results.txt
echo -e "Will use model: "${green}$model${NC} | tee -a results.txt
echo "" | tee -a results.txt
echo -e Will benchmark the tokens per second for ${cpu_used} and or ${gpu_avail} | tee -a results.txt
echo "" | tee -a results.txt
echo "" | tee -a results.txt
echo -e Running benchmark ${purple}$benchmark${NC} times for ${cpu_used} and or ${gpu_avail} | tee -a results.txt
echo -e with ${borange}$cpugover${NC} setting for cpu governor | tee -a results.txt
echo "" | tee -a results.txt
for run in $(seq 1 $benchmark); do
echo "Why is the blue sky blue?" | ollama run $model --verbose 2>&1 >/dev/null | grep "eval rate:" | tee -a results.txt ;
avg=$(cat results.txt | grep -v "prompt eval rate:" |tail -n $benchmark | awk '{print $3}' | awk 'NR>1{ tot+=$1 } END{ print tot/(NR-1) }')
done
echo "" | tee -a results.txt
echo -e ${red}$avg${NC} is the average ${blue}tokens per second${NC} using ${green}$model${NC} model | tee -a results.txt
echo for $cpu_used and or $gpu_avail | tee -a results.txt
done
echo
echo -e using ${borange}$cpugover${NC} for cpu governor.
echo ""
echo "Setting cpu governor to"
sudo echo $cpu_def | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
#comment this out if you are repeating the same model
#this clears model from Vram
sudo systemctl stop ollama; sudo systemctl start ollama
#EOF |