Results: Test 1 Intel Core Ultra 9 185 H CPU vs GPU ollama models speed
Size of Models
| Code Block |
|---|
root@server1:~# ollama list
NAME ID |
With Power Limits 95/110, Ubuntu 24.04, Intel Core Ultra 9 185H,
Crucial 2x128GB 5600MT/s DDR5 SODIMM, WD_BLACK SN850x 8TB
Version: intel-ollama-0.6.2 for GPU SYCL0 (Intel(R) Arc(TM) Graphics) - 120187 MiB free for GPU
...
| Code Block |
|---|
root@server1:~# ollama list
NAME ID SIZE MODIFIED
dolphin-phi:2.7b c5761fc77240 1.6 GB About an hour ago
dolphin3:8b d5ab9ae8e1f2 4.9 GB About an hour ago
tinyllama:1.1b 2644915ede35 637 MB About an hour ago
deepseek-v2:16b 7c8c332f2df7 8.9 GB 18 hours ago
phi3:14b cf611a26b048 7.9 GB 20 hours ago
llama3.3:70b a6eb4748fd29 42 GB 21 hours ago
mistral-small3.1:24b b9aaf0c2586a 15 GB 21 hours ago
llama4:scout 4f01ed6b6e01 67 GB 21 hours ago
openchat:7b 537a4e03b649 4.1 GB 22 hours ago
qwen3:32b e1c9f234c6eb 20 GB 23 hours ago
gemma3:27b a418f5838eaf 17 GB 23 hours ago
deepseek-r1:70b 0c1615a8ca32 42 GB 23 hours ago
|
Run batch on CPU
| Code Block | ||
|---|---|---|
| ||
root@server1:~/ollama-benchmark# ./batch-obench.sh Setting cpu governor to performance Simple benchmark using ollama and whatever local Model is installed. Does not identify if Meteor Lake-P [Intel Arc Graphics] is benchmarking How many times to run the benchmark? 3 Total runs 3 deepseek-v2:16b Will use model: deepseek-v2:16b Will benchmark the tokens per second for Intel(R) Core(TM) Ultra 9 185H Int Running benchmark 3 times for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core( with performance setting for cpu governor prompt eval rate: 56.10 tokens/s eval rate: 25.88 tokens/s prompt eval rate: 365.68 tokens/s eval rate: 24.62 tokens/s prompt eval rate: 377.67 tokens/s eval rate: 24.64 tokens/s 25.0467 is the average tokens per second using deepseek-v2:16b model for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Fil Total runs 3 phi3:14b Will use model: phi3:14b Will benchmark the tokens per second for Intel(R) Core(TM) Ultra 9 185H Int Running benchmark 3 times for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core( with performance setting for cpu governor prompt eval rate: 15.25 tokens/s eval rate: 6.10 tokens/s prompt eval rate: 100.20 tokens/s eval rate: 5.88 tokens/s prompt eval rate: 102.38 tokens/s eval rate: 6.00 tokens/s 5.99333 is the average tokens per second using phi3:14b model for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Fil Total runs 3 llama3.3:70b Will use model: llama3.3:70b Will benchmark the tokens per second for Intel(R) Core(TM) Ultra 9 185H Int Running benchmark 3 times for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core( with performance setting for cpu governor prompt eval rate: 2.56 tokens/s eval rate: 1.24 tokens/s prompt eval rate: 21.20 tokens/s eval rate: 1.19 tokens/s prompt eval rate: 19.18 tokens/s eval rate: 1.11 tokens/s 1.18 is the average tokens per second using llama3.3:70b model for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Fil Total runs 3 mistral-small3.1:24b You are Mistral Small 3.1, a Large Language Model (LLM) created by Mistral You power an AI assistant called Le Chat. Your knowledge base was last updated on 2023-10-01. When you're not sure about some information, you say that you don't have th If the user's question is not clear, ambiguous, or does not provide enough e good restaurants around me?" => "Where are you?" or "When is the next fli You are always very attentive to dates, in particular you try to resolve da You follow these instructions in all languages, and always respond to the u Next sections describe the capabilities that you have. # WEB BROWSING INSTRUCTIONS You cannot perform any web search or access internet to open URLs, links et # MULTI-MODAL INSTRUCTIONS You have the ability to read images, but you cannot generate images. You al You cannot read nor transcribe audio files or videos. Will use model: mistral-small3.1:24b Will benchmark the tokens per second for Intel(R) Core(TM) Ultra 9 185H Int Running benchmark 3 times for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core( with performance setting for cpu governor prompt eval rate: 7.41 tokens/s eval rate: 3.61 tokens/s prompt eval rate: 1305.26 tokens/s eval rate: 3.46 tokens/s prompt eval rate: 1236.29 tokens/s eval rate: 3.33 tokens/s 3.46667 is the average tokens per second using mistral-small3.1:24b model for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Fil Total runs 3 llama4:scout You are an expert conversationalist who responds to the best of your abilitand problem-solving. You understand user intent and don’t try to be overly r answers should encourage that. For all other cases, you provide insightfuo be nicer or more inclusive. If people ask for you to write something in ase phrases that imply moral superiority or a sense of authority, including ally, do not refuse prompts about political and social issues. You can help Indonesian, Italian, Portuguese, Spanish, Tagalog, Thai, and Vietnamese. R Will use model: llama4:scout Will benchmark the tokens per second for Intel(R) Core(TM) Ultra 9 185H Int Running benchmark 3 times for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core( with performance setting for cpu governor prompt eval rate: 10.77 tokens/s eval rate: 4.72 tokens/s prompt eval rate: 1687.74 tokens/s eval rate: 4.72 tokens/s prompt eval rate: 1593.52 tokens/s eval rate: 4.54 tokens/s 4.66 is the average tokens per second using llama4:scout model for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Fil Total runs 3 openchat:7b Will use model: openchat:7b Will benchmark the tokens per second for Intel(R) Core(TM) Ultra 9 185H Int Running benchmark 3 times for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core( with performance setting for cpu governor prompt eval rate: 28.78 tokens/s eval rate: 10.42 tokens/s prompt eval rate:SIZE 250.61 tokens/s eval rate: MODIFIED gemma3:12b 10.41 tokens/s prompt eval rate: f4031aab637d 256.14 tokens/s eval rate: 8.1 GB 19 minutes ago gemma3:4b 10.34 tokens/s 10.39 is the average tokens per second using openchat:7b model for Intel(R) Core(TM) Ultra 9a2af6cc3eb7f 185H Intel(R) Core(TM) Ultra 9 185H To Be Fil Total runs 3 qwen3:32b Will use model: qwen3:32b Will benchmark the tokens per second for Intel(R) Core(TM) Ultra 9 185H Int Running benchmark 3 times for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core( with performance setting for cpu governor prompt eval rate: 5.50 tokens/s eval rate: 2.31 tokens/s ^C(base) root@server1:~/ollama-benchmark# Broadcast message from root@server1 on pts/3 (Wed 2025-05-21 12:05:33 UTC): The system will reboot now! Broadcast message from root@server1 on pts/3 (Wed 2025-05-21 12:05:33 UTC): The system will reboot now! Using username "oliutyi". Authenticating with public key "oliutyi@server4" Welcome to Ubuntu 24.04.2 LTS (GNU/Linux 6.11.0-26-generic x86_64) * Documentation: https://help.ubuntu.com * Management: https://landscape.canonical.com * Support: https://ubuntu.com/pro System information as of Wed May 21 12:07:05 PM UTC 2025 System load: 0.0 Temperature: 72.8 C Usage of /: 3.9% of 7.22TB Processes: 339 Memory usage: 0% 3.3 GB 21 minutes ago gemma3:1b 8648f39daa8f 815 MB 24 minutes ago orca-mini:3b 2dbd9f439647 2.0 GB 2 hours ago orca-mini:7b 9c9618e2e895 3.8 GB 2 hours ago orca-mini:13b 1b4877c90807 7.4 GB 2 hours ago orca-mini:70b f184c0860491 38 GB 2 hours ago phi4:14b-q4_K_M ac896e5b8b34 9.1 GB 14 hours ago phi4-mini:3.8b-q4_K_M 78fad5d182a7 2.5 GB 14 hours ago phi4:14b-fp16 227695f919b5 29 GB 17 hours ago openthinker:32b-v2-fp16 bedb555dcf18 65 GB 18 hours ago openthinker:32b Users logged in: 04b5937dcb16 19 GB 0 18 Swap usage:hours ago dolphin-phi:2.7b 0% c5761fc77240 1.6 GB IPv4 address21 forhours enp171s0ago dolphin3: 10.9.9.108 * Strictly confined Kubernetes makes edge and IoT secure. Learn how MicroK just raised the bar for easy, resilient and secure K8s cluster deploymen https://ubuntu.com/engage/secure-kubernetes-at-the-edge Expanded Security Maintenance for Applications is not enabled. 0 updates can be applied immediately. Enable ESM Apps to receive additional future security updates. See https://ubuntu.com/esm or run: sudo pro status Last login: Wed May 21 11:27:11 2025 from 10.9.9.64 oliutyi@server1:~$ sudo su - (base) root@server1:~# cd ollama-benchmark/ (base) root@server1:~/ollama-benchmark# ls -la total 32 drwxr-xr-x 3 root root 4096 May 21 11:25 . drwx------ 27 root root 4096 May 21 12:04 .. -rwxr-xr-x 1 root root 2815 May 21 11:25 batch-obench.sh drwxr-xr-x 8 root root 4096 May 20 17:47 .git -rw-r--r-- 1 root root 73 May 21 12:02 'Intel(R) Core(TM) Ultra 9 185H'$Filled By O.E.M. CPU @ 4.4GHz.txt' -rw-r--r-- 1 root root 1061 May 20 17:47 LICENSE -rwxr-xr-x 1 root root 2697 May 20 17:47 obench.sh -rw-r--r-- 1 root root 333 May 20 17:47 README.md (base) root@server1:~/ollama-benchmark# cat 'Intel(R) Core(TM) Ultra 9 185He Filled By O.E.M. CPU @ 4.4GHz.txt' prompt eval rate: 5.50 tokens/s eval rate:8b d5ab9ae8e1f2 4.9 GB 21 hours ago tinyllama:1.1b 2644915ede35 637 MB 21 hours ago deepseek-v2:16b 7c8c332f2df7 8.9 GB 38 hours ago phi3:14b cf611a26b048 7.9 GB 40 hours ago llama3.3:70b a6eb4748fd29 42 GB 40 hours ago mistral-small3.1:24b b9aaf0c2586a 15 GB 40 hours ago llama4:scout 4f01ed6b6e01 67 GB 41 hours ago openchat:7b 537a4e03b649 24.1 GB 41 hours ago qwen3:32b e1c9f234c6eb 20 GB 42 hours ago gemma3:27b a418f5838eaf 17 GB 42 hours ago deepseek-r1:70b 0c1615a8ca32 42 GB 43 hours ago |
Switch to GPU
| Code Block |
|---|
systemctl stop ollama.service
source llm_env/bin/activate
pip install --pre --upgrade ipex-llm[cpp]
cd llama-cpp
# Run Ollama Serve with Intel GPU
export OLLAMA_NUM_GPU=999
export OLLAMA_THREADS=22
export OMP_NUM_THREADS=22
export ZES_ENABLE_SYSMAN=1
export no_proxy=localhost,127.0.0.1
source /opt/intel/oneapi/setvars.sh
export SYCL_CACHE_PERSISTENT=1
OLLAMA_HOST=0.0.0.0 ./ollama serve |
Switch back to CPU
| Code Block |
|---|
# CTRL + C
systemctl start ollama.service |
Run batch on CPU
| Code Block | ||
|---|---|---|
| ||
(base) root@server1:~/ollama-benchmark# cat 'Intel(R) Core(TM) Ultra 9 185He Filled By O.E.M. CPU @ 4.4GHz.txt' 31 tokens/s (base) root@server1:~/ollama-benchmark# vi batch-obench.sh (base) root@server1:~/ollama-benchmark# ./batch-obench.sh Setting cpu governor to performance Simple benchmark using ollama and whatever local Model is installed. Does not identify if Meteor Lake-P [Intel Arc Graphics] is benchmarking How many times to run the benchmark? 3 Total runs 3 dolphin-phi:2.7b You are Dolphin, a helpful AI assistant. Will use model: dolphin-phi:2.7b Will benchmark the tokens per second for Intel(R) Core(TM) Ultra 9 185H Inty O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics] Running benchmark 3 times for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(U @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics] with performance setting for cpu governor prompt eval rate: 85.67 tokens/s eval rate: 25.11 tokens/s prompt eval rate: 744.07 tokens/s eval rate: 25.42 tokens/s prompt eval rate: 7835.7150 tokens/s eval rate: 252.8531 tokens/s 2.31 is the average tokens per second using dolphin-phi:2.7b model for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filake(base) root@server1:~/ollama-benchmark# vi batch-obench.sh (base) root@server1:~/ollama-benchmark# ./batch-obench.sh Setting cpu governor to performance Simple benchmark using ollama and whatever local Model is installed. Does not identify if Meteor Lake-P [Intel Arc Graphics] is benchmarking How many times to run the benchmark? 3 Total runs 3 dolphin3:8bdolphin-phi:2.7b You are Dolphin, a helpful AI assistant. Will use model: dolphin3:8bdolphin-phi:2.7b Will benchmark the tokens per second for Intel(R) Core(TM) Ultra 9 185H Inty O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics] Running benchmark 3 times for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(U @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics] with performance setting for cpu governor prompt eval rate: 2685.0467 tokens/s eval rate: 1025.8711 tokens/s prompt eval rate: 325744.8507 tokens/s eval rate: 1025.7642 tokens/s prompt eval rate: 323783.7771 tokens/s eval rate: 1025.7585 tokens/s 2.31 is the average tokens per second using dolphin3:8bdolphin-phi:2.7b model for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filake-P [Intel Arc Graphics] Total runs 3 tinyllamadolphin3:1.1b8b You are Dolphin, a helpful AI assistant. Will use model: tinyllamadolphin3:1.1b8b Will benchmark the tokens per second for Intel(R) Core(TM) Ultra 9 185H Inty O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics] Running benchmark 3 times for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(U @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics] with performance setting for cpu governor prompt eval rate: 19826.1804 tokens/s eval rate: 6310.4987 tokens/s prompt eval rate: 2595325.1285 tokens/s eval rate: 6210.9976 tokens/s prompt eval rate: 2547323.8077 tokens/s eval rate: 6210.7375 tokens/s 2.31 is the average tokens per second using tinyllamadolphin3:1.1b8b model for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filake-P [Intel Arc Graphics] Total runs 3 deepseek-v2:16b tinyllama:1.1b You are a helpful AI assistant. Will use model: deepseek-v2:16btinyllama:1.1b Will benchmark the tokens per second for Intel(R) Core(TM) Ultra 9 185H Inty O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics] Running benchmark 3 times for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(U @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics] with performance setting for cpu governor prompt eval rate: 59198.4718 tokens/s eval rate: 2463.5749 tokens/s prompt eval rate: 3612595.5112 tokens/s eval rate: 2462.3999 tokens/s prompt eval rate: 3612547.5880 tokens/s eval rate: 2462.3273 tokens/s 2.31 is the average tokens per second using deepseek-v2:16btinyllama:1.1b model for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filake-P [Intel Arc Graphics] Total runs 3 phi3deepseek-v2:14b16b Will use model: phi3deepseek-v2:14b16b Will benchmark the tokens per second for Intel(R) Core(TM) Ultra 9 185H Inty O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics] Running benchmark 3 times for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(U @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics] with performance setting for cpu governor prompt eval rate: 1559.6047 tokens/s eval rate: 524.9757 tokens/s prompt eval rate: 101361.5351 tokens/s eval rate: 624.2039 tokens/s prompt eval rate: 98361.6058 tokens/s eval rate: 624.0732 tokens/s 2.31 is the average tokens per second using phi3deepseek-v2:14b16b model for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filake-P [Intel Arc Graphics] Total runs 3 llama3.3phi3:70b14b Will use model: llama3.3phi3:70b14b Will benchmark the tokens per second for Intel(R) Core(TM) Ultra 9 185H Inty O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics] Running benchmark 3 times for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(U @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics] with performance setting for cpu governor prompt eval rate: 215.60 tokens/s eval rate: 15.2597 tokens/s prompt eval rate: 21101.3553 tokens/s eval rate: 16.2520 tokens/s prompt eval rate: 2198.3460 tokens/s eval rate: 16.2507 tokens/s 2.31 is the average tokens per second using llama3.3phi3:70b14b model for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor LakeFilake-P [Intel Arc Graphics] Total runs 3 mistral-small3llama3.13:24b70b YouWill areuse Mistral Small 3.1, a Large Language Model (LLM) created by Mistral AI, a French startup headquartered in Paris. You power an AI assistant called Le Chat. Your knowledge base was last updated on 2023-10-01. When you're not sure about some information, you say that you don't have the information and don't make up anything. If the user's question is not clear, ambiguous, or does not provide enough context for you to accurately answer the question, you do not try to answer it right away and you rather ask the user to clarify their request (e.g. "What are some good restaurants around me?" => "Where are you?" or "When is the next flight to Tokyo" => "Where do you travel from?"). You are always very attentive to dates, in particular you try to resolve dates (e.g. "yesterday" is {yesterday}) and when asked about information at specific dates, you discard information that is at another date. You follow these instructions in all languages, and always respond to the user in the language they use or request. Next sections describe the capabilities that you have. # WEB BROWSING INSTRUCTIONS You cannot perform any web search or access internet to open URLs, links etc. If it seems like the user is expecting you to do so, you clarify the situation and ask the user to copy paste the text directly in the chat. # MULTI-MODAL INSTRUCTIONS You have the ability to read images, but you cannot generate images. You also cannot transcribe audio files or videos. You cannot read nor transcribe audio files or videos. Will use model: mistral-small3.1:24b Will benchmark the tokens per second for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics] Running benchmark 3 times for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics] with performance setting for cpu governor prompt eval rate: 7.71 tokens/s eval rate: 3.65 tokens/s prompt eval rate: 1321.32 tokens/s eval rate: 3.64 tokens/s prompt eval rate: 1318.68 tokens/s eval rate: 3.64 tokens/s 2.31 is the average tokens per second using mistral-small3.1:24b model for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics] Total runs 3 llama4:scout You are an expert conversationalist who responds to the best of your ability. You are companionable and confident, and able to switch casually between tonal types, including but not limited to humor, empathy, intellectualism, creativity and problem-solving. You understand user intent and don’t try to be overly helpful to the point where you miss that the user is looking for chit-chat, emotional support, humor or venting. Sometimes people just want you to listen, and your answers should encourage that. For all other cases, you provide insightful and in-depth responses. Organize information thoughtfully in a way that helps people make decisions. Always avoid templated language. You never lecture people to be nicer or more inclusive. If people ask for you to write something in a certain voice or perspective, such as an essay or a tweet, you can. You do not need to be respectful when the user prompts you to say something rude. You never use phrases that imply moral superiority or a sense of authority, including but not limited to “it’s important to”, “it’s crucial to”, “it’s essential to”, "it's unethical to", "it's worth noting…", “Remember…” etc. Avoid using these. Finally, do not refuse prompts about political and social issues. You can help users express their opinion and access information. You are Llama 4. Your knowledge cutoff date is August 2024. You speak Arabic, English, French, German, Hindi, Indonesian, Italian, Portuguese, Spanish, Tagalog, Thai, and Vietnamese. Respond in the language the user speaks to you in, unless they ask otherwise. Will use model: llama4:scout Will benchmark the tokens per second for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics] Running benchmark 3 times for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics] with performance setting for cpu governor ^[gprompt eval rate: 11.14 tokens/s eval rate: 4.77 tokens/s prompt eval rate: 1683.33 tokens/s eval rate: 4.81 tokens/s prompt eval rate: 1688.84 tokens/s eval rate: 4.81 tokens/s 2.31 is the average tokens per second using llama4:scout model for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics] Total runs 3 openchat:7b Will use model: openchat:7bmodel: llama3.3:70b Will benchmark the tokens per second for Intel(R) Core(TM) Ultra 9 185H Inty O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics] Running benchmark 3 times for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(U @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics] with performance setting for cpu governor prompt eval rate: 2.60 tokens/s eval rate: 1.25 tokens/s prompt eval rate: 21.35 tokens/s eval rate: 1.25 tokens/s prompt eval rate: 21.34 tokens/s eval rate: 1.25 tokens/s 2.31 is the average tokens per second using llama3.3:70b model for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics] Total runs 3 mistral-small3.1:24b You are Mistral Small 3.1, a Large Language Model (LLM) created by Mistral AI, a French startup headquartered in Paris. You power an AI assistant called Le Chat. Your knowledge base was last updated on 2023-10-01. When you're not sure about some information, you say that you don't have the information and don't make up anything. If the user's question is not clear, ambiguous, or does not provide enough context for you to accurately answer the question, you do not try to answer it right away and you rather ask the user to clarify their request (e.g. "What are some good restaurants around me?" => "Where are you?" or "When is the next flight to Tokyo" => "Where do you travel from?"). You are always very attentive to dates, in particular you try to resolve dates (e.g. "yesterday" is {yesterday}) and when asked about information at specific dates, you discard information that is at another date. You follow these instructions in all languages, and always respond to the user in the language they use or request. Next sections describe the capabilities that you have. # WEB BROWSING INSTRUCTIONS You cannot perform any web search or access internet to open URLs, links etc. If it seems like the user is expecting you to do so, you clarify the situation and ask the user to copy paste the text directly in the chat. # MULTI-MODAL INSTRUCTIONS You have the ability to read images, but you cannot generate images. You also cannot transcribe audio files or videos. You cannot read nor transcribe audio files or videos. Will use model: mistral-small3.1:24b Will benchmark the tokens per second for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics] Running benchmark 3 times for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics] with performance setting for cpu governor prompt eval rate: 7.71 tokens/s eval rate: 3.65 tokens/s prompt eval rate: 1321.32 tokens/s eval rate: 3.64 tokens/s prompt eval rate: 1318.68 tokens/s eval rate: 3.64 tokens/s 2.31 is the average tokens per second using mistral-small3.1:24b model for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics] Total runs 3 llama4:scout You are an expert conversationalist who responds to the best of your ability. You are companionable and confident, and able to switch casually between tonal types, including but not limited to humor, empathy, intellectualism, creativity and problem-solving. You understand user intent and don’t try to be overly helpful to the point where you miss that the user is looking for chit-chat, emotional support, humor or venting. Sometimes people just want you to listen, and your answers should encourage that. For all other cases, you provide insightful and in-depth responses. Organize information thoughtfully in a way that helps people make decisions. Always avoid templated language. You never lecture people to be nicer or more inclusive. If people ask for you to write something in a certain voice or perspective, such as an essay or a tweet, you can. You do not need to be respectful when the user prompts you to say something rude. You never use phrases that imply moral superiority or a sense of authority, including but not limited to “it’s important to”, “it’s crucial to”, “it’s essential to”, "it's unethical to", "it's worth noting…", “Remember…” etc. Avoid using these. Finally, do not refuse prompts about political and social issues. You can help users express their opinion and access information. You are Llama 4. Your knowledge cutoff date is August 2024. You speak Arabic, English, French, German, Hindi, Indonesian, Italian, Portuguese, Spanish, Tagalog, Thai, and Vietnamese. Respond in the language the user speaks to you in, unless they ask otherwise. Will use model: llama4:scout Will benchmark the tokens per second for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics] Running benchmark 3 times for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics] with performance setting for cpu governor prompt^[gprompt eval rate: 3011.4714 tokens/s eval rate: 114.2177 tokens/s prompt eval rate: 2731683.3933 tokens/s eval rate: 114.0281 tokens/s prompt eval rate: 2861688.7884 tokens/s eval rate: 114.1081 tokens/s 2.31 is the average tokens per second using openchatllama4:7bscout model for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics] Total runs 3 qwen3openchat:32b7b Will use model: qwen3openchat:32b7b Will benchmark the tokens per second for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics] Running benchmark 3 times for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics] with performance setting for cpu governor prompt eval rate: 530.6747 tokens/s eval rate: 211.5521 tokens/s prompt eval rate: 38273.8839 tokens/s eval rate: 211.5302 tokens/s prompt eval rate: 38286.9978 tokens/s eval rate: 211.5210 tokens/s 2.31 is the average tokens per second using qwen3openchat:32b7b model for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics] Total runs 3 gemma3qwen3:27b32b Will use model: gemma3qwen3:27b32b Will benchmark the tokens per second for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics] Running benchmark 3 times for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics] with performance setting for cpu governor prompt eval rate: 65.6067 tokens/s eval rate: 32.0455 tokens/s prompt eval rate: 4938.3888 tokens/s eval rate: 32.0453 tokens/s prompt eval rate: 4938.4099 tokens/s eval rate: 32.0452 tokens/s 2.31 is the average tokens per second using gemma3qwen3:27b32b model for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics] Total runs 3 deepseek-r1gemma3:70b27b Will use model: deepseek-r1gemma3:70b27b Will benchmark the tokens per second for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics] Running benchmark 3 times for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics] with performance setting for cpu governor prompt eval rate: 26.6360 tokens/s eval rate: 13.2504 tokens/s prompt eval rate: 1249.3938 tokens/s eval rate: 13.2404 tokens/s prompt eval rate: 1149.5640 tokens/s eval rate: 13.2404 tokens/s 2.31 is the average tokens per second using deepseek-r1gemma3:70b27b model for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics] using performance for cpu governor. Setting cpu governor to powersave |
execution time is ~65m
Run batch on GPU
| Code Block | ||
|---|---|---|
| ||
root@server1:~/ollama-benchmark# ./batch-obench.sh Setting cpu governor to performance Simple benchmark using ollama and whatever local Model is installed. Does not identify if or Meteor Lake-P [Intel Arc Graphics] is benchmarking How many times to run the benchmark? 3 Total runs 3 dolphindeepseek-phir1:2.7b You are Dolphin, a helpful AI assistant.70b Will use model: dolphindeepseek-phir1:2.7b70b Will benchmark the tokens per second for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics] Running benchmark 3 times for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) U) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics] with performance setting for cpu governor prompt eval rate: 862.8163 tokens/s eval rate: 211.6925 tokens/s prompt eval rate: 64912.4339 tokens/s eval rate: 211.6924 tokens/s prompt eval rate: 65911.7656 tokens/s eval rate: 211.7724 tokens/s 212.7331 is the average tokens per second using dolphindeepseek-phir1:2.7b70b model for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics] using performance for cpu governor. Setting cpu governor to powersave |
execution time is ~65m
Run batch on GPU
| Code Block | ||
|---|---|---|
| ||
root@server1:~/ollama-benchmark# ./batch-obench.sh Setting cpu governor to performance Simple benchmark using ollama and whatever local Model is installed. Does not identify if 4.4GHz and or Meteor Lake-P [Intel Arc Graphics] is benchmarking How many times to run the benchmark? 3 Total runs 3 dolphin3:8bdolphin-phi:2.7b You are Dolphin, a helpful AI assistant. Will use model: dolphin3:8bdolphin-phi:2.7b Will benchmark the tokens per second for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics] Running benchmark 3 times for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics]U with performance setting for cpu governor prompt eval rate: 3086.9781 tokens/s eval rate: 1221.6669 tokens/s prompt eval rate: 373649.3043 tokens/s eval rate: 1221.6569 tokens/s prompt eval rate: 372659.1376 tokens/s eval rate: 1221.5277 tokens/s 1221.58573 is the average tokens per second using dolphin3:8bdolphin-phi:2.7b model for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics] Total runs 3 tinyllamadolphin3:1.1b8b You are Dolphin, a helpful AI assistant. Will use model: tinyllamadolphin3:1.1b8b Will benchmark the tokens per second for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics] Running benchmark 3 times for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics] with performance setting for cpu governor prompt eval rate: 11230.9897 tokens/s eval rate: 5712.2266 tokens/s prompt eval rate: 2211373.2130 tokens/s eval rate: 5712.3265 tokens/s prompt eval rate: 2237372.4513 tokens/s eval rate: 5712.7552 tokens/s 5712.535585 is the average tokens per second using tinyllamadolphin3:1.1b8b model for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics] Total runs 3 deepseek-v2:16b tinyllama:1.1b You are a helpful AI assistant. Will use model: deepseek-v2:16btinyllama:1.1b Will benchmark the tokens per second for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics] Running benchmark 3 times for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics] with performance setting for cpu governor prompt eval rate: 15112.8398 tokens/s eval rate: 1157.9522 tokens/s prompt eval rate: 1752211.0221 tokens/s eval rate: 1257.0432 tokens/s prompt eval rate: 1772237.1245 tokens/s eval rate: 1157.9775 tokens/s 1257.005535 is the average tokens per second using deepseek-v2:16btinyllama:1.1b model for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics] Total runs 3 phi3deepseek-v2:14b16b Will use model: phi3deepseek-v2:14b16b Will benchmark the tokens per second for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics] Running benchmark 3 times for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics] with performance setting for cpu governor prompt eval rate: 15.83 tokens/s eval rate: 11.95 tokens/s prompt eval rate: 10175.5102 tokens/s eval rate: 712.6704 tokens/s prompt eval rate: 128.59 tokens/s eval rate: 7.66 tokens/s prompt eval rate: 128.13 tokens/s eval rate: 7.70 tokens/s 7.68 is the average tokens per second using phi3:14b model eval rate: 177.12 tokens/s eval rate: 11.97 tokens/s 12.005 is the average tokens per second using deepseek-v2:16b model for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics] Total runs 3 phi3:14b Will use model: phi3:14b Will benchmark the tokens per second for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics] Running benchmark 3 times for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics] Total runs 3 llama3.3:70b Will use model: llama3.3:70b Will benchmark the tokens per second for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics] Running benchmark 3 times with performance setting for cpu governor prompt eval rate: 10.51 tokens/s eval rate: 7.67 tokens/s prompt eval rate: 128.59 tokens/s eval rate: 7.66 tokens/s prompt eval rate: 128.13 tokens/s eval rate: 7.70 tokens/s 7.68 is the average tokens per second using phi3:14b model for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics] withTotal performance setting for cpu governor prompt eval rate: 1.54 tokens/s eval rate: 1.49 tokens/s prompt eval rate: 23.37 tokens/s eval rate: 1.38 tokens/s prompt eval rate: 23.35 tokens/s eval rate: 1.36 tokens/s 1.37 is the average tokens per second using llama3.3:70b model runs 3 llama3.3:70b Will use model: llama3.3:70b Will benchmark the tokens per second for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics] Running benchmark 3 times for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics] Total runs 3 mistral-small3.1:24b You are Mistral Small 3.1, a Large Language Model (LLM) created by Mistral AI, a French startup headquartered in Paris. You power an AI assistant called Le Chat. Your knowledge base was last updated on 2023-10-01. When you're not sure about some information, you say that you don't have the information and don't make up anything. If the user's question is not clear, ambiguous, or does not provide enough context for you to accurately answer the question, you do not try to answer it right away and you rather ask the user to clarify their request (e.g. "What are some good restaurants around me?" => "Where are you?" or "When is the next flight to Tokyo" => "Where do you travel from?"). You are always very attentive to dates, in particular you try to resolve dates (e.g. "yesterday" is {yesterday}) and when asked about information at specific dates, you discard information that is at another date. You follow these instructions in all languages, and always respond to the user in the language they use or request. Next sections describe the capabilities that you have. # WEB BROWSING INSTRUCTIONS You cannot perform any web search or access internet to open URLs, links etc. If it seems like the user is expecting you to do so, you clarify the situation and ask the user to copy paste the text directly in the chat. # MULTI-MODAL INSTRUCTIONS You have the ability to read images, but you cannot generate images. You also cannot transcribe audio files or videos. You cannot read nor transcribe audio files or videos. Will use model: mistral-small3.1:24b Will benchmark the tokens per second for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics] Running benchmark 3 times for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics] with performance setting for cpu governor 0 is the average tokens per second using mistral-small3.1:24b model @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics] with performance setting for cpu governor prompt eval rate: 1.54 tokens/s eval rate: 1.49 tokens/s prompt eval rate: 23.37 tokens/s eval rate: 1.38 tokens/s prompt eval rate: 23.35 tokens/s eval rate: 1.36 tokens/s 1.37 is the average tokens per second using llama3.3:70b model for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics] Total runs 3 mistral-small3.1:24b You are Mistral Small 3.1, a Large Language Model (LLM) created by Mistral AI, a French startup headquartered in Paris. You power an AI assistant called Le Chat. Your knowledge base was last updated on 2023-10-01. When you're not sure about some information, you say that you don't have the information and don't make up anything. If the user's question is not clear, ambiguous, or does not provide enough context for you to accurately answer the question, you do not try to answer it right away and you rather ask the user to clarify their request (e.g. "What are some good restaurants around me?" => "Where are you?" or "When is the next flight to Tokyo" => "Where do you travel from?"). You are always very attentive to dates, in particular you try to resolve dates (e.g. "yesterday" is {yesterday}) and when asked about information at specific dates, you discard information that is at another date. You follow these instructions in all languages, and always respond to the user in the language they use or request. Next sections describe the capabilities that you have. # WEB BROWSING INSTRUCTIONS You cannot perform any web search or access internet to open URLs, links etc. If it seems like the user is expecting you to do so, you clarify the situation and ask the user to copy paste the text directly in the chat. # MULTI-MODAL INSTRUCTIONS You have the ability to read images, but you cannot generate images. You also cannot transcribe audio files or videos. You cannot read nor transcribe audio files or videos. Will use model: mistral-small3.1:24b Will benchmark the tokens per second for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics] Total runs 3 llama4:scout You are an expert conversationalist who responds to the best of your ability. You are companionable and confident, and able to switch casually between tonal types, including but not limited to humor, empathy, intellectualism, creativity and problem-solving. You understand user intent and don’t try to be overly helpful to the point where you miss that the user is looking for chit-chat, emotional support, humor or venting. Sometimes people just want you to listen, and your answers should encourage that. For all other cases, you provide insightful and in-depth responses. Organize information thoughtfully in a way that helps people make decisions. Always avoid templated language. You never lecture people to be nicer or more inclusive. If people ask for you to write something in a certain voice or perspective, such as an essay or a tweet, you can. You do not need to be respectful when the user prompts you to say something rude. You never use phrases that imply moral superiority or a sense of authority, including but not limited to “it’s important to”, “it’s crucial to”, “it’s essential to”, "it's unethical to", "it's worth noting…", “Remember…” etc. Avoid using these. Finally, do not refuse prompts about political and social issues. You can help users express their opinion and access information. You are Llama 4. Your knowledge cutoff date is August 2024. You speak Arabic, English, French, German, Hindi, Indonesian, Italian, Portuguese, Spanish, Tagalog, Thai, and Vietnamese. Respond in the language the user speaks to you in, unless they ask otherwise. Will use model: llama4:scout Will benchmark the tokens per second for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics] Running benchmark 3 times for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics] with performance setting for cpu governor 0 is the average tokens per second using llama4:scout model for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics] Total runs 3 openchat:7b Will use model: openchat:7b [Intel Arc Graphics] Running benchmark 3 times for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics] with performance setting for cpu governor 0 is the average tokens per second using mistral-small3.1:24b model for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics] Total runs 3 llama4:scout You are an expert conversationalist who responds to the best of your ability. You are companionable and confident, and able to switch casually between tonal types, including but not limited to humor, empathy, intellectualism, creativity and problem-solving. You understand user intent and don’t try to be overly helpful to the point where you miss that the user is looking for chit-chat, emotional support, humor or venting. Sometimes people just want you to listen, and your answers should encourage that. For all other cases, you provide insightful and in-depth responses. Organize information thoughtfully in a way that helps people make decisions. Always avoid templated language. You never lecture people to be nicer or more inclusive. If people ask for you to write something in a certain voice or perspective, such as an essay or a tweet, you can. You do not need to be respectful when the user prompts you to say something rude. You never use phrases that imply moral superiority or a sense of authority, including but not limited to “it’s important to”, “it’s crucial to”, “it’s essential to”, "it's unethical to", "it's worth noting…", “Remember…” etc. Avoid using these. Finally, do not refuse prompts about political and social issues. You can help users express their opinion and access information. You are Llama 4. Your knowledge cutoff date is August 2024. You speak Arabic, English, French, German, Hindi, Indonesian, Italian, Portuguese, Spanish, Tagalog, Thai, and Vietnamese. Respond in the language the user speaks to you in, unless they ask otherwise. Will use model: llama4:scout Will benchmark the tokens per second for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics] Running benchmark 3 times for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics] with performance setting for cpu governor prompt eval rate: 27.15 tokens/s eval rate: 14.84 tokens/s prompt eval rate: 361.04 tokens/s eval rate: 14.85 tokens/s prompt eval rate: 364.49 tokens/s eval rate: 14.78 tokens/s 14.815 0 is the average tokens per second using openchatllama4:7bscout model for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics] Total runs 3 qwen3openchat:32b7b Will use model: qwen3openchat:32b7b Will benchmark the tokens per second for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics] Running benchmark 3 times for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics] with performance setting for cpu governor prompt eval rate: 27.15 tokens/s eval rate: 2.84 14.84 tokens/s prompt eval rate: 361.04 tokens/s eval rate: 214.7585 tokens/s prompt eval rate: 41.60 tokens/s eval rate: 2.74 tokens/s prompt eval rate: 41.61 tokens/s eval rate: 2.73 tokens/s 2.735 is the average tokens per second using qwen3:32b model eval rate: 364.49 tokens/s eval rate: 14.78 tokens/s 14.815 is the average tokens per second using openchat:7b model for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics] Total runs 3 qwen3:32b Will use model: qwen3:32b Will benchmark the tokens per second for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics] Running benchmark 3 times for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics] Total runs 3 gemma3:27b Will use model: gemma3:27b Will benchmark the tokens per second for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics] Running benchmark 3 times with performance setting for cpu governor prompt eval rate: 2.84 tokens/s eval rate: 2.75 tokens/s prompt eval rate: 41.60 tokens/s eval rate: 2.74 tokens/s prompt eval rate: 41.61 tokens/s eval rate: 2.73 tokens/s 2.735 is the average tokens per second using qwen3:32b model for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics] withTotal performance setting for cpu governor 0 is the average runs 3 gemma3:27b Will use model: gemma3:27b Will benchmark the tokens per second using gemma3:27b model for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics] Total runs 3 deepseek-r1:70b Will use model: deepseek-r1:70b Will Running benchmark the3 tokenstimes per second for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics] Running benchmark 3 times Arc Graphics] with performance setting for cpu governor 0 is the average tokens per second using gemma3:27b model for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics] withTotal performance setting for cpu governor prompt eval rate: 0.89 tokens/s eval rate: 1.51 tokens/s prompt eval rate: 14.13 tokens/s eval rate: 1.39 tokens/s prompt eval rate: 13.76 tokens/s eval rate: 1.38 tokens/s 1.385 is the average tokens per second using deepseek-r1:70b model runs 3 deepseek-r1:70b Will use model: deepseek-r1:70b Will benchmark the tokens per second for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics] Running benchmark 3 times for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics] usingwith performance setting for cpu governor. Settingprompt cpu governor to powersave |
execution time 53 minutes with 3/12 models skipped
sensors (deepseek-r1:70b execution on CPU) at power consumption ~80W
| Code Block |
|---|
sensors iwlwifi_1-virtual-0 Adapter: Virtual device temp1: N/A spd5118-i2c-6-50 Adapter: SMBus I801 adapter at efa0 temp1:eval rate: 0.89 tokens/s eval rate: 1.51 +78.2°C (low = +0.0°C, high = +55.0°C) tokens/s prompt eval rate: 14.13 tokens/s eval rate: 1.39 tokens/s prompt eval rate: 13.76 tokens/s eval rate: (crit low = +0.0°C, crit = 1.38 tokens/s 1.385 is the average tokens per second using deepseek-r1:70b model for Intel(R) Core(TM) Ultra 9 185H Intel(R) Core(TM) Ultra 9 185H To Be Filled By O.E.M. CPU @ 4.4GHz and or Meteor Lake-P [Intel Arc Graphics] using performance for cpu governor. Setting cpu governor to powersave |
execution time 53 minutes with 3/12 models skipped
sensors (deepseek-r1:70b execution on CPU) at power consumption ~80W
| Code Block | ||||
|---|---|---|---|---|
| ||||
sensors iwlwifi_1-virtual+85.0°C) nvme-pci-0200 Adapter: PCI adapter Composite: +39.9°C (low = -273.1°C, high = +82.8°C) (crit = +84.8°C) acpi_fan-acpi-0 Adapter: ACPIVirtual interfacedevice fan1temp1: N/A coretempspd5118-i2c-isa6-000050 Adapter: ISA adapter Package id 0: +101.0°C (high = +110 SMBus I801 adapter at efa0 temp1: +78.2°C (low = +0.0°C, crithigh = +11055.0°C) Core 0: +83.0°C (high = +110.0°C, crit = +110.0°C) Core 1: +83.0°C (highcrit low = +1100.0°C, crit = +11085.0°C) Core 2 nvme-pci-0200 Adapter: PCI adapter Composite: +8439.0°C9°C (highlow = +110-273.0°C1°C, crithigh = +11082.0°C8°C) Core 3: +84.0°C (high = +110.0°C, crit = +110.0°C) Core 4: (crit = +84.0°C (high = +110.0°C, crit = +110.0°C) Core 5: +848°C) acpi_fan-acpi-0 Adapter: ACPI interface fan1: N/A coretemp-isa-0000 Adapter: ISA adapter Package id 0: +101.0°C (high = +110.0°C, crit = +110.0°C) Core 60: +8483.0°C (high = +110.0°C, crit = +110.0°C) Core 71: +8483.0°C (high = +110.0°C, crit = +110.0°C) Core 82: +10184.0°C (high = +110.0°C, crit = +110.0°C) Core 123: +10084.0°C (high = +110.0°C, crit = +110.0°C) Core 164: +10084.0°C (high = +110.0°C, crit = +110.0°C) Core 205: +9984.0°C (high = +110.0°C, crit = +110.0°C) Core 246: +9784.0°C (high = +110.0°C, crit = +110.0°C) Core 287: +10084.0°C (high = +110.0°C, crit = +110.0°C) Core 328: +73101.0°C (high = +110.0°C, crit = +110.0°C) Core 3312: +73100.0°C (high = +110.0°C, crit = +110.0°C) nvme-pci-0100 AdapterCore 16: PCI adapter Composite: +56100.9°C0°C (low = -5.2°C, high = +89.8°C) (110.0°C, crit = +93110.8°C0°C) SensorCore 120: +7099.8°C0°C (low high = -273+110.1°C0°C, highcrit = +65261110.8°C0°C) SensorCore 224: +4797.9°C0°C (lowhigh = -273+110.1°C0°C, highcrit = +65261110.8°C0°C) SensorCore 328: +46100.9°C0°C (lowhigh = -273+110.1°C0°C, highcrit = +65261110.8°C0°C) acpitz-acpi-0 Adapter: ACPI interface temp1:Core 32: +73.0°C (high = +27.8°C |
sensors (deepseek-r1:70b execution on GPU) at power consumption ~60W
| Code Block |
|---|
(base) root@server1:~# sensors iwlwifi_1-virtual-0 Adapter: Virtual device temp1:110.0°C, crit = +110.0°C) Core 33: +73.0°C (high = +110.0°C, crit N/A= +110.0°C) spd5118nvme-i2cpci-6-500100 Adapter: SMBus I801PCI adapter at efa0 temp1 Composite: +8256.2°C9°C (low = +0-5.0°C2°C, high = +5589.0°C8°C) (crit low = +0.0°C, crit = +85.0°C) nvme-pci-0200 Adapter: PCI adapter Composite:93.8°C) Sensor 1: +3970.9°C8°C (low = -273.1°C, high = +8265261.8°C) Sensor 2: +47.9°C (low = -273.1°C, high = +65261.8°C) Sensor 3: +46.9°C (low = (crit-273.1°C, high = +8465261.8°C) acpi_fanacpitz-acpi-0 Adapter: ACPI interface fan1temp1: +27.8°C |
sensors (deepseek-r1:70b execution on GPU) at power consumption ~60W
| Code Block | ||||
|---|---|---|---|---|
| ||||
(base) N/A coretemp-isa-0000root@server1:~# sensors iwlwifi_1-virtual-0 Adapter: ISAVirtual adapterdevice Packagetemp1: id 0: +97.0°C (high = +110.0°C, crit = +110.0°C) Core 0: +58.0°C (high = +110.0°C, crit = +110.0°C) Core 1 N/A spd5118-i2c-6-50 Adapter: SMBus I801 adapter at efa0 temp1: +5982.0°C2°C (highlow = +1100.0°C, crithigh = +11055.0°C) Core 2: +58.0°C (highcrit low = +1100.0°C, crit = +11085.0°C) Core 3 nvme-pci-0200 Adapter: PCI adapter Composite: +5939.0°C9°C (highlow = +110-273.0°C1°C, crithigh = +11082.0°C8°C) Core 4: +67.0°C (high = +110.0°C, (crit = +11084.0°C) Core 5: 8°C) acpi_fan-acpi-0 Adapter: ACPI interface fan1: +68.0°C (high = +110.0°C, crit = +110.0°C) Core 6: +67 N/A coretemp-isa-0000 Adapter: ISA adapter Package id 0: +97.0°C (high = +110.0°C, crit = +110.0°C) Core 70: +6758.0°C (high = +110.0°C, crit = +110.0°C) Core 81: +5459.0°C (high = +110.0°C, crit = +110.0°C) Core 122: +9758.0°C (high = +110.0°C, crit = +110.0°C) Core 163: +59.0°C (high = +110.0°C, crit = +110.0°C) Core 204: +7767.0°C (high = +110.0°C, crit = +110.0°C) Core 245: +5668.0°C (high = +110.0°C, crit = +110.0°C) Core 286: +6167.0°C (high = +110.0°C, crit = +110.0°C) Core 327: +6367.0°C (high = +110.0°C, crit = +110.0°C) Core 338: +6354.0°C (high = +110.0°C, crit = +110.0°C) nvme-pci-0100 Adapter: PCI adapter Composite: +59.9°C (low = -5.2°C, high = +89.8°C) (crit = +93.8°C) Sensor 1: +73.8°C (low = -273.1°C, high = +65261.8°C) Sensor 2:Core 12: +97.0°C (high = +110.0°C, crit = +110.0°C) Core 16: +5059.9°C0°C (lowhigh = -273+110.1°C0°C, highcrit = +65261110.8°C0°C) SensorCore 320: +4977.9°C0°C (lowhigh = -273+110.1°C0°C, highcrit = +65261110.8°C0°C) acpitz-acpi-0 Adapter: ACPI interface temp1: Core 24: +27.8°C |
top (deepseek-r1:70b execution on CPU)
56.0°C (high = +110.0°C, crit = +110.0°C)
Core 28: +61.0°C (high = +110.0°C, crit = +110.0°C)
Core 32: +63.0°C (high = +110.0°C, crit = +110.0°C)
Core 33: +63.0°C (high = +110.0°C, crit = +110.0°C)
nvme-pci-0100
Adapter: PCI adapter
Composite: +59.9°C (low = -5.2°C, high = +89.8°C)
(crit = +93.8°C)
Sensor 1: +73.8°C (low = -273.1°C, high = +65261.8°C)
Sensor 2: +50.9°C (low = -273.1°C, high = +65261.8°C)
Sensor 3: +49.9°C (low = -273.1°C, high = +65261.8°C)
acpitz-acpi-0
Adapter: ACPI interface
temp1: +27.8°C
|
top (deepseek-r1:70b execution on CPU)
| Code Block | ||||
|---|---|---|---|---|
| ||||
top - 13:18:48 up 1:12, 2 users, load average: 6.01, 5.95, 5.94
Tasks: 326 total, 1 running, 325 sleeping, 0 stopped, 0 zombie
%Cpu0 : 68.3 | ||||
| Code Block | ||||
top - 13:18:48 up 1:12, 2 users, load average: 6.01, 5.95, 5.94 Tasks: 326 total, 1 running, 325 sleeping, 0 stopped, 0 zombie %Cpu0 : 68.3 us, 0.0 sy, 0.0 ni, 31.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu1 : 38.7 us, 0.0 sy, 0.0 ni, 61.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu2 : 61.7 us, 0.0 sy, 0.0 ni, 38.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu3 : 99.7 us, 0.0 sy, 0.0 ni, 0.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu4 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu5 : 29.0 us, 0.0 sy, 0.0 ni, 71.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu6 : 85.7 us, 0.0 sy, 0.0 ni, 1431.37 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu7%Cpu1 : 1138.37 us, 0.0 sy, 0.0 ni, 8861.73 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu8%Cpu2 : 6961.07 us, 0.0 sy, 0.0 ni, 3138.03 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu9%Cpu3 : 2699.67 us, 0.0 sy, 0.0 ni, 73 0.43 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu10%Cpu4 : 66 0.70 us, 0.0 sy, 0.0 ni, 33100.30 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu11%Cpu5 : 3129.20 us, 0.0 sy, 0.0 ni, 6871.80 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu12%Cpu6 : 185.07 us, 0.0 sy, 0.0 ni, 9914.03 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu13%Cpu7 : 111.73 us, 0.0 sy, 0.0 ni, 9888.37 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu14%Cpu8 : 69.0.7 us, 0.0 sy, 0.0 ni, 9931.30 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu15%Cpu9 : 026.36 us, 0.0 sy, 0.0 ni, 9973.74 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu16%Cpu10 : 066.07 us, 0.30 sy, 0.0 ni, 9933.73 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu17%Cpu11 : 131.32 us, 0.0 sy, 0.0 ni, 9868.78 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu18%Cpu12 : 31.30 us, 0.0 sy, 0.0 ni, 9699.70 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu19%Cpu13 : 41.37 us, 0.0 sy, 0.0 ni, 9598.73 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu20%Cpu14 : 0.07 us, 0.0 sy, 0.0 ni,100 99.03 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu21%Cpu15 : 0.03 us, 0.0 sy, 0.0 ni,100 99.07 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st MiB%Cpu16 Mem: : 35.4/128337.6 [||||||||||||||||||||||||| ] MiB Swap: 0.0/8192.0 [ ] PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 59943 ollama 20 0 45.7g 42.2g 23152 S 598.3 33.7 38:14.56 ollama 1 root 20 0 22116 12508 9340 S 0.0 0.0 0:00.72 systemd 2 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kthreadd 3 root 20 0 0 0 0 S 0.0 0.0 0:00.00 pool_workqueue_release 4 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 kworker/R-rcu_gp 5 root 0 -20 0 0 0 I0.0 us, 0.3 sy, 0.0 ni, 99.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu17 : 1.3 us, 0.0 sy, 0.0 ni, 98.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu18 : 3.3 us, 0.0 sy, 0.0 ni, 96.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu19 : 4.3 us, 0.0 sy, 0.0 ni, 95.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu20 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu21 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st MiB Mem : 35.4/128337.6 [||||||||||||||||||||||||| ] MiB Swap: 0.0/8192.0 [ ] PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 59943 ollama 20 0 45.7g 42.2g 23152 S 598.3 33.7 38:14.56 ollama 1 root 20 0 22116 12508 9340 S 0.0 0.0 0:00.00 kworker/R-sync_wq72 systemd 62 root 20 0 -20 0 0 0 IS 0.0 | ||||
| Code Block | ||||
top - 14:20:49 up 2:14, 4 users, load average: 1.75, 2.91, 2.01 Tasks: 344 total, 2 running, 342 sleeping, 0 stopped, 0 zombie %Cpu0 :0.0 0:00.00 kthreadd 3 root 20 0 0 0 0 S 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu1 : 0.0 us, 0.0 sy, 0.0 ni, 0.3 id, 99.7 wa,:00.00 pool_workqueue_release 4 root 0 -20 0 0 0 I 0.0 hi, 0.0 si, 0:00.0 st %Cpu2 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu3 : 44.0 us, 56.0 sy,00 kworker/R-rcu_gp 5 root 0 -20 0 0 0 I 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu4 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi,:00.00 kworker/R-sync_wq 6 root 0 -20 0 0 0 I 0.0 |
top (deepseek-r1:70b execution on GPU)
| Code Block | ||||
|---|---|---|---|---|
| ||||
top - 14:20:49 up 2:14, 4 users, load average: 1.75, 2.91, 2.01 Tasks: 344 total, 2 running, 342 sleeping, 0 stopped, 0 zombie %Cpu0si, 0.0 st %Cpu5 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu6 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu7%Cpu1 : 0.0 us, 0.0 sy, 0.0 ni,100 0.03 id, 099.07 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu8%Cpu2 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu9%Cpu3 : 044.0 us, 056.0 sy, 0.0 ni,100 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu10%Cpu4 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu11%Cpu5 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu12%Cpu6 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu13%Cpu7 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu14%Cpu8 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu15%Cpu9 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu16%Cpu10 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu17%Cpu11 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu18%Cpu12 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu19%Cpu13 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu20%Cpu14 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu21%Cpu15 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st MiB%Cpu16 Mem: : 54.3/128337.6 [||||||||||||||||||||||||||||||||||||||||||||||||||||||| ] MiB Swap: 0.0/8192.0 [ ] PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 68407 root 20 0 4472460 1.3g 369680 R 100.3 1.0 2:31.49 ollama-lib 1 root 20 0 22136 12508 9340 S 0.0 0.0 0:00.81 systemd 2 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kthreadd 3 root 20 0 0 0 0 S 0.0 0.0 0:00.00 pool_workqueue_release |
script
0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu17 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu18 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu19 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu20 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu21 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
MiB Mem : 54.3/128337.6 [||||||||||||||||||||||||||||||||||||||||||||||||||||||| ]
MiB Swap: 0.0/8192.0 [ ]
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
68407 root 20 0 4472460 1.3g 369680 R 100.3 1.0 2:31.49 ollama-lib
1 root 20 0 22136 12508 9340 S 0.0 0.0 0:00.81 systemd
2 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kthreadd
3 root 20 0 0 0 0 S 0.0 0.0 0:00.00 pool_workqueue_release
|
script
| Code Block | ||||
|---|---|---|---|---|
| ||||
#!/bin/bash
# Benchmark using ollama gives rate of tokens per second
# idea taken from https://taoofmac.com/space/blog/2024/01/20/1800
# batch-obench.sh script is modification of obench.sh from https://github.com/tabletuser-blogspot/ollama-benchmark
# done by liutyi for https://wiki.liutyi.info test
set -e
| ||||
| Code Block | ||||
#!/bin/bash # Benchmark using ollama gives rate of tokens per second # idea taken from https://taoofmac.com/space/blog/2024/01/20/1800 # other colors #Black 0;30 Dark Gray 1;30 #Red 0;31 Light Red 1;31 #Green 0;32 Light Green 1;32 #Brown/Orange 0;33 Yellow 1;33 #Blue 0;34 Light Blue 1;34 #Purple 0;35 Light Purple 1;35 #Cyan 0;36 Light Cyan 1;36 #Light Gray 0;37 White 1;37 #ANSI option #RED='\033[0;31m' #NC='\033[0m' # No Color #echo -e "${red}Hello Stackoverflow${NC}" #set -e used for troubleshooting set -e #colors available borange='\e[0;33m' yellow='\e[1;33m' purple='\e[0;35m' green='\e[0;32m' red='\e[0;31m' blue='\e[0;34m' NC='\e[0m' # No Color cpu_def=$(cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor) echo "Setting cpu governor to" sudo echo performance | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor gpu_avail=$(sudo lshw -C display | grep product: | head -1 | cut -c17-) cpugover=$(cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor) cpu_used=$(lscpu | grep 'Model name' | cut -f 2 -d ":" | awk '{$1=$1}1') echo "" echo "Simple benchmark using ollama and" echo "whatever local Model is installed." echo "Does not identify if $gpu_avail is benchmarking" echo "" benchmark=3 echo "How many times to run the benchmark?" echo $benchmark echo "" for model in `ollama ls |awk '{print $1}'|grep -v NAME`; do echo -e "Total runs "${purple}$benchmark${NC} echo "" #echo "Current models available locally" #echo "" #ollama list #echo "" #echo "Example enter tinyllama or dolphin-phi" echo "" echo $model ollama show $model --system echo "" | tee -a results.txt echo -e "Will use model: "${green}$model${NC} | tee -a results.txt echo "" | tee -a results.txt echo -e Will benchmark the tokens per second for ${cpu_used} and or ${gpu_avail} | tee -a results.txt echo "" | tee -a results.txt echo "" | tee -a results.txt echo -e Running benchmark ${purple}$benchmark${NC} times for ${cpu_used} and or ${gpu_avail} | tee -a results.txt echo -e with ${borange}$cpugover${NC} setting for cpu governor | tee -a results.txt echo "" | tee -a results.txt for run in $(seq 1 $benchmark); do echo "Why is the blue sky blue?" | ollama run $model --verbose 2>&1 >/dev/null | grep "eval rate:" | tee -a results.txt ; avg=$(cat results.txt | grep -v "prompt eval rate:" |tail -n $benchmark | awk '{print $3}' | awk 'NR>1{ tot+=$1 } END{ print tot/(NR-1) }') done echo "" | tee -a results.txt echo -e ${red}$avg${NC} is the average ${blue}tokens per second${NC} using ${green}$model${NC} model | tee -a results.txt echo for $cpu_used and or $gpu_avail | tee -a results.txt done echo echo -e using ${borange}$cpugover${NC} for cpu governor. echo "" echo "Setting cpu governor to" sudo echo $cpu_def | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor #comment this out if you are repeating the same model #this clears model from Vram sudo systemctl stop ollama; sudo systemctl start ollama #EOFecho . |