...
https://www.intel.com/content/www
/us/en/developer/articles/technical/run-llms-on-gpus-using-llama-cpp.html
...
| Code Block |
|---|
wget https://github.com/openvinotoolkit/model_server/releases/download/v2025.1/ovms_ubuntu24_python_on.tar.gz
tar -xzvf ovms_ubuntu24_python_on.tar.gz
export LD_LIBRARY_PATH=${PWD}/ovms/lib
export PATH=$PATH:${PWD}/ovms/bin
curl --create-dirs -k https://storage.openvinotoolkit.org/repositories/open_model_zoo/2022.1/models_bin/2/resnet50-binary-0001/FP32-INT1/resnet50-binary-0001.xml -o models/resnet50/1/model.xml
curl --create-dirs -k https://storage.openvinotoolkit.org/repositories/open_model_zoo/2022.1/models_bin/2/resnet50-binary-0001/FP32-INT1/resnet50-binary-0001.bin -o models/resnet50/1/model.bin
chmod -R 755 models
export PYTHONPATH=${PWD}/ovms/lib/python
sudo apt -y install libpython3.12
pip3 install "Jinja2==3.1.6" "MarkupSafe==3.0.2"
ovms --port 9000 --model_name resnet --model_path models/resnet50 |
ollama + WebUI on Intel Arc
ollama
| Code Block |
|---|
sudo apt update
sudo apt upgrade
sudo add-apt-repository ppa:deadsnakes/ppa
sudo apt install python3.11 -y
sudo apt install python3.11-venv -y
python3.11 -V
python3.11 -m venv llm_env
source llm_env/bin/activate
pip install --pre --upgrade ipex-llm[cpp]
mkdir llama-cpp
cd llama-cpp
# Run Ollama Serve with Intel GPU
export OLLAMA_NUM_GPU=999
export no_proxy=localhost,127.0.0.1
export ZES_ENABLE_SYSMAN=1
source /opt/intel/oneapi/setvars.sh
export SYCL_CACHE_PERSISTENT=1
# localhost access
# ./ollama serve
# for non-localhost access
OLLAMA_HOST=0.0.0.0 ./ollama serve
|
| Code Block |
|---|
(base) root@server1:~/llama-cpp# ./ollama list
NAME ID SIZE MODIFIED
qwen3:32b e1c9f234c6eb 20 GB 28 minutes ago
gemma3:27b a418f5838eaf 17 GB 37 minutes ago
deepseek-r1:70b 0c1615a8ca32 42 GB About an hour ago
|
pull model
| Code Block |
|---|
(base) root@server1:~/llama-cpp# ./ollama pull openchat:7b
pulling manifest
pulling 1cecc26325a1... 7% ▕███████████
|
Web-UI
| Code Block |
|---|
source llm_env/bin/activate
#pip install open-webui==0.2.5
pip install open-webui # 0.6.10
open-webui serve |
| sec to load model | layers to GPU | ||
|---|---|---|---|
| DeepSeek R1 Distill Llama 70B | 54.25 | 81/81 | |
llama.cpp
https://github.com/ggml-org/llama.cpp
...


