From c81b7fc003f8ba04086c5a9343aca4530d388a01 Mon Sep 17 00:00:00 2001
From: Xin Qiu <qiuxin2012@users.noreply.github.com>
Date: Wed, 19 Feb 2025 19:13:55 +0800
Subject: [PATCH] Add Portable zip Linux QuickStart (#12849)

* linux doc

* update

* Update ollama_portablze_zip_quickstart.md

* Update ollama_portablze_zip_quickstart.md

* Update ollama_portablze_zip_quickstart.zh-CN.md

* Update ollama_portablze_zip_quickstart.md

* meet code review

* update

* Add tips & troubleshooting sections for both Linux & Windows

* Rebase

* Fix based on comments

* Small fix

* Fix img

* Update table for linux

* Small fix

---------

Co-authored-by: Yuwen Hu <yuwen.hu@intel.com>
---
 .../ollama_portablze_zip_quickstart.md        | 152 ++++++++++++++----
 .../ollama_portablze_zip_quickstart.zh-CN.md  |  96 ++++++++---
 2 files changed, 194 insertions(+), 54 deletions(-)
diff --git a/docs/mddocs/Quickstart/ollama_portablze_zip_quickstart.md b/docs/mddocs/Quickstart/ollama_portablze_zip_quickstart.md
index c31abbf9..5bfdcd5d 100644
--- a/docs/mddocs/Quickstart/ollama_portablze_zip_quickstart.md
+++ b/docs/mddocs/Quickstart/ollama_portablze_zip_quickstart.md
@@ -5,17 +5,22 @@
 
 This guide demonstrates how to use [Ollama portable zip](https://github.com/intel/ipex-llm/releases/tag/v2.2.0-nightly) to directly run Ollama on Intel GPU with `ipex-llm` (without the need of manual installations).
 
-> [!NOTE]
-> Currently, IPEX-LLM only provides Ollama portable zip on Windows.
-
 ## Table of Contents
-- [Prerequisites](#prerequisitesa)
-- [Step 1: Download and Unzip](#step-1-download-and-unzip)
-- [Step 2: Start Ollama Serve](#step-2-start-ollama-serve)
-- [Step 3: Run Ollama](#step-3-run-ollama)
-- [Tips & Troubleshooting](#tips--troubleshootings)
+- [Windows Quickstart](#windows-quickstart)
+  - [Prerequisites](#prerequisites)
+  - [Step 1: Download and Unzip](#step-1-download-and-unzip)
+  - [Step 2: Start Ollama Serve](#step-2-start-ollama-serve)
+  - [Step 3: Run Ollama](#step-3-run-ollama)
+- [Linux Quickstart](#linux-quickstart)
+  - [Prerequisites](#prerequisites-1)
+  - [Step 1: Download and Extract](#step-1-download-and-extract)
+  - [Step 2: Start Ollama Serve](#step-2-start-ollama-serve-1)
+  - [Step 3: Run Ollama](#step-3-run-ollama-1)
+- [Tips & Troubleshooting](#tips--troubleshooting)
 
-## Prerequisites
+## Windows Quickstart
+
+### Prerequisites
 
 Check your GPU driver version, and update it if needed:
 
@@ -23,13 +28,13 @@ Check your GPU driver version, and update it if needed:
 
 - For other Intel iGPU/dGPU, we recommend using GPU driver version [32.0.101.6078](https://www.intel.com/content/www/us/en/download/785597/834050/intel-arc-iris-xe-graphics-windows.html)
 
-## Step 1: Download and Unzip
+### Step 1: Download and Unzip
 
-Download IPEX-LLM Ollama portable zip from the [link](https://github.com/intel/ipex-llm/releases/tag/v2.2.0-nightly).
+Download IPEX-LLM Ollama portable zip for Windows users from the [link](https://github.com/intel/ipex-llm/releases/tag/v2.2.0-nightly).
 
 Then, extract the zip file to a folder.
 
-## Step 2: Start Ollama Serve
+### Step 2: Start Ollama Serve
 
 Double-click `start-ollama.bat` in the extracted folder to start the Ollama service. A window will then pop up as shown below:
 
@@ -37,7 +42,7 @@ Double-click `start-ollama.bat` in the extracted folder to start the Ollama serv
   <img src="https://llm-assets.readthedocs.io/en/latest/_images/ollama_portable_start_ollama.png"  width=80%/>
 </div>
 
-## Step 3: Run Ollama
+### Step 3: Run Ollama
 
 You could then use Ollama to run LLMs on Intel GPUs as follows:
 
@@ -48,17 +53,70 @@ You could then use Ollama to run LLMs on Intel GPUs as follows:
   <img src="https://llm-assets.readthedocs.io/en/latest/_images/ollama_portable_run_ollama.png"  width=80%/>
 </div>
 
+## Linux Quickstart
+
+### Prerequisites
+
+Check your GPU driver version, and update it if needed:
+
+- For client GPU, like A-series, B-series and integrated GPU, we recommend following [Intel client GPU driver installing guide](https://dgpu-docs.intel.com/driver/client/overview.html) to install your GPU driver.
+
+### Step 1: Download and Extract
+
+Download IPEX-LLM Ollama portable tgz for Ubuntu users from the [link](https://github.com/intel/ipex-llm/releases/tag/v2.2.0-nightly).
+
+Then open a terminal, extract the tgz file to a folder.
+
+```bash
+cd PATH/TO/DOWNLOADED/TGZ
+tar xvf [Downloaded tgz file]
+```
+
+### Step 2: Start Ollama Serve
+
+Enter the extracted folder, and run `start-ollama.sh` to start Ollama service.  
+
+```bash
+cd PATH/TO/EXTRACTED/FOLDER
+./start-ollama.sh
+```
+
+<div align="center">
+  <img src="https://llm-assets.readthedocs.io/en/latest/_images/ollama_portable_start_ollama_ubuntu.png"  width=80%/>
+</div>
+
+
+### Step 3: Run Ollama
+
+You could then use Ollama to run LLMs on Intel GPUs as follows:
+
+- Open another ternimal, and enter the extracted folder through `cd PATH/TO/EXTRACTED/FOLDER`
+- Run `./ollama run deepseek-r1:7b` (you may use any other model)
+
+<div align="center">
+  <img src="https://llm-assets.readthedocs.io/en/latest/_images/ollama_portable_run_ollama_ubuntu.png"  width=80%/>
+</div>
+
+
 ## Tips & Troubleshooting
 
 ### Speed up model download using alternative sources
 
-Ollama by default downloads model from [Ollama library](https://ollama.com/library). By setting the environment variable `IPEX_LLM_MODEL_SOURCE` to `modelscope`/`ollama` before [run Ollama](#step-3-run-ollama), you could switch the source from which the model is downloaded first.
+Ollama by default downloads model from [Ollama library](https://ollama.com/library). By setting the environment variable `IPEX_LLM_MODEL_SOURCE` to `modelscope`/`ollama` **before Run Ollama**, you could switch the source from which the model is downloaded first.
 
 For example, if you would like to run `deepseek-r1:7b` but the download speed from Ollama library is quite slow, you could use [its model source](https://www.modelscope.cn/models/unsloth/DeepSeek-R1-Distill-Qwen-7B-GGUF) from [ModelScope](https://www.modelscope.cn/models) instead, through:
 
-- Open "Command Prompt" (cmd), and navigate to the extracted folder by `cd /d PATH\TO\EXTRACTED\FOLDER`
-- Run `set IPEX_LLM_MODEL_SOURCE=modelscope` in "Command Prompt"
-- Run `ollama run deepseek-r1:7b`
+- For **Windows** users:
+
+  - Open "Command Prompt", and navigate to the extracted folder by `cd /d PATH\TO\EXTRACTED\FOLDER`
+  - Run `set IPEX_LLM_MODEL_SOURCE=modelscope` in "Command Prompt"
+  - Run `ollama run deepseek-r1:7b`
+
+- For **Linux** users:
+
+  - In a terminal other than the one for Ollama serve, navigate to the extracted folder by `cd PATH\TO\EXTRACTED\FOLDER`
+  - Run `export IPEX_LLM_MODEL_SOURCE=modelscope` in the terminal
+  - Run `./ollama run deepseek-r1:7b`
 
 > [!TIP]
 > Model downloaded with `set IPEX_LLM_MODEL_SOURCE=modelscope` will still show actual model id in `ollama list`, e.g.
@@ -72,26 +130,58 @@ For example, if you would like to run `deepseek-r1:7b` but the download speed fr
 
 By default, Ollama runs model with a context window of 2048 tokens. That is, the model can "remember" at most 2048 tokens of context.
 
-To increase the context length, you could set environment variable `IPEX_LLM_NUM_CTX` before [starting Ollama serve](#step-2-start-ollama-serve), as shwon below:
+To increase the context length, you could set environment variable `IPEX_LLM_NUM_CTX` **before Start Ollama Serve**, as shwon below (if Ollama serve is already running, please make sure to stop it first):
 
-- Open "Command Prompt" (cmd), and navigate to the extracted folder through `cd /d PATH\TO\EXTRACTED\FOLDER`
-- Set `IPEX_LLM_NUM_CTX` to the desired length in the "Command Prompt, e.g. `set IPEX_LLM_NUM_CTX=16384`
-- Start Ollama serve through `start-ollama.bat`
+- For **Windows** users:
+
+  - Open "Command Prompt", and navigate to the extracted folder through `cd /d PATH\TO\EXTRACTED\FOLDER`
+  - Set `IPEX_LLM_NUM_CTX` to the desired length in the "Command Prompt, e.g. `set IPEX_LLM_NUM_CTX=16384`
+  - Start Ollama serve through `start-ollama.bat`
+
+- For **Linux** users:
+
+  - In a terminal, navigate to the extracted folder through `cd PATH\TO\EXTRACTED\FOLDER`
+  - Set `IPEX_LLM_NUM_CTX` to the desired length in the terminal, e.g. `export IPEX_LLM_NUM_CTX=16384`
+  - Start Ollama serve through `./start-ollama.sh`
 
 > [!TIP]
 > `IPEX_LLM_NUM_CTX` has a higher priority than the `num_ctx` settings in a models' `Modelfile`.
 
+### Select specific GPU to run Ollama when multiple ones are available
+
+If your machine has multiple Intel GPUs, Ollama will by default runs on all of them.
+
+To specify which Intel GPU you would like Ollama to use, you could set environment variable `ONEAPI_DEVICE_SELECTOR` **before Start Ollama Serve**, as follows (if Ollama serve is already running, please make sure to stop it first):
+
+- Identify the id (e.g. 0, 1, etc.) for your multiple GPUs. You could find them in the logs of Ollama serve when loading any models, e.g.:
+
+  <div align="center">
+    <img src="https://llm-assets.readthedocs.io/en/latest/_images/ollama_portable_multi_gpus.png"  width=80%/>
+  </div>
+
+- For **Windows** users:
+
+  - Open "Command Prompt", and navigate to the extracted folder by `cd /d PATH\TO\EXTRACTED\FOLDER`
+  - In the "Command Prompt", set `ONEAPI_DEVICE_SELECTOR` to define the Intel GPU you want to use, e.g. `set ONEAPI_DEVICE_SELECTOR=level_zero:0`, in which `0` should be changed to your desired GPU id
+  - Start Ollama serve through `start-ollama.bat`
+
+- For **Linux** users:
+
+  - In a terminal, navigate to the extracted folder by `cd PATH\TO\EXTRACTED\FOLDER`
+  - Set `ONEAPI_DEVICE_SELECTOR` to define the Intel GPU you want to use, e.g. `export ONEAPI_DEVICE_SELECTOR=level_zero:0`, in which `0` should be changed to your desired GPU id
+  - Start Ollama serve through `./start-ollama.sh`
+
 ### Additional models supported after Ollama v0.5.4
 
 The currently Ollama Portable Zip is based on Ollama v0.5.4; in addition, the following new models have also been supported in the Ollama Portable Zip:
 
-  | Model  | Download | Model Link |
-  | - | - | - |
-  | DeepSeek-R1 | `ollama run deepseek-r1` | [deepseek-r1](https://ollama.com/library/deepseek-r1) |
-  | Openthinker | `ollama run openthinker` | [openthinker](https://ollama.com/library/openthinker) |
-  | DeepScaleR | `ollama run deepscaler` | [deepscaler](https://ollama.com/library/deepscaler) |
-  | Phi-4 | `ollama run phi4` | [phi4](https://ollama.com/library/phi4) |
-  | Dolphin 3.0 | `ollama run dolphin3` | [dolphin3](https://ollama.com/library/dolphin3) |
-  | Smallthinker | `ollama run smallthinker` | [smallthinker](https://ollama.com/library/smallthinker) |
-  | Granite3.1-Dense |  `ollama run granite3-dense` | [granite3.1-dense](https://ollama.com/library/granite3.1-dense) |
-  | Granite3.1-Moe-3B | `ollama run granite3-moe` | [granite3.1-moe](https://ollama.com/library/granite3.1-moe) |
+  | Model  | Download (Windows) | Download (Linux) | Model Link |
+  | - | - | - | - |
+  | DeepSeek-R1 | `ollama run deepseek-r1` | `./ollama run deepseek-r1` | [deepseek-r1](https://ollama.com/library/deepseek-r1) |
+  | Openthinker | `ollama run openthinker` | `./ollama run openthinker` | [openthinker](https://ollama.com/library/openthinker) |
+  | DeepScaleR | `ollama run deepscaler` | `./ollama run deepscaler` | [deepscaler](https://ollama.com/library/deepscaler) |
+  | Phi-4 | `ollama run phi4` | `./ollama run phi4` | [phi4](https://ollama.com/library/phi4) |
+  | Dolphin 3.0 | `ollama run dolphin3` | `./ollama run dolphin3` | [dolphin3](https://ollama.com/library/dolphin3) |
+  | Smallthinker | `ollama run smallthinker` |`./ollama run smallthinker` | [smallthinker](https://ollama.com/library/smallthinker) |
+  | Granite3.1-Dense |  `ollama run granite3-dense` | `./ollama run granite3-dense` | [granite3.1-dense](https://ollama.com/library/granite3.1-dense) |
+  | Granite3.1-Moe-3B | `ollama run granite3-moe` | `./ollama run granite3-moe` | [granite3.1-moe](https://ollama.com/library/granite3.1-moe) |
\ No newline at end of file
diff --git a/docs/mddocs/Quickstart/ollama_portablze_zip_quickstart.zh-CN.md b/docs/mddocs/Quickstart/ollama_portablze_zip_quickstart.zh-CN.md
index dab796f0..6b3a429f 100644
--- a/docs/mddocs/Quickstart/ollama_portablze_zip_quickstart.zh-CN.md
+++ b/docs/mddocs/Quickstart/ollama_portablze_zip_quickstart.zh-CN.md
@@ -5,17 +5,22 @@
 
 本指南演示如何使用 [Ollama portable zip](https://github.com/intel/ipex-llm/releases/tag/v2.2.0-nightly) 通过 `ipex-llm` 在 Intel GPU 上直接免安装运行 Ollama。
 
-> [!NOTE]
-> 目前，IPEX-LLM 仅在 Windows 上提供 Ollama portable zip。
-
 ## 目录
-- [系统环境安装](#系统环境准备)
-- [步骤 1：下载和解压](#步骤-1下载和解压)
-- [步骤 2：启动 Ollama Serve](#步骤-2启动-ollama-serve)
-- [步骤 3：运行 Ollama](#步骤-3运行-ollama)
-- [提示和故障排除](#提示和故障排除)
+- [Windows用户指南](#windows用户指南)
+  - [系统环境安装](#系统环境准备)
+  - [步骤 1：下载和解压](#步骤-1下载和解压)
+  - [步骤 2：启动 Ollama Serve](#步骤-2启动-ollama-serve)
+  - [步骤 3：运行 Ollama](#步骤-3运行-ollama)
+  - [提示和故障排除](#提示和故障排除)
+- [Linux用户指南](#linux用户指南)
+  - [系统环境安装](#系统环境准备-1)
+  - [步骤 1：下载和解压](#步骤-1下载和解压-1)
+  - [步骤 2：启动 Ollama Serve](#步骤-2启动-ollama-serve-1)
+  - [步骤 3：运行 Ollama](#步骤-3运行-ollama-1)
 
-## 系统环境准备
+## Windows用户指南
+
+### 系统环境准备
 
 检查你的 GPU 驱动程序版本，并根据需要进行更新：
 
@@ -23,13 +28,13 @@
 
 - 对于其他的 Intel 核显和独显，我们推荐使用 GPU 驱动版本 [32.0.101.6078](https://www.intel.com/content/www/us/en/download/785597/834050/intel-arc-iris-xe-graphics-windows.html)
 
-## 步骤 1：下载和解压
+### 步骤 1：下载和解压
 
 从此[链接](https://github.com/intel/ipex-llm/releases/tag/v2.2.0-nightly)下载 IPEX-LLM Ollama portable zip。
 
 然后，将 zip 文件解压到一个文件夹中。
 
-## 步骤 2：启动 Ollama Serve
+### 步骤 2：启动 Ollama Serve
 
 在解压后的文件夹中双击 `start-ollama.bat` 即可启动 Ollama Serve。随后会弹出一个窗口，如下所示：
 
@@ -37,27 +42,27 @@
   <img src="https://llm-assets.readthedocs.io/en/latest/_images/ollama_portable_start_ollama.png"  width=80%/>
 </div>
 
-## 步骤 3：运行 Ollama
+### 步骤 3：运行 Ollama
 
 在 Intel GPUs 上使用 Ollama 运行 LLMs，如下所示：
 
 - 打开命令提示符（cmd），并通过在命令行输入指令 `cd /d PATH\TO\EXTRACTED\FOLDER` 进入解压后的文件夹
-- 在命令提示符中运行 `ollama run deepseek-r1:7（可以将当前模型替换为你需要的模型）
+- 在命令提示符中运行 `ollama run deepseek-r1:7b`（可以将当前模型替换为你需要的模型）
 
 <div align="center">
   <img src="https://llm-assets.readthedocs.io/en/latest/_images/ollama_portable_run_ollama.png"  width=80%/>
 </div>
 
-## 提示和故障排除
+### 提示和故障排除
 
-### 通过切换源提升模型下载速度
+#### 通过切换源提升模型下载速度
 
-Ollama 默认从 [Ollama 库](https://ollama.com/library) 下载模型。在 [运行 Ollama](#步骤-3运行-ollama) 之前设置环境变量 `IPEX_LLM_MODEL_SOURCE` 为 `modelscope/ollama`，你可以切换模型的首选下载源。
+Ollama 默认从 [Ollama 库](https://ollama.com/library) 下载模型。通过在 [运行 Ollama](#步骤-3运行-ollama) 之前设置环境变量 `IPEX_LLM_MODEL_SOURCE` 为 `modelscope/ollama`，你可以切换模型的首选下载源。
 
-例如，如果你想运行 `deepseek-r1:7b` 但从 Ollama 库的下载速度较慢，可以通过如下方式改用 [ModelScope](https://www.modelscope.cn/models) 的 [模型源](https://www.modelscope.cn/models/unsloth/DeepSeek-R1-Distill-Qwen-7B-GGUF)：
+例如，如果你想运行 `deepseek-r1:7b` 但从 Ollama 库的下载速度较慢，可以通过如下方式改用 [ModelScope](https://www.modelscope.cn/models) 上的 [模型源](https://www.modelscope.cn/models/unsloth/DeepSeek-R1-Distill-Qwen-7B-GGUF)：
 
-- 打开 “命令提示符”（cmd），并通过 `cd /d PATH\TO\EXTRACTED\FOLDER` 命令进入解压后的文件夹
-- 在 “命令提示符” 中运行 `set IPEX_LLM_MODEL_SOURCE=modelscope`
+- 打开命令提示符（cmd），并通过 `cd /d PATH\TO\EXTRACTED\FOLDER` 命令进入解压后的文件夹
+- 在命令提示符中运行 `set IPEX_LLM_MODEL_SOURCE=modelscope`
 - 运行 `ollama run deepseek-r1:7b`
 
 > [!Tip]
@@ -68,20 +73,20 @@ Ollama 默认从 [Ollama 库](https://ollama.com/library) 下载模型。在 [
 > ```
 > 除了 `ollama run` 和 `ollama pull`，其他操作中模型应通过其实际 ID 进行识别，例如： `ollama rm modelscope.cn/unsloth/DeepSeek-R1-Distill-Qwen-7B-GGUF:Q4_K_M`
 
-### 在 Ollama 中增加上下文长度
+#### 在 Ollama 中增加上下文长度
 
 默认情况下，Ollama 使用 2048 个 token 的上下文窗口运行模型。也就是说，模型最多能 “记住” 2048 个 token 的上下文。
 
 要增加上下文长度，可以在 [启动 Ollama serve](#步骤-2启动-ollama-serve) 之前设置环境变量 `IPEX_LLM_NUM_CTX`，步骤如下：
 
-- 打开 “命令提示符”（cmd），并通过 `cd /d PATH\TO\EXTRACTED\FOLDER` 命令进入解压后的文件夹
-- 在 “命令提示符” 中将 `IPEX_LLM_NUM_CTX` 设置为所需长度，例如：`set IPEX_LLM_NUM_CTX=16384`
+- 打开命令提示符（cmd），并通过 `cd /d PATH\TO\EXTRACTED\FOLDER` 命令进入解压后的文件夹
+- 在命令提示符中将 `IPEX_LLM_NUM_CTX` 设置为所需长度，例如：`set IPEX_LLM_NUM_CTX=16384`
 - 通过运行 `start-ollama.bat` 启动 Ollama serve
 
 > [!Tip]
 > `IPEX_LLM_NUM_CTX` 的优先级高于模型 `Modelfile` 中设置的 `num_ctx`。
 
-### Ollama v0.5.4 之后支持的其他模型
+#### Ollama v0.5.4 之后支持的其他模型
 
 当前的 Ollama Portable Zip 基于 Ollama v0.5.4；此外，以下新模型也已在 Ollama Portable Zip 中得到支持：
 
@@ -95,3 +100,48 @@ Ollama 默认从 [Ollama 库](https://ollama.com/library) 下载模型。在 [
   | Smallthinker | `ollama run smallthinker` | [smallthinker](https://ollama.com/library/smallthinker) |
   | Granite3.1-Dense |  `ollama run granite3-dense` | [granite3.1-dense](https://ollama.com/library/granite3.1-dense) |
   | Granite3.1-Moe-3B | `ollama run granite3-moe` | [granite3.1-moe](https://ollama.com/library/granite3.1-moe) |
+
+
+## Linux用户指南
+
+### 系统环境准备
+
+检查你的 GPU 驱动程序版本，并根据需要进行更新：
+
+- 对于消费级显卡用户，如A系列，B系列和集成显卡，我们推荐按照[消费级显卡驱动安装指南](https://dgpu-docs.intel.com/driver/client/overview.html)来安装您的显卡驱动。
+
+
+### 步骤 1：下载和解压
+
+从此[链接](https://github.com/intel/ipex-llm/releases/tag/v2.2.0-nightly)下载 IPEX-LLM Ollama portable tgz。
+
+然后，开启一个终端，输入如下命令将 tgz 文件解压到一个文件夹中。
+```bash
+cd PATH/TO/DOWNLOADED/TGZ
+tar xvf [Downloaded tgz file]
+```
+
+### 步骤 2：启动 Ollama Serve
+
+进入解压后的文件夹，执行`./start-ollama.sh`启动 Ollama Serve： 
+
+[可选操作] 对于有多块显卡的用户，请编辑解压后文件夹中的 start-ollama.sh，并根据机器配置修改 ONEAPI_DEVICE_SELECTOR。默认情况下，Ollama 会使用所有显卡。
+```bash
+ cd PATH/TO/EXTRACTED/FOLDER
+./start-ollama.sh
+```
+
+<div align="center">
+  <img src="https://llm-assets.readthedocs.io/en/latest/_images/ollama_portable_start_ollama_ubuntu.png"  width=80%/>
+</div>
+
+### 步骤 3：运行 Ollama
+
+在 Intel GPUs 上使用 Ollama 运行大语言模型，如下所示：
+
+- 打开另外一个终端，并输入指令 `cd PATH/TO/EXTRACTED/FOLDER` 进入解压后的文件夹
+- 在终端中运行 `./ollama run deepseek-r1:7b`（可以将当前模型替换为你需要的模型）
+
+<div align="center">
+  <img src="https://llm-assets.readthedocs.io/en/latest/_images/ollama_portable_run_ollama_ubuntu.png"  width=80%/>
+</div>