From c81b7fc003f8ba04086c5a9343aca4530d388a01 Mon Sep 17 00:00:00 2001 From: Xin Qiu Date: Wed, 19 Feb 2025 19:13:55 +0800 Subject: [PATCH] Add Portable zip Linux QuickStart (#12849) * linux doc * update * Update ollama_portablze_zip_quickstart.md * Update ollama_portablze_zip_quickstart.md * Update ollama_portablze_zip_quickstart.zh-CN.md * Update ollama_portablze_zip_quickstart.md * meet code review * update * Add tips & troubleshooting sections for both Linux & Windows * Rebase * Fix based on comments * Small fix * Fix img * Update table for linux * Small fix --------- Co-authored-by: Yuwen Hu --- .../ollama_portablze_zip_quickstart.md | 152 ++++++++++++++---- .../ollama_portablze_zip_quickstart.zh-CN.md | 96 ++++++++--- 2 files changed, 194 insertions(+), 54 deletions(-) diff --git a/docs/mddocs/Quickstart/ollama_portablze_zip_quickstart.md b/docs/mddocs/Quickstart/ollama_portablze_zip_quickstart.md index c31abbf9..5bfdcd5d 100644 --- a/docs/mddocs/Quickstart/ollama_portablze_zip_quickstart.md +++ b/docs/mddocs/Quickstart/ollama_portablze_zip_quickstart.md @@ -5,17 +5,22 @@ This guide demonstrates how to use [Ollama portable zip](https://github.com/intel/ipex-llm/releases/tag/v2.2.0-nightly) to directly run Ollama on Intel GPU with `ipex-llm` (without the need of manual installations). -> [!NOTE] -> Currently, IPEX-LLM only provides Ollama portable zip on Windows. - ## Table of Contents -- [Prerequisites](#prerequisitesa) -- [Step 1: Download and Unzip](#step-1-download-and-unzip) -- [Step 2: Start Ollama Serve](#step-2-start-ollama-serve) -- [Step 3: Run Ollama](#step-3-run-ollama) -- [Tips & Troubleshooting](#tips--troubleshootings) +- [Windows Quickstart](#windows-quickstart) + - [Prerequisites](#prerequisites) + - [Step 1: Download and Unzip](#step-1-download-and-unzip) + - [Step 2: Start Ollama Serve](#step-2-start-ollama-serve) + - [Step 3: Run Ollama](#step-3-run-ollama) +- [Linux Quickstart](#linux-quickstart) + - [Prerequisites](#prerequisites-1) + - [Step 1: Download and Extract](#step-1-download-and-extract) + - [Step 2: Start Ollama Serve](#step-2-start-ollama-serve-1) + - [Step 3: Run Ollama](#step-3-run-ollama-1) +- [Tips & Troubleshooting](#tips--troubleshooting) -## Prerequisites +## Windows Quickstart + +### Prerequisites Check your GPU driver version, and update it if needed: @@ -23,13 +28,13 @@ Check your GPU driver version, and update it if needed: - For other Intel iGPU/dGPU, we recommend using GPU driver version [32.0.101.6078](https://www.intel.com/content/www/us/en/download/785597/834050/intel-arc-iris-xe-graphics-windows.html) -## Step 1: Download and Unzip +### Step 1: Download and Unzip -Download IPEX-LLM Ollama portable zip from the [link](https://github.com/intel/ipex-llm/releases/tag/v2.2.0-nightly). +Download IPEX-LLM Ollama portable zip for Windows users from the [link](https://github.com/intel/ipex-llm/releases/tag/v2.2.0-nightly). Then, extract the zip file to a folder. -## Step 2: Start Ollama Serve +### Step 2: Start Ollama Serve Double-click `start-ollama.bat` in the extracted folder to start the Ollama service. A window will then pop up as shown below: @@ -37,7 +42,7 @@ Double-click `start-ollama.bat` in the extracted folder to start the Ollama serv -## Step 3: Run Ollama +### Step 3: Run Ollama You could then use Ollama to run LLMs on Intel GPUs as follows: @@ -48,17 +53,70 @@ You could then use Ollama to run LLMs on Intel GPUs as follows: +## Linux Quickstart + +### Prerequisites + +Check your GPU driver version, and update it if needed: + +- For client GPU, like A-series, B-series and integrated GPU, we recommend following [Intel client GPU driver installing guide](https://dgpu-docs.intel.com/driver/client/overview.html) to install your GPU driver. + +### Step 1: Download and Extract + +Download IPEX-LLM Ollama portable tgz for Ubuntu users from the [link](https://github.com/intel/ipex-llm/releases/tag/v2.2.0-nightly). + +Then open a terminal, extract the tgz file to a folder. + +```bash +cd PATH/TO/DOWNLOADED/TGZ +tar xvf [Downloaded tgz file] +``` + +### Step 2: Start Ollama Serve + +Enter the extracted folder, and run `start-ollama.sh` to start Ollama service. + +```bash +cd PATH/TO/EXTRACTED/FOLDER +./start-ollama.sh +``` + +
+ +
+ + +### Step 3: Run Ollama + +You could then use Ollama to run LLMs on Intel GPUs as follows: + +- Open another ternimal, and enter the extracted folder through `cd PATH/TO/EXTRACTED/FOLDER` +- Run `./ollama run deepseek-r1:7b` (you may use any other model) + +
+ +
+ + ## Tips & Troubleshooting ### Speed up model download using alternative sources -Ollama by default downloads model from [Ollama library](https://ollama.com/library). By setting the environment variable `IPEX_LLM_MODEL_SOURCE` to `modelscope`/`ollama` before [run Ollama](#step-3-run-ollama), you could switch the source from which the model is downloaded first. +Ollama by default downloads model from [Ollama library](https://ollama.com/library). By setting the environment variable `IPEX_LLM_MODEL_SOURCE` to `modelscope`/`ollama` **before Run Ollama**, you could switch the source from which the model is downloaded first. For example, if you would like to run `deepseek-r1:7b` but the download speed from Ollama library is quite slow, you could use [its model source](https://www.modelscope.cn/models/unsloth/DeepSeek-R1-Distill-Qwen-7B-GGUF) from [ModelScope](https://www.modelscope.cn/models) instead, through: -- Open "Command Prompt" (cmd), and navigate to the extracted folder by `cd /d PATH\TO\EXTRACTED\FOLDER` -- Run `set IPEX_LLM_MODEL_SOURCE=modelscope` in "Command Prompt" -- Run `ollama run deepseek-r1:7b` +- For **Windows** users: + + - Open "Command Prompt", and navigate to the extracted folder by `cd /d PATH\TO\EXTRACTED\FOLDER` + - Run `set IPEX_LLM_MODEL_SOURCE=modelscope` in "Command Prompt" + - Run `ollama run deepseek-r1:7b` + +- For **Linux** users: + + - In a terminal other than the one for Ollama serve, navigate to the extracted folder by `cd PATH\TO\EXTRACTED\FOLDER` + - Run `export IPEX_LLM_MODEL_SOURCE=modelscope` in the terminal + - Run `./ollama run deepseek-r1:7b` > [!TIP] > Model downloaded with `set IPEX_LLM_MODEL_SOURCE=modelscope` will still show actual model id in `ollama list`, e.g. @@ -72,26 +130,58 @@ For example, if you would like to run `deepseek-r1:7b` but the download speed fr By default, Ollama runs model with a context window of 2048 tokens. That is, the model can "remember" at most 2048 tokens of context. -To increase the context length, you could set environment variable `IPEX_LLM_NUM_CTX` before [starting Ollama serve](#step-2-start-ollama-serve), as shwon below: +To increase the context length, you could set environment variable `IPEX_LLM_NUM_CTX` **before Start Ollama Serve**, as shwon below (if Ollama serve is already running, please make sure to stop it first): -- Open "Command Prompt" (cmd), and navigate to the extracted folder through `cd /d PATH\TO\EXTRACTED\FOLDER` -- Set `IPEX_LLM_NUM_CTX` to the desired length in the "Command Prompt, e.g. `set IPEX_LLM_NUM_CTX=16384` -- Start Ollama serve through `start-ollama.bat` +- For **Windows** users: + + - Open "Command Prompt", and navigate to the extracted folder through `cd /d PATH\TO\EXTRACTED\FOLDER` + - Set `IPEX_LLM_NUM_CTX` to the desired length in the "Command Prompt, e.g. `set IPEX_LLM_NUM_CTX=16384` + - Start Ollama serve through `start-ollama.bat` + +- For **Linux** users: + + - In a terminal, navigate to the extracted folder through `cd PATH\TO\EXTRACTED\FOLDER` + - Set `IPEX_LLM_NUM_CTX` to the desired length in the terminal, e.g. `export IPEX_LLM_NUM_CTX=16384` + - Start Ollama serve through `./start-ollama.sh` > [!TIP] > `IPEX_LLM_NUM_CTX` has a higher priority than the `num_ctx` settings in a models' `Modelfile`. +### Select specific GPU to run Ollama when multiple ones are available + +If your machine has multiple Intel GPUs, Ollama will by default runs on all of them. + +To specify which Intel GPU you would like Ollama to use, you could set environment variable `ONEAPI_DEVICE_SELECTOR` **before Start Ollama Serve**, as follows (if Ollama serve is already running, please make sure to stop it first): + +- Identify the id (e.g. 0, 1, etc.) for your multiple GPUs. You could find them in the logs of Ollama serve when loading any models, e.g.: + +
+ +
+ +- For **Windows** users: + + - Open "Command Prompt", and navigate to the extracted folder by `cd /d PATH\TO\EXTRACTED\FOLDER` + - In the "Command Prompt", set `ONEAPI_DEVICE_SELECTOR` to define the Intel GPU you want to use, e.g. `set ONEAPI_DEVICE_SELECTOR=level_zero:0`, in which `0` should be changed to your desired GPU id + - Start Ollama serve through `start-ollama.bat` + +- For **Linux** users: + + - In a terminal, navigate to the extracted folder by `cd PATH\TO\EXTRACTED\FOLDER` + - Set `ONEAPI_DEVICE_SELECTOR` to define the Intel GPU you want to use, e.g. `export ONEAPI_DEVICE_SELECTOR=level_zero:0`, in which `0` should be changed to your desired GPU id + - Start Ollama serve through `./start-ollama.sh` + ### Additional models supported after Ollama v0.5.4 The currently Ollama Portable Zip is based on Ollama v0.5.4; in addition, the following new models have also been supported in the Ollama Portable Zip: - | Model | Download | Model Link | - | - | - | - | - | DeepSeek-R1 | `ollama run deepseek-r1` | [deepseek-r1](https://ollama.com/library/deepseek-r1) | - | Openthinker | `ollama run openthinker` | [openthinker](https://ollama.com/library/openthinker) | - | DeepScaleR | `ollama run deepscaler` | [deepscaler](https://ollama.com/library/deepscaler) | - | Phi-4 | `ollama run phi4` | [phi4](https://ollama.com/library/phi4) | - | Dolphin 3.0 | `ollama run dolphin3` | [dolphin3](https://ollama.com/library/dolphin3) | - | Smallthinker | `ollama run smallthinker` | [smallthinker](https://ollama.com/library/smallthinker) | - | Granite3.1-Dense | `ollama run granite3-dense` | [granite3.1-dense](https://ollama.com/library/granite3.1-dense) | - | Granite3.1-Moe-3B | `ollama run granite3-moe` | [granite3.1-moe](https://ollama.com/library/granite3.1-moe) | + | Model | Download (Windows) | Download (Linux) | Model Link | + | - | - | - | - | + | DeepSeek-R1 | `ollama run deepseek-r1` | `./ollama run deepseek-r1` | [deepseek-r1](https://ollama.com/library/deepseek-r1) | + | Openthinker | `ollama run openthinker` | `./ollama run openthinker` | [openthinker](https://ollama.com/library/openthinker) | + | DeepScaleR | `ollama run deepscaler` | `./ollama run deepscaler` | [deepscaler](https://ollama.com/library/deepscaler) | + | Phi-4 | `ollama run phi4` | `./ollama run phi4` | [phi4](https://ollama.com/library/phi4) | + | Dolphin 3.0 | `ollama run dolphin3` | `./ollama run dolphin3` | [dolphin3](https://ollama.com/library/dolphin3) | + | Smallthinker | `ollama run smallthinker` |`./ollama run smallthinker` | [smallthinker](https://ollama.com/library/smallthinker) | + | Granite3.1-Dense | `ollama run granite3-dense` | `./ollama run granite3-dense` | [granite3.1-dense](https://ollama.com/library/granite3.1-dense) | + | Granite3.1-Moe-3B | `ollama run granite3-moe` | `./ollama run granite3-moe` | [granite3.1-moe](https://ollama.com/library/granite3.1-moe) | \ No newline at end of file diff --git a/docs/mddocs/Quickstart/ollama_portablze_zip_quickstart.zh-CN.md b/docs/mddocs/Quickstart/ollama_portablze_zip_quickstart.zh-CN.md index dab796f0..6b3a429f 100644 --- a/docs/mddocs/Quickstart/ollama_portablze_zip_quickstart.zh-CN.md +++ b/docs/mddocs/Quickstart/ollama_portablze_zip_quickstart.zh-CN.md @@ -5,17 +5,22 @@ 本指南演示如何使用 [Ollama portable zip](https://github.com/intel/ipex-llm/releases/tag/v2.2.0-nightly) 通过 `ipex-llm` 在 Intel GPU 上直接免安装运行 Ollama。 -> [!NOTE] -> 目前,IPEX-LLM 仅在 Windows 上提供 Ollama portable zip。 - ## 目录 -- [系统环境安装](#系统环境准备) -- [步骤 1:下载和解压](#步骤-1下载和解压) -- [步骤 2:启动 Ollama Serve](#步骤-2启动-ollama-serve) -- [步骤 3:运行 Ollama](#步骤-3运行-ollama) -- [提示和故障排除](#提示和故障排除) +- [Windows用户指南](#windows用户指南) + - [系统环境安装](#系统环境准备) + - [步骤 1:下载和解压](#步骤-1下载和解压) + - [步骤 2:启动 Ollama Serve](#步骤-2启动-ollama-serve) + - [步骤 3:运行 Ollama](#步骤-3运行-ollama) + - [提示和故障排除](#提示和故障排除) +- [Linux用户指南](#linux用户指南) + - [系统环境安装](#系统环境准备-1) + - [步骤 1:下载和解压](#步骤-1下载和解压-1) + - [步骤 2:启动 Ollama Serve](#步骤-2启动-ollama-serve-1) + - [步骤 3:运行 Ollama](#步骤-3运行-ollama-1) -## 系统环境准备 +## Windows用户指南 + +### 系统环境准备 检查你的 GPU 驱动程序版本,并根据需要进行更新: @@ -23,13 +28,13 @@ - 对于其他的 Intel 核显和独显,我们推荐使用 GPU 驱动版本 [32.0.101.6078](https://www.intel.com/content/www/us/en/download/785597/834050/intel-arc-iris-xe-graphics-windows.html) -## 步骤 1:下载和解压 +### 步骤 1:下载和解压 从此[链接](https://github.com/intel/ipex-llm/releases/tag/v2.2.0-nightly)下载 IPEX-LLM Ollama portable zip。 然后,将 zip 文件解压到一个文件夹中。 -## 步骤 2:启动 Ollama Serve +### 步骤 2:启动 Ollama Serve 在解压后的文件夹中双击 `start-ollama.bat` 即可启动 Ollama Serve。随后会弹出一个窗口,如下所示: @@ -37,27 +42,27 @@ -## 步骤 3:运行 Ollama +### 步骤 3:运行 Ollama 在 Intel GPUs 上使用 Ollama 运行 LLMs,如下所示: - 打开命令提示符(cmd),并通过在命令行输入指令 `cd /d PATH\TO\EXTRACTED\FOLDER` 进入解压后的文件夹 -- 在命令提示符中运行 `ollama run deepseek-r1:7(可以将当前模型替换为你需要的模型) +- 在命令提示符中运行 `ollama run deepseek-r1:7b`(可以将当前模型替换为你需要的模型)
-## 提示和故障排除 +### 提示和故障排除 -### 通过切换源提升模型下载速度 +#### 通过切换源提升模型下载速度 -Ollama 默认从 [Ollama 库](https://ollama.com/library) 下载模型。在 [运行 Ollama](#步骤-3运行-ollama) 之前设置环境变量 `IPEX_LLM_MODEL_SOURCE` 为 `modelscope/ollama`,你可以切换模型的首选下载源。 +Ollama 默认从 [Ollama 库](https://ollama.com/library) 下载模型。通过在 [运行 Ollama](#步骤-3运行-ollama) 之前设置环境变量 `IPEX_LLM_MODEL_SOURCE` 为 `modelscope/ollama`,你可以切换模型的首选下载源。 -例如,如果你想运行 `deepseek-r1:7b` 但从 Ollama 库的下载速度较慢,可以通过如下方式改用 [ModelScope](https://www.modelscope.cn/models) 的 [模型源](https://www.modelscope.cn/models/unsloth/DeepSeek-R1-Distill-Qwen-7B-GGUF): +例如,如果你想运行 `deepseek-r1:7b` 但从 Ollama 库的下载速度较慢,可以通过如下方式改用 [ModelScope](https://www.modelscope.cn/models) 上的 [模型源](https://www.modelscope.cn/models/unsloth/DeepSeek-R1-Distill-Qwen-7B-GGUF): -- 打开 “命令提示符”(cmd),并通过 `cd /d PATH\TO\EXTRACTED\FOLDER` 命令进入解压后的文件夹 -- 在 “命令提示符” 中运行 `set IPEX_LLM_MODEL_SOURCE=modelscope` +- 打开命令提示符(cmd),并通过 `cd /d PATH\TO\EXTRACTED\FOLDER` 命令进入解压后的文件夹 +- 在命令提示符中运行 `set IPEX_LLM_MODEL_SOURCE=modelscope` - 运行 `ollama run deepseek-r1:7b` > [!Tip] @@ -68,20 +73,20 @@ Ollama 默认从 [Ollama 库](https://ollama.com/library) 下载模型。在 [ > ``` > 除了 `ollama run` 和 `ollama pull`,其他操作中模型应通过其实际 ID 进行识别,例如: `ollama rm modelscope.cn/unsloth/DeepSeek-R1-Distill-Qwen-7B-GGUF:Q4_K_M` -### 在 Ollama 中增加上下文长度 +#### 在 Ollama 中增加上下文长度 默认情况下,Ollama 使用 2048 个 token 的上下文窗口运行模型。也就是说,模型最多能 “记住” 2048 个 token 的上下文。 要增加上下文长度,可以在 [启动 Ollama serve](#步骤-2启动-ollama-serve) 之前设置环境变量 `IPEX_LLM_NUM_CTX`,步骤如下: -- 打开 “命令提示符”(cmd),并通过 `cd /d PATH\TO\EXTRACTED\FOLDER` 命令进入解压后的文件夹 -- 在 “命令提示符” 中将 `IPEX_LLM_NUM_CTX` 设置为所需长度,例如:`set IPEX_LLM_NUM_CTX=16384` +- 打开命令提示符(cmd),并通过 `cd /d PATH\TO\EXTRACTED\FOLDER` 命令进入解压后的文件夹 +- 在命令提示符中将 `IPEX_LLM_NUM_CTX` 设置为所需长度,例如:`set IPEX_LLM_NUM_CTX=16384` - 通过运行 `start-ollama.bat` 启动 Ollama serve > [!Tip] > `IPEX_LLM_NUM_CTX` 的优先级高于模型 `Modelfile` 中设置的 `num_ctx`。 -### Ollama v0.5.4 之后支持的其他模型 +#### Ollama v0.5.4 之后支持的其他模型 当前的 Ollama Portable Zip 基于 Ollama v0.5.4;此外,以下新模型也已在 Ollama Portable Zip 中得到支持: @@ -95,3 +100,48 @@ Ollama 默认从 [Ollama 库](https://ollama.com/library) 下载模型。在 [ | Smallthinker | `ollama run smallthinker` | [smallthinker](https://ollama.com/library/smallthinker) | | Granite3.1-Dense | `ollama run granite3-dense` | [granite3.1-dense](https://ollama.com/library/granite3.1-dense) | | Granite3.1-Moe-3B | `ollama run granite3-moe` | [granite3.1-moe](https://ollama.com/library/granite3.1-moe) | + + +## Linux用户指南 + +### 系统环境准备 + +检查你的 GPU 驱动程序版本,并根据需要进行更新: + +- 对于消费级显卡用户,如A系列,B系列和集成显卡,我们推荐按照[消费级显卡驱动安装指南](https://dgpu-docs.intel.com/driver/client/overview.html)来安装您的显卡驱动。 + + +### 步骤 1:下载和解压 + +从此[链接](https://github.com/intel/ipex-llm/releases/tag/v2.2.0-nightly)下载 IPEX-LLM Ollama portable tgz。 + +然后,开启一个终端,输入如下命令将 tgz 文件解压到一个文件夹中。 +```bash +cd PATH/TO/DOWNLOADED/TGZ +tar xvf [Downloaded tgz file] +``` + +### 步骤 2:启动 Ollama Serve + +进入解压后的文件夹,执行`./start-ollama.sh`启动 Ollama Serve: + +[可选操作] 对于有多块显卡的用户,请编辑解压后文件夹中的 start-ollama.sh,并根据机器配置修改 ONEAPI_DEVICE_SELECTOR。默认情况下,Ollama 会使用所有显卡。 +```bash + cd PATH/TO/EXTRACTED/FOLDER +./start-ollama.sh +``` + +
+ +
+ +### 步骤 3:运行 Ollama + +在 Intel GPUs 上使用 Ollama 运行大语言模型,如下所示: + +- 打开另外一个终端,并输入指令 `cd PATH/TO/EXTRACTED/FOLDER` 进入解压后的文件夹 +- 在终端中运行 `./ollama run deepseek-r1:7b`(可以将当前模型替换为你需要的模型) + +
+ +