Update Readme (#12855)

This commit is contained in:
Jason Dai 2025-02-19 19:55:29 +08:00 committed by GitHub
parent 4eed0c7d99
commit 38a682adb1
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
6 changed files with 76 additions and 64 deletions

View file

@ -5,11 +5,11 @@
**`IPEX-LLM`** is an LLM acceleration library for Intel [GPU](docs/mddocs/Quickstart/install_windows_gpu.md) *(e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max)*, [NPU](docs/mddocs/Quickstart/npu_quickstart.md) and CPU [^1]. **`IPEX-LLM`** is an LLM acceleration library for Intel [GPU](docs/mddocs/Quickstart/install_windows_gpu.md) *(e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max)*, [NPU](docs/mddocs/Quickstart/npu_quickstart.md) and CPU [^1].
> [!NOTE] > [!NOTE]
> - *`IPEX-LLM` provides seamless integration with [llama.cpp](docs/mddocs/Quickstart/llama_cpp_quickstart.md), [Ollama](docs/mddocs/Quickstart/ollama_quickstart.md), [HuggingFace transformers](python/llm/example/GPU/HuggingFace), [LangChain](python/llm/example/GPU/LangChain), [LlamaIndex](python/llm/example/GPU/LlamaIndex), [vLLM](docs/mddocs/Quickstart/vLLM_quickstart.md), [Text-Generation-WebUI](docs/mddocs/Quickstart/webui_quickstart.md), [DeepSpeed-AutoTP](python/llm/example/GPU/Deepspeed-AutoTP), [FastChat](docs/mddocs/Quickstart/fastchat_quickstart.md), [Axolotl](docs/mddocs/Quickstart/axolotl_quickstart.md), [HuggingFace PEFT](python/llm/example/GPU/LLM-Finetuning), [HuggingFace TRL](python/llm/example/GPU/LLM-Finetuning/DPO), [AutoGen](python/llm/example/CPU/Applications/autogen), [ModeScope](python/llm/example/GPU/ModelScope-Models), etc.* > - *`IPEX-LLM` provides seamless integration with [llama.cpp](docs/mddocs/Quickstart/llama_cpp_quickstart.md), [Ollama](docs/mddocs/Quickstart/ollama_portablze_zip_quickstart.md), [HuggingFace transformers](python/llm/example/GPU/HuggingFace), [LangChain](python/llm/example/GPU/LangChain), [LlamaIndex](python/llm/example/GPU/LlamaIndex), [vLLM](docs/mddocs/Quickstart/vLLM_quickstart.md), [Text-Generation-WebUI](docs/mddocs/Quickstart/webui_quickstart.md), [DeepSpeed-AutoTP](python/llm/example/GPU/Deepspeed-AutoTP), [FastChat](docs/mddocs/Quickstart/fastchat_quickstart.md), [Axolotl](docs/mddocs/Quickstart/axolotl_quickstart.md), [HuggingFace PEFT](python/llm/example/GPU/LLM-Finetuning), [HuggingFace TRL](python/llm/example/GPU/LLM-Finetuning/DPO), [AutoGen](python/llm/example/CPU/Applications/autogen), [ModeScope](python/llm/example/GPU/ModelScope-Models), etc.*
> - ***70+ models** have been optimized/verified on `ipex-llm` (e.g., Llama, Phi, Mistral, Mixtral, Whisper, DeepSeek, Qwen, ChatGLM, MiniCPM, Qwen-VL, MiniCPM-V and more), with state-of-art **LLM optimizations**, **XPU acceleration** and **low-bit (FP8/FP6/FP4/INT4) support**; see the complete list [here](#verified-models).* > - ***70+ models** have been optimized/verified on `ipex-llm` (e.g., Llama, Phi, Mistral, Mixtral, Whisper, DeepSeek, Qwen, ChatGLM, MiniCPM, Qwen-VL, MiniCPM-V and more), with state-of-art **LLM optimizations**, **XPU acceleration** and **low-bit (FP8/FP6/FP4/INT4) support**; see the complete list [here](#verified-models).*
## Latest Update 🔥 ## Latest Update 🔥
- [2025/02] We added support of [Ollama Portable Zip](docs/mddocs/Quickstart/ollama_portablze_zip_quickstart.md) to directly run Ollama on Intel GPU (***without the need of manual installations***). - [2025/02] We added support of [Ollama Portable Zip](https://github.com/intel/ipex-llm/releases/tag/v2.2.0-nightly) to directly run Ollama on Intel GPU for both [Windows](docs/mddocs/Quickstart/ollama_portablze_zip_quickstart.md#windows-quickstart) and [Linux](docs/mddocs/Quickstart/ollama_portablze_zip_quickstart.md#linux-quickstart) (***without the need of manual installations***).
- [2025/02] We added support for running [vLLM 0.6.6](docs/mddocs/DockerGuides/vllm_docker_quickstart.md) on Intel Arc GPUs. - [2025/02] We added support for running [vLLM 0.6.6](docs/mddocs/DockerGuides/vllm_docker_quickstart.md) on Intel Arc GPUs.
- [2025/01] We added the guide for running `ipex-llm` on Intel Arc [B580](docs/mddocs/Quickstart/bmg_quickstart.md) GPU. - [2025/01] We added the guide for running `ipex-llm` on Intel Arc [B580](docs/mddocs/Quickstart/bmg_quickstart.md) GPU.
- [2025/01] We added support for running [Ollama 0.5.4](docs/mddocs/Quickstart/ollama_quickstart.md) on Intel GPU. - [2025/01] We added support for running [Ollama 0.5.4](docs/mddocs/Quickstart/ollama_quickstart.md) on Intel GPU.

View file

@ -5,11 +5,11 @@
**`ipex-llm`** 是一个将大语言模型高效地运行于 Intel [GPU](docs/mddocs/Quickstart/install_windows_gpu.md) *(如搭载集成显卡的个人电脑Arc 独立显卡、Flex 及 Max 数据中心 GPU 等)*、[NPU](docs/mddocs/Quickstart/npu_quickstart.md) 和 CPU 上的大模型 XPU 加速库[^1]。 **`ipex-llm`** 是一个将大语言模型高效地运行于 Intel [GPU](docs/mddocs/Quickstart/install_windows_gpu.md) *(如搭载集成显卡的个人电脑Arc 独立显卡、Flex 及 Max 数据中心 GPU 等)*、[NPU](docs/mddocs/Quickstart/npu_quickstart.md) 和 CPU 上的大模型 XPU 加速库[^1]。
> [!NOTE] > [!NOTE]
> - *`ipex-llm`可以与 [llama.cpp](docs/mddocs/Quickstart/llama_cpp_quickstart.zh-CN.md), [Ollama](docs/mddocs/Quickstart/ollama_quickstart.zh-CN.md), [HuggingFace transformers](python/llm/example/GPU/HuggingFace), [LangChain](python/llm/example/GPU/LangChain), [LlamaIndex](python/llm/example/GPU/LlamaIndex), [vLLM](docs/mddocs/Quickstart/vLLM_quickstart.md), [Text-Generation-WebUI](docs/mddocs/Quickstart/webui_quickstart.md), [DeepSpeed-AutoTP](python/llm/example/GPU/Deepspeed-AutoTP), [FastChat](docs/mddocs/Quickstart/fastchat_quickstart.md), [Axolotl](docs/mddocs/Quickstart/axolotl_quickstart.md), [HuggingFace PEFT](python/llm/example/GPU/LLM-Finetuning), [HuggingFace TRL](python/llm/example/GPU/LLM-Finetuning/DPO), [AutoGen](python/llm/example/CPU/Applications/autogen), [ModeScope](python/llm/example/GPU/ModelScope-Models) 等无缝衔接。* > - *`ipex-llm`可以与 [llama.cpp](docs/mddocs/Quickstart/llama_cpp_quickstart.zh-CN.md), [Ollama](docs/mddocs/Quickstart/ollama_portablze_zip_quickstart.zh-CN.md), [HuggingFace transformers](python/llm/example/GPU/HuggingFace), [LangChain](python/llm/example/GPU/LangChain), [LlamaIndex](python/llm/example/GPU/LlamaIndex), [vLLM](docs/mddocs/Quickstart/vLLM_quickstart.md), [Text-Generation-WebUI](docs/mddocs/Quickstart/webui_quickstart.md), [DeepSpeed-AutoTP](python/llm/example/GPU/Deepspeed-AutoTP), [FastChat](docs/mddocs/Quickstart/fastchat_quickstart.md), [Axolotl](docs/mddocs/Quickstart/axolotl_quickstart.md), [HuggingFace PEFT](python/llm/example/GPU/LLM-Finetuning), [HuggingFace TRL](python/llm/example/GPU/LLM-Finetuning/DPO), [AutoGen](python/llm/example/CPU/Applications/autogen), [ModeScope](python/llm/example/GPU/ModelScope-Models) 等无缝衔接。*
> - ***70+** 模型已经在 `ipex-llm` 上得到优化和验证(如 Llama, Phi, Mistral, Mixtral, Whisper, DeepSeek, Qwen, ChatGLM, MiniCPM, Qwen-VL, MiniCPM-V 等), 以获得先进的 **大模型算法优化**, **XPU 加速** 以及 **低比特FP8FP8/FP6/FP4/INT4) 支持**;更多模型信息请参阅[这里](#模型验证)。* > - ***70+** 模型已经在 `ipex-llm` 上得到优化和验证(如 Llama, Phi, Mistral, Mixtral, Whisper, DeepSeek, Qwen, ChatGLM, MiniCPM, Qwen-VL, MiniCPM-V 等), 以获得先进的 **大模型算法优化**, **XPU 加速** 以及 **低比特FP8FP8/FP6/FP4/INT4) 支持**;更多模型信息请参阅[这里](#模型验证)。*
## 最新更新 🔥 ## 最新更新 🔥
- [2025/02] 新增 [Ollama Portable Zip](docs/mddocs/Quickstart/ollama_portablze_zip_quickstart.zh-CN.md) 在 Intel GPU 上直接**免安装运行 Ollama** - [2025/02] 新增 [Ollama Portable Zip](https://github.com/intel/ipex-llm/releases/tag/v2.2.0-nightly) 在 Intel GPU 上直接**免安装运行 Ollama** (包括 [Windows](docs/mddocs/Quickstart/ollama_portablze_zip_quickstart.zh-CN.md#windows用户指南) 和 [Linux](docs/mddocs/Quickstart/ollama_portablze_zip_quickstart.zh-CN.md#linux用户指南))
- [2025/02] 新增在 Intel Arc GPUs 上运行 [vLLM 0.6.6](docs/mddocs/DockerGuides/vllm_docker_quickstart.md) 的支持。 - [2025/02] 新增在 Intel Arc GPUs 上运行 [vLLM 0.6.6](docs/mddocs/DockerGuides/vllm_docker_quickstart.md) 的支持。
- [2025/01] 新增在 Intel Arc [B580](docs/mddocs/Quickstart/bmg_quickstart.md) GPU 上运行 `ipex-llm` 的指南。 - [2025/01] 新增在 Intel Arc [B580](docs/mddocs/Quickstart/bmg_quickstart.md) GPU 上运行 `ipex-llm` 的指南。
- [2025/01] 新增在 Intel GPU 上运行 [Ollama 0.5.4](docs/mddocs/Quickstart/ollama_quickstart.zh-CN.md) 的支持。 - [2025/01] 新增在 Intel GPU 上运行 [Ollama 0.5.4](docs/mddocs/Quickstart/ollama_quickstart.zh-CN.md) 的支持。

View file

@ -17,6 +17,11 @@ This guide demonstrates how to use [Ollama portable zip](https://github.com/inte
- [Step 2: Start Ollama Serve](#step-2-start-ollama-serve-1) - [Step 2: Start Ollama Serve](#step-2-start-ollama-serve-1)
- [Step 3: Run Ollama](#step-3-run-ollama-1) - [Step 3: Run Ollama](#step-3-run-ollama-1)
- [Tips & Troubleshooting](#tips--troubleshooting) - [Tips & Troubleshooting](#tips--troubleshooting)
- [Speed up model download using alternative sources](#speed-up-model-download-using-alternative-sources)
- [Increase context length in Ollama](#increase-context-length-in-ollama)
- [Select specific GPU to run Ollama when multiple ones are available](#select-specific-gpu-to-run-ollama-when-multiple-ones-are-available)
- [Additional models supported after Ollama v0.5.4](#additional-models-supported-after-ollama-v054)
- [More details](ollama_quickstart.md)
## Windows Quickstart ## Windows Quickstart
@ -57,9 +62,7 @@ You could then use Ollama to run LLMs on Intel GPUs as follows:
### Prerequisites ### Prerequisites
Check your GPU driver version, and update it if needed: Check your GPU driver version, and update it if needed; we recommend following [Intel client GPU driver installation guide](https://dgpu-docs.intel.com/driver/client/overview.html) to install your GPU driver.
- For client GPU, like A-series, B-series and integrated GPU, we recommend following [Intel client GPU driver installing guide](https://dgpu-docs.intel.com/driver/client/overview.html) to install your GPU driver.
### Step 1: Download and Extract ### Step 1: Download and Extract
@ -102,9 +105,9 @@ You could then use Ollama to run LLMs on Intel GPUs as follows:
### Speed up model download using alternative sources ### Speed up model download using alternative sources
Ollama by default downloads model from [Ollama library](https://ollama.com/library). By setting the environment variable `IPEX_LLM_MODEL_SOURCE` to `modelscope`/`ollama` **before Run Ollama**, you could switch the source from which the model is downloaded first. Ollama by default downloads model from the Ollama library. By setting the environment variable `IPEX_LLM_MODEL_SOURCE` to `modelscope` or `ollama` **before running Ollama**, you could switch the source where the model is downloaded.
For example, if you would like to run `deepseek-r1:7b` but the download speed from Ollama library is quite slow, you could use [its model source](https://www.modelscope.cn/models/unsloth/DeepSeek-R1-Distill-Qwen-7B-GGUF) from [ModelScope](https://www.modelscope.cn/models) instead, through: For example, if you would like to run `deepseek-r1:7b` but the download speed from the Ollama library is slow, you could download the model from [ModelScope](https://www.modelscope.cn/models) as follows:
- For **Windows** users: - For **Windows** users:
@ -184,4 +187,4 @@ The currently Ollama Portable Zip is based on Ollama v0.5.4; in addition, the fo
| Dolphin 3.0 | `ollama run dolphin3` | `./ollama run dolphin3` | [dolphin3](https://ollama.com/library/dolphin3) | | Dolphin 3.0 | `ollama run dolphin3` | `./ollama run dolphin3` | [dolphin3](https://ollama.com/library/dolphin3) |
| Smallthinker | `ollama run smallthinker` |`./ollama run smallthinker` | [smallthinker](https://ollama.com/library/smallthinker) | | Smallthinker | `ollama run smallthinker` |`./ollama run smallthinker` | [smallthinker](https://ollama.com/library/smallthinker) |
| Granite3.1-Dense | `ollama run granite3-dense` | `./ollama run granite3-dense` | [granite3.1-dense](https://ollama.com/library/granite3.1-dense) | | Granite3.1-Dense | `ollama run granite3-dense` | `./ollama run granite3-dense` | [granite3.1-dense](https://ollama.com/library/granite3.1-dense) |
| Granite3.1-Moe-3B | `ollama run granite3-moe` | `./ollama run granite3-moe` | [granite3.1-moe](https://ollama.com/library/granite3.1-moe) | | Granite3.1-Moe-3B | `ollama run granite3-moe` | `./ollama run granite3-moe` | [granite3.1-moe](https://ollama.com/library/granite3.1-moe) |

View file

@ -11,12 +11,16 @@
- [步骤 1下载和解压](#步骤-1下载和解压) - [步骤 1下载和解压](#步骤-1下载和解压)
- [步骤 2启动 Ollama Serve](#步骤-2启动-ollama-serve) - [步骤 2启动 Ollama Serve](#步骤-2启动-ollama-serve)
- [步骤 3运行 Ollama](#步骤-3运行-ollama) - [步骤 3运行 Ollama](#步骤-3运行-ollama)
- [提示和故障排除](#提示和故障排除)
- [Linux用户指南](#linux用户指南) - [Linux用户指南](#linux用户指南)
- [系统环境安装](#系统环境准备-1) - [系统环境安装](#系统环境准备-1)
- [步骤 1下载和解压](#步骤-1下载和解压-1) - [步骤 1下载和解压](#步骤-1下载和解压-1)
- [步骤 2启动 Ollama Serve](#步骤-2启动-ollama-serve-1) - [步骤 2启动 Ollama Serve](#步骤-2启动-ollama-serve-1)
- [步骤 3运行 Ollama](#步骤-3运行-ollama-1) - [步骤 3运行 Ollama](#步骤-3运行-ollama-1)
- [提示和故障排除](#提示和故障排除)
- [通过切换源提升模型下载速度](#通过切换源提升模型下载速度)
- [在 Ollama 中增加上下文长度](#在-ollama-中增加上下文长度)
- [Ollama v0.5.4 之后新增模型支持](#ollama-v054-之后新增模型支持)
- [更多信息](ollama_quickstart.zh-CN.md)
## Windows用户指南 ## Windows用户指南
@ -53,55 +57,6 @@
<img src="https://llm-assets.readthedocs.io/en/latest/_images/ollama_portable_run_ollama.png" width=80%/> <img src="https://llm-assets.readthedocs.io/en/latest/_images/ollama_portable_run_ollama.png" width=80%/>
</div> </div>
### 提示和故障排除
#### 通过切换源提升模型下载速度
Ollama 默认从 [Ollama 库](https://ollama.com/library) 下载模型。通过在 [运行 Ollama](#步骤-3运行-ollama) 之前设置环境变量 `IPEX_LLM_MODEL_SOURCE``modelscope/ollama`,你可以切换模型的首选下载源。
例如,如果你想运行 `deepseek-r1:7b` 但从 Ollama 库的下载速度较慢,可以通过如下方式改用 [ModelScope](https://www.modelscope.cn/models) 上的 [模型源](https://www.modelscope.cn/models/unsloth/DeepSeek-R1-Distill-Qwen-7B-GGUF)
- 打开命令提示符cmd并通过 `cd /d PATH\TO\EXTRACTED\FOLDER` 命令进入解压后的文件夹
- 在命令提示符中运行 `set IPEX_LLM_MODEL_SOURCE=modelscope`
- 运行 `ollama run deepseek-r1:7b`
> [!Tip]
> 使用 `set IPEX_LLM_MODEL_SOURCE=modelscope` 下载的模型,在执行 `ollama list` 时仍会显示实际的模型 ID例如
> ```
> NAME ID SIZE MODIFIED
> modelscope.cn/unsloth/DeepSeek-R1-Distill-Qwen-7B-GGUF:Q4_K_M f482d5af6aec 4.7 GB About a minute ago
> ```
> 除了 `ollama run``ollama pull`,其他操作中模型应通过其实际 ID 进行识别,例如: `ollama rm modelscope.cn/unsloth/DeepSeek-R1-Distill-Qwen-7B-GGUF:Q4_K_M`
#### 在 Ollama 中增加上下文长度
默认情况下Ollama 使用 2048 个 token 的上下文窗口运行模型。也就是说,模型最多能 “记住” 2048 个 token 的上下文。
要增加上下文长度,可以在 [启动 Ollama serve](#步骤-2启动-ollama-serve) 之前设置环境变量 `IPEX_LLM_NUM_CTX`,步骤如下:
- 打开命令提示符cmd并通过 `cd /d PATH\TO\EXTRACTED\FOLDER` 命令进入解压后的文件夹
- 在命令提示符中将 `IPEX_LLM_NUM_CTX` 设置为所需长度,例如:`set IPEX_LLM_NUM_CTX=16384`
- 通过运行 `start-ollama.bat` 启动 Ollama serve
> [!Tip]
> `IPEX_LLM_NUM_CTX` 的优先级高于模型 `Modelfile` 中设置的 `num_ctx`
#### Ollama v0.5.4 之后支持的其他模型
当前的 Ollama Portable Zip 基于 Ollama v0.5.4;此外,以下新模型也已在 Ollama Portable Zip 中得到支持:
| 模型 | 下载 | 模型链接 |
| - | - | - |
| DeepSeek-R1 | `ollama run deepseek-r1` | [deepseek-r1](https://ollama.com/library/deepseek-r1) |
| Openthinker | `ollama run openthinker` | [openthinker](https://ollama.com/library/openthinker) |
| DeepScaleR | `ollama run deepscaler` | [deepscaler](https://ollama.com/library/deepscaler) |
| Phi-4 | `ollama run phi4` | [phi4](https://ollama.com/library/phi4) |
| Dolphin 3.0 | `ollama run dolphin3` | [dolphin3](https://ollama.com/library/dolphin3) |
| Smallthinker | `ollama run smallthinker` | [smallthinker](https://ollama.com/library/smallthinker) |
| Granite3.1-Dense | `ollama run granite3-dense` | [granite3.1-dense](https://ollama.com/library/granite3.1-dense) |
| Granite3.1-Moe-3B | `ollama run granite3-moe` | [granite3.1-moe](https://ollama.com/library/granite3.1-moe) |
## Linux用户指南 ## Linux用户指南
### 系统环境准备 ### 系统环境准备
@ -145,3 +100,51 @@ tar xvf [Downloaded tgz file]
<div align="center"> <div align="center">
<img src="https://llm-assets.readthedocs.io/en/latest/_images/ollama_portable_run_ollama_ubuntu.png" width=80%/> <img src="https://llm-assets.readthedocs.io/en/latest/_images/ollama_portable_run_ollama_ubuntu.png" width=80%/>
</div> </div>
## 提示和故障排除
### 通过切换源提升模型下载速度
Ollama 默认从 Ollama 库下载模型。通过在运行 Ollama 之前设置环境变量 `IPEX_LLM_MODEL_SOURCE``modelscope`或`ollama`,你可以切换模型的首选下载源。
例如,如果你想运行 `deepseek-r1:7b` 但从 Ollama 库的下载速度较慢,可以通过如下方式改用 ModelScope 上的模型源:
- 打开命令提示符cmd并通过 `cd /d PATH\TO\EXTRACTED\FOLDER` 命令进入解压后的文件夹
- 在命令提示符中运行 `set IPEX_LLM_MODEL_SOURCE=modelscope`
- 运行 `ollama run deepseek-r1:7b`
> [!Tip]
> 使用 `set IPEX_LLM_MODEL_SOURCE=modelscope` 下载的模型,在执行 `ollama list` 时仍会显示实际的模型 ID例如
> ```
> NAME ID SIZE MODIFIED
> modelscope.cn/unsloth/DeepSeek-R1-Distill-Qwen-7B-GGUF:Q4_K_M f482d5af6aec 4.7 GB About a minute ago
> ```
> 除了 `ollama run``ollama pull`,其他操作中模型应通过其实际 ID 进行识别,例如: `ollama rm modelscope.cn/unsloth/DeepSeek-R1-Distill-Qwen-7B-GGUF:Q4_K_M`
### 在 Ollama 中增加上下文长度
默认情况下Ollama 使用 2048 个 token 的上下文窗口运行模型。也就是说,模型最多能 “记住” 2048 个 token 的上下文。
要增加上下文长度,可以在 [启动 Ollama serve](#步骤-2启动-ollama-serve) 之前设置环境变量 `IPEX_LLM_NUM_CTX`,步骤如下:
- 打开命令提示符cmd并通过 `cd /d PATH\TO\EXTRACTED\FOLDER` 命令进入解压后的文件夹
- 在命令提示符中将 `IPEX_LLM_NUM_CTX` 设置为所需长度,例如:`set IPEX_LLM_NUM_CTX=16384`
- 通过运行 `start-ollama.bat` 启动 Ollama serve
> [!Tip]
> `IPEX_LLM_NUM_CTX` 的优先级高于模型 `Modelfile` 中设置的 `num_ctx`
### Ollama v0.5.4 之后新增模型支持
当前的 Ollama Portable Zip 基于 Ollama v0.5.4;此外,以下新模型也已在 Ollama Portable Zip 中得到支持:
| 模型 | 下载 | 模型链接 |
| - | - | - |
| DeepSeek-R1 | `ollama run deepseek-r1` | [deepseek-r1](https://ollama.com/library/deepseek-r1) |
| Openthinker | `ollama run openthinker` | [openthinker](https://ollama.com/library/openthinker) |
| DeepScaleR | `ollama run deepscaler` | [deepscaler](https://ollama.com/library/deepscaler) |
| Phi-4 | `ollama run phi4` | [phi4](https://ollama.com/library/phi4) |
| Dolphin 3.0 | `ollama run dolphin3` | [dolphin3](https://ollama.com/library/dolphin3) |
| Smallthinker | `ollama run smallthinker` | [smallthinker](https://ollama.com/library/smallthinker) |
| Granite3.1-Dense | `ollama run granite3-dense` | [granite3.1-dense](https://ollama.com/library/granite3.1-dense) |
| Granite3.1-Moe-3B | `ollama run granite3-moe` | [granite3.1-moe](https://ollama.com/library/granite3.1-moe) |

View file

@ -6,14 +6,16 @@
**`IPEX-LLM`** is an LLM acceleration library for Intel [GPU](Quickstart/install_windows_gpu.md) *(e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max)*, [NPU](Quickstart/npu_quickstart.md) and CPU [^1]. **`IPEX-LLM`** is an LLM acceleration library for Intel [GPU](Quickstart/install_windows_gpu.md) *(e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max)*, [NPU](Quickstart/npu_quickstart.md) and CPU [^1].
## Latest Update 🔥 ## Latest Update 🔥
- [2025/02] We added support of [Ollama Portable Zip](https://github.com/intel/ipex-llm/releases/tag/v2.2.0-nightly) to directly run Ollama on Intel GPU for both [Windows](Quickstart/ollama_portablze_zip_quickstart.md#windows-quickstart) and [Linux](docs/mddocs/Quickstart/ollama_portablze_zip_quickstart.md#linux-quickstart) (***without the need of manual installations***).
- [2025/02] We added support for running [vLLM 0.6.6](DockerGuides/vllm_docker_quickstart.md) on Intel Arc GPUs.
- [2025/01] We added the guide for running `ipex-llm` on Intel Arc [B580](Quickstart/bmg_quickstart.md) GPU - [2025/01] We added the guide for running `ipex-llm` on Intel Arc [B580](Quickstart/bmg_quickstart.md) GPU
- [2024/12] We added support for running [Ollama 0.4.6](Quickstart/ollama_quickstart.md) on Intel GPU. - [2025/01] We added support for running [Ollama 0.5.4](Quickstart/ollama_quickstart.md) on Intel GPU.
- [2024/12] We added both ***Python*** and ***C++*** support for Intel Core Ultra [NPU](Quickstart/npu_quickstart.md) (including 100H, 200V and 200K series). - [2024/12] We added both ***Python*** and ***C++*** support for Intel Core Ultra [NPU](Quickstart/npu_quickstart.md) (including 100H, 200V and 200K series).
- [2024/11] We added support for running [vLLM 0.6.2](DockerGuides/vllm_docker_quickstart.md) on Intel Arc GPUs.
<details><summary>More updates</summary> <details><summary>More updates</summary>
<br/> <br/>
- [2024/11] We added support for running [vLLM 0.6.2](DockerGuides/vllm_docker_quickstart.md) on Intel Arc GPUs.
- [2024/07] We added support for running Microsoft's **GraphRAG** using local LLM on Intel GPU; see the quickstart guide [here](Quickstart/graphrag_quickstart.md). - [2024/07] We added support for running Microsoft's **GraphRAG** using local LLM on Intel GPU; see the quickstart guide [here](Quickstart/graphrag_quickstart.md).
- [2024/07] We added extensive support for Large Multimodal Models, including [StableDiffusion](../../python/llm/example/GPU/HuggingFace/Multimodal/StableDiffusion), [Phi-3-Vision](../../python/llm/example/GPU/HuggingFace/Multimodal/phi-3-vision), [Qwen-VL](../../python/llm/example/GPU/HuggingFace/Multimodal/qwen-vl), and [more](../../python/llm/example/GPU/HuggingFace/Multimodal). - [2024/07] We added extensive support for Large Multimodal Models, including [StableDiffusion](../../python/llm/example/GPU/HuggingFace/Multimodal/StableDiffusion), [Phi-3-Vision](../../python/llm/example/GPU/HuggingFace/Multimodal/phi-3-vision), [Qwen-VL](../../python/llm/example/GPU/HuggingFace/Multimodal/qwen-vl), and [more](../../python/llm/example/GPU/HuggingFace/Multimodal).
- [2024/07] We added **FP6** support on Intel [GPU](../../python/llm/example/GPU/HuggingFace/More-Data-Types). - [2024/07] We added **FP6** support on Intel [GPU](../../python/llm/example/GPU/HuggingFace/More-Data-Types).
@ -50,6 +52,7 @@
## `ipex-llm` Quickstart ## `ipex-llm` Quickstart
### Use ### Use
- [Ollama Portable Zip](Quickstart/ollama_portablze_zip_quickstart.md): running **Ollama** on Intel GPU ***without the need of manual installations***
- [Arc B580](Quickstart/bmg_quickstart.md): running `ipex-llm` on Intel Arc **B580** GPU for Ollama, llama.cpp, PyTorch, HuggingFace, etc. - [Arc B580](Quickstart/bmg_quickstart.md): running `ipex-llm` on Intel Arc **B580** GPU for Ollama, llama.cpp, PyTorch, HuggingFace, etc.
- [NPU](Quickstart/npu_quickstart.md): running `ipex-llm` on Intel **NPU** in both Python and C++ - [NPU](Quickstart/npu_quickstart.md): running `ipex-llm` on Intel **NPU** in both Python and C++
- [llama.cpp](Quickstart/llama_cpp_quickstart.md): running **llama.cpp** (*using C++ interface of `ipex-llm`*) on Intel GPU - [llama.cpp](Quickstart/llama_cpp_quickstart.md): running **llama.cpp** (*using C++ interface of `ipex-llm`*) on Intel GPU

View file

@ -4,14 +4,16 @@
</p> </p>
## 最新更新 🔥 ## 最新更新 🔥
- [2025/02] 新增 [Ollama Portable Zip](https://github.com/intel/ipex-llm/releases/tag/v2.2.0-nightly) 在 Intel GPU 上直接**免安装运行 Ollama** (包括 [Windows](Quickstart/ollama_portablze_zip_quickstart.zh-CN.md#windows用户指南) 和 [Linux](Quickstart/ollama_portablze_zip_quickstart.zh-CN.md#linux用户指南))。
- [2025/02] 新增在 Intel Arc GPUs 上运行 [vLLM 0.6.6](DockerGuides/vllm_docker_quickstart.md) 的支持。
- [2025/01] 新增在 Intel Arc [B580](Quickstart/bmg_quickstart.md) GPU 上运行 `ipex-llm` 的指南。 - [2025/01] 新增在 Intel Arc [B580](Quickstart/bmg_quickstart.md) GPU 上运行 `ipex-llm` 的指南。
- [2024/12] 新增在 Intel GPU 上运行 [Ollama 0.4.6](Quickstart/ollama_quickstart.zh-CN.md) 的支持。 - [2025/01] 新增在 Intel GPU 上运行 [Ollama 0.5.4](Quickstart/ollama_quickstart.zh-CN.md) 的支持。
- [2024/12] 增加了对 Intel Core Ultra [NPU](Quickstart/npu_quickstart.md)(包括 100H200V 和 200K 系列)的 **Python****C++** 支持。 - [2024/12] 增加了对 Intel Core Ultra [NPU](Quickstart/npu_quickstart.md)(包括 100H200V 和 200K 系列)的 **Python****C++** 支持。
- [2024/11] 新增在 Intel Arc GPUs 上运行 [vLLM 0.6.2](DockerGuides/vllm_docker_quickstart.md) 的支持。
<details><summary>更多更新</summary> <details><summary>更多更新</summary>
<br/> <br/>
- [2024/11] 新增在 Intel Arc GPUs 上运行 [vLLM 0.6.2](DockerGuides/vllm_docker_quickstart.md) 的支持。
- [2024/07] 新增 Microsoft **GraphRAG** 的支持(使用运行在本地 Intel GPU 上的 LLM详情参考[快速入门指南](Quickstart/graphrag_quickstart.md)。 - [2024/07] 新增 Microsoft **GraphRAG** 的支持(使用运行在本地 Intel GPU 上的 LLM详情参考[快速入门指南](Quickstart/graphrag_quickstart.md)。
- [2024/07] 全面增强了对多模态大模型的支持,包括 [StableDiffusion](../../python/llm/example/GPU/HuggingFace/Multimodal/StableDiffusion), [Phi-3-Vision](../../python/llm/example/GPU/HuggingFace/Multimodal/phi-3-vision), [Qwen-VL](../../python/llm/example/GPU/HuggingFace/Multimodal/qwen-vl),更多详情请点击[这里](../../python/llm/example/GPU/HuggingFace/Multimodal)。 - [2024/07] 全面增强了对多模态大模型的支持,包括 [StableDiffusion](../../python/llm/example/GPU/HuggingFace/Multimodal/StableDiffusion), [Phi-3-Vision](../../python/llm/example/GPU/HuggingFace/Multimodal/phi-3-vision), [Qwen-VL](../../python/llm/example/GPU/HuggingFace/Multimodal/qwen-vl),更多详情请点击[这里](../../python/llm/example/GPU/HuggingFace/Multimodal)。
- [2024/07] 新增 Intel GPU 上 **FP6** 的支持,详情参考[更多数据类型样例](../../python/llm/example/GPU/HuggingFace/More-Data-Types)。 - [2024/07] 新增 Intel GPU 上 **FP6** 的支持,详情参考[更多数据类型样例](../../python/llm/example/GPU/HuggingFace/More-Data-Types)。
@ -48,6 +50,7 @@
## `ipex-llm` 快速入门 ## `ipex-llm` 快速入门
### 使用 ### 使用
- [Ollama Portable Zip](Quickstart/ollama_portablze_zip_quickstart.zh-CN.md): 在 Intel GPU 上直接**免安装运行 Ollama**。
- [Arc B580](Quickstart/bmg_quickstart.md): 在 Intel Arc **B580** GPU 上运行 `ipex-llm`(包括 Ollama, llama.cpp, PyTorch, HuggingFace 等) - [Arc B580](Quickstart/bmg_quickstart.md): 在 Intel Arc **B580** GPU 上运行 `ipex-llm`(包括 Ollama, llama.cpp, PyTorch, HuggingFace 等)
- [NPU](Quickstart/npu_quickstart.md): 在 Intel **NPU** 上运行 `ipex-llm`(支持 Python 和 C++ - [NPU](Quickstart/npu_quickstart.md): 在 Intel **NPU** 上运行 `ipex-llm`(支持 Python 和 C++
- [llama.cpp](Quickstart/llama_cpp_quickstart.zh-CN.md): 在 Intel GPU 上运行 **llama.cpp** (*使用 `ipex-llm` 的 C++ 接口*) - [llama.cpp](Quickstart/llama_cpp_quickstart.zh-CN.md): 在 Intel GPU 上运行 **llama.cpp** (*使用 `ipex-llm` 的 C++ 接口*)