diff --git a/README.md b/README.md
index 58d12945..e4a32293 100644
--- a/README.md
+++ b/README.md
@@ -5,12 +5,12 @@
**`IPEX-LLM`** is an LLM acceleration library for Intel [GPU](docs/mddocs/Quickstart/install_windows_gpu.md) *(e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max)*, [NPU](docs/mddocs/Quickstart/npu_quickstart.md) and CPU [^1].
> [!NOTE]
-> - *`IPEX-LLM` provides seamless integration with [llama.cpp](docs/mddocs/Quickstart/llama_cpp_quickstart.md), [Ollama](docs/mddocs/Quickstart/ollama_portable_zip_quickstart.md), [HuggingFace transformers](python/llm/example/GPU/HuggingFace), [LangChain](python/llm/example/GPU/LangChain), [LlamaIndex](python/llm/example/GPU/LlamaIndex), [vLLM](docs/mddocs/Quickstart/vLLM_quickstart.md), [Text-Generation-WebUI](docs/mddocs/Quickstart/webui_quickstart.md), [DeepSpeed-AutoTP](python/llm/example/GPU/Deepspeed-AutoTP), [FastChat](docs/mddocs/Quickstart/fastchat_quickstart.md), [Axolotl](docs/mddocs/Quickstart/axolotl_quickstart.md), [HuggingFace PEFT](python/llm/example/GPU/LLM-Finetuning), [HuggingFace TRL](python/llm/example/GPU/LLM-Finetuning/DPO), [AutoGen](python/llm/example/CPU/Applications/autogen), [ModeScope](python/llm/example/GPU/ModelScope-Models), etc.*
+> - *`IPEX-LLM` provides seamless integration with [llama.cpp](docs/mddocs/Quickstart/llamacpp_portable_zip_gpu_quickstart.md), [Ollama](docs/mddocs/Quickstart/ollama_portable_zip_quickstart.md), [HuggingFace transformers](python/llm/example/GPU/HuggingFace), [LangChain](python/llm/example/GPU/LangChain), [LlamaIndex](python/llm/example/GPU/LlamaIndex), [vLLM](docs/mddocs/Quickstart/vLLM_quickstart.md), [Text-Generation-WebUI](docs/mddocs/Quickstart/webui_quickstart.md), [DeepSpeed-AutoTP](python/llm/example/GPU/Deepspeed-AutoTP), [FastChat](docs/mddocs/Quickstart/fastchat_quickstart.md), [Axolotl](docs/mddocs/Quickstart/axolotl_quickstart.md), [HuggingFace PEFT](python/llm/example/GPU/LLM-Finetuning), [HuggingFace TRL](python/llm/example/GPU/LLM-Finetuning/DPO), [AutoGen](python/llm/example/CPU/Applications/autogen), [ModeScope](python/llm/example/GPU/ModelScope-Models), etc.*
> - ***70+ models** have been optimized/verified on `ipex-llm` (e.g., Llama, Phi, Mistral, Mixtral, Whisper, DeepSeek, Qwen, ChatGLM, MiniCPM, Qwen-VL, MiniCPM-V and more), with state-of-art **LLM optimizations**, **XPU acceleration** and **low-bit (FP8/FP6/FP4/INT4) support**; see the complete list [here](#verified-models).*
## Latest Update 🔥
-- [2025/02] We added support of [llama.cpp Portable Zip](https://github.com/intel/ipex-llm/releases/tag/v2.2.0-nightly) for Intel [GPU](docs/mddocs/Quickstart/llamacpp_portable_zip_gpu_quickstart.md) and [NPU](docs/mddocs/Quickstart/llama_cpp_npu_portable_zip_quickstart.md).
-- [2025/02] We added support of [Ollama Portable Zip](https://github.com/intel/ipex-llm/releases/tag/v2.2.0-nightly) to directly run Ollama on Intel GPU for both [Windows](docs/mddocs/Quickstart/ollama_portable_zip_quickstart.md#windows-quickstart) and [Linux](docs/mddocs/Quickstart/ollama_portable_zip_quickstart.md#linux-quickstart) (***without the need of manual installations***).
+- [2025/02] We added support of [llama.cpp Portable Zip](https://github.com/intel/ipex-llm/releases/tag/v2.2.0-nightly) for Intel **GPU** (both [Windows](docs/mddocs/Quickstart/llamacpp_portable_zip_gpu_quickstart.md#windows-quickstart) and [Linux](docs/mddocs/Quickstart/llamacpp_portable_zip_gpu_quickstart.md#linux-quickstart)) and **NPU** ([Windows](docs/mddocs/Quickstart/llama_cpp_npu_portable_zip_quickstart.md) only).
+- [2025/02] We added support of [Ollama Portable Zip](https://github.com/intel/ipex-llm/releases/tag/v2.2.0-nightly) to directly run Ollama on Intel **GPU** for both [Windows](docs/mddocs/Quickstart/ollama_portable_zip_quickstart.md#windows-quickstart) and [Linux](docs/mddocs/Quickstart/ollama_portable_zip_quickstart.md#linux-quickstart) (***without the need of manual installations***).
- [2025/02] We added support for running [vLLM 0.6.6](docs/mddocs/DockerGuides/vllm_docker_quickstart.md) on Intel Arc GPUs.
- [2025/01] We added the guide for running `ipex-llm` on Intel Arc [B580](docs/mddocs/Quickstart/bmg_quickstart.md) GPU.
- [2025/01] We added support for running [Ollama 0.5.4](docs/mddocs/Quickstart/ollama_quickstart.md) on Intel GPU.
@@ -88,7 +88,7 @@ See demos of running local LLMs *on Intel Core Ultra iGPU, Intel Core Ultra NPU,
- Ollama (Mistral-7B, Q4_K)
+ Ollama (Mistral-7B, Q4_K)
|
HuggingFace (Llama3.2-3B, SYM_INT4)
@@ -97,12 +97,12 @@ See demos of running local LLMs *on Intel Core Ultra iGPU, Intel Core Ultra NPU,
TextGeneration-WebUI (Llama3-8B, FP8)
|
- llama.cpp (DeepSeek-R1-Distill-Qwen-32B, Q4_K)
+ llama.cpp (DeepSeek-R1-Distill-Qwen-32B, Q4_K)
|
@@ -178,11 +178,10 @@ Please see the **Perplexity** result below (tested on Wikitext dataset using the
## `ipex-llm` Quickstart
### Use
-- [Ollama Portable Zip](docs/mddocs/Quickstart/ollama_portable_zip_quickstart.md): running **Ollama** on Intel GPU ***without the need of manual installations***
+- [Ollama](docs/mddocs/Quickstart/ollama_portable_zip_quickstart.md): running **Ollama** on Intel GPU ***without the need of manual installations***
+- [llama.cpp](docs/mddocs/Quickstart/llamacpp_portable_zip_gpu_quickstart.md): running **llama.cpp** on Intel GPU ***without the need of manual installations***
- [Arc B580](docs/mddocs/Quickstart/bmg_quickstart.md): running `ipex-llm` on Intel Arc **B580** GPU for Ollama, llama.cpp, PyTorch, HuggingFace, etc.
-- [NPU](docs/mddocs/Quickstart/npu_quickstart.md): running `ipex-llm` on Intel **NPU** in both Python and C++
-- [Ollama](docs/mddocs/Quickstart/ollama_quickstart.md): running **ollama** (*using C++ interface of `ipex-llm`*) on Intel GPU
-- [llama.cpp](docs/mddocs/Quickstart/llama_cpp_quickstart.md): running **llama.cpp** (*using C++ interface of `ipex-llm`*) on Intel GPU
+- [NPU](docs/mddocs/Quickstart/npu_quickstart.md): running `ipex-llm` on Intel **NPU** in both Python/C++ or [llama.cpp](docs/mddocs/Quickstart/llama_cpp_npu_portable_zip_quickstart.md) API.
- [PyTorch/HuggingFace](docs/mddocs/Quickstart/install_windows_gpu.md): running **PyTorch**, **HuggingFace**, **LangChain**, **LlamaIndex**, etc. (*using Python interface of `ipex-llm`*) on Intel GPU for [Windows](docs/mddocs/Quickstart/install_windows_gpu.md) and [Linux](docs/mddocs/Quickstart/install_linux_gpu.md)
- [vLLM](docs/mddocs/Quickstart/vLLM_quickstart.md): running `ipex-llm` in **vLLM** on both Intel [GPU](docs/mddocs/DockerGuides/vllm_docker_quickstart.md) and [CPU](docs/mddocs/DockerGuides/vllm_cpu_docker_quickstart.md)
- [FastChat](docs/mddocs/Quickstart/fastchat_quickstart.md): running `ipex-llm` in **FastChat** serving on on both Intel GPU and CPU
diff --git a/README.zh-CN.md b/README.zh-CN.md
index ae7a637a..a568b456 100644
--- a/README.zh-CN.md
+++ b/README.zh-CN.md
@@ -5,12 +5,12 @@
**`ipex-llm`** 是一个将大语言模型高效地运行于 Intel [GPU](docs/mddocs/Quickstart/install_windows_gpu.md) *(如搭载集成显卡的个人电脑,Arc 独立显卡、Flex 及 Max 数据中心 GPU 等)*、[NPU](docs/mddocs/Quickstart/npu_quickstart.md) 和 CPU 上的大模型 XPU 加速库[^1]。
> [!NOTE]
-> - *`ipex-llm`可以与 [llama.cpp](docs/mddocs/Quickstart/llama_cpp_quickstart.zh-CN.md), [Ollama](docs/mddocs/Quickstart/ollama_portable_zip_quickstart.zh-CN.md), [HuggingFace transformers](python/llm/example/GPU/HuggingFace), [LangChain](python/llm/example/GPU/LangChain), [LlamaIndex](python/llm/example/GPU/LlamaIndex), [vLLM](docs/mddocs/Quickstart/vLLM_quickstart.md), [Text-Generation-WebUI](docs/mddocs/Quickstart/webui_quickstart.md), [DeepSpeed-AutoTP](python/llm/example/GPU/Deepspeed-AutoTP), [FastChat](docs/mddocs/Quickstart/fastchat_quickstart.md), [Axolotl](docs/mddocs/Quickstart/axolotl_quickstart.md), [HuggingFace PEFT](python/llm/example/GPU/LLM-Finetuning), [HuggingFace TRL](python/llm/example/GPU/LLM-Finetuning/DPO), [AutoGen](python/llm/example/CPU/Applications/autogen), [ModeScope](python/llm/example/GPU/ModelScope-Models) 等无缝衔接。*
+> - *`ipex-llm`可以与 [llama.cpp](docs/mddocs/Quickstart/llamacpp_portable_zip_gpu_quickstart.md), [Ollama](docs/mddocs/Quickstart/ollama_portable_zip_quickstart.zh-CN.md), [HuggingFace transformers](python/llm/example/GPU/HuggingFace), [LangChain](python/llm/example/GPU/LangChain), [LlamaIndex](python/llm/example/GPU/LlamaIndex), [vLLM](docs/mddocs/Quickstart/vLLM_quickstart.md), [Text-Generation-WebUI](docs/mddocs/Quickstart/webui_quickstart.md), [DeepSpeed-AutoTP](python/llm/example/GPU/Deepspeed-AutoTP), [FastChat](docs/mddocs/Quickstart/fastchat_quickstart.md), [Axolotl](docs/mddocs/Quickstart/axolotl_quickstart.md), [HuggingFace PEFT](python/llm/example/GPU/LLM-Finetuning), [HuggingFace TRL](python/llm/example/GPU/LLM-Finetuning/DPO), [AutoGen](python/llm/example/CPU/Applications/autogen), [ModeScope](python/llm/example/GPU/ModelScope-Models) 等无缝衔接。*
> - ***70+** 模型已经在 `ipex-llm` 上得到优化和验证(如 Llama, Phi, Mistral, Mixtral, Whisper, DeepSeek, Qwen, ChatGLM, MiniCPM, Qwen-VL, MiniCPM-V 等), 以获得先进的 **大模型算法优化**, **XPU 加速** 以及 **低比特(FP8FP8/FP6/FP4/INT4)支持**;更多模型信息请参阅[这里](#模型验证)。*
## 最新更新 🔥
-- [2025/02] 新增 [llama.cpp Portable Zip](https://github.com/intel/ipex-llm/releases/tag/v2.2.0-nightly) 在 Intel [GPU](docs/mddocs/Quickstart/llamacpp_portable_zip_gpu_quickstart.md) 和 [NPU](docs/mddocs/Quickstart/llama_cpp_npu_portable_zip_quickstart.zh-CN.md) 上直接**免安装运行 llama.cpp**。
-- [2025/02] 新增 [Ollama Portable Zip](https://github.com/intel/ipex-llm/releases/tag/v2.2.0-nightly) 在 Intel GPU 上直接**免安装运行 Ollama** (包括 [Windows](docs/mddocs/Quickstart/ollama_portable_zip_quickstart.zh-CN.md#windows用户指南) 和 [Linux](docs/mddocs/Quickstart/ollama_portable_zip_quickstart.zh-CN.md#linux用户指南))。
+- [2025/02] 新增 [llama.cpp Portable Zip](https://github.com/intel/ipex-llm/releases/tag/v2.2.0-nightly) 在 Intel **GPU** (包括 [Windows](docs/mddocs/Quickstart/llamacpp_portable_zip_gpu_quickstart.md#windows-quickstart) 和 [Linux](docs/mddocs/Quickstart/llamacpp_portable_zip_gpu_quickstart.md#linux-quickstart)) 和 **NPU** (仅 [Windows](docs/mddocs/Quickstart/llama_cpp_npu_portable_zip_quickstart.zh-CN.md)) 上直接**免安装运行 llama.cpp**。
+- [2025/02] 新增 [Ollama Portable Zip](https://github.com/intel/ipex-llm/releases/tag/v2.2.0-nightly) 在 Intel **GPU** 上直接**免安装运行 Ollama** (包括 [Windows](docs/mddocs/Quickstart/ollama_portable_zip_quickstart.zh-CN.md#windows用户指南) 和 [Linux](docs/mddocs/Quickstart/ollama_portable_zip_quickstart.zh-CN.md#linux用户指南))。
- [2025/02] 新增在 Intel Arc GPUs 上运行 [vLLM 0.6.6](docs/mddocs/DockerGuides/vllm_docker_quickstart.md) 的支持。
- [2025/01] 新增在 Intel Arc [B580](docs/mddocs/Quickstart/bmg_quickstart.md) GPU 上运行 `ipex-llm` 的指南。
- [2025/01] 新增在 Intel GPU 上运行 [Ollama 0.5.4](docs/mddocs/Quickstart/ollama_quickstart.zh-CN.md) 的支持。
@@ -88,7 +88,7 @@
- Ollama (Mistral-7B, Q4_K)
+ Ollama (Mistral-7B, Q4_K)
|
HuggingFace (Llama3.2-3B, SYM_INT4)
@@ -97,7 +97,7 @@
TextGeneration-WebUI (Llama3-8B, FP8)
|
- llama.cpp (DeepSeek-R1-Distill-Qwen-32B, Q4_K)
+ llama.cpp (DeepSeek-R1-Distill-Qwen-32B, Q4_K)
|
@@ -178,11 +178,10 @@ See the demo of running [*Text-Generation-WebUI*](https://ipex-llm.readthedocs.i
## `ipex-llm` 快速入门
### 使用
-- [Ollama Portable Zip](docs/mddocs/Quickstart/ollama_portable_zip_quickstart.zh-CN.md): 在 Intel GPU 上直接**免安装运行 Ollama**。
+- [Ollama](docs/mddocs/Quickstart/ollama_portable_zip_quickstart.zh-CN.md): 在 Intel GPU 上直接**免安装运行 Ollama**
+- [llama.cpp](docs/mddocs/Quickstart/llamacpp_portable_zip_gpu_quickstart.md): 在 Intel GPU 上直接**免安装运行llama.cpp**
- [Arc B580](docs/mddocs/Quickstart/bmg_quickstart.md): 在 Intel Arc **B580** GPU 上运行 `ipex-llm`(包括 Ollama, llama.cpp, PyTorch, HuggingFace 等)
-- [NPU](docs/mddocs/Quickstart/npu_quickstart.md): 在 Intel **NPU** 上运行 `ipex-llm`(支持 Python 和 C++)
-- [Ollama](docs/mddocs/Quickstart/ollama_quickstart.zh-CN.md): 在 Intel GPU 上运行 **ollama** (*使用 `ipex-llm` 的 C++ 接口*)
-- [llama.cpp](docs/mddocs/Quickstart/llama_cpp_quickstart.zh-CN.md): 在 Intel GPU 上运行 **llama.cpp** (*使用 `ipex-llm` 的 C++ 接口*)
+- [NPU](docs/mddocs/Quickstart/npu_quickstart.md): 在 Intel **NPU** 上运行 `ipex-llm`(支持 Python/C++ 及 [llama.cpp](docs/mddocs/Quickstart/llama_cpp_npu_portable_zip_quickstart.zh-CN.md) API)
- [PyTorch/HuggingFace](docs/mddocs/Quickstart/install_windows_gpu.zh-CN.md): 使用 [Windows](docs/mddocs/Quickstart/install_windows_gpu.zh-CN.md) 和 [Linux](docs/mddocs/Quickstart/install_linux_gpu.zh-CN.md) 在 Intel GPU 上运行 **PyTorch**、**HuggingFace**、**LangChain**、**LlamaIndex** 等 (*使用 `ipex-llm` 的 Python 接口*)
- [vLLM](docs/mddocs/Quickstart/vLLM_quickstart.md): 在 Intel [GPU](docs/mddocs/DockerGuides/vllm_docker_quickstart.md) 和 [CPU](docs/mddocs/DockerGuides/vllm_cpu_docker_quickstart.md) 上使用 `ipex-llm` 运行 **vLLM**
- [FastChat](docs/mddocs/Quickstart/fastchat_quickstart.md): 在 Intel GPU 和 CPU 上使用 `ipex-llm` 运行 **FastChat** 服务
diff --git a/docs/mddocs/Quickstart/llama_cpp_quickstart.md b/docs/mddocs/Quickstart/llama_cpp_quickstart.md
index 111e2f0b..68c0fdfd 100644
--- a/docs/mddocs/Quickstart/llama_cpp_quickstart.md
+++ b/docs/mddocs/Quickstart/llama_cpp_quickstart.md
@@ -5,6 +5,9 @@
[ggerganov/llama.cpp](https://github.com/ggerganov/llama.cpp) prvoides fast LLM inference in pure C++ across a variety of hardware; you can now use the C++ interface of [`ipex-llm`](https://github.com/intel-analytics/ipex-llm) as an accelerated backend for `llama.cpp` running on Intel **GPU** *(e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max)*.
+> [!Important]
+> You may use [llama.cpp Portable Zip](./llamacpp_portable_zip_gpu_quickstart.md) to directly run llama.cpp on Intel GPU with ipex-llm (***without the need of manual installations***).
+
> [!NOTE]
> For installation on Intel Arc B-Series GPU (such as **B580**), please refer to this [guide](./bmg_quickstart.md).
diff --git a/docs/mddocs/Quickstart/llama_cpp_quickstart.zh-CN.md b/docs/mddocs/Quickstart/llama_cpp_quickstart.zh-CN.md
index 369c4031..0c9960b1 100644
--- a/docs/mddocs/Quickstart/llama_cpp_quickstart.zh-CN.md
+++ b/docs/mddocs/Quickstart/llama_cpp_quickstart.zh-CN.md
@@ -5,6 +5,9 @@
[ggerganov/llama.cpp](https://github.com/ggerganov/llama.cpp) 是一个使用纯C++实现的、支持多种硬件平台的高效大语言模型推理库。现在,借助 [`ipex-llm`](https://github.com/intel-analytics/ipex-llm) 的 C++ 接口作为其加速后端,你可以在 Intel **GPU** *(如配有集成显卡,以及 Arc,Flex 和 Max 等独立显卡的本地 PC)* 上,轻松部署并运行 `llama.cpp` 。
+> [!Important]
+> 现在可使用 [llama.cpp Portable Zip](./llamacpp_portable_zip_gpu_quickstart.md) 在 Intel GPU 上直接***免安装运行 llama.cpp***.
+
> [!NOTE]
> 如果是在 Intel Arc B 系列 GPU 上安装(例,**B580**),请参阅本[指南](./bmg_quickstart.md)。
diff --git a/docs/mddocs/Quickstart/llamacpp_portable_zip_gpu_quickstart.md b/docs/mddocs/Quickstart/llamacpp_portable_zip_gpu_quickstart.md
index 82e60add..0619a589 100644
--- a/docs/mddocs/Quickstart/llamacpp_portable_zip_gpu_quickstart.md
+++ b/docs/mddocs/Quickstart/llamacpp_portable_zip_gpu_quickstart.md
@@ -25,6 +25,7 @@ This guide demonstrates how to use [llama.cpp portable zip](https://github.com/i
- [Error: Detected different sycl devices](#error-detected-different-sycl-devices)
- [Multi-GPUs usage](#multi-gpus-usage)
- [Performance Environment](#performance-environment)
+- [More Details](llama_cpp_quickstart.md)
## Windows Quickstart
@@ -68,10 +69,8 @@ Part of outputs:
```
Found 1 SYCL devices:
-| | | | |Max | |Max |Global |
- |
-| | | | |compute|Max work|sub |mem |
- |
+| | | | |Max | |Max |Global | |
+| | | | |compute|Max work|sub |mem | |
|ID| Device Type| Name|Version|units |group |group|size | Driver version|
|--|-------------------|---------------------------------------|-------|-------|--------|-----|-------|---------------------|
| 0| [level_zero:gpu:0]| Intel Arc Graphics| 12.71| 128| 1024| 32| 13578M| 1.3.27504|
@@ -149,10 +148,8 @@ Part of outputs:
```bash
Found 1 SYCL devices:
-| | | | |Max | |Max |Global |
- |
-| | | | |compute|Max work|sub |mem |
- |
+| | | | |Max | |Max |Global | |
+| | | | |compute|Max work|sub |mem | |
|ID| Device Type| Name|Version|units |group |group|size | Driver version|
|--|-------------------|---------------------------------------|-------|-------|--------|-----|-------|---------------------|
| 0| [level_zero:gpu:0]| Intel Arc Graphics| 12.71| 128| 1024| 32| 13578M| 1.3.27504|
@@ -185,7 +182,7 @@ XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXX [end of text]
```
-### FlashMoe for DeeSeek V3/R1
+### FlashMoE for DeeSeek V3/R1
FlashMoE is a command-line tool built on llama.cpp, optimized for mixture-of-experts (MoE) models such as DeepSeek V3/R1. Now, it's available for Linux platforms.
diff --git a/docs/mddocs/Quickstart/ollama_quickstart.md b/docs/mddocs/Quickstart/ollama_quickstart.md
index 0e04de32..920769ff 100644
--- a/docs/mddocs/Quickstart/ollama_quickstart.md
+++ b/docs/mddocs/Quickstart/ollama_quickstart.md
@@ -6,7 +6,7 @@
[ollama/ollama](https://github.com/ollama/ollama) is popular framework designed to build and run language models on a local machine; you can now use the C++ interface of [`ipex-llm`](https://github.com/intel-analytics/ipex-llm) as an accelerated backend for `ollama` running on Intel **GPU** *(e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max)*.
> [!Important]
-> You may use [Ollama portable zip](./ollama_portable_zip_quickstart.md) to directly run Ollama on Intel GPU with ipex-llm (***without the need of manual installations***).
+> You may use [Ollama Portable Zip](./ollama_portable_zip_quickstart.md) to directly run Ollama on Intel GPU with ipex-llm (***without the need of manual installations***).
> [!NOTE]
> For installation on Intel Arc B-Series GPU (such as **B580**), please refer to this [guide](./bmg_quickstart.md).
diff --git a/docs/mddocs/README.md b/docs/mddocs/README.md
index 0107a091..5f1d1abf 100644
--- a/docs/mddocs/README.md
+++ b/docs/mddocs/README.md
@@ -6,7 +6,8 @@
**`IPEX-LLM`** is an LLM acceleration library for Intel [GPU](Quickstart/install_windows_gpu.md) *(e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max)*, [NPU](Quickstart/npu_quickstart.md) and CPU [^1].
## Latest Update 🔥
-- [2025/02] We added support of [Ollama Portable Zip](https://github.com/intel/ipex-llm/releases/tag/v2.2.0-nightly) to directly run Ollama on Intel GPU for both [Windows](Quickstart/ollama_portable_zip_quickstart.md#windows-quickstart) and [Linux](docs/mddocs/Quickstart/ollama_portable_zip_quickstart.md#linux-quickstart) (***without the need of manual installations***).
+- [2025/02] We added support of [llama.cpp Portable Zip](https://github.com/intel/ipex-llm/releases/tag/v2.2.0-nightly) for Intel **GPU** (both [Windows](Quickstart/llamacpp_portable_zip_gpu_quickstart.md#windows-quickstart) and [Linux](Quickstart/llamacpp_portable_zip_gpu_quickstart.md#linux-quickstart)) and **NPU** ([Windows](Quickstart/llama_cpp_npu_portable_zip_quickstart.md) only).
+- [2025/02] We added support of [Ollama Portable Zip](https://github.com/intel/ipex-llm/releases/tag/v2.2.0-nightly) to directly run Ollama on Intel **GPU** for both [Windows](Quickstart/ollama_portable_zip_quickstart.md#windows-quickstart) and [Linux](Quickstart/ollama_portable_zip_quickstart.md#linux-quickstart) (***without the need of manual installations***).
- [2025/02] We added support for running [vLLM 0.6.6](DockerGuides/vllm_docker_quickstart.md) on Intel Arc GPUs.
- [2025/01] We added the guide for running `ipex-llm` on Intel Arc [B580](Quickstart/bmg_quickstart.md) GPU
- [2025/01] We added support for running [Ollama 0.5.4](Quickstart/ollama_quickstart.md) on Intel GPU.
diff --git a/docs/mddocs/README.zh-CN.md b/docs/mddocs/README.zh-CN.md
index 44c76db6..18c2f652 100644
--- a/docs/mddocs/README.zh-CN.md
+++ b/docs/mddocs/README.zh-CN.md
@@ -4,7 +4,8 @@
## 最新更新 🔥
-- [2025/02] 新增 [Ollama Portable Zip](https://github.com/intel/ipex-llm/releases/tag/v2.2.0-nightly) 在 Intel GPU 上直接**免安装运行 Ollama** (包括 [Windows](Quickstart/ollama_portable_zip_quickstart.zh-CN.md#windows用户指南) 和 [Linux](Quickstart/ollama_portable_zip_quickstart.zh-CN.md#linux用户指南))。
+- [2025/02] 新增 [llama.cpp Portable Zip](https://github.com/intel/ipex-llm/releases/tag/v2.2.0-nightly) 在 Intel **GPU** (包括 [Windows](Quickstart/llamacpp_portable_zip_gpu_quickstart.md#windows-quickstart) 和 [Linux](Quickstart/llamacpp_portable_zip_gpu_quickstart.md#linux-quickstart)) 和 **NPU** (仅 [Windows](Quickstart/llama_cpp_npu_portable_zip_quickstart.zh-CN.md)) 上直接**免安装运行 llama.cpp**。
+- [2025/02] 新增 [Ollama Portable Zip](https://github.com/intel/ipex-llm/releases/tag/v2.2.0-nightly) 在 Intel **GPU** 上直接**免安装运行 Ollama** (包括 [Windows](Quickstart/ollama_portable_zip_quickstart.zh-CN.md#windows用户指南) 和 [Linux](Quickstart/ollama_portable_zip_quickstart.zh-CN.md#linux用户指南))。
- [2025/02] 新增在 Intel Arc GPUs 上运行 [vLLM 0.6.6](DockerGuides/vllm_docker_quickstart.md) 的支持。
- [2025/01] 新增在 Intel Arc [B580](Quickstart/bmg_quickstart.md) GPU 上运行 `ipex-llm` 的指南。
- [2025/01] 新增在 Intel GPU 上运行 [Ollama 0.5.4](Quickstart/ollama_quickstart.zh-CN.md) 的支持。