Update quickstart (#12927)
This commit is contained in:
parent
0b5079833c
commit
69edc8b6f6
8 changed files with 36 additions and 33 deletions
23
README.md
23
README.md
|
|
@ -5,12 +5,12 @@
|
|||
|
||||
**`IPEX-LLM`** is an LLM acceleration library for Intel [GPU](docs/mddocs/Quickstart/install_windows_gpu.md) *(e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max)*, [NPU](docs/mddocs/Quickstart/npu_quickstart.md) and CPU [^1].
|
||||
> [!NOTE]
|
||||
> - *`IPEX-LLM` provides seamless integration with [llama.cpp](docs/mddocs/Quickstart/llama_cpp_quickstart.md), [Ollama](docs/mddocs/Quickstart/ollama_portable_zip_quickstart.md), [HuggingFace transformers](python/llm/example/GPU/HuggingFace), [LangChain](python/llm/example/GPU/LangChain), [LlamaIndex](python/llm/example/GPU/LlamaIndex), [vLLM](docs/mddocs/Quickstart/vLLM_quickstart.md), [Text-Generation-WebUI](docs/mddocs/Quickstart/webui_quickstart.md), [DeepSpeed-AutoTP](python/llm/example/GPU/Deepspeed-AutoTP), [FastChat](docs/mddocs/Quickstart/fastchat_quickstart.md), [Axolotl](docs/mddocs/Quickstart/axolotl_quickstart.md), [HuggingFace PEFT](python/llm/example/GPU/LLM-Finetuning), [HuggingFace TRL](python/llm/example/GPU/LLM-Finetuning/DPO), [AutoGen](python/llm/example/CPU/Applications/autogen), [ModeScope](python/llm/example/GPU/ModelScope-Models), etc.*
|
||||
> - *`IPEX-LLM` provides seamless integration with [llama.cpp](docs/mddocs/Quickstart/llamacpp_portable_zip_gpu_quickstart.md), [Ollama](docs/mddocs/Quickstart/ollama_portable_zip_quickstart.md), [HuggingFace transformers](python/llm/example/GPU/HuggingFace), [LangChain](python/llm/example/GPU/LangChain), [LlamaIndex](python/llm/example/GPU/LlamaIndex), [vLLM](docs/mddocs/Quickstart/vLLM_quickstart.md), [Text-Generation-WebUI](docs/mddocs/Quickstart/webui_quickstart.md), [DeepSpeed-AutoTP](python/llm/example/GPU/Deepspeed-AutoTP), [FastChat](docs/mddocs/Quickstart/fastchat_quickstart.md), [Axolotl](docs/mddocs/Quickstart/axolotl_quickstart.md), [HuggingFace PEFT](python/llm/example/GPU/LLM-Finetuning), [HuggingFace TRL](python/llm/example/GPU/LLM-Finetuning/DPO), [AutoGen](python/llm/example/CPU/Applications/autogen), [ModeScope](python/llm/example/GPU/ModelScope-Models), etc.*
|
||||
> - ***70+ models** have been optimized/verified on `ipex-llm` (e.g., Llama, Phi, Mistral, Mixtral, Whisper, DeepSeek, Qwen, ChatGLM, MiniCPM, Qwen-VL, MiniCPM-V and more), with state-of-art **LLM optimizations**, **XPU acceleration** and **low-bit (FP8/FP6/FP4/INT4) support**; see the complete list [here](#verified-models).*
|
||||
|
||||
## Latest Update 🔥
|
||||
- [2025/02] We added support of [llama.cpp Portable Zip](https://github.com/intel/ipex-llm/releases/tag/v2.2.0-nightly) for Intel [GPU](docs/mddocs/Quickstart/llamacpp_portable_zip_gpu_quickstart.md) and [NPU](docs/mddocs/Quickstart/llama_cpp_npu_portable_zip_quickstart.md).
|
||||
- [2025/02] We added support of [Ollama Portable Zip](https://github.com/intel/ipex-llm/releases/tag/v2.2.0-nightly) to directly run Ollama on Intel GPU for both [Windows](docs/mddocs/Quickstart/ollama_portable_zip_quickstart.md#windows-quickstart) and [Linux](docs/mddocs/Quickstart/ollama_portable_zip_quickstart.md#linux-quickstart) (***without the need of manual installations***).
|
||||
- [2025/02] We added support of [llama.cpp Portable Zip](https://github.com/intel/ipex-llm/releases/tag/v2.2.0-nightly) for Intel **GPU** (both [Windows](docs/mddocs/Quickstart/llamacpp_portable_zip_gpu_quickstart.md#windows-quickstart) and [Linux](docs/mddocs/Quickstart/llamacpp_portable_zip_gpu_quickstart.md#linux-quickstart)) and **NPU** ([Windows](docs/mddocs/Quickstart/llama_cpp_npu_portable_zip_quickstart.md) only).
|
||||
- [2025/02] We added support of [Ollama Portable Zip](https://github.com/intel/ipex-llm/releases/tag/v2.2.0-nightly) to directly run Ollama on Intel **GPU** for both [Windows](docs/mddocs/Quickstart/ollama_portable_zip_quickstart.md#windows-quickstart) and [Linux](docs/mddocs/Quickstart/ollama_portable_zip_quickstart.md#linux-quickstart) (***without the need of manual installations***).
|
||||
- [2025/02] We added support for running [vLLM 0.6.6](docs/mddocs/DockerGuides/vllm_docker_quickstart.md) on Intel Arc GPUs.
|
||||
- [2025/01] We added the guide for running `ipex-llm` on Intel Arc [B580](docs/mddocs/Quickstart/bmg_quickstart.md) GPU.
|
||||
- [2025/01] We added support for running [Ollama 0.5.4](docs/mddocs/Quickstart/ollama_quickstart.md) on Intel GPU.
|
||||
|
|
@ -88,7 +88,7 @@ See demos of running local LLMs *on Intel Core Ultra iGPU, Intel Core Ultra NPU,
|
|||
</tr>
|
||||
<tr>
|
||||
<td align="center" width="25%">
|
||||
<a href="docs/mddocs/Quickstart/ollama_quickstart.md">Ollama <br> (Mistral-7B, Q4_K) </a>
|
||||
<a href="docs/mddocs/Quickstart/ollama_portable_zip_quickstart.md">Ollama <br> (Mistral-7B, Q4_K) </a>
|
||||
</td>
|
||||
<td align="center" width="25%">
|
||||
<a href="docs/mddocs/Quickstart/npu_quickstart.md">HuggingFace <br> (Llama3.2-3B, SYM_INT4)</a>
|
||||
|
|
@ -97,12 +97,12 @@ See demos of running local LLMs *on Intel Core Ultra iGPU, Intel Core Ultra NPU,
|
|||
<a href="docs/mddocs/Quickstart/webui_quickstart.md">TextGeneration-WebUI <br> (Llama3-8B, FP8) </a>
|
||||
</td>
|
||||
<td align="center" width="25%">
|
||||
<a href="docs/mddocs/Quickstart/llama_cpp_quickstart.md">llama.cpp <br> (DeepSeek-R1-Distill-Qwen-32B, Q4_K)</a>
|
||||
<a href="docs/mddocs/Quickstart/llamacpp_portable_zip_gpu_quickstart.md">llama.cpp <br> (DeepSeek-R1-Distill-Qwen-32B, Q4_K)</a>
|
||||
</td> </tr>
|
||||
</table>
|
||||
|
||||
<!--
|
||||
See the demo of running [*Text-Generation-WebUI*](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/webui_quickstart.html), [*local RAG using LangChain-Chatchat*](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/chatchat_quickstart.html), [*llama.cpp*](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/llama_cpp_quickstart.html) and [*Ollama*](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/ollama_quickstart.html) *(on either Intel Core Ultra laptop or Arc GPU)* with `ipex-llm` below.
|
||||
See the demo of running [*Text-Generation-WebUI*](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/webui_quickstart.html), [*local RAG using LangChain-Chatchat*](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/chatchat_quickstart.html), [*llama.cpp*](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/llamacpp_portable_zip_gpu_quickstart.md) and [*Ollama*](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/ollama_quickstart.html) *(on either Intel Core Ultra laptop or Arc GPU)* with `ipex-llm` below.
|
||||
|
||||
<table width="100%">
|
||||
<tr>
|
||||
|
|
@ -131,10 +131,10 @@ See the demo of running [*Text-Generation-WebUI*](https://ipex-llm.readthedocs.i
|
|||
<a href="https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/chatchat_quickstart.html">Local RAG using LangChain-Chatchat</a>
|
||||
</td>
|
||||
<td align="center" width="25%">
|
||||
<a href="https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/llama_cpp_quickstart.html">llama.cpp</a>
|
||||
<a href="https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/llamacpp_portable_zip_gpu_quickstart.md">llama.cpp</a>
|
||||
</td>
|
||||
<td align="center" width="25%">
|
||||
<a href="https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/ollama_quickstart.html">Ollama</a>
|
||||
<a href="https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/ollama_portable_zip_quickstart.md">Ollama</a>
|
||||
</td> </tr>
|
||||
</table>
|
||||
-->
|
||||
|
|
@ -178,11 +178,10 @@ Please see the **Perplexity** result below (tested on Wikitext dataset using the
|
|||
## `ipex-llm` Quickstart
|
||||
|
||||
### Use
|
||||
- [Ollama Portable Zip](docs/mddocs/Quickstart/ollama_portable_zip_quickstart.md): running **Ollama** on Intel GPU ***without the need of manual installations***
|
||||
- [Ollama](docs/mddocs/Quickstart/ollama_portable_zip_quickstart.md): running **Ollama** on Intel GPU ***without the need of manual installations***
|
||||
- [llama.cpp](docs/mddocs/Quickstart/llamacpp_portable_zip_gpu_quickstart.md): running **llama.cpp** on Intel GPU ***without the need of manual installations***
|
||||
- [Arc B580](docs/mddocs/Quickstart/bmg_quickstart.md): running `ipex-llm` on Intel Arc **B580** GPU for Ollama, llama.cpp, PyTorch, HuggingFace, etc.
|
||||
- [NPU](docs/mddocs/Quickstart/npu_quickstart.md): running `ipex-llm` on Intel **NPU** in both Python and C++
|
||||
- [Ollama](docs/mddocs/Quickstart/ollama_quickstart.md): running **ollama** (*using C++ interface of `ipex-llm`*) on Intel GPU
|
||||
- [llama.cpp](docs/mddocs/Quickstart/llama_cpp_quickstart.md): running **llama.cpp** (*using C++ interface of `ipex-llm`*) on Intel GPU
|
||||
- [NPU](docs/mddocs/Quickstart/npu_quickstart.md): running `ipex-llm` on Intel **NPU** in both Python/C++ or [llama.cpp](docs/mddocs/Quickstart/llama_cpp_npu_portable_zip_quickstart.md) API.
|
||||
- [PyTorch/HuggingFace](docs/mddocs/Quickstart/install_windows_gpu.md): running **PyTorch**, **HuggingFace**, **LangChain**, **LlamaIndex**, etc. (*using Python interface of `ipex-llm`*) on Intel GPU for [Windows](docs/mddocs/Quickstart/install_windows_gpu.md) and [Linux](docs/mddocs/Quickstart/install_linux_gpu.md)
|
||||
- [vLLM](docs/mddocs/Quickstart/vLLM_quickstart.md): running `ipex-llm` in **vLLM** on both Intel [GPU](docs/mddocs/DockerGuides/vllm_docker_quickstart.md) and [CPU](docs/mddocs/DockerGuides/vllm_cpu_docker_quickstart.md)
|
||||
- [FastChat](docs/mddocs/Quickstart/fastchat_quickstart.md): running `ipex-llm` in **FastChat** serving on on both Intel GPU and CPU
|
||||
|
|
|
|||
|
|
@ -5,12 +5,12 @@
|
|||
|
||||
**`ipex-llm`** 是一个将大语言模型高效地运行于 Intel [GPU](docs/mddocs/Quickstart/install_windows_gpu.md) *(如搭载集成显卡的个人电脑,Arc 独立显卡、Flex 及 Max 数据中心 GPU 等)*、[NPU](docs/mddocs/Quickstart/npu_quickstart.md) 和 CPU 上的大模型 XPU 加速库[^1]。
|
||||
> [!NOTE]
|
||||
> - *`ipex-llm`可以与 [llama.cpp](docs/mddocs/Quickstart/llama_cpp_quickstart.zh-CN.md), [Ollama](docs/mddocs/Quickstart/ollama_portable_zip_quickstart.zh-CN.md), [HuggingFace transformers](python/llm/example/GPU/HuggingFace), [LangChain](python/llm/example/GPU/LangChain), [LlamaIndex](python/llm/example/GPU/LlamaIndex), [vLLM](docs/mddocs/Quickstart/vLLM_quickstart.md), [Text-Generation-WebUI](docs/mddocs/Quickstart/webui_quickstart.md), [DeepSpeed-AutoTP](python/llm/example/GPU/Deepspeed-AutoTP), [FastChat](docs/mddocs/Quickstart/fastchat_quickstart.md), [Axolotl](docs/mddocs/Quickstart/axolotl_quickstart.md), [HuggingFace PEFT](python/llm/example/GPU/LLM-Finetuning), [HuggingFace TRL](python/llm/example/GPU/LLM-Finetuning/DPO), [AutoGen](python/llm/example/CPU/Applications/autogen), [ModeScope](python/llm/example/GPU/ModelScope-Models) 等无缝衔接。*
|
||||
> - *`ipex-llm`可以与 [llama.cpp](docs/mddocs/Quickstart/llamacpp_portable_zip_gpu_quickstart.md), [Ollama](docs/mddocs/Quickstart/ollama_portable_zip_quickstart.zh-CN.md), [HuggingFace transformers](python/llm/example/GPU/HuggingFace), [LangChain](python/llm/example/GPU/LangChain), [LlamaIndex](python/llm/example/GPU/LlamaIndex), [vLLM](docs/mddocs/Quickstart/vLLM_quickstart.md), [Text-Generation-WebUI](docs/mddocs/Quickstart/webui_quickstart.md), [DeepSpeed-AutoTP](python/llm/example/GPU/Deepspeed-AutoTP), [FastChat](docs/mddocs/Quickstart/fastchat_quickstart.md), [Axolotl](docs/mddocs/Quickstart/axolotl_quickstart.md), [HuggingFace PEFT](python/llm/example/GPU/LLM-Finetuning), [HuggingFace TRL](python/llm/example/GPU/LLM-Finetuning/DPO), [AutoGen](python/llm/example/CPU/Applications/autogen), [ModeScope](python/llm/example/GPU/ModelScope-Models) 等无缝衔接。*
|
||||
> - ***70+** 模型已经在 `ipex-llm` 上得到优化和验证(如 Llama, Phi, Mistral, Mixtral, Whisper, DeepSeek, Qwen, ChatGLM, MiniCPM, Qwen-VL, MiniCPM-V 等), 以获得先进的 **大模型算法优化**, **XPU 加速** 以及 **低比特(FP8FP8/FP6/FP4/INT4)支持**;更多模型信息请参阅[这里](#模型验证)。*
|
||||
|
||||
## 最新更新 🔥
|
||||
- [2025/02] 新增 [llama.cpp Portable Zip](https://github.com/intel/ipex-llm/releases/tag/v2.2.0-nightly) 在 Intel [GPU](docs/mddocs/Quickstart/llamacpp_portable_zip_gpu_quickstart.md) 和 [NPU](docs/mddocs/Quickstart/llama_cpp_npu_portable_zip_quickstart.zh-CN.md) 上直接**免安装运行 llama.cpp**。
|
||||
- [2025/02] 新增 [Ollama Portable Zip](https://github.com/intel/ipex-llm/releases/tag/v2.2.0-nightly) 在 Intel GPU 上直接**免安装运行 Ollama** (包括 [Windows](docs/mddocs/Quickstart/ollama_portable_zip_quickstart.zh-CN.md#windows用户指南) 和 [Linux](docs/mddocs/Quickstart/ollama_portable_zip_quickstart.zh-CN.md#linux用户指南))。
|
||||
- [2025/02] 新增 [llama.cpp Portable Zip](https://github.com/intel/ipex-llm/releases/tag/v2.2.0-nightly) 在 Intel **GPU** (包括 [Windows](docs/mddocs/Quickstart/llamacpp_portable_zip_gpu_quickstart.md#windows-quickstart) 和 [Linux](docs/mddocs/Quickstart/llamacpp_portable_zip_gpu_quickstart.md#linux-quickstart)) 和 **NPU** (仅 [Windows](docs/mddocs/Quickstart/llama_cpp_npu_portable_zip_quickstart.zh-CN.md)) 上直接**免安装运行 llama.cpp**。
|
||||
- [2025/02] 新增 [Ollama Portable Zip](https://github.com/intel/ipex-llm/releases/tag/v2.2.0-nightly) 在 Intel **GPU** 上直接**免安装运行 Ollama** (包括 [Windows](docs/mddocs/Quickstart/ollama_portable_zip_quickstart.zh-CN.md#windows用户指南) 和 [Linux](docs/mddocs/Quickstart/ollama_portable_zip_quickstart.zh-CN.md#linux用户指南))。
|
||||
- [2025/02] 新增在 Intel Arc GPUs 上运行 [vLLM 0.6.6](docs/mddocs/DockerGuides/vllm_docker_quickstart.md) 的支持。
|
||||
- [2025/01] 新增在 Intel Arc [B580](docs/mddocs/Quickstart/bmg_quickstart.md) GPU 上运行 `ipex-llm` 的指南。
|
||||
- [2025/01] 新增在 Intel GPU 上运行 [Ollama 0.5.4](docs/mddocs/Quickstart/ollama_quickstart.zh-CN.md) 的支持。
|
||||
|
|
@ -88,7 +88,7 @@
|
|||
</tr>
|
||||
<tr>
|
||||
<td align="center" width="25%">
|
||||
<a href="docs/mddocs/Quickstart/ollama_quickstart.zh-CN.md">Ollama <br> (Mistral-7B, Q4_K) </a>
|
||||
<a href="docs/mddocs/Quickstart/ollama_portable_zip_quickstart.zh-CN.md">Ollama <br> (Mistral-7B, Q4_K) </a>
|
||||
</td>
|
||||
<td align="center" width="25%">
|
||||
<a href="docs/mddocs/Quickstart/npu_quickstart.md">HuggingFace <br> (Llama3.2-3B, SYM_INT4)</a>
|
||||
|
|
@ -97,7 +97,7 @@
|
|||
<a href="docs/mddocs/Quickstart/webui_quickstart.md">TextGeneration-WebUI <br> (Llama3-8B, FP8) </a>
|
||||
</td>
|
||||
<td align="center" width="25%">
|
||||
<a href="docs/mddocs/Quickstart/llama_cpp_quickstart.zh-CN.md">llama.cpp <br> (DeepSeek-R1-Distill-Qwen-32B, Q4_K)</a>
|
||||
<a href="docs/mddocs/Quickstart/llamacpp_portable_zip_gpu_quickstart.md">llama.cpp <br> (DeepSeek-R1-Distill-Qwen-32B, Q4_K)</a>
|
||||
</td> </tr>
|
||||
</table>
|
||||
|
||||
|
|
@ -178,11 +178,10 @@ See the demo of running [*Text-Generation-WebUI*](https://ipex-llm.readthedocs.i
|
|||
## `ipex-llm` 快速入门
|
||||
|
||||
### 使用
|
||||
- [Ollama Portable Zip](docs/mddocs/Quickstart/ollama_portable_zip_quickstart.zh-CN.md): 在 Intel GPU 上直接**免安装运行 Ollama**。
|
||||
- [Ollama](docs/mddocs/Quickstart/ollama_portable_zip_quickstart.zh-CN.md): 在 Intel GPU 上直接**免安装运行 Ollama**
|
||||
- [llama.cpp](docs/mddocs/Quickstart/llamacpp_portable_zip_gpu_quickstart.md): 在 Intel GPU 上直接**免安装运行llama.cpp**
|
||||
- [Arc B580](docs/mddocs/Quickstart/bmg_quickstart.md): 在 Intel Arc **B580** GPU 上运行 `ipex-llm`(包括 Ollama, llama.cpp, PyTorch, HuggingFace 等)
|
||||
- [NPU](docs/mddocs/Quickstart/npu_quickstart.md): 在 Intel **NPU** 上运行 `ipex-llm`(支持 Python 和 C++)
|
||||
- [Ollama](docs/mddocs/Quickstart/ollama_quickstart.zh-CN.md): 在 Intel GPU 上运行 **ollama** (*使用 `ipex-llm` 的 C++ 接口*)
|
||||
- [llama.cpp](docs/mddocs/Quickstart/llama_cpp_quickstart.zh-CN.md): 在 Intel GPU 上运行 **llama.cpp** (*使用 `ipex-llm` 的 C++ 接口*)
|
||||
- [NPU](docs/mddocs/Quickstart/npu_quickstart.md): 在 Intel **NPU** 上运行 `ipex-llm`(支持 Python/C++ 及 [llama.cpp](docs/mddocs/Quickstart/llama_cpp_npu_portable_zip_quickstart.zh-CN.md) API)
|
||||
- [PyTorch/HuggingFace](docs/mddocs/Quickstart/install_windows_gpu.zh-CN.md): 使用 [Windows](docs/mddocs/Quickstart/install_windows_gpu.zh-CN.md) 和 [Linux](docs/mddocs/Quickstart/install_linux_gpu.zh-CN.md) 在 Intel GPU 上运行 **PyTorch**、**HuggingFace**、**LangChain**、**LlamaIndex** 等 (*使用 `ipex-llm` 的 Python 接口*)
|
||||
- [vLLM](docs/mddocs/Quickstart/vLLM_quickstart.md): 在 Intel [GPU](docs/mddocs/DockerGuides/vllm_docker_quickstart.md) 和 [CPU](docs/mddocs/DockerGuides/vllm_cpu_docker_quickstart.md) 上使用 `ipex-llm` 运行 **vLLM**
|
||||
- [FastChat](docs/mddocs/Quickstart/fastchat_quickstart.md): 在 Intel GPU 和 CPU 上使用 `ipex-llm` 运行 **FastChat** 服务
|
||||
|
|
|
|||
|
|
@ -5,6 +5,9 @@
|
|||
|
||||
[ggerganov/llama.cpp](https://github.com/ggerganov/llama.cpp) prvoides fast LLM inference in pure C++ across a variety of hardware; you can now use the C++ interface of [`ipex-llm`](https://github.com/intel-analytics/ipex-llm) as an accelerated backend for `llama.cpp` running on Intel **GPU** *(e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max)*.
|
||||
|
||||
> [!Important]
|
||||
> You may use [llama.cpp Portable Zip](./llamacpp_portable_zip_gpu_quickstart.md) to directly run llama.cpp on Intel GPU with ipex-llm (***without the need of manual installations***).
|
||||
|
||||
> [!NOTE]
|
||||
> For installation on Intel Arc B-Series GPU (such as **B580**), please refer to this [guide](./bmg_quickstart.md).
|
||||
|
||||
|
|
|
|||
|
|
@ -5,6 +5,9 @@
|
|||
|
||||
[ggerganov/llama.cpp](https://github.com/ggerganov/llama.cpp) 是一个使用纯C++实现的、支持多种硬件平台的高效大语言模型推理库。现在,借助 [`ipex-llm`](https://github.com/intel-analytics/ipex-llm) 的 C++ 接口作为其加速后端,你可以在 Intel **GPU** *(如配有集成显卡,以及 Arc,Flex 和 Max 等独立显卡的本地 PC)* 上,轻松部署并运行 `llama.cpp` 。
|
||||
|
||||
> [!Important]
|
||||
> 现在可使用 [llama.cpp Portable Zip](./llamacpp_portable_zip_gpu_quickstart.md) 在 Intel GPU 上直接***免安装运行 llama.cpp***.
|
||||
|
||||
> [!NOTE]
|
||||
> 如果是在 Intel Arc B 系列 GPU 上安装(例,**B580**),请参阅本[指南](./bmg_quickstart.md)。
|
||||
|
||||
|
|
|
|||
|
|
@ -25,6 +25,7 @@ This guide demonstrates how to use [llama.cpp portable zip](https://github.com/i
|
|||
- [Error: Detected different sycl devices](#error-detected-different-sycl-devices)
|
||||
- [Multi-GPUs usage](#multi-gpus-usage)
|
||||
- [Performance Environment](#performance-environment)
|
||||
- [More Details](llama_cpp_quickstart.md)
|
||||
|
||||
## Windows Quickstart
|
||||
|
||||
|
|
@ -68,10 +69,8 @@ Part of outputs:
|
|||
|
||||
```
|
||||
Found 1 SYCL devices:
|
||||
| | | | |Max | |Max |Global |
|
||||
|
|
||||
| | | | |compute|Max work|sub |mem |
|
||||
|
|
||||
| | | | |Max | |Max |Global | |
|
||||
| | | | |compute|Max work|sub |mem | |
|
||||
|ID| Device Type| Name|Version|units |group |group|size | Driver version|
|
||||
|--|-------------------|---------------------------------------|-------|-------|--------|-----|-------|---------------------|
|
||||
| 0| [level_zero:gpu:0]| Intel Arc Graphics| 12.71| 128| 1024| 32| 13578M| 1.3.27504|
|
||||
|
|
@ -149,10 +148,8 @@ Part of outputs:
|
|||
|
||||
```bash
|
||||
Found 1 SYCL devices:
|
||||
| | | | |Max | |Max |Global |
|
||||
|
|
||||
| | | | |compute|Max work|sub |mem |
|
||||
|
|
||||
| | | | |Max | |Max |Global | |
|
||||
| | | | |compute|Max work|sub |mem | |
|
||||
|ID| Device Type| Name|Version|units |group |group|size | Driver version|
|
||||
|--|-------------------|---------------------------------------|-------|-------|--------|-----|-------|---------------------|
|
||||
| 0| [level_zero:gpu:0]| Intel Arc Graphics| 12.71| 128| 1024| 32| 13578M| 1.3.27504|
|
||||
|
|
@ -185,7 +182,7 @@ XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
|
|||
<answer>XXXX</answer> [end of text]
|
||||
```
|
||||
|
||||
### FlashMoe for DeeSeek V3/R1
|
||||
### FlashMoE for DeeSeek V3/R1
|
||||
|
||||
FlashMoE is a command-line tool built on llama.cpp, optimized for mixture-of-experts (MoE) models such as DeepSeek V3/R1. Now, it's available for Linux platforms.
|
||||
|
||||
|
|
|
|||
|
|
@ -6,7 +6,7 @@
|
|||
[ollama/ollama](https://github.com/ollama/ollama) is popular framework designed to build and run language models on a local machine; you can now use the C++ interface of [`ipex-llm`](https://github.com/intel-analytics/ipex-llm) as an accelerated backend for `ollama` running on Intel **GPU** *(e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max)*.
|
||||
|
||||
> [!Important]
|
||||
> You may use [Ollama portable zip](./ollama_portable_zip_quickstart.md) to directly run Ollama on Intel GPU with ipex-llm (***without the need of manual installations***).
|
||||
> You may use [Ollama Portable Zip](./ollama_portable_zip_quickstart.md) to directly run Ollama on Intel GPU with ipex-llm (***without the need of manual installations***).
|
||||
|
||||
> [!NOTE]
|
||||
> For installation on Intel Arc B-Series GPU (such as **B580**), please refer to this [guide](./bmg_quickstart.md).
|
||||
|
|
|
|||
|
|
@ -6,7 +6,8 @@
|
|||
**`IPEX-LLM`** is an LLM acceleration library for Intel [GPU](Quickstart/install_windows_gpu.md) *(e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max)*, [NPU](Quickstart/npu_quickstart.md) and CPU [^1].
|
||||
|
||||
## Latest Update 🔥
|
||||
- [2025/02] We added support of [Ollama Portable Zip](https://github.com/intel/ipex-llm/releases/tag/v2.2.0-nightly) to directly run Ollama on Intel GPU for both [Windows](Quickstart/ollama_portable_zip_quickstart.md#windows-quickstart) and [Linux](docs/mddocs/Quickstart/ollama_portable_zip_quickstart.md#linux-quickstart) (***without the need of manual installations***).
|
||||
- [2025/02] We added support of [llama.cpp Portable Zip](https://github.com/intel/ipex-llm/releases/tag/v2.2.0-nightly) for Intel **GPU** (both [Windows](Quickstart/llamacpp_portable_zip_gpu_quickstart.md#windows-quickstart) and [Linux](Quickstart/llamacpp_portable_zip_gpu_quickstart.md#linux-quickstart)) and **NPU** ([Windows](Quickstart/llama_cpp_npu_portable_zip_quickstart.md) only).
|
||||
- [2025/02] We added support of [Ollama Portable Zip](https://github.com/intel/ipex-llm/releases/tag/v2.2.0-nightly) to directly run Ollama on Intel **GPU** for both [Windows](Quickstart/ollama_portable_zip_quickstart.md#windows-quickstart) and [Linux](Quickstart/ollama_portable_zip_quickstart.md#linux-quickstart) (***without the need of manual installations***).
|
||||
- [2025/02] We added support for running [vLLM 0.6.6](DockerGuides/vllm_docker_quickstart.md) on Intel Arc GPUs.
|
||||
- [2025/01] We added the guide for running `ipex-llm` on Intel Arc [B580](Quickstart/bmg_quickstart.md) GPU
|
||||
- [2025/01] We added support for running [Ollama 0.5.4](Quickstart/ollama_quickstart.md) on Intel GPU.
|
||||
|
|
|
|||
|
|
@ -4,7 +4,8 @@
|
|||
</p>
|
||||
|
||||
## 最新更新 🔥
|
||||
- [2025/02] 新增 [Ollama Portable Zip](https://github.com/intel/ipex-llm/releases/tag/v2.2.0-nightly) 在 Intel GPU 上直接**免安装运行 Ollama** (包括 [Windows](Quickstart/ollama_portable_zip_quickstart.zh-CN.md#windows用户指南) 和 [Linux](Quickstart/ollama_portable_zip_quickstart.zh-CN.md#linux用户指南))。
|
||||
- [2025/02] 新增 [llama.cpp Portable Zip](https://github.com/intel/ipex-llm/releases/tag/v2.2.0-nightly) 在 Intel **GPU** (包括 [Windows](Quickstart/llamacpp_portable_zip_gpu_quickstart.md#windows-quickstart) 和 [Linux](Quickstart/llamacpp_portable_zip_gpu_quickstart.md#linux-quickstart)) 和 **NPU** (仅 [Windows](Quickstart/llama_cpp_npu_portable_zip_quickstart.zh-CN.md)) 上直接**免安装运行 llama.cpp**。
|
||||
- [2025/02] 新增 [Ollama Portable Zip](https://github.com/intel/ipex-llm/releases/tag/v2.2.0-nightly) 在 Intel **GPU** 上直接**免安装运行 Ollama** (包括 [Windows](Quickstart/ollama_portable_zip_quickstart.zh-CN.md#windows用户指南) 和 [Linux](Quickstart/ollama_portable_zip_quickstart.zh-CN.md#linux用户指南))。
|
||||
- [2025/02] 新增在 Intel Arc GPUs 上运行 [vLLM 0.6.6](DockerGuides/vllm_docker_quickstart.md) 的支持。
|
||||
- [2025/01] 新增在 Intel Arc [B580](Quickstart/bmg_quickstart.md) GPU 上运行 `ipex-llm` 的指南。
|
||||
- [2025/01] 新增在 Intel GPU 上运行 [Ollama 0.5.4](Quickstart/ollama_quickstart.zh-CN.md) 的支持。
|
||||
|
|
|
|||
Loading…
Reference in a new issue