Update README (#12956)
This commit is contained in:
parent
5ee09b4b28
commit
2a8f624f4b
6 changed files with 17 additions and 16 deletions
|
|
@ -6,7 +6,7 @@
|
||||||
**`IPEX-LLM`** is an LLM acceleration library for Intel [GPU](docs/mddocs/Quickstart/install_windows_gpu.md) *(e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max)*, [NPU](docs/mddocs/Quickstart/npu_quickstart.md) and CPU [^1].
|
**`IPEX-LLM`** is an LLM acceleration library for Intel [GPU](docs/mddocs/Quickstart/install_windows_gpu.md) *(e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max)*, [NPU](docs/mddocs/Quickstart/npu_quickstart.md) and CPU [^1].
|
||||||
> [!NOTE]
|
> [!NOTE]
|
||||||
> - *`IPEX-LLM` provides seamless integration with [llama.cpp](docs/mddocs/Quickstart/llamacpp_portable_zip_gpu_quickstart.md), [Ollama](docs/mddocs/Quickstart/ollama_portable_zip_quickstart.md), [HuggingFace transformers](python/llm/example/GPU/HuggingFace), [LangChain](python/llm/example/GPU/LangChain), [LlamaIndex](python/llm/example/GPU/LlamaIndex), [vLLM](docs/mddocs/Quickstart/vLLM_quickstart.md), [Text-Generation-WebUI](docs/mddocs/Quickstart/webui_quickstart.md), [DeepSpeed-AutoTP](python/llm/example/GPU/Deepspeed-AutoTP), [FastChat](docs/mddocs/Quickstart/fastchat_quickstart.md), [Axolotl](docs/mddocs/Quickstart/axolotl_quickstart.md), [HuggingFace PEFT](python/llm/example/GPU/LLM-Finetuning), [HuggingFace TRL](python/llm/example/GPU/LLM-Finetuning/DPO), [AutoGen](python/llm/example/CPU/Applications/autogen), [ModeScope](python/llm/example/GPU/ModelScope-Models), etc.*
|
> - *`IPEX-LLM` provides seamless integration with [llama.cpp](docs/mddocs/Quickstart/llamacpp_portable_zip_gpu_quickstart.md), [Ollama](docs/mddocs/Quickstart/ollama_portable_zip_quickstart.md), [HuggingFace transformers](python/llm/example/GPU/HuggingFace), [LangChain](python/llm/example/GPU/LangChain), [LlamaIndex](python/llm/example/GPU/LlamaIndex), [vLLM](docs/mddocs/Quickstart/vLLM_quickstart.md), [Text-Generation-WebUI](docs/mddocs/Quickstart/webui_quickstart.md), [DeepSpeed-AutoTP](python/llm/example/GPU/Deepspeed-AutoTP), [FastChat](docs/mddocs/Quickstart/fastchat_quickstart.md), [Axolotl](docs/mddocs/Quickstart/axolotl_quickstart.md), [HuggingFace PEFT](python/llm/example/GPU/LLM-Finetuning), [HuggingFace TRL](python/llm/example/GPU/LLM-Finetuning/DPO), [AutoGen](python/llm/example/CPU/Applications/autogen), [ModeScope](python/llm/example/GPU/ModelScope-Models), etc.*
|
||||||
> - ***70+ models** have been optimized/verified on `ipex-llm` (e.g., Llama, Phi, Mistral, Mixtral, Whisper, DeepSeek, Qwen, ChatGLM, MiniCPM, Qwen-VL, MiniCPM-V and more), with state-of-art **LLM optimizations**, **XPU acceleration** and **low-bit (FP8/FP6/FP4/INT4) support**; see the complete list [here](#verified-models).*
|
> - ***70+ models** have been optimized/verified on `ipex-llm` (e.g., Llama, Phi, Mistral, Mixtral, DeepSeek, Qwen, ChatGLM, MiniCPM, Qwen-VL, MiniCPM-V and more), with state-of-art **LLM optimizations**, **XPU acceleration** and **low-bit (FP8/FP6/FP4/INT4) support**; see the complete list [here](#verified-models).*
|
||||||
|
|
||||||
## Latest Update 🔥
|
## Latest Update 🔥
|
||||||
- [2025/03] We can now run **DeepSeek-R1-671B-Q4_K_M** with 1 or 2 Arc A770 on Xeon using the latest [llama.cpp Portable Zip](docs/mddocs/Quickstart/llamacpp_portable_zip_gpu_quickstart.md#flashmoe-for-deepseek-v3r1).
|
- [2025/03] We can now run **DeepSeek-R1-671B-Q4_K_M** with 1 or 2 Arc A770 on Xeon using the latest [llama.cpp Portable Zip](docs/mddocs/Quickstart/llamacpp_portable_zip_gpu_quickstart.md#flashmoe-for-deepseek-v3r1).
|
||||||
|
|
|
||||||
|
|
@ -6,11 +6,10 @@
|
||||||
**`ipex-llm`** 是一个将大语言模型高效地运行于 Intel [GPU](docs/mddocs/Quickstart/install_windows_gpu.md) *(如搭载集成显卡的个人电脑,Arc 独立显卡、Flex 及 Max 数据中心 GPU 等)*、[NPU](docs/mddocs/Quickstart/npu_quickstart.md) 和 CPU 上的大模型 XPU 加速库[^1]。
|
**`ipex-llm`** 是一个将大语言模型高效地运行于 Intel [GPU](docs/mddocs/Quickstart/install_windows_gpu.md) *(如搭载集成显卡的个人电脑,Arc 独立显卡、Flex 及 Max 数据中心 GPU 等)*、[NPU](docs/mddocs/Quickstart/npu_quickstart.md) 和 CPU 上的大模型 XPU 加速库[^1]。
|
||||||
> [!NOTE]
|
> [!NOTE]
|
||||||
> - *`ipex-llm`可以与 [llama.cpp](docs/mddocs/Quickstart/llamacpp_portable_zip_gpu_quickstart.zh-CN.md), [Ollama](docs/mddocs/Quickstart/ollama_portable_zip_quickstart.zh-CN.md), [HuggingFace transformers](python/llm/example/GPU/HuggingFace), [LangChain](python/llm/example/GPU/LangChain), [LlamaIndex](python/llm/example/GPU/LlamaIndex), [vLLM](docs/mddocs/Quickstart/vLLM_quickstart.md), [Text-Generation-WebUI](docs/mddocs/Quickstart/webui_quickstart.md), [DeepSpeed-AutoTP](python/llm/example/GPU/Deepspeed-AutoTP), [FastChat](docs/mddocs/Quickstart/fastchat_quickstart.md), [Axolotl](docs/mddocs/Quickstart/axolotl_quickstart.md), [HuggingFace PEFT](python/llm/example/GPU/LLM-Finetuning), [HuggingFace TRL](python/llm/example/GPU/LLM-Finetuning/DPO), [AutoGen](python/llm/example/CPU/Applications/autogen), [ModeScope](python/llm/example/GPU/ModelScope-Models) 等无缝衔接。*
|
> - *`ipex-llm`可以与 [llama.cpp](docs/mddocs/Quickstart/llamacpp_portable_zip_gpu_quickstart.zh-CN.md), [Ollama](docs/mddocs/Quickstart/ollama_portable_zip_quickstart.zh-CN.md), [HuggingFace transformers](python/llm/example/GPU/HuggingFace), [LangChain](python/llm/example/GPU/LangChain), [LlamaIndex](python/llm/example/GPU/LlamaIndex), [vLLM](docs/mddocs/Quickstart/vLLM_quickstart.md), [Text-Generation-WebUI](docs/mddocs/Quickstart/webui_quickstart.md), [DeepSpeed-AutoTP](python/llm/example/GPU/Deepspeed-AutoTP), [FastChat](docs/mddocs/Quickstart/fastchat_quickstart.md), [Axolotl](docs/mddocs/Quickstart/axolotl_quickstart.md), [HuggingFace PEFT](python/llm/example/GPU/LLM-Finetuning), [HuggingFace TRL](python/llm/example/GPU/LLM-Finetuning/DPO), [AutoGen](python/llm/example/CPU/Applications/autogen), [ModeScope](python/llm/example/GPU/ModelScope-Models) 等无缝衔接。*
|
||||||
> - ***70+** 模型已经在 `ipex-llm` 上得到优化和验证(如 Llama, Phi, Mistral, Mixtral, Whisper, DeepSeek, Qwen, ChatGLM, MiniCPM, Qwen-VL, MiniCPM-V 等), 以获得先进的 **大模型算法优化**, **XPU 加速** 以及 **低比特(FP8FP8/FP6/FP4/INT4)支持**;更多模型信息请参阅[这里](#模型验证)。*
|
> - ***70+** 模型已经在 `ipex-llm` 上得到优化和验证(如 Llama, Phi, Mistral, Mixtral, DeepSeek, Qwen, ChatGLM, MiniCPM, Qwen-VL, MiniCPM-V 等), 以获得先进的 **大模型算法优化**, **XPU 加速** 以及 **低比特(FP8FP8/FP6/FP4/INT4)支持**;更多模型信息请参阅[这里](#模型验证)。*
|
||||||
|
|
||||||
## 最新更新 🔥
|
## 最新更新 🔥
|
||||||
- [2025/03] 使用最新 [llama.cpp Portable Zip](docs/mddocs/Quickstart/llamacpp_portable_zip_gpu_quickstart.zh-CN.md#flashmoe-运行-deepseek-v3r1), 可以在 Xeon 上通过1到2张 Arc A770 GPU 运行 **DeepSeek-R1-671B-Q4_K_M**。
|
- [2025/03] 使用最新 [llama.cpp Portable Zip](docs/mddocs/Quickstart/llamacpp_portable_zip_gpu_quickstart.zh-CN.md#flashmoe-运行-deepseek-v3r1), 可以在 Xeon 上通过1到2张 Arc A770 GPU 运行 **DeepSeek-R1-671B-Q4_K_M**。
|
||||||
- We can now run DeepSeek-R1-671B-Q4_K_M with 1 or 2 Arc A770 on Xeon using the latest [llama.cpp Portable Zip](https://github.com/intel/ipex-llm/blob/main/docs/mddocs/Quickstart/llamacpp_portable_zip_gpu_quickstart.md#flashmoe-for-deepseek-v3r1)。
|
|
||||||
- [2025/02] 新增 [llama.cpp Portable Zip](https://github.com/intel/ipex-llm/releases/tag/v2.2.0-nightly) 在 Intel **GPU** (包括 [Windows](docs/mddocs/Quickstart/llamacpp_portable_zip_gpu_quickstart.zh-CN.md#windows-用户指南) 和 [Linux](docs/mddocs/Quickstart/llamacpp_portable_zip_gpu_quickstart.zh-CN.md#linux-用户指南)) 和 **NPU** (仅 [Windows](docs/mddocs/Quickstart/llama_cpp_npu_portable_zip_quickstart.zh-CN.md)) 上直接**免安装运行 llama.cpp**。
|
- [2025/02] 新增 [llama.cpp Portable Zip](https://github.com/intel/ipex-llm/releases/tag/v2.2.0-nightly) 在 Intel **GPU** (包括 [Windows](docs/mddocs/Quickstart/llamacpp_portable_zip_gpu_quickstart.zh-CN.md#windows-用户指南) 和 [Linux](docs/mddocs/Quickstart/llamacpp_portable_zip_gpu_quickstart.zh-CN.md#linux-用户指南)) 和 **NPU** (仅 [Windows](docs/mddocs/Quickstart/llama_cpp_npu_portable_zip_quickstart.zh-CN.md)) 上直接**免安装运行 llama.cpp**。
|
||||||
- [2025/02] 新增 [Ollama Portable Zip](https://github.com/intel/ipex-llm/releases/tag/v2.2.0-nightly) 在 Intel **GPU** 上直接**免安装运行 Ollama** (包括 [Windows](docs/mddocs/Quickstart/ollama_portable_zip_quickstart.zh-CN.md#windows用户指南) 和 [Linux](docs/mddocs/Quickstart/ollama_portable_zip_quickstart.zh-CN.md#linux用户指南))。
|
- [2025/02] 新增 [Ollama Portable Zip](https://github.com/intel/ipex-llm/releases/tag/v2.2.0-nightly) 在 Intel **GPU** 上直接**免安装运行 Ollama** (包括 [Windows](docs/mddocs/Quickstart/ollama_portable_zip_quickstart.zh-CN.md#windows用户指南) 和 [Linux](docs/mddocs/Quickstart/ollama_portable_zip_quickstart.zh-CN.md#linux用户指南))。
|
||||||
- [2025/02] 新增在 Intel Arc GPUs 上运行 [vLLM 0.6.6](docs/mddocs/DockerGuides/vllm_docker_quickstart.md) 的支持。
|
- [2025/02] 新增在 Intel Arc GPUs 上运行 [vLLM 0.6.6](docs/mddocs/DockerGuides/vllm_docker_quickstart.md) 的支持。
|
||||||
|
|
|
||||||
|
|
@ -190,7 +190,7 @@ XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
|
||||||
|
|
||||||
### FlashMoE for DeepSeek V3/R1
|
### FlashMoE for DeepSeek V3/R1
|
||||||
|
|
||||||
FlashMoE is a command-line tool built on llama.cpp, optimized for mixture-of-experts (MoE) models such as DeepSeek V3/R1. Now, it's available for Linux platforms.
|
FlashMoE is a command-line tool built on top of `llama.cpp`, optimized for mixture-of-experts (MoE) models such as DeepSeek V3/R1. It is currently available for Linux platforms.
|
||||||
|
|
||||||
Tested MoE GGUF Models (other MoE GGUF models are also supported):
|
Tested MoE GGUF Models (other MoE GGUF models are also supported):
|
||||||
- [DeepSeek-V3-Q4_K_M](https://huggingface.co/unsloth/DeepSeek-V3-GGUF/tree/main/DeepSeek-V3-Q4_K_M)
|
- [DeepSeek-V3-Q4_K_M](https://huggingface.co/unsloth/DeepSeek-V3-GGUF/tree/main/DeepSeek-V3-Q4_K_M)
|
||||||
|
|
|
||||||
|
|
@ -77,10 +77,8 @@ llama-cli.exe -m PATH\TO\DeepSeek-R1-Distill-Qwen-7B-Q4_K_M.gguf -p "A conversat
|
||||||
|
|
||||||
```
|
```
|
||||||
Found 1 SYCL devices:
|
Found 1 SYCL devices:
|
||||||
| | | | |Max | |Max |Global |
|
| | | | |Max | |Max |Global | |
|
||||||
|
|
| | | | |compute|Max work|sub |mem | |
|
||||||
| | | | |compute|Max work|sub |mem |
|
|
||||||
|
|
|
||||||
|ID| Device Type| Name|Version|units |group |group|size | Driver version|
|
|ID| Device Type| Name|Version|units |group |group|size | Driver version|
|
||||||
|--|-------------------|---------------------------------------|-------|-------|--------|-----|-------|---------------------|
|
|--|-------------------|---------------------------------------|-------|-------|--------|-----|-------|---------------------|
|
||||||
| 0| [level_zero:gpu:0]| Intel Arc Graphics| 12.71| 128| 1024| 32| 13578M| 1.3.27504|
|
| 0| [level_zero:gpu:0]| Intel Arc Graphics| 12.71| 128| 1024| 32| 13578M| 1.3.27504|
|
||||||
|
|
@ -160,10 +158,8 @@ llama_perf_context_print: total time = xxxxx.xx ms / 1385 tokens
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
Found 1 SYCL devices:
|
Found 1 SYCL devices:
|
||||||
| | | | |Max | |Max |Global |
|
| | | | |Max | |Max |Global | |
|
||||||
|
|
| | | | |compute|Max work|sub |mem | |
|
||||||
| | | | |compute|Max work|sub |mem |
|
|
||||||
|
|
|
||||||
|ID| Device Type| Name|Version|units |group |group|size | Driver version|
|
|ID| Device Type| Name|Version|units |group |group|size | Driver version|
|
||||||
|--|-------------------|---------------------------------------|-------|-------|--------|-----|-------|---------------------|
|
|--|-------------------|---------------------------------------|-------|-------|--------|-----|-------|---------------------|
|
||||||
| 0| [level_zero:gpu:0]| Intel Arc Graphics| 12.71| 128| 1024| 32| 13578M| 1.3.27504|
|
| 0| [level_zero:gpu:0]| Intel Arc Graphics| 12.71| 128| 1024| 32| 13578M| 1.3.27504|
|
||||||
|
|
@ -198,7 +194,7 @@ XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
|
||||||
|
|
||||||
### FlashMoE 运行 DeepSeek V3/R1
|
### FlashMoE 运行 DeepSeek V3/R1
|
||||||
|
|
||||||
FlashMoE 是一款基于 llama.cpp 构建的命令行工具,针对 DeepSeek V3/R1 等混合专家模型(MoE)模型进行了优化。现在,它可用于 Linux 平台。
|
FlashMoE 是一款基于 `llama.cpp` 构建的命令行工具,针对 DeepSeek V3/R1 等混合专家模型(MoE)模型进行了优化。它现可用于 Linux 平台。
|
||||||
|
|
||||||
经过测试的 MoE GGUF 模型(也支持其他 MoE GGUF 模型):
|
经过测试的 MoE GGUF 模型(也支持其他 MoE GGUF 模型):
|
||||||
- [DeepSeek-V3-Q4_K_M](https://huggingface.co/unsloth/DeepSeek-V3-GGUF/tree/main/DeepSeek-V3-Q4_K_M)
|
- [DeepSeek-V3-Q4_K_M](https://huggingface.co/unsloth/DeepSeek-V3-GGUF/tree/main/DeepSeek-V3-Q4_K_M)
|
||||||
|
|
|
||||||
|
|
@ -129,8 +129,10 @@
|
||||||
## FAQ
|
## FAQ
|
||||||
- [FAQ & Trouble Shooting](Overview/FAQ/faq.md)
|
- [FAQ & Trouble Shooting](Overview/FAQ/faq.md)
|
||||||
|
|
||||||
|
[^1]: Performance varies by use, configuration and other factors. `ipex-llm` may not optimize to the same degree for non-Intel products. Learn more at www.Intel.com/PerformanceIndex.
|
||||||
|
|
||||||
## Verified Models
|
## Verified Models
|
||||||
Over 70 models have been optimized/verified on `ipex-llm`, including *LLaMA/LLaMA2, Mistral, Mixtral, Gemma, LLaVA, Whisper, ChatGLM2/ChatGLM3, Baichuan/Baichuan2, Qwen/Qwen-1.5, InternLM* and more; see the list below.
|
Over 70 models have been optimized/verified on `ipex-llm`, including *LLaMA/LLaMA2, Mistral, Mixtral, Gemma, LLaVA, Whisper, DeepSeek, ChatGLM2/ChatGLM3, Baichuan/Baichuan2, Qwen/Qwen-1.5, InternLM* and more; see the list below.
|
||||||
|
|
||||||
| Model | CPU Example | GPU Example | NPU Example |
|
| Model | CPU Example | GPU Example | NPU Example |
|
||||||
|------------|----------------------------------------------|----------------------------------------------|----------------------------------------------|
|
|------------|----------------------------------------------|----------------------------------------------|----------------------------------------------|
|
||||||
|
|
|
||||||
|
|
@ -3,6 +3,8 @@
|
||||||
< <a href='./README.md'>English</a> | <b>中文 ></b>
|
< <a href='./README.md'>English</a> | <b>中文 ></b>
|
||||||
</p>
|
</p>
|
||||||
|
|
||||||
|
**`ipex-llm`** 是一个将大语言模型高效地运行于 Intel [GPU](docs/mddocs/Quickstart/install_windows_gpu.md) *(如搭载集成显卡的个人电脑,Arc 独立显卡、Flex 及 Max 数据中心 GPU 等)*、[NPU](docs/mddocs/Quickstart/npu_quickstart.md) 和 CPU 上的大模型 XPU 加速库[^1]。
|
||||||
|
|
||||||
## 最新更新 🔥
|
## 最新更新 🔥
|
||||||
- [2025/03] 使用最新 [llama.cpp Portable Zip](Quickstart/llamacpp_portable_zip_gpu_quickstart.zh-CN.md#flashmoe-运行-deepseek-v3r1), 可以在 Xeon 上通过1到2张 Arc A770 GPU 运行 **DeepSeek-R1-671B-Q4_K_M**。
|
- [2025/03] 使用最新 [llama.cpp Portable Zip](Quickstart/llamacpp_portable_zip_gpu_quickstart.zh-CN.md#flashmoe-运行-deepseek-v3r1), 可以在 Xeon 上通过1到2张 Arc A770 GPU 运行 **DeepSeek-R1-671B-Q4_K_M**。
|
||||||
- [2025/02] 新增 [llama.cpp Portable Zip](https://github.com/intel/ipex-llm/releases/tag/v2.2.0-nightly) 在 Intel **GPU** (包括 [Windows](Quickstart/llamacpp_portable_zip_gpu_quickstart.zh-CN.md#windows-用户指南) 和 [Linux](Quickstart/llamacpp_portable_zip_gpu_quickstart.zh-CN.md#linux-用户指南)) 和 **NPU** (仅 [Windows](Quickstart/llama_cpp_npu_portable_zip_quickstart.zh-CN.md)) 上直接**免安装运行 llama.cpp**。
|
- [2025/02] 新增 [llama.cpp Portable Zip](https://github.com/intel/ipex-llm/releases/tag/v2.2.0-nightly) 在 Intel **GPU** (包括 [Windows](Quickstart/llamacpp_portable_zip_gpu_quickstart.zh-CN.md#windows-用户指南) 和 [Linux](Quickstart/llamacpp_portable_zip_gpu_quickstart.zh-CN.md#linux-用户指南)) 和 **NPU** (仅 [Windows](Quickstart/llama_cpp_npu_portable_zip_quickstart.zh-CN.md)) 上直接**免安装运行 llama.cpp**。
|
||||||
|
|
@ -127,8 +129,10 @@
|
||||||
## FAQ
|
## FAQ
|
||||||
- [常见问题解答](Overview/FAQ/faq.md)
|
- [常见问题解答](Overview/FAQ/faq.md)
|
||||||
|
|
||||||
|
[^1]: Performance varies by use, configuration and other factors. `ipex-llm` may not optimize to the same degree for non-Intel products. Learn more at www.Intel.com/PerformanceIndex
|
||||||
|
|
||||||
## 模型验证
|
## 模型验证
|
||||||
50+ 模型已经在 `ipex-llm` 上得到优化和验证,包括 *LLaMA/LLaMA2, Mistral, Mixtral, Gemma, LLaVA, Whisper, ChatGLM2/ChatGLM3, Baichuan/Baichuan2, Qwen/Qwen-1.5, InternLM,* 更多模型请参看下表,
|
70+ 模型已经在 `ipex-llm` 上得到优化和验证,包括 *LLaMA/LLaMA2, Mistral, Mixtral, Gemma, LLaVA, Whisper, DeepSeek, ChatGLM2/ChatGLM3, Baichuan/Baichuan2, Qwen/Qwen-1.5, InternLM,* 更多模型请参看下表,
|
||||||
|
|
||||||
| 模型 | CPU 示例 | GPU 示例 | NPU 示例 |
|
| 模型 | CPU 示例 | GPU 示例 | NPU 示例 |
|
||||||
|----------- |------------------------------------------|-------------------------------------------|-------------------------------------------|
|
|----------- |------------------------------------------|-------------------------------------------|-------------------------------------------|
|
||||||
|
|
|
||||||
Loading…
Reference in a new issue