💫 Intel® LLM Library for PyTorch*

< English | 中文 >

`ipex-llm` 快速入门

使用

Ollama Portable Zip: 在 Intel GPU 上直接免安装运行 Ollama。
Arc B580: 在 Intel Arc B580 GPU 上运行 ipex-llm（包括 Ollama, llama.cpp, PyTorch, HuggingFace 等）
NPU: 在 Intel NPU 上运行 ipex-llm（支持 Python 和 C++）
llama.cpp: 在 Intel GPU 上运行 llama.cpp (使用 ipex-llm 的 C++ 接口)
Ollama: 在 Intel GPU 上运行 ollama (使用 ipex-llm 的 C++ 接口)
PyTorch/HuggingFace: 使用 Windows 和 Linux 在 Intel GPU 上运行 PyTorch、HuggingFace、LangChain、LlamaIndex 等 (使用 ipex-llm 的 Python 接口)
vLLM: 在 Intel GPU 和 CPU 上使用 ipex-llm 运行 vLLM
FastChat: 在 Intel GPU 和 CPU 上使用 ipex-llm 运行 FastChat 服务
Serving on multiple Intel GPUs: 利用 DeepSpeed AutoTP 和 FastAPI 在 多个 Intel GPU 上运行 ipex-llm 推理服务
Text-Generation-WebUI: 使用 ipex-llm 运行 oobabooga WebUI
Axolotl: 使用 Axolotl 和 ipex-llm 进行 LLM 微调
Benchmarking: 在 Intel GPU 和 CPU 上运行性能基准测试（延迟和吞吐量）

Docker

GPU Inference in C++: 在 Intel GPU 上使用 ipex-llm 运行 llama.cpp, ollama等
GPU Inference in Python : 在 Intel GPU 上使用 ipex-llm 运行 HuggingFace transformers, LangChain, LlamaIndex, ModelScope，等
vLLM on GPU: 在 Intel GPU 上使用 ipex-llm 运行 vLLM 推理服务
vLLM on CPU: 在 Intel CPU 上使用 ipex-llm 运行 vLLM 推理服务
FastChat on GPU: 在 Intel GPU 上使用 ipex-llm 运行 FastChat 推理服务
VSCode on GPU: 在 Intel GPU 上使用 VSCode 开发并运行基于 Python 的 ipex-llm 应用

应用

GraphRAG: 基于 ipex-llm 使用本地 LLM 运行 Microsoft 的 GraphRAG
RAGFlow: 基于 ipex-llm 运行 RAGFlow (一个开源的 RAG 引擎)
LangChain-Chatchat: 基于 ipex-llm 运行 LangChain-Chatchat (使用 RAG pipline 的知识问答库)
Coding copilot: 基于 ipex-llm 运行 Continue (VSCode 里的编码智能助手)
Open WebUI: 基于 ipex-llm 运行 Open WebUI
PrivateGPT: 基于 ipex-llm 运行 PrivateGPT 与文档进行交互
Dify platform: 在Dify(一款开源的大语言模型应用开发平台) 里接入 ipex-llm 加速本地 LLM

安装

Windows GPU: 在带有 Intel GPU 的 Windows 系统上安装 ipex-llm
Linux GPU: 在带有 Intel GPU 的Linux系统上安装 ipex-llm
更多内容, 请参考完整安装指南

代码示例

低比特推理
- INT4 inference: 在 Intel GPU 和 CPU 上进行 INT4 LLM 推理
- FP8/FP6/FP4 inference: 在 Intel GPU 上进行 FP8，FP6 和 FP4 LLM 推理
- INT8 inference: 在 Intel GPU 和 CPU 上进行 INT8 LLM 推理
- INT2 inference: 在 Intel GPU 上进行 INT2 LLM 推理 (基于 llama.cpp IQ2 机制)
FP16/BF16 推理
- 在 Intel GPU 上进行 FP16 LLM 推理（并使用 self-speculative decoding 优化）
- 在 Intel CPU 上进行 BF16 LLM 推理（并使用 self-speculative decoding 优化）
分布式推理
- 在 Intel GPU 上进行 流水线并行 推理
- 在 Intel GPU 上进行 DeepSpeed AutoTP 推理
保存和加载
- Low-bit models: 保存和加载 ipex-llm 低比特模型 (INT4/FP4/FP6/INT8/FP8/FP16/etc.)
- GGUF: 直接将 GGUF 模型加载到 ipex-llm 中
- AWQ: 直接将 AWQ 模型加载到 ipex-llm 中
- GPTQ: 直接将 GPTQ 模型加载到 ipex-llm 中
微调
- 在 Intel GPU 进行 LLM 微调，包括 LoRA，QLoRA，DPO，QA-LoRA 和 ReLoRA
- 在 Intel CPU 进行 QLoRA 微调
与社区库集成
教程

API 文档

FAQ

常见问题解答

模型验证

50+ 模型已经在 ipex-llm 上得到优化和验证，包括 LLaMA/LLaMA2, Mistral, Mixtral, Gemma, LLaVA, Whisper, ChatGLM2/ChatGLM3, Baichuan/Baichuan2, Qwen/Qwen-1.5, InternLM, 更多模型请参看下表，

模型	CPU 示例	GPU 示例	NPU 示例
LLaMA	link1, link2	link
LLaMA 2	link1, link2	link	Python link, C++ link
LLaMA 3	link	link	Python link, C++ link
LLaMA 3.1	link	link
LLaMA 3.2		link	Python link, C++ link
LLaMA 3.2-Vision		link
ChatGLM	link
ChatGLM2	link	link
ChatGLM3	link	link
GLM-4	link	link
GLM-4V	link	link
GLM-Edge		link	Python link
GLM-Edge-V		link
Mistral	link	link
Mixtral	link	link
Falcon	link	link
MPT	link	link
Dolly-v1	link	link
Dolly-v2	link	link
Replit Code	link	link
RedPajama	link1, link2
Phoenix	link1, link2
StarCoder	link1, link2	link
Baichuan	link	link
Baichuan2	link	link	Python link
InternLM	link	link
InternVL2		link
Qwen	link	link
Qwen1.5	link	link
Qwen2	link	link	Python link, C++ link
Qwen2.5		link	Python link, C++ link
Qwen-VL	link	link
Qwen2-VL		link
Qwen2-Audio		link
Aquila	link	link
Aquila2	link	link
MOSS	link
Whisper	link	link
Phi-1_5	link	link
Flan-t5	link	link
LLaVA	link	link
CodeLlama	link	link
Skywork	link
InternLM-XComposer	link
WizardCoder-Python	link
CodeShell	link
Fuyu	link
Distil-Whisper	link	link
Yi	link	link
BlueLM	link	link
Mamba	link	link
SOLAR	link	link
Phixtral	link	link
InternLM2	link	link
RWKV4		link
RWKV5		link
Bark	link	link
SpeechT5		link
DeepSeek-MoE	link
Ziya-Coding-34B-v1.0	link
Phi-2	link	link
Phi-3	link	link
Phi-3-vision	link	link
Yuan2	link	link
Gemma	link	link
Gemma2		link
DeciLM-7B	link	link
Deepseek	link	link
StableLM	link	link
CodeGemma	link	link
Command-R/cohere	link	link
CodeGeeX2	link	link
MiniCPM	link	link	Python link, C++ link
MiniCPM3		link
MiniCPM-V		link
MiniCPM-V-2	link	link
MiniCPM-Llama3-V-2_5		link	Python link
MiniCPM-V-2_6	link	link	Python link
StableDiffusion		link
Bce-Embedding-Base-V1			Python link
Speech_Paraformer-Large			Python link

28 KiB Raw Blame History Unescape Escape

💫 Intel® LLM Library for PyTorch*

最新更新 🔥

ipex-llm 快速入门

使用

Docker

应用

安装

代码示例

低比特推理

FP16/BF16 推理

分布式推理

保存和加载

微调

与社区库集成

API 文档

FAQ

模型验证

28 KiB

Raw Blame History

`ipex-llm` 快速入门