Add install_windows_gpu.zh-CN.md and install_linux_gpu.zh-CN.md (#12409)
* Add install_linux_gpu.zh-CN.md * Add install_windows_gpu.zh-CN.md * Update llama_cpp_quickstart.zh-CN.md Related links updated to zh-CN version. * Update install_linux_gpu.zh-CN.md Added link to English version. * Update install_windows_gpu.zh-CN.md Add the link to English version. * Update install_windows_gpu.md Add the link to CN version. * Update install_linux_gpu.md Add the link to CN version. * Update README.zh-CN.md Modified the related link to zh-CN version.
This commit is contained in:
		
							parent
							
								
									d6057f6dd2
								
							
						
					
					
						commit
						a9cb70a71c
					
				
					 6 changed files with 807 additions and 11 deletions
				
			
		| 
						 | 
				
			
			@ -27,7 +27,7 @@
 | 
			
		|||
<br/>
 | 
			
		||||
 
 | 
			
		||||
- [2024/05] 你可以使用 **Docker** [images](#docker) 很容易地运行 `ipex-llm` 推理、服务和微调。
 | 
			
		||||
- [2024/05] 你能够在 Windows 上仅使用 "*[one command](docs/mddocs/Quickstart/install_windows_gpu.md#install-ipex-llm)*" 来安装 `ipex-llm`。
 | 
			
		||||
- [2024/05] 你能够在 Windows 上仅使用 "*[one command](docs/mddocs/Quickstart/install_windows_gpu.zh-CN.md#安装-ipex-llm)*" 来安装 `ipex-llm`。
 | 
			
		||||
- [2024/04] 你现在可以在 Intel GPU 上使用 `ipex-llm` 运行 **Open WebUI** ,详情参考[快速入门指南](docs/mddocs/Quickstart/open_webui_with_ollama_quickstart.md)。
 | 
			
		||||
- [2024/04] 你现在可以在 Intel GPU 上使用 `ipex-llm` 以及 `llama.cpp` 和 `ollama` 运行 **Llama 3** ,详情参考[快速入门指南](docs/mddocs/Quickstart/llama3_llamacpp_ollama_quickstart.md)。
 | 
			
		||||
- [2024/04] `ipex-llm` 现在在Intel [GPU](python/llm/example/GPU/HuggingFace/LLM/llama3) 和 [CPU](python/llm/example/CPU/HF-Transformers-AutoModels/Model/llama3) 上都支持 **Llama 3** 了。
 | 
			
		||||
| 
						 | 
				
			
			@ -187,7 +187,7 @@ See the demo of running [*Text-Generation-WebUI*](https://ipex-llm.readthedocs.i
 | 
			
		|||
### 使用
 | 
			
		||||
- [llama.cpp](docs/mddocs/Quickstart/llama_cpp_quickstart.zh-CN.md): 在 Intel GPU 上运行 **llama.cpp** (*使用 `ipex-llm` 的 C++ 接口*) 
 | 
			
		||||
- [Ollama](docs/mddocs/Quickstart/ollama_quickstart.zh-CN.md): 在 Intel GPU 上运行 **ollama** (*使用 `ipex-llm` 的 C++ 接口*) 
 | 
			
		||||
- [PyTorch/HuggingFace](docs/mddocs/Quickstart/install_windows_gpu.md): 使用 [Windows](docs/mddocs/Quickstart/install_windows_gpu.md) 和 [Linux](docs/mddocs/Quickstart/install_linux_gpu.md) 在 Intel GPU 上运行 **PyTorch**、**HuggingFace**、**LangChain**、**LlamaIndex** 等 (*使用 `ipex-llm` 的 Python 接口*) 
 | 
			
		||||
- [PyTorch/HuggingFace](docs/mddocs/Quickstart/install_windows_gpu.zh-CN.md): 使用 [Windows](docs/mddocs/Quickstart/install_windows_gpu.zh-CN.md) 和 [Linux](docs/mddocs/Quickstart/install_linux_gpu.zh-CN.md) 在 Intel GPU 上运行 **PyTorch**、**HuggingFace**、**LangChain**、**LlamaIndex** 等 (*使用 `ipex-llm` 的 Python 接口*) 
 | 
			
		||||
- [vLLM](docs/mddocs/Quickstart/vLLM_quickstart.md): 在 Intel [GPU](docs/mddocs/DockerGuides/vllm_docker_quickstart.md) 和 [CPU](docs/mddocs/DockerGuides/vllm_cpu_docker_quickstart.md) 上使用 `ipex-llm` 运行 **vLLM** 
 | 
			
		||||
- [FastChat](docs/mddocs/Quickstart/fastchat_quickstart.md): 在 Intel GPU 和 CPU 上使用 `ipex-llm` 运行 **FastChat** 服务
 | 
			
		||||
- [Serving on multiple Intel GPUs](docs/mddocs/Quickstart/deepspeed_autotp_fastapi_quickstart.md): 利用 DeepSpeed AutoTP 和 FastAPI 在 **多个 Intel GPU** 上运行 `ipex-llm` 推理服务
 | 
			
		||||
| 
						 | 
				
			
			@ -205,8 +205,8 @@ See the demo of running [*Text-Generation-WebUI*](https://ipex-llm.readthedocs.i
 | 
			
		|||
- [Dify platform](docs/mddocs/Quickstart/dify_quickstart.md): 在`Dify`(*一款开源的大语言模型应用开发平台*) 里接入 `ipex-llm` 加速本地 LLM
 | 
			
		||||
 | 
			
		||||
### 安装
 | 
			
		||||
- [Windows GPU](docs/mddocs/Quickstart/install_windows_gpu.md): 在带有 Intel GPU 的 Windows 系统上安装 `ipex-llm` 
 | 
			
		||||
- [Linux GPU](docs/mddocs/Quickstart/install_linux_gpu.md): 在带有 Intel GPU 的Linux系统上安装 `ipex-llm` 
 | 
			
		||||
- [Windows GPU](docs/mddocs/Quickstart/install_windows_gpu.zh-CN.md): 在带有 Intel GPU 的 Windows 系统上安装 `ipex-llm` 
 | 
			
		||||
- [Linux GPU](docs/mddocs/Quickstart/install_linux_gpu.zh-CN.md): 在带有 Intel GPU 的Linux系统上安装 `ipex-llm` 
 | 
			
		||||
- *更多内容, 请参考[完整安装指南](docs/mddocs/Overview/install.md)*
 | 
			
		||||
 | 
			
		||||
### 代码示例
 | 
			
		||||
| 
						 | 
				
			
			
 | 
			
		|||
| 
						 | 
				
			
			@ -1,4 +1,7 @@
 | 
			
		|||
# Install IPEX-LLM on Linux with Intel GPU
 | 
			
		||||
<p>
 | 
			
		||||
  <b>< English</b> | <a href='./install_linux_gpu.zh-CN.md'>中文</a> >
 | 
			
		||||
</p>
 | 
			
		||||
    
 | 
			
		||||
This guide demonstrates how to install IPEX-LLM on Linux with Intel GPUs. It applies to Intel Data Center GPU Flex Series and Max Series, as well as Intel Arc Series GPU and Intel iGPU.
 | 
			
		||||
 | 
			
		||||
| 
						 | 
				
			
			
 | 
			
		|||
							
								
								
									
										463
									
								
								docs/mddocs/Quickstart/install_linux_gpu.zh-CN.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										463
									
								
								docs/mddocs/Quickstart/install_linux_gpu.zh-CN.md
									
									
									
									
									
										Normal file
									
								
							| 
						 | 
				
			
			@ -0,0 +1,463 @@
 | 
			
		|||
# 在带有 Intel GPU 的Linux系统上安装 IPEX-LLM
 | 
			
		||||
<p>
 | 
			
		||||
  < <a href='./install_linux_gpu.md'>English</a> | <b>中文</b> >
 | 
			
		||||
</p>
 | 
			
		||||
 | 
			
		||||
本指南将引导你如何在带有 Intel GPU 的 Linux 系统上安装 IPEX-LLM。适用于 Intel 数据中心的 GPU Flex 和 Max 系列,以及 Intel Arc 系列 GPU 和 Intel iGPU。
 | 
			
		||||
 | 
			
		||||
我们建议使用带有 Linux 内核 6.2 或 6.5 的 Ubuntu 22.04 操作系统上使用 IPEX-LLM。本页演示了如何在 PyTorch 2.1 中使用 IPEX-LLM。你可以查看[完整安装页面](../Overview/install_gpu.md#linux)了解更多详细信息。
 | 
			
		||||
 | 
			
		||||
## 目录
 | 
			
		||||
- [系统环境安装](./install_linux_gpu.zh-CN.md#系统环境安装)
 | 
			
		||||
  - [安装 GPU 驱动程序](./install_linux_gpu.zh-CN.md#安装-GPU-驱动程序)
 | 
			
		||||
    - [适用于处理器编号为 1xxH/U/HL/UL 的第一代 Intel Core™ Ultra Processers(代号 Meteor Lake)](./install_linux_gpu.zh-CN.md#适用于处理器编号为-1xxhuhlul-的第一代-intel-core-ultra-processers代号-meteor-lake)
 | 
			
		||||
    - [适用于其他 Intel iGPU 和 dGPU](./install_linux_gpu.zh-CN.md#适用于其他-intel-igpu-和-dgpu)
 | 
			
		||||
  - [安装 oneAPI](./install_linux_gpu.zh-CN.md#安装-oneapi)
 | 
			
		||||
  - [设置 Python 环境](./install_linux_gpu.zh-CN.md#设置-python-环境)
 | 
			
		||||
- [安装 ipex-llm](./install_linux_gpu.zh-CN.md#安装-ipex-llm)
 | 
			
		||||
- [验证安装](./install_linux_gpu.zh-CN.md#验证安装)
 | 
			
		||||
- [运行时配置](./install_linux_gpu.zh-CN.md#运行时配置)
 | 
			
		||||
- [快速示例](./install_linux_gpu.zh-CN.md#快速示例)
 | 
			
		||||
- [故障排除和提示](./install_linux_gpu.zh-CN.md#故障排除和提示)
 | 
			
		||||
 | 
			
		||||
## 系统环境安装
 | 
			
		||||
 | 
			
		||||
### 安装 GPU 驱动程序
 | 
			
		||||
#### 适用于处理器编号为 1xxH/U/HL/UL 的第一代 Intel Core™ Ultra Processers(代号 Meteor Lake)
 | 
			
		||||
 | 
			
		||||
> [!NOTE]
 | 
			
		||||
> 我们目前已在具有内核 `6.5.0-35-generic` 的 Ubuntu 22.04 系统中验证过 IPEX-LLM 在 Meteor Lake iGPU 上的运行和使用。
 | 
			
		||||
 | 
			
		||||
##### 1. 查看当前内核版本
 | 
			
		||||
 | 
			
		||||
你可以通过以下方式查看当前的内核版本:
 | 
			
		||||
 | 
			
		||||
```bash
 | 
			
		||||
uname -r
 | 
			
		||||
```
 | 
			
		||||
如果显示的版本不是 `6.5.0-35-generic`,可以通过以下方式将内核降级或升级至推荐版本。
 | 
			
		||||
 | 
			
		||||
##### 2. (可选) 降级 / 升级到内核 6.5.0-35
 | 
			
		||||
 | 
			
		||||
如果当前的内核版本不是 `6.5.0-35-generic`,你可以通过以下方式降级或升级它:
 | 
			
		||||
 | 
			
		||||
```bash
 | 
			
		||||
export VERSION="6.5.0-35"
 | 
			
		||||
sudo apt-get install -y linux-headers-$VERSION-generic linux-image-$VERSION-generic linux-modules-extra-$VERSION-generic
 | 
			
		||||
 | 
			
		||||
sudo sed -i "s/GRUB_DEFAULT=.*/GRUB_DEFAULT=\"1> $(echo $(($(awk -F\' '/menuentry / {print $2}' /boot/grub/grub.cfg \
 | 
			
		||||
| grep -no $VERSION | sed 's/:/\n/g' | head -n 1)-2)))\"/" /etc/default/grub
 | 
			
		||||
 | 
			
		||||
sudo update-grub
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
然后重新启动机器:
 | 
			
		||||
 | 
			
		||||
```bash
 | 
			
		||||
sudo reboot
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
重启之后,再次使用 `uname -r` 查看,内核版本已经修改为 `6.5.0-35-generic`。
 | 
			
		||||
 | 
			
		||||
##### 3. 通过 `force_probe` flag 启用 GPU 驱动程序支持
 | 
			
		||||
 | 
			
		||||
接下来,你需要通过设置 `force_probe` 参数在内核 `6.5.0-35-generic` 上启用 GPU 驱动程序支持:
 | 
			
		||||
 | 
			
		||||
```bash
 | 
			
		||||
export FORCE_PROBE_VALUE=$(sudo dmesg  | grep i915 | grep -o 'i915\.force_probe=[a-zA-Z0-9]\{4\}')
 | 
			
		||||
 | 
			
		||||
sudo sed -i "/^GRUB_CMDLINE_LINUX_DEFAULT=/ s/\"\(.*\)\"/\"\1 $FORCE_PROBE_VALUE\"/" /etc/default/grub
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
> [!TIP]
 | 
			
		||||
> 除了使用上述命令之外,你还可以通过以下方式手动查看 `force_probe` flag 的值:
 | 
			
		||||
>
 | 
			
		||||
> ```bash
 | 
			
		||||
> sudo dmesg  | grep i915
 | 
			
		||||
> ```
 | 
			
		||||
>
 | 
			
		||||
> 你可能会获得类似 `Your graphics device 7d55 is not properly supported by i915 in this kernel version. To force driver probe anyway, use i915.force_probe=7d55` 的输出,其中 `7d55` 是 PCI ID,它取决于你的 GPU 型号。
 | 
			
		||||
>
 | 
			
		||||
> 然后,直接修改 `/etc/default/grub` 文件。确保在 `GRUB_CMDLINE_LINUX_DEFAULT` 的值中添加 `i915.force_probe=xxxx`。例如,修改之前,`/etc/default/grub` 文件中有 `GRUB_CMDLINE_LINUX_DEFAULT="quiet splash"`。你需要将其修改为 `GRUB_CMDLINE_LINUX_DEFAULT="quiet splash i915.force_probe=7d55"`。
 | 
			
		||||
 | 
			
		||||
然后通过以下方式更新 grub:
 | 
			
		||||
 | 
			
		||||
```bash
 | 
			
		||||
sudo update-grub
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
需要重启机器使配置生效:
 | 
			
		||||
 | 
			
		||||
```bash
 | 
			
		||||
sudo reboot
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
##### 4. 安装 computer packages
 | 
			
		||||
 | 
			
		||||
通过以下命令在 Ubuntu 22.04 上为 Intel GPU 安装需要的 computer packages:
 | 
			
		||||
 | 
			
		||||
```bash
 | 
			
		||||
wget -qO - https://repositories.intel.com/gpu/intel-graphics.key | \
 | 
			
		||||
  sudo gpg --yes --dearmor --output /usr/share/keyrings/intel-graphics.gpg
 | 
			
		||||
 | 
			
		||||
echo "deb [arch=amd64,i386 signed-by=/usr/share/keyrings/intel-graphics.gpg] https://repositories.intel.com/gpu/ubuntu jammy client" | \
 | 
			
		||||
  sudo tee /etc/apt/sources.list.d/intel-gpu-jammy.list
 | 
			
		||||
 | 
			
		||||
sudo apt update
 | 
			
		||||
 | 
			
		||||
sudo apt-get install -y libze1 intel-level-zero-gpu intel-opencl-icd clinfo
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
##### 5. 配置权限并验证 GPU 驱动程序设置
 | 
			
		||||
 | 
			
		||||
要完成 GPU 驱动程序设置,需要确保你的用户在 render 群组中:
 | 
			
		||||
 | 
			
		||||
```bash
 | 
			
		||||
sudo gpasswd -a ${USER} render
 | 
			
		||||
newgrp render
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
然后,你可以使用以下命令验证 GPU 驱动程序是否正常运行:
 | 
			
		||||
 | 
			
		||||
```bash
 | 
			
		||||
clinfo | grep "Device Name"
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
基于你的 GPU 型号,上述命令的输出应包含 `Intel(R) Arc(TM) Graphics` 或 `Intel(R) Graphics`。
 | 
			
		||||
 | 
			
		||||
> [!TIP]
 | 
			
		||||
> 请参阅[客户端 GPU 驱动程序的 Intel 官方安装指南](https://dgpu-docs.intel.com/driver/client/overview.html#installing-client-gpus-on-ubuntu-desktop-22-04-lts)以获取更多详情。
 | 
			
		||||
 | 
			
		||||
#### 适用于其他 Intel iGPU 和 dGPU
 | 
			
		||||
 | 
			
		||||
##### Linux 内核 6.2
 | 
			
		||||
 | 
			
		||||
* 根据你的 CPU 类型选择以下其中一个选项进行设置:
 | 
			
		||||
 | 
			
		||||
  1. **选项 1**:对于配备多个 A770 Arc GPU 的 `Intel Core CPU`,使用以下 repository:
 | 
			
		||||
      ```bash
 | 
			
		||||
      sudo apt-get install -y gpg-agent wget
 | 
			
		||||
      wget -qO - https://repositories.intel.com/gpu/intel-graphics.key | \
 | 
			
		||||
      sudo gpg --dearmor --output /usr/share/keyrings/intel-graphics.gpg
 | 
			
		||||
      echo "deb [arch=amd64,i386 signed-by=/usr/share/keyrings/intel-graphics.gpg] https://repositories.intel.com/gpu/ubuntu jammy client" | \
 | 
			
		||||
      sudo tee /etc/apt/sources.list.d/intel-gpu-jammy.list
 | 
			
		||||
      ```
 | 
			
		||||
 | 
			
		||||
  3. **选项 2**: 对于配备多个 A770 Arc GPU 的 `Intel Xeon-W/SP CPU`,使用以下 repository 可获得更好的性能:
 | 
			
		||||
      ```bash
 | 
			
		||||
      wget -qO - https://repositories.intel.com/gpu/intel-graphics.key | \
 | 
			
		||||
      sudo gpg --yes --dearmor --output /usr/share/keyrings/intel-graphics.gpg
 | 
			
		||||
      echo "deb [arch=amd64 signed-by=/usr/share/keyrings/intel-graphics.gpg] https://repositories.intel.com/gpu/ubuntu jammy/lts/2350 unified" | \
 | 
			
		||||
      sudo tee /etc/apt/sources.list.d/intel-gpu-jammy.list
 | 
			
		||||
      sudo apt update
 | 
			
		||||
      ```
 | 
			
		||||
 | 
			
		||||
    <img src="https://llm-assets.readthedocs.io/en/latest/_images/wget.png" width=100%; />
 | 
			
		||||
 | 
			
		||||
* 安装驱动程序
 | 
			
		||||
 | 
			
		||||
    ```bash
 | 
			
		||||
    sudo apt-get update
 | 
			
		||||
 | 
			
		||||
    # Install out-of-tree driver
 | 
			
		||||
    sudo apt-get -y install \
 | 
			
		||||
        gawk \
 | 
			
		||||
        dkms \
 | 
			
		||||
        linux-headers-$(uname -r) \
 | 
			
		||||
        libc6-dev
 | 
			
		||||
    sudo apt install intel-i915-dkms intel-fw-gpu
 | 
			
		||||
 | 
			
		||||
    # Install Compute Runtime
 | 
			
		||||
    sudo apt-get install -y udev \
 | 
			
		||||
        intel-opencl-icd intel-level-zero-gpu level-zero \
 | 
			
		||||
        intel-media-va-driver-non-free libmfx1 libmfxgen1 libvpl2 \
 | 
			
		||||
        libegl-mesa0 libegl1-mesa libegl1-mesa-dev libgbm1 libgl1-mesa-dev libgl1-mesa-dri \
 | 
			
		||||
        libglapi-mesa libgles2-mesa-dev libglx-mesa0 libigdgmm12 libxatracker2 mesa-va-drivers \
 | 
			
		||||
        mesa-vdpau-drivers mesa-vulkan-drivers va-driver-all vainfo
 | 
			
		||||
    
 | 
			
		||||
    sudo reboot
 | 
			
		||||
    ```
 | 
			
		||||
 | 
			
		||||
    <img src="https://llm-assets.readthedocs.io/en/latest/_images/i915.png" width=100%; />
 | 
			
		||||
 | 
			
		||||
    <img src="https://llm-assets.readthedocs.io/en/latest/_images/gawk.png" width=100%; />
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
* 配置权限
 | 
			
		||||
    ```bash
 | 
			
		||||
    sudo gpasswd -a ${USER} render
 | 
			
		||||
    newgrp render
 | 
			
		||||
 | 
			
		||||
    # Verify the device is working with i915 driver
 | 
			
		||||
    sudo apt-get install -y hwinfo
 | 
			
		||||
    hwinfo --display
 | 
			
		||||
    ```
 | 
			
		||||
 | 
			
		||||
##### Linux 内核 6.5
 | 
			
		||||
 | 
			
		||||
* 根据你的 CPU 类型选择以下其中一个选项安装:
 | 
			
		||||
 | 
			
		||||
  1. **选项 1**: 对于配备多个 A770 Arc GPU 的 `Intel Core CPU`,使用以下 repository:
 | 
			
		||||
      ```bash
 | 
			
		||||
      sudo apt-get install -y gpg-agent wget
 | 
			
		||||
      wget -qO - https://repositories.intel.com/gpu/intel-graphics.key | \
 | 
			
		||||
      sudo gpg --dearmor --output /usr/share/keyrings/intel-graphics.gpg
 | 
			
		||||
      echo "deb [arch=amd64,i386 signed-by=/usr/share/keyrings/intel-graphics.gpg] https://repositories.intel.com/gpu/ubuntu jammy client" | \
 | 
			
		||||
      sudo tee /etc/apt/sources.list.d/intel-gpu-jammy.list
 | 
			
		||||
      ```
 | 
			
		||||
 | 
			
		||||
  2. **选项 2**: 对于配备多个 A770 Arc GPU 的 `Intel Xeon-W/SP CPU`,使用以下 repository 可获得更好的性能:
 | 
			
		||||
      ```bash
 | 
			
		||||
      wget -qO - https://repositories.intel.com/gpu/intel-graphics.key | \
 | 
			
		||||
      sudo gpg --yes --dearmor --output /usr/share/keyrings/intel-graphics.gpg
 | 
			
		||||
      echo "deb [arch=amd64 signed-by=/usr/share/keyrings/intel-graphics.gpg] https://repositories.intel.com/gpu/ubuntu jammy/lts/2350 unified" | \
 | 
			
		||||
      sudo tee /etc/apt/sources.list.d/intel-gpu-jammy.list
 | 
			
		||||
      sudo apt update
 | 
			
		||||
      ```
 | 
			
		||||
 | 
			
		||||
    <img src="https://llm-assets.readthedocs.io/en/latest/_images/wget.png" width=100%; />
 | 
			
		||||
 | 
			
		||||
* 安装驱动程序
 | 
			
		||||
 | 
			
		||||
    ```bash
 | 
			
		||||
    sudo apt-get update
 | 
			
		||||
 | 
			
		||||
    # Install out-of-tree driver
 | 
			
		||||
    sudo apt-get -y install \
 | 
			
		||||
        gawk \
 | 
			
		||||
        dkms \
 | 
			
		||||
        linux-headers-$(uname -r) \
 | 
			
		||||
        libc6-dev
 | 
			
		||||
    sudo apt install -y intel-i915-dkms intel-fw-gpu
 | 
			
		||||
 | 
			
		||||
    # Install Compute Runtime
 | 
			
		||||
    sudo apt-get install -y udev \
 | 
			
		||||
        intel-opencl-icd intel-level-zero-gpu level-zero \
 | 
			
		||||
        intel-media-va-driver-non-free libmfx1 libmfxgen1 libvpl2 \
 | 
			
		||||
        libegl-mesa0 libegl1-mesa libegl1-mesa-dev libgbm1 libgl1-mesa-dev libgl1-mesa-dri \
 | 
			
		||||
        libglapi-mesa libgles2-mesa-dev libglx-mesa0 libigdgmm12 libxatracker2 mesa-va-drivers \
 | 
			
		||||
        mesa-vdpau-drivers mesa-vulkan-drivers va-driver-all vainfo
 | 
			
		||||
  
 | 
			
		||||
    sudo reboot
 | 
			
		||||
    ```
 | 
			
		||||
 | 
			
		||||
    <img src="https://llm-assets.readthedocs.io/en/latest/_images/gawk.png" width=100%; />
 | 
			
		||||
 | 
			
		||||
* 配置权限
 | 
			
		||||
    ```bash
 | 
			
		||||
    sudo gpasswd -a ${USER} render
 | 
			
		||||
    newgrp render
 | 
			
		||||
 | 
			
		||||
    # Verify the device is working with i915 driver
 | 
			
		||||
    sudo apt-get install -y hwinfo
 | 
			
		||||
    hwinfo --display
 | 
			
		||||
    ```
 | 
			
		||||
 | 
			
		||||
### 安装 oneAPI 
 | 
			
		||||
IPEX-LLM 需要在 Linux 上安装适用于 Intel GPU 的 oneAPI 2024.0。
 | 
			
		||||
 | 
			
		||||
  ```bash
 | 
			
		||||
  wget -O- https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB | gpg --dearmor | sudo tee /usr/share/keyrings/oneapi-archive-keyring.gpg > /dev/null
 | 
			
		||||
 | 
			
		||||
  echo "deb [signed-by=/usr/share/keyrings/oneapi-archive-keyring.gpg] https://apt.repos.intel.com/oneapi all main" | sudo tee /etc/apt/sources.list.d/oneAPI.list
 | 
			
		||||
  
 | 
			
		||||
  sudo apt update
 | 
			
		||||
 | 
			
		||||
  sudo apt install intel-oneapi-common-vars=2024.0.0-49406 \
 | 
			
		||||
    intel-oneapi-common-oneapi-vars=2024.0.0-49406 \
 | 
			
		||||
    intel-oneapi-diagnostics-utility=2024.0.0-49093 \
 | 
			
		||||
    intel-oneapi-compiler-dpcpp-cpp=2024.0.2-49895 \
 | 
			
		||||
    intel-oneapi-dpcpp-ct=2024.0.0-49381 \
 | 
			
		||||
    intel-oneapi-mkl=2024.0.0-49656 \
 | 
			
		||||
    intel-oneapi-mkl-devel=2024.0.0-49656 \
 | 
			
		||||
    intel-oneapi-mpi=2021.11.0-49493 \
 | 
			
		||||
    intel-oneapi-mpi-devel=2021.11.0-49493 \
 | 
			
		||||
    intel-oneapi-dal=2024.0.1-25 \
 | 
			
		||||
    intel-oneapi-dal-devel=2024.0.1-25 \
 | 
			
		||||
    intel-oneapi-ippcp=2021.9.1-5 \
 | 
			
		||||
    intel-oneapi-ippcp-devel=2021.9.1-5 \
 | 
			
		||||
    intel-oneapi-ipp=2021.10.1-13 \
 | 
			
		||||
    intel-oneapi-ipp-devel=2021.10.1-13 \
 | 
			
		||||
    intel-oneapi-tlt=2024.0.0-352 \
 | 
			
		||||
    intel-oneapi-ccl=2021.11.2-5 \
 | 
			
		||||
    intel-oneapi-ccl-devel=2021.11.2-5 \
 | 
			
		||||
    intel-oneapi-dnnl-devel=2024.0.0-49521 \
 | 
			
		||||
    intel-oneapi-dnnl=2024.0.0-49521 \
 | 
			
		||||
    intel-oneapi-tcm-1.0=1.0.0-435
 | 
			
		||||
  ```
 | 
			
		||||
  <img src="https://llm-assets.readthedocs.io/en/latest/_images/oneapi.png" alt="image-20240221102252565" width=100%; />
 | 
			
		||||
 | 
			
		||||
  <img src="https://llm-assets.readthedocs.io/en/latest/_images/basekit.png" alt="image-20240221102252565" width=100%; />
 | 
			
		||||
 | 
			
		||||
>[!IMPORTANT]
 | 
			
		||||
> 请务必在 GPU 驱动程序和 oneAPI 安装完成后重新启动机器:
 | 
			
		||||
>
 | 
			
		||||
> ```bash
 | 
			
		||||
> sudo reboot
 | 
			
		||||
> ```
 | 
			
		||||
 | 
			
		||||
### 设置 Python 环境
 | 
			
		||||
 
 | 
			
		||||
如果你的机器上没有安装 conda,请按如下方式下载并安装 Miniforge:
 | 
			
		||||
  ```bash
 | 
			
		||||
  wget https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-Linux-x86_64.sh
 | 
			
		||||
  bash Miniforge3-Linux-x86_64.sh
 | 
			
		||||
  source ~/.bashrc
 | 
			
		||||
  ```
 | 
			
		||||
 | 
			
		||||
你可以使用 `conda --version` 来确认 conda 已安装成功。
 | 
			
		||||
 | 
			
		||||
conda 安装完成后,创建一个新的 Python 环境 `llm`:
 | 
			
		||||
```bash
 | 
			
		||||
conda create -n llm python=3.11
 | 
			
		||||
```
 | 
			
		||||
激活新创建的 `llm` 环境:
 | 
			
		||||
```bash
 | 
			
		||||
conda activate llm
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
## 安装 `ipex-llm`
 | 
			
		||||
 | 
			
		||||
在已激活的 `llm` 环境,使用 `pip` 安装适用于 GPU 的 `ipex-llm`。可根据区域选择不同的 `extra-index-url`,提供 US 和 CN 两个选项:
 | 
			
		||||
 | 
			
		||||
- **US**:
 | 
			
		||||
 | 
			
		||||
  ```bash
 | 
			
		||||
  pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 | 
			
		||||
  ```
 | 
			
		||||
 | 
			
		||||
- **CN**:
 | 
			
		||||
 | 
			
		||||
  ```bash
 | 
			
		||||
  pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/cn/
 | 
			
		||||
  ```
 | 
			
		||||
 | 
			
		||||
> [!NOTE]
 | 
			
		||||
> 如果在安装 IPEX 时遇到网络问题,请参阅[本指南](../Overview/install_gpu.md#install-ipex-llm-from-wheel-1)获取故障排除建议。
 | 
			
		||||
 | 
			
		||||
## 验证安装
 | 
			
		||||
- 你可以通过从库中导入一些类来验证 `ipex-llm` 是否安装成功。例如,在终端中执行以下导入命令:
 | 
			
		||||
 | 
			
		||||
  ```bash
 | 
			
		||||
  source /opt/intel/oneapi/setvars.sh
 | 
			
		||||
 | 
			
		||||
  python
 | 
			
		||||
 | 
			
		||||
  > from ipex_llm.transformers import AutoModel, AutoModelForCausalLM
 | 
			
		||||
  ```
 | 
			
		||||
 | 
			
		||||
## 运行时配置
 | 
			
		||||
 | 
			
		||||
要在 Linux 上使用 GPU 加速,需要和推荐设置多个环境变量。根据你的 GPU 设备选择相应的配置:
 | 
			
		||||
 | 
			
		||||
- **Intel Arc™ A 系列和 Intel 数据中心 Flex 系列 GPU**:
 | 
			
		||||
 | 
			
		||||
  对于 Intel Arc™ A 系列和 Intel 数据中心 Flex 系列 GPU,推荐使用:
 | 
			
		||||
 | 
			
		||||
  ```bash
 | 
			
		||||
  # Configure oneAPI environment variables. 
 | 
			
		||||
  source /opt/intel/oneapi/setvars.sh
 | 
			
		||||
 | 
			
		||||
  # Recommended Environment Variables for optimal performance
 | 
			
		||||
  export USE_XETLA=OFF
 | 
			
		||||
  export SYCL_CACHE_PERSISTENT=1
 | 
			
		||||
  # [optional] under most circumstances, the following environment variable may improve performance, but sometimes this may also cause performance degradation
 | 
			
		||||
  export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 | 
			
		||||
  ```
 | 
			
		||||
 | 
			
		||||
- **Intel 数据中心 Max 系列 GPU**:
 | 
			
		||||
 | 
			
		||||
  我们建议使用如下环境变量:
 | 
			
		||||
 | 
			
		||||
  ```bash
 | 
			
		||||
  # Configure oneAPI environment variables. 
 | 
			
		||||
  source /opt/intel/oneapi/setvars.sh
 | 
			
		||||
 | 
			
		||||
  # Recommended Environment Variables for optimal performance
 | 
			
		||||
  export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 | 
			
		||||
  export SYCL_CACHE_PERSISTENT=1
 | 
			
		||||
  export ENABLE_SDP_FUSION=1
 | 
			
		||||
  # [optional] under most circumstances, the following environment variable may improve performance, but sometimes this may also cause performance degradation
 | 
			
		||||
  export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 | 
			
		||||
  ```
 | 
			
		||||
 | 
			
		||||
  请注意 `libtcmalloc.so` 可以通过 `conda install -c conda-forge -y gperftools=2.10` 安装。
 | 
			
		||||
 | 
			
		||||
- **Intel iGPU**:
 | 
			
		||||
 | 
			
		||||
  ```bash
 | 
			
		||||
  # Configure oneAPI environment variables. 
 | 
			
		||||
  source /opt/intel/oneapi/setvars.sh
 | 
			
		||||
 | 
			
		||||
  export SYCL_CACHE_PERSISTENT=1
 | 
			
		||||
  export BIGDL_LLM_XMX_DISABLED=1
 | 
			
		||||
  ```
 | 
			
		||||
 | 
			
		||||
> [!NOTE]
 | 
			
		||||
> 有关运行时配置的更多详细信息,请参阅[本指南](../Overview/install_gpu.md#runtime-configuration-1)。
 | 
			
		||||
 | 
			
		||||
> [!NOTE]
 | 
			
		||||
> 环境变量 `SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS` 用于控制是否使用即时命令列表将任务提交到 GPU。启动此变量通常可以提高性能,但也有例外情况。因此,建议你在启用和禁用该环境变量的情况下进行测试,以找到最佳的性能设置。更多相关细节请参考[此处文档](https://www.intel.com/content/www/us/en/developer/articles/guide/level-zero-immediate-command-lists.html)。
 | 
			
		||||
 | 
			
		||||
## 快速示例
 | 
			
		||||
 | 
			
		||||
现在,让我们体验一下真实的大型语言模型(LLM)。本示例将使用 [phi-1.5](https://huggingface.co/microsoft/phi-1_5) 模型,一个具有13亿个参数的 LLM。请按照以下步骤设置和运行模型,并观察它如何对提示 "What is AI?" 做出响应。
 | 
			
		||||
 | 
			
		||||
- 步骤 1:激活之前创建的 `llm` Python 环境:
 | 
			
		||||
 | 
			
		||||
   ```bash
 | 
			
		||||
   conda activate llm
 | 
			
		||||
   ```
 | 
			
		||||
 | 
			
		||||
- 步骤 2:按照上述[运行时配置](#运行时配置)章节,准备运行时环境。
 | 
			
		||||
 | 
			
		||||
- 步骤 3:创建一个名为 `demo.py` 新文件,并将如下代码复制进其中:
 | 
			
		||||
 | 
			
		||||
   ```python
 | 
			
		||||
   # Copy/Paste the contents to a new file demo.py
 | 
			
		||||
   import torch
 | 
			
		||||
   from ipex_llm.transformers import AutoModelForCausalLM
 | 
			
		||||
   from transformers import AutoTokenizer, GenerationConfig
 | 
			
		||||
   generation_config = GenerationConfig(use_cache = True)
 | 
			
		||||
   
 | 
			
		||||
   tokenizer = AutoTokenizer.from_pretrained("tiiuae/falcon-7b", trust_remote_code=True)
 | 
			
		||||
   # load Model using ipex-llm and load it to GPU
 | 
			
		||||
   model = AutoModelForCausalLM.from_pretrained(
 | 
			
		||||
       "tiiuae/falcon-7b", load_in_4bit=True, cpu_embedding=True, trust_remote_code=True)
 | 
			
		||||
   model = model.to('xpu')
 | 
			
		||||
 | 
			
		||||
   # Format the prompt
 | 
			
		||||
   question = "What is AI?"
 | 
			
		||||
   prompt = " Question:{prompt}\n\n Answer:".format(prompt=question)
 | 
			
		||||
   # Generate predicted tokens
 | 
			
		||||
   with torch.inference_mode():
 | 
			
		||||
       input_ids = tokenizer.encode(prompt, return_tensors="pt").to('xpu')
 | 
			
		||||
       # warm up one more time before the actual generation task for the first run, see details in `Tips & Troubleshooting`
 | 
			
		||||
       # output = model.generate(input_ids, do_sample=False, max_new_tokens=32, generation_config = generation_config)
 | 
			
		||||
       output = model.generate(input_ids, do_sample=False, max_new_tokens=32, generation_config = generation_config).cpu()
 | 
			
		||||
       output_str = tokenizer.decode(output[0], skip_special_tokens=True)
 | 
			
		||||
       print(output_str)
 | 
			
		||||
   ```
 | 
			
		||||
 | 
			
		||||
   > **提示**:
 | 
			
		||||
   >
 | 
			
		||||
   > 在内存有限的 Intel iGPU 上运行大语言模型时,我们建议在 `from_pretrained` 函数中设置 `cpu_embedding=True`。这将使内存占用较大的 embedding 层使用 CPU 而非 GPU。
 | 
			
		||||
 | 
			
		||||
- 步骤 4:在已激活的 Python 环境中使用以下命令运行 `demo.py`:
 | 
			
		||||
 | 
			
		||||
  ```bash
 | 
			
		||||
  python demo.py
 | 
			
		||||
  ```
 | 
			
		||||
   
 | 
			
		||||
### 示例输出
 | 
			
		||||
 | 
			
		||||
以下是在一个配备第 11 代 Intel Core i7 CPU 和 Iris Xe Graphics iGPU 的系统上的示例输出:
 | 
			
		||||
```
 | 
			
		||||
Question:What is AI?
 | 
			
		||||
Answer: AI stands for Artificial Intelligence, which is the simulation of human intelligence in machines.
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
## 提示和故障排除
 | 
			
		||||
 | 
			
		||||
### 首次运行时进行 Warm-up 以获得最佳性能
 | 
			
		||||
首次在 GPU 上运行大语言模型时,你可能会注意到性能低于预期,在生成第一个 token 之前可能会有长达几分钟的延迟。发生这种延迟是因为 GPU 内核需要编译和初始化,这在不同类型的 GPU 之间会有所差异。为获得最佳稳定的性能,我们推荐在正式生成任务开始之前,额外运行一次 `model.generate(...)` 做为 warm-up。如果你正在开发应用程序,你可以将此 warm-up 步骤集成到启动或加载流程中以加强用户体验。
 | 
			
		||||
| 
						 | 
				
			
			@ -1,4 +1,7 @@
 | 
			
		|||
# Install IPEX-LLM on Windows with Intel GPU
 | 
			
		||||
<p>
 | 
			
		||||
  < <b>English</b> | <a href='./install_windows_gpu.zh-CN.md'>中文</a> >
 | 
			
		||||
</p>
 | 
			
		||||
    
 | 
			
		||||
This guide demonstrates how to install IPEX-LLM on Windows with Intel GPUs. 
 | 
			
		||||
 | 
			
		||||
| 
						 | 
				
			
			
 | 
			
		|||
							
								
								
									
										327
									
								
								docs/mddocs/Quickstart/install_windows_gpu.zh-CN.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										327
									
								
								docs/mddocs/Quickstart/install_windows_gpu.zh-CN.md
									
									
									
									
									
										Normal file
									
								
							| 
						 | 
				
			
			@ -0,0 +1,327 @@
 | 
			
		|||
# 在带有 Intel GPU 的 Windows 系统上安装 IPEX-LLM
 | 
			
		||||
<p>
 | 
			
		||||
  < <a href='./install_windows_gpu.md'>English</a> | <b>中文</b> >
 | 
			
		||||
</p>
 | 
			
		||||
    
 | 
			
		||||
本指南将引导你如何在具有 Intel GPUs 的 Windows 系统上安装 IPEX-LLM。 
 | 
			
		||||
 | 
			
		||||
适用于 Intel Core Ultra 和 Core 11-14 代集成的 GPUs (iGPUs),以及 Intel Arc 系列 GPU。
 | 
			
		||||
 | 
			
		||||
## 目录
 | 
			
		||||
- [系统环境安装](./install_windows_gpu.zh-CN.md#系统环境安装)
 | 
			
		||||
- [安装 ipex-llm](./install_windows_gpu.zh-CN.md#安装-ipex-llm)
 | 
			
		||||
- [验证安装](./install_windows_gpu.zh-CN.md#验证安装)
 | 
			
		||||
- [监控 GPU 状态](./install_windows_gpu.zh-CN.md#监控-gpu-状态)
 | 
			
		||||
- [快速示例](./install_windows_gpu.zh-CN.md#快速示例)
 | 
			
		||||
- [故障排除和提示](./install_windows_gpu.zh-CN.md#故障排除和提示)
 | 
			
		||||
 | 
			
		||||
## 系统环境安装
 | 
			
		||||
 | 
			
		||||
### (可选) 更新 GPU 驱动程序
 | 
			
		||||
 | 
			
		||||
> [!IMPORTANT]
 | 
			
		||||
> 如果你的驱动程序版本低于 `31.0.101.5122`,请更新 GPU 驱动程序。 可参考[此处](../Overview/install_gpu.md#prerequisites)获取更多信息。
 | 
			
		||||
 | 
			
		||||
可以从 [Intel 官方下载页面](https://www.intel.com/content/www/us/en/download/785597/intel-arc-iris-xe-graphics-windows.html)下载并安装最新的 GPU 驱动程序。更新后需要重启以完成安装。
 | 
			
		||||
 | 
			
		||||
> [!NOTE]
 | 
			
		||||
> 该过程可能需要大约 10 分钟。重启后,检查 **Intel Arc Control** 应用程序以验证驱动程序是否已正确安装。如果安装成功,应该会看到类似下图的 **Arc Control** 界面。
 | 
			
		||||
 | 
			
		||||
<img src="https://llm-assets.readthedocs.io/en/latest/_images/quickstart_windows_gpu_3.png" width=100%; />
 | 
			
		||||
 | 
			
		||||
### 设置 Python 环境
 | 
			
		||||
 | 
			
		||||
访问 [Miniforge 安装页面](https://conda-forge.org/download/),下载 **适用于 Windows 的 Miniforge 安装程序**,并按照说明步骤完成安装。
 | 
			
		||||
 | 
			
		||||
<div align="center">
 | 
			
		||||
<img src="https://llm-assets.readthedocs.io/en/latest/_images/quickstart_windows_gpu_miniforge_download.png"  width=80%/>
 | 
			
		||||
</div>
 | 
			
		||||
 | 
			
		||||
安装完成后,打开 **Miniforge Prompt**,创建一个新的 Python 环境 `llm` :
 | 
			
		||||
 | 
			
		||||
```cmd
 | 
			
		||||
conda create -n llm python=3.11 libuv
 | 
			
		||||
```
 | 
			
		||||
激活新创建的环境 `llm`:
 | 
			
		||||
 | 
			
		||||
```cmd
 | 
			
		||||
conda activate llm
 | 
			
		||||
```
 | 
			
		||||
  
 | 
			
		||||
## 安装 `ipex-llm`
 | 
			
		||||
 | 
			
		||||
在 `llm` 环境处于激活状态下,使用 `pip` 安装适用于 GPU 的 `ipex-llm`。 
 | 
			
		||||
- **对于处理器编号为 2xxV 的第二代 Intel Core™ Ultra Processers (代号 Lunar Lake)**:
 | 
			
		||||
 | 
			
		||||
  可以根据区域选择不同的 `extra-index-url`,提供 US 和 CN 两个选项:
 | 
			
		||||
 | 
			
		||||
  - **US**:
 | 
			
		||||
 | 
			
		||||
      ```cmd
 | 
			
		||||
      conda create -n llm python=3.11 libuv
 | 
			
		||||
      conda activate llm
 | 
			
		||||
 | 
			
		||||
      pip install --pre --upgrade ipex-llm[xpu_lnl] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/lnl/us/
 | 
			
		||||
      ```
 | 
			
		||||
  - **CN**:
 | 
			
		||||
 | 
			
		||||
      ```cmd
 | 
			
		||||
      conda create -n llm python=3.11 libuv
 | 
			
		||||
      conda activate llm
 | 
			
		||||
 | 
			
		||||
      pip install --pre --upgrade ipex-llm[xpu_lnl] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/lnl/cn/
 | 
			
		||||
      ```
 | 
			
		||||
- 对于**其他 Intel iGPU 和 dGPU**:
 | 
			
		||||
 | 
			
		||||
   可以根据区域选择不同的 `extra-index-url`,提供 US 和 CN 两个选项:
 | 
			
		||||
 | 
			
		||||
   - **US**:
 | 
			
		||||
 | 
			
		||||
      ```cmd
 | 
			
		||||
      conda create -n llm python=3.11 libuv
 | 
			
		||||
      conda activate llm
 | 
			
		||||
 | 
			
		||||
      pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 | 
			
		||||
      ```
 | 
			
		||||
 | 
			
		||||
   - **CN**:
 | 
			
		||||
 | 
			
		||||
      ```cmd
 | 
			
		||||
      conda create -n llm python=3.11 libuv
 | 
			
		||||
      conda activate llm
 | 
			
		||||
 | 
			
		||||
      pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/cn/
 | 
			
		||||
      ```
 | 
			
		||||
 | 
			
		||||
> [!NOTE]
 | 
			
		||||
> 如果在安装 IPEX 时遇到网络问题,请参阅[本指南](../Overview/install_gpu.md#install-ipex-llm-from-wheel)获取故障排除建议。
 | 
			
		||||
 | 
			
		||||
## 验证安装
 | 
			
		||||
你可以通过以下步骤验证 `ipex-llm` 是否已安装成功。
 | 
			
		||||
 | 
			
		||||
### 步骤 1: 运行时配置
 | 
			
		||||
- 打开 **Miniforge Prompt**,激活已创建的 Python 环境 `llm`:
 | 
			
		||||
 | 
			
		||||
   ```cmd
 | 
			
		||||
   conda activate llm
 | 
			
		||||
   ```
 | 
			
		||||
 | 
			
		||||
- 根据你的设备,设置以下环境参数:
 | 
			
		||||
 | 
			
		||||
  - **Intel iGPU**:
 | 
			
		||||
 | 
			
		||||
    ```cmd
 | 
			
		||||
    set SYCL_CACHE_PERSISTENT=1
 | 
			
		||||
    set BIGDL_LLM_XMX_DISABLED=1
 | 
			
		||||
    ```
 | 
			
		||||
 | 
			
		||||
  - **Intel Arc™ A770**:
 | 
			
		||||
 | 
			
		||||
    ```cmd
 | 
			
		||||
    set SYCL_CACHE_PERSISTENT=1
 | 
			
		||||
    ```
 | 
			
		||||
  
 | 
			
		||||
> [!TIP]
 | 
			
		||||
> 对于其他的 Intel dGPU 系列,请参阅[此指南](../Overview/install_gpu.md#runtime-configuration)了解有关运行时配置的更多详细信息。
 | 
			
		||||
 | 
			
		||||
### 步骤 2: 运行 Python 代码
 | 
			
		||||
 | 
			
		||||
- 在 Miniforge Prompt 窗口中,通过输入 `python` 并按下 Enter 键以启动 Python 交互式控制台。
 | 
			
		||||
 | 
			
		||||
- 请在 Miniforge Prompt 中**逐行复制** 以下代码,**每复制一行**后按 Enter 键。
 | 
			
		||||
 | 
			
		||||
  ```python
 | 
			
		||||
  import torch 
 | 
			
		||||
  from ipex_llm.transformers import AutoModel,AutoModelForCausalLM    
 | 
			
		||||
  tensor_1 = torch.randn(1, 1, 40, 128).to('xpu') 
 | 
			
		||||
  tensor_2 = torch.randn(1, 1, 128, 40).to('xpu') 
 | 
			
		||||
  print(torch.matmul(tensor_1, tensor_2).size()) 
 | 
			
		||||
  ```
 | 
			
		||||
 | 
			
		||||
  最后会输出如下内容:
 | 
			
		||||
 | 
			
		||||
  ```
 | 
			
		||||
  torch.Size([1, 1, 40, 40])
 | 
			
		||||
  ```
 | 
			
		||||
 | 
			
		||||
  > **提示**:
 | 
			
		||||
  >
 | 
			
		||||
  > 如果您遇到任何问题,请参阅[这里](../Overview/install_gpu.md#troubleshooting)寻求帮助。
 | 
			
		||||
 | 
			
		||||
- 退出 Python 交互式控制台,只需按 Ctrl+Z,然后按下 Enter 键(或者输入 `exit()`,再按 Enter 键)。
 | 
			
		||||
 | 
			
		||||
## 监控 GPU 状态
 | 
			
		||||
要监控 GPU 性能和状态 (例如内存消耗、利用率等),你可以 **使用 Windows 任务管理器的 `性能` 选项卡**(参见下图左侧)或 **Arc Control** 应用程序(参见下图右侧)
 | 
			
		||||
 | 
			
		||||
<img src="https://llm-assets.readthedocs.io/en/latest/_images/quickstart_windows_gpu_4.png"  width=100%; />
 | 
			
		||||
 | 
			
		||||
## 快速示例
 | 
			
		||||
 | 
			
		||||
现在让我们实际运行一个大型语言模型(LLM)。本示例将使用 [Qwen2-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2-1.5B-Instruct) 模型,一个拥有15亿参数的LLM。 请按照以下步骤设置和运行模型,并观察它如何对提示词 "What is AI?" 做出响应。
 | 
			
		||||
 | 
			
		||||
- 步骤 1: 按照上述 [运行时配置](#步骤-1-运行时配置)章节,准备运行时环境。
 | 
			
		||||
 | 
			
		||||
- 步骤 2: 创建代码文件。IPEX-LLM 支持从 Hugging Face 或 ModelScope 加载模型。请根据你的需求选择。
 | 
			
		||||
 | 
			
		||||
  - **从 Hugging Face 加载模型**:
 | 
			
		||||
    
 | 
			
		||||
    创建一个名为 `demo.py` 新文件,并将如下代码复制进其中,从而运行基于 IPEX-LLM 优化的 [Qwen2-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2-1.5B-Instruct) 模型。
 | 
			
		||||
 | 
			
		||||
      ```python
 | 
			
		||||
      # Copy/Paste the contents to a new file demo.py
 | 
			
		||||
      import torch
 | 
			
		||||
      from ipex_llm.transformers import AutoModelForCausalLM
 | 
			
		||||
      from transformers import AutoTokenizer, GenerationConfig
 | 
			
		||||
      generation_config = GenerationConfig(use_cache=True)
 | 
			
		||||
 | 
			
		||||
      print('Now start loading Tokenizer and optimizing Model...')
 | 
			
		||||
      tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2-1.5B-Instruct",
 | 
			
		||||
                                                trust_remote_code=True)
 | 
			
		||||
 | 
			
		||||
      # Load Model using ipex-llm and load it to GPU
 | 
			
		||||
      model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2-1.5B-Instruct",
 | 
			
		||||
                                                   load_in_4bit=True,
 | 
			
		||||
                                                   cpu_embedding=True,
 | 
			
		||||
                                                   trust_remote_code=True)
 | 
			
		||||
      model = model.to('xpu')
 | 
			
		||||
      print('Successfully loaded Tokenizer and optimized Model!')
 | 
			
		||||
 | 
			
		||||
      # Format the prompt
 | 
			
		||||
      # you could tune the prompt based on your own model,
 | 
			
		||||
      # here the prompt tuning refers to https://huggingface.co/Qwen/Qwen2-1.5B-Instruct#quickstart
 | 
			
		||||
      question = "What is AI?"
 | 
			
		||||
      messages = [
 | 
			
		||||
          {"role": "system", "content": "You are a helpful assistant."},
 | 
			
		||||
          {"role": "user", "content": question}
 | 
			
		||||
      ]
 | 
			
		||||
      text = tokenizer.apply_chat_template(
 | 
			
		||||
          messages,
 | 
			
		||||
          tokenize=False,
 | 
			
		||||
          add_generation_prompt=True
 | 
			
		||||
      )
 | 
			
		||||
 | 
			
		||||
      # Generate predicted tokens
 | 
			
		||||
      with torch.inference_mode():
 | 
			
		||||
         input_ids = tokenizer.encode(text, return_tensors="pt").to('xpu')
 | 
			
		||||
 | 
			
		||||
         print('--------------------------------------Note-----------------------------------------')
 | 
			
		||||
         print('| For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or |')
 | 
			
		||||
         print('| Pro A60, it may take several minutes for GPU kernels to compile and initialize. |')
 | 
			
		||||
         print('| Please be patient until it finishes warm-up...                                  |')
 | 
			
		||||
         print('-----------------------------------------------------------------------------------')
 | 
			
		||||
 | 
			
		||||
         # To achieve optimal and consistent performance, we recommend a one-time warm-up by running `model.generate(...)` an additional time before starting your actual generation tasks.
 | 
			
		||||
         # If you're developing an application, you can incorporate this warm-up step into start-up or loading routine to enhance the user experience.
 | 
			
		||||
         output = model.generate(input_ids,
 | 
			
		||||
                                 do_sample=False,
 | 
			
		||||
                                 max_new_tokens=32,
 | 
			
		||||
                                 generation_config=generation_config) # warm-up
 | 
			
		||||
 | 
			
		||||
         print('Successfully finished warm-up, now start generation...')
 | 
			
		||||
 | 
			
		||||
         output = model.generate(input_ids,
 | 
			
		||||
                                 do_sample=False,
 | 
			
		||||
                                 max_new_tokens=32,
 | 
			
		||||
                                 generation_config=generation_config).cpu()
 | 
			
		||||
         output_str = tokenizer.decode(output[0], skip_special_tokens=False)
 | 
			
		||||
         print(output_str)
 | 
			
		||||
      ```
 | 
			
		||||
  - **从 ModelScope 加载模型**:
 | 
			
		||||
 | 
			
		||||
    请在 Miniforge Prompt 中运行以下命令来安装 ModelScope:
 | 
			
		||||
    
 | 
			
		||||
    ```cmd
 | 
			
		||||
    pip install modelscope==1.11.0
 | 
			
		||||
    ```
 | 
			
		||||
 | 
			
		||||
    创建一个名为 `demo.py` 新文件,并将如下代码复制进其中,从而运行基于 IPEX-LLM 优化的 [Qwen2-1.5B-Instruct](https://www.modelscope.cn/models/qwen/Qwen2-1.5B-Instruct/summary) 模型。
 | 
			
		||||
 | 
			
		||||
      ```python
 | 
			
		||||
      # Copy/Paste the contents to a new file demo.py
 | 
			
		||||
      import torch
 | 
			
		||||
      from ipex_llm.transformers import AutoModelForCausalLM
 | 
			
		||||
      from transformers import GenerationConfig
 | 
			
		||||
      from modelscope import AutoTokenizer
 | 
			
		||||
      generation_config = GenerationConfig(use_cache=True)
 | 
			
		||||
 | 
			
		||||
      print('Now start loading Tokenizer and optimizing Model...')
 | 
			
		||||
      tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2-1.5B-Instruct",
 | 
			
		||||
                                                trust_remote_code=True)
 | 
			
		||||
 | 
			
		||||
      # Load Model using ipex-llm and load it to GPU
 | 
			
		||||
      model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2-1.5B-Instruct",
 | 
			
		||||
                                                   load_in_4bit=True,
 | 
			
		||||
                                                   cpu_embedding=True,
 | 
			
		||||
                                                   trust_remote_code=True,
 | 
			
		||||
                                                   model_hub='modelscope')
 | 
			
		||||
      model = model.to('xpu')
 | 
			
		||||
      print('Successfully loaded Tokenizer and optimized Model!')
 | 
			
		||||
 | 
			
		||||
      # Format the prompt
 | 
			
		||||
      # you could tune the prompt based on your own model,
 | 
			
		||||
      # here the prompt tuning refers to https://huggingface.co/Qwen/Qwen2-1.5B-Instruct#quickstart
 | 
			
		||||
      question = "What is AI?"
 | 
			
		||||
      messages = [
 | 
			
		||||
          {"role": "system", "content": "You are a helpful assistant."},
 | 
			
		||||
          {"role": "user", "content": question}
 | 
			
		||||
      ]
 | 
			
		||||
      text = tokenizer.apply_chat_template(
 | 
			
		||||
          messages,
 | 
			
		||||
          tokenize=False,
 | 
			
		||||
          add_generation_prompt=True
 | 
			
		||||
      )
 | 
			
		||||
      
 | 
			
		||||
      # Generate predicted tokens
 | 
			
		||||
      with torch.inference_mode():
 | 
			
		||||
         input_ids = tokenizer.encode(text, return_tensors="pt").to('xpu')
 | 
			
		||||
         print('--------------------------------------Note-----------------------------------------')
 | 
			
		||||
         print('| For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or |')
 | 
			
		||||
         print('| Pro A60, it may take several minutes for GPU kernels to compile and initialize. |')
 | 
			
		||||
         print('| Please be patient until it finishes warm-up...                                  |')
 | 
			
		||||
         print('-----------------------------------------------------------------------------------')
 | 
			
		||||
 | 
			
		||||
         # To achieve optimal and consistent performance, we recommend a one-time warm-up by running `model.generate(...)` an additional time before starting your actual generation tasks.
 | 
			
		||||
         # If you're developing an application, you can incorporate this warm-up step into start-up or loading routine to enhance the user experience.
 | 
			
		||||
         output = model.generate(input_ids,
 | 
			
		||||
                                 do_sample=False,
 | 
			
		||||
                                 max_new_tokens=32,
 | 
			
		||||
                                 generation_config=generation_config) # warm-up
 | 
			
		||||
 | 
			
		||||
         print('Successfully finished warm-up, now start generation...')
 | 
			
		||||
 | 
			
		||||
         output = model.generate(input_ids,
 | 
			
		||||
                                 do_sample=False,
 | 
			
		||||
                                 max_new_tokens=32,
 | 
			
		||||
                                 generation_config=generation_config).cpu()
 | 
			
		||||
         output_str = tokenizer.decode(output[0], skip_special_tokens=False)
 | 
			
		||||
         print(output_str)
 | 
			
		||||
      ```
 | 
			
		||||
      > **提示**:
 | 
			
		||||
      >
 | 
			
		||||
      > 请注意,有些模型在 ModelScope 上的 repo id 可能与 Hugging Face 不同。
 | 
			
		||||
 | 
			
		||||
> [!NOTE]
 | 
			
		||||
> 在内存有限的 Intel iGPU 上运行大语言模型时,我们建议在 `from_pretrained` 函数中设置 `cpu_embedding=True`。这将使内存占用较大的 embedding 层使用 CPU 而非 GPU。 
 | 
			
		||||
 | 
			
		||||
- 步骤 3. 使用以下命令在激活的 `Python` 环境 `llm` 中运行 `demo.py`:
 | 
			
		||||
 | 
			
		||||
  ```cmd
 | 
			
		||||
  python demo.py
 | 
			
		||||
  ```
 | 
			
		||||
   
 | 
			
		||||
### 示例输出
 | 
			
		||||
 | 
			
		||||
以下是在一个配备 Intel Core Ultra 5 125H CPU 和 Intel Arc Graphics iGPU 的系统上的示例输出:
 | 
			
		||||
```
 | 
			
		||||
<|im_start|>system
 | 
			
		||||
You are a helpful assistant.<|im_end|>
 | 
			
		||||
<|im_start|>user
 | 
			
		||||
What is AI?<|im_end|>
 | 
			
		||||
<|im_start|>assistant
 | 
			
		||||
Artificial Intelligence (AI) refers to the simulation of human intelligence in machines that are programmed to think and act like humans. It involves the development of algorithms,
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
## 故障排除和提示
 | 
			
		||||
 | 
			
		||||
### 首次运行时进行 Warm-up 以获得最佳性能
 | 
			
		||||
首次在 GPU 上运行大语言模型时,你可能会注意到性能低于预期,在生成第一个 token 之前可能会有长达几分钟的延迟。发生这种延迟是因为 GPU 内核需要编译和初始化,这在不同类型的 GPU 之间会有所差异。为获得最佳且稳定的性能,我们推荐在正式生成任务开始之前,额外运行一次 `model.generate(...)` 做为 warm-up。如果你正在开发应用程序,你可以将此 warm-up 步骤集成到启动或加载流程中以加强用户体验。
 | 
			
		||||
| 
						 | 
				
			
			@ -42,11 +42,11 @@ IPEX-LLM 现在已支持在 Linux 和 Windows 系统上运行 `llama.cpp`。
 | 
			
		|||
#### Linux
 | 
			
		||||
对于 Linux 系统,我们推荐使用 Ubuntu 20.04 或更高版本 (优先推荐 Ubuntu 22.04)。
 | 
			
		||||
 | 
			
		||||
请仔细参阅网页[在配有 Intel GPU 的 Linux 系统下安装 IPEX-LLM](./install_linux_gpu.md), 首先按照 [Intel GPU 驱动程序安装](./install_linux_gpu.md#install-gpu-driver)步骤安装 Intel GPU 驱动程序,然后参考 [oneAPI 安装](./install_linux_gpu.md#install-oneapi)步骤安装 Intel® oneAPI Base Toolkit 2024.0。
 | 
			
		||||
请仔细参阅网页[在配有 Intel GPU 的 Linux 系统下安装 IPEX-LLM](./install_linux_gpu.zh-CN.md), 首先按照 [Intel GPU 驱动程序安装](./install_linux_gpu.zh-CN.md#安装-gpu-驱动程序)步骤安装 Intel GPU 驱动程序,然后参考 [oneAPI 安装](./install_linux_gpu.zh-CN.md#安装-oneapi)步骤安装 Intel® oneAPI Base Toolkit 2024.0。
 | 
			
		||||
 | 
			
		||||
#### Windows (可选)
 | 
			
		||||
 | 
			
		||||
请确保你的 GPU 驱动程序版本不低于 `31.0.101.5522`。 如果版本较低,请参考 [GPU 驱动更新指南](./install_windows_gpu.md#optional-update-gpu-driver)进行升级,否则可能会遇到输出乱码的问题。 
 | 
			
		||||
请确保你的 GPU 驱动程序版本不低于 `31.0.101.5522`。 如果版本较低,请参考 [GPU 驱动更新指南](./install_windows_gpu.zh-CN.md#可选-更新-gpu-驱动程序)进行升级,否则可能会遇到输出乱码的问题。 
 | 
			
		||||
 | 
			
		||||
### 1. 为 llama.cpp 安装 IPEX-LLM
 | 
			
		||||
 | 
			
		||||
| 
						 | 
				
			
			@ -324,7 +324,7 @@ Log end
 | 
			
		|||
如果出现类似 `main: prompt is too long (xxx tokens, max xxx)` 的错误,请将 `-c` 参数设置为更大的数值,来支持更长的上下文内容。
 | 
			
		||||
 | 
			
		||||
#### 4. `gemm: cannot allocate memory on host` 错误 / `could not create an engine` 错误
 | 
			
		||||
如果在 Linux 上遇到 `oneapi::mkl::oneapi::mkl::blas::gemm: cannot allocate memory on host` 或 `could not create an engine` 错误,可能是因为你使用 pip 安装了 oneAPI 依赖项(例如 `pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0`)。建议换用 `apt` 来安装 oneAPI 依赖项以避免此问题。更多详情信息请参考[此处指南](./install_linux_gpu.md)。
 | 
			
		||||
如果在 Linux 上遇到 `oneapi::mkl::oneapi::mkl::blas::gemm: cannot allocate memory on host` 或 `could not create an engine` 错误,可能是因为你使用 pip 安装了 oneAPI 依赖项(例如 `pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0`)。建议换用 `apt` 来安装 oneAPI 依赖项以避免此问题。更多详情信息请参考[此处指南](./install_linux_gpu.zh-CN.md)。
 | 
			
		||||
 | 
			
		||||
#### 5. 无法量化模型
 | 
			
		||||
如果你遇到 `main: failed to quantize model from xxx`,请确保已经创建相关的输出目录。
 | 
			
		||||
| 
						 | 
				
			
			@ -354,7 +354,7 @@ Log end
 | 
			
		|||
2. Linux:是否已经在运行 llama.cpp 命令前执行了 `source /opt/intel/oneapi/setvars.sh`。执行此 source 命令只在当前会话有效。
 | 
			
		||||
 | 
			
		||||
#### 11. 遇到输出乱码请先检查驱动
 | 
			
		||||
如果你遇到输出乱码,请检查 GPU 驱动版本是否 >= [31.0.101.5522](https://www.intel.cn/content/www/cn/zh/download/785597/823163/intel-arc-iris-xe-graphics-windows.html)。如果不是,请参照[这里](./install_linux_gpu.md#install-gpu-driver) 的说明更新你的 GPU 驱动。
 | 
			
		||||
如果你遇到输出乱码,请检查 GPU 驱动版本是否 >= [31.0.101.5522](https://www.intel.cn/content/www/cn/zh/download/785597/823163/intel-arc-iris-xe-graphics-windows.html)。如果不是,请参照[这里](./install_linux_gpu.zh-CN.md#安装-GPU-驱动程序) 的说明更新你的 GPU 驱动。
 | 
			
		||||
 | 
			
		||||
#### 12. 为什么我的程序找不到 sycl 设备
 | 
			
		||||
如果你遇到 `GGML_ASSERT: C:/Users/Administrator/actions-runner/cpp-release/_work/llm.cpp/llm.cpp/llama-cpp-bigdl/ggml-sycl.cpp:18283: main_gpu_id<g_all_sycl_device_count` 错误或者类似错误,并且发现使用 `ls-sycl-device` 时没有任何输出,这是因为 llama.cpp 无法找到 sycl 设备。在某些笔记本电脑上,安装 ARC 驱动程序可能会导致被 Microsoft 强制安装 `OpenCL, OpenGL, and Vulkan Compatibility Pack`,这会无意中阻止系统定位 sycl 设备。这个问题可以通过在微软应用商店中手动卸载这个软件包来解决。
 | 
			
		||||
| 
						 | 
				
			
			
 | 
			
		|||
		Loading…
	
		Reference in a new issue