diff --git a/docs/readthedocs/source/doc/LLM/Quickstart/install_linux_gpu.md b/docs/readthedocs/source/doc/LLM/Quickstart/install_linux_gpu.md index 6feb4606..69142a30 100644 --- a/docs/readthedocs/source/doc/LLM/Quickstart/install_linux_gpu.md +++ b/docs/readthedocs/source/doc/LLM/Quickstart/install_linux_gpu.md @@ -86,6 +86,8 @@ IPEX-LLM currently supports the Ubuntu 20.04 operating system and later, and sup libglapi-mesa libgles2-mesa-dev libglx-mesa0 libigdgmm12 libxatracker2 mesa-va-drivers \ mesa-vdpau-drivers mesa-vulkan-drivers va-driver-all vainfo + sudo apt install -y intel-i915-dkms intel-fw-gpu + sudo reboot ``` diff --git a/python/llm/example/GPU/Deepspeed-AutoTP/README.md b/python/llm/example/GPU/Deepspeed-AutoTP/README.md index 6b362db6..8c5f5365 100644 --- a/python/llm/example/GPU/Deepspeed-AutoTP/README.md +++ b/python/llm/example/GPU/Deepspeed-AutoTP/README.md @@ -8,6 +8,10 @@ To run this example with IPEX-LLM on Intel GPUs, we have some recommended requir ## Example: +### 0. Prerequisites + +Please visit the [Install IPEX-LLM on Linux with Intel GPU](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/install_linux_gpu.html), follow [Install Intel GPU Driver](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/install_linux_gpu.html#install-intel-gpu-driver) and [Install oneAPI](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/install_linux_gpu.html#install-oneapi) to install GPU driver and IntelĀ® oneAPI Base Toolkit 2024.0. + ### 1. Install ```bash @@ -15,6 +19,7 @@ conda create -n llm python=3.11 conda activate llm # below command will install intel_extension_for_pytorch==2.1.10+xpu as default pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ +pip install transformers==4.37.0 pip install oneccl_bind_pt==2.1.100 --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ # configures OneAPI environment variables source /opt/intel/oneapi/setvars.sh @@ -49,6 +54,14 @@ bash run_vicuna_33b_arc_2_card.sh > **Note**: You could change `NUM_GPUS` to the number of GPUs you have on your machine. And you could also specify other low bit optimizations through `--low-bit`. +- Run Qwen1.5-14B-Chat on two Intel Arc A770 + +```bash +bash run_qwen_14b_arc_2_card.sh +``` + +> **Note**: You could change `NUM_GPUS` to the number of GPUs you have on your machine. And you could also specify other low bit optimizations through `--low-bit`. + - Run Mistral-7B-Instruct on two cards of Intel Data Center GPU Flex ```bash @@ -69,7 +82,7 @@ bash run_mistral_7b_instruct_flex_2_card.sh [0] One day, she decided to go on a journey to find the legendary ``` -**Important**: The first token latency is much larger than rest token latency, you could use [our benchmark tool](https://github.com/intel-analytics/ipex-llm/blob/main/python/llm/dev/benchmark/README.md) to obtain more details about first and rest token latency. +**Important**: To obtain more details about performance, please use [our benchmark tool](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/dev/benchmark/all-in-one). You need to specify `test_api` as `"deepspeed_optimize_model_gpu"`, and modify other configurations based on your requirement in `config.yaml`. Then just run `run-deepspeed-arc.sh` or `run-deepspeed-pvc.sh` according to your device to collect performance data. ### Known Issue diff --git a/python/llm/example/GPU/Deepspeed-AutoTP/run_qwen_14b_arc_2_card.sh b/python/llm/example/GPU/Deepspeed-AutoTP/run_qwen_14b_arc_2_card.sh new file mode 100644 index 00000000..0b45569b --- /dev/null +++ b/python/llm/example/GPU/Deepspeed-AutoTP/run_qwen_14b_arc_2_card.sh @@ -0,0 +1,38 @@ +# +# Copyright 2016 The BigDL Authors. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +## Validated BKC for Qwen1.5-14B-Chat on 2 ARC with +## Ubuntu 22.04.4, kernel 6.5.0-27-generic, level-zero 1.14.0, NEO(compute runtime) 24.09.28717.12 + +export MASTER_ADDR=127.0.0.1 +export FI_PROVIDER=tcp +export CCL_ATL_TRANSPORT=ofi +export CCL_ZE_IPC_EXCHANGE=sockets + +export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so:${LD_PRELOAD} +basekit_root=/opt/intel/oneapi +source $basekit_root/setvars.sh --force +source $basekit_root/ccl/latest/env/vars.sh --force + +NUM_GPUS=2 # number of used GPU +export USE_XETLA=OFF +if grep -q "Core" /proc/cpuinfo; then + export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=2 +fi +export TORCH_LLM_ALLREDUCE=0 # Different from PVC + +mpirun -np $NUM_GPUS --prepend-rank \ + python deepspeed_autotp.py --repo-id-or-model-path 'Qwen/Qwen1.5-14B-Chat' --low-bit 'sym_int4'