diff --git a/docs/readthedocs/source/doc/LLM/Quickstart/bigdl_llm_migration.md b/docs/readthedocs/source/doc/LLM/Quickstart/bigdl_llm_migration.md index 4f32cc87..afc79586 100644 --- a/docs/readthedocs/source/doc/LLM/Quickstart/bigdl_llm_migration.md +++ b/docs/readthedocs/source/doc/LLM/Quickstart/bigdl_llm_migration.md @@ -1,11 +1,36 @@ # `bigdl-llm` Migration Guide + +This guide helps you migrate your `bigdl-llm` application to use `ipex-llm`. + ## Upgrade `bigdl-llm` package to `ipex-llm` -First uninstall `bigdl-llm` and install `ipex-llm`. + +```eval_rst +.. note:: + This step assumes you have already installed `bigdl-llm`. +``` +You need to uninstall `bigdl-llm` and install `ipex-llm`With your `bigdl-llm` conda envionment activated, exeucte the folloiwng command according to your device type and location: + +### For CPU + ```bash pip uninstall -y bigdl-llm pip install --pre --upgrade ipex-llm[all] # for cpu -pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu # for xpu ``` + +### For GPU +```eval_rst +.. tabs:: + .. tab:: US + .. code-block:: cmd + pip uninstall -y bigdl-llm + pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ + .. tab:: CN + .. code-block:: cmd + pip uninstall -y bigdl-llm + pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/cn/ +``` + + ## Migrate `bigdl-llm` code to `ipex-llm` There are two options to migrate `bigdl-llm` code to `ipex-llm`. @@ -13,8 +38,8 @@ There are two options to migrate `bigdl-llm` code to `ipex-llm`. To upgrade `bigdl-llm` code to `ipex-llm`, simply replace all `bigdl.llm` with `ipex_llm`: ```python -#from bigdl.llm.transformers import AutoModelForCausalLM -from ipex_llm.transformers import AutoModelForCausalLM +#from bigdl.llm.transformers import AutoModelForCausalLM # Original line +from ipex_llm.transformers import AutoModelForCausalLM #Updated line model = AutoModelForCausalLM.from_pretrained(model_path, load_in_4bit=True, trust_remote_code=True) @@ -24,8 +49,7 @@ model = AutoModelForCausalLM.from_pretrained(model_path, To run in the compatible mode, simply add `import ipex_llm` at the beginning of the existing `bigdl-llm` code: ```python -# need to add the below line before "import bigdl.llm" -import ipex_llm +import ipex_llm # Add this line before any bigdl.llm imports from bigdl.llm.transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained(model_path, load_in_4bit=True, diff --git a/docs/readthedocs/source/doc/LLM/Quickstart/install_linux_gpu.md b/docs/readthedocs/source/doc/LLM/Quickstart/install_linux_gpu.md index a800f127..357e35d8 100644 --- a/docs/readthedocs/source/doc/LLM/Quickstart/install_linux_gpu.md +++ b/docs/readthedocs/source/doc/LLM/Quickstart/install_linux_gpu.md @@ -4,10 +4,11 @@ This guide demonstrates how to install IPEX-LLM on Linux with Intel GPUs. It app IPEX-LLM currently supports the Ubuntu 20.04 operating system and later, and supports PyTorch 2.0 and PyTorch 2.1 on Linux. This page demonstrates IPEX-LLM with PyTorch 2.1. Check the [Installation](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Overview/install_gpu.html#linux) page for more details. +## Install Prerequisites -## Install Intel GPU Driver +### Install GPU Driver -### For Linux kernel 6.2 +#### For Linux kernel 6.2 * Install arc driver ```bash @@ -24,15 +25,12 @@ IPEX-LLM currently supports the Ubuntu 20.04 operating system and later, and sup ```bash sudo apt-get update - sudo apt-get -y install \ gawk \ dkms \ linux-headers-$(uname -r) \ libc6-dev - sudo apt install intel-i915-dkms intel-fw-gpu - sudo apt-get install -y gawk libc6-dev udev\ intel-opencl-icd intel-level-zero-gpu level-zero \ intel-media-va-driver-non-free libmfx1 libmfxgen1 libvpl2 \ @@ -59,24 +57,7 @@ IPEX-LLM currently supports the Ubuntu 20.04 operating system and later, and sup ``` -## Setup Python Environment - -Install the Miniconda as follows if you don't have conda installed on your machine: - ```bash - wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh - - bash Miniconda3-latest-Linux-x86_64.sh - - source ~/.bashrc - - # Verify the installation - conda --version - # rm Miniconda3-latest-Linux-x86_64.sh # if you don't need this file any longer - ``` - > image-20240221102252569 - - -## Install oneAPI +### Install oneAPI ``` wget -O- https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB | gpg --dearmor | sudo tee /usr/share/keyrings/oneapi-archive-keyring.gpg > /dev/null @@ -90,23 +71,36 @@ Install the Miniconda as follows if you don't have conda installed on your machi > image-20240221102252565 +### Setup Python Environment + +Download and install the Miniconda as follows if you don't have conda installed on your machine: + ```bash + wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh + bash Miniconda3-latest-Linux-x86_64.sh + source ~/.bashrc + ``` + +You can use `conda --version` to verify you conda installation. + +After installation, create a new python environment `llm`: +```cmd +conda create -n llm python=3.9 libuv +``` +Activate the newly created environment `llm`: +```cmd +conda activate llm +``` + ## Install `ipex-llm` * With the `llm` environment active, use `pip` to install `ipex-llm` for GPU: ``` - conda create -n llm python=3.9 - conda activate llm - pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://developer.intel.com/ipex-whl-stable-xpu ``` - > image-20240221102252564 - - > image-20240221102252564 - - -* You can verify if ipex-llm is successfully installed by simply importing a few classes from the library. For example, execute the following import command in the terminal: +## Verify Installation +* You can verify if `ipex-llm` is successfully installed by simply importing a few classes from the library. For example, execute the following import command in the terminal: ```bash source /opt/intel/oneapi/setvars.sh @@ -115,38 +109,44 @@ Install the Miniconda as follows if you don't have conda installed on your machi > from ipex_llm.transformers import AutoModel, AutoModelForCausalLM ``` - > image-20240221102252562 - - ## Runtime Configurations To use GPU acceleration on Linux, several environment variables are required or recommended before running a GPU example. -* For Intel Arc™ A-Series Graphics and Intel Data Center GPU Flex Series, we recommend: - ```bash - # Configure oneAPI environment variables. Required step for APT or offline installed oneAPI. - # Skip this step for PIP-installed oneAPI since the environment has already been configured in LD_LIBRARY_PATH. - source /opt/intel/oneapi/setvars.sh +```eval_rst +.. tabs:: + .. tab:: Intel Arc™ A-Series and Intel Data Center GPU Flex - # Recommended Environment Variables for optimal performance - export USE_XETLA=OFF - export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 - ``` + For Intel Arc™ A-Series Graphics and Intel Data Center GPU Flex Series, we recommend: + .. code-block:: bash -* For Intel Data Center GPU Max Series, we recommend: - ```bash - # Configure oneAPI environment variables. Required step for APT or offline installed oneAPI. - # Skip this step for PIP-installed oneAPI since the environment has already been configured in LD_LIBRARY_PATH. - source /opt/intel/oneapi/setvars.sh + # Configure oneAPI environment variables. Required step for APT or offline installed oneAPI. + # Skip this step for PIP-installed oneAPI since the environment has already been configured in LD_LIBRARY_PATH. + source /opt/intel/oneapi/setvars.sh - # Recommended Environment Variables for optimal performance - export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so - export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 - export ENABLE_SDP_FUSION=1 - ``` - Please note that `libtcmalloc.so` can be installed by ```conda install -c conda-forge -y gperftools=2.10```. + # Recommended Environment Variables for optimal performance + export USE_XETLA=OFF + export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 + .. tab:: Intel Data Center GPU Max + + For Intel Data Center GPU Max Series, we recommend: + + .. code-block:: bash + + # Configure oneAPI environment variables. Required step for APT or offline installed oneAPI. + # Skip this step for PIP-installed oneAPI since the environment has already been configured in LD_LIBRARY_PATH. + source /opt/intel/oneapi/setvars.sh + + # Recommended Environment Variables for optimal performance + export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so + export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 + export ENABLE_SDP_FUSION=1 + + Please note that ``libtcmalloc.so`` can be installed by ``conda install -c conda-forge -y gperftools=2.10`` + +``` ## A Quick Example @@ -156,16 +156,7 @@ Now let's play with a real LLM. We'll be using the [phi-1.5](https://huggingface ```bash conda activate llm ``` -* Step 2: If you're running on iGPU, set some environment variables by running below commands: - > For more details about runtime configurations, refer to [this guide](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Overview/install_gpu.html#runtime-configuration): - ```bash - # Skip this step for PIP-installed oneAPI since the environment has already been configured in LD_LIBRARY_PATH. - source /opt/intel/oneapi/setvars.sh - - # Recommended Environment Variables for optimal performance - export USE_XETLA=OFF - export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 - ``` +* Step 2: Follow [Runtime Configurations Section](#runtime-configurations) above to prepare your runtime environment. * Step 3: Create a new file named `demo.py` and insert the code snippet below. ```python # Copy/Paste the contents to a new file demo.py diff --git a/docs/readthedocs/source/doc/LLM/Quickstart/install_windows_gpu.md b/docs/readthedocs/source/doc/LLM/Quickstart/install_windows_gpu.md index 7997d740..eceb34e6 100644 --- a/docs/readthedocs/source/doc/LLM/Quickstart/install_windows_gpu.md +++ b/docs/readthedocs/source/doc/LLM/Quickstart/install_windows_gpu.md @@ -93,18 +93,18 @@ Choose either US or CN website for `extra-index-url`: ``` ## Verify Installation -You can verify if `ipex-llm` is successfully installed by simply running a few lines of code: +You can verify if `ipex-llm` is successfully installed following below steps. -* Step 1: Open the **Anaconda Prompt** and activate the Python environment `llm` you previously created: - ```cmd - conda activate llm - ``` -* Step 2: Configure oneAPI variables by running the following command: - ```cmd - call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat" - ``` -* Step 3: - Please also set the following environment variable according to your device: +### Step 1: Runtime Configurations + * Open the **Anaconda Prompt** and activate the Python environment `llm` you previously created: + ```cmd + conda activate llm + ``` + * Configure oneAPI variables by running the following command: + ```cmd + call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat" + ``` +* Set the following environment variables according to your device: ```eval_rst .. tabs:: @@ -125,9 +125,12 @@ You can verify if `ipex-llm` is successfully installed by simply running a few l For other Intel dGPU Series, please refer to `this guide `_ for more details regarding runtime configuration. ``` -* Step 4: Launch the Python interactive shell by typing `python` in the Anaconda prompt window and then press Enter. -* Step 5: Copy following code to Anaconda prompt **line by line** and press Enter **after copying each line**. +### Step 2: Run Python Code + +* Launch the Python interactive shell by typing `python` in the Anaconda prompt window and then press Enter. + +* Copy following code to Anaconda prompt **line by line** and press Enter **after copying each line**. ```python import torch from ipex_llm.transformers import AutoModel,AutoModelForCausalLM @@ -156,41 +159,12 @@ To monitor your GPU's performance and status (e.g. memory consumption, utilizati Now let's play with a real LLM. We'll be using the [Qwen-1.8B-Chat](https://huggingface.co/Qwen/Qwen-1_8B-Chat) model, a 1.8 billion parameter LLM for this demonstration. Follow the steps below to setup and run the model, and observe how it responds to a prompt "What is AI?". -* Step 1: Open the **Anaconda Prompt** and activate the Python environment `llm` you previously created: - ```cmd - conda activate llm - ``` -* Step 2: Configure oneAPI variables by running the following command: - ```cmd - call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat" - ``` -* Step 3: - Please also set the following environment variable according to your device: - - ```eval_rst - .. tabs:: - .. tab:: Intel iGPU - - .. code-block:: cmd - - set SYCL_CACHE_PERSISTENT=1 - set BIGDL_LLM_XMX_DISABLED=1 - - .. tab:: Intel Arc™ A770 - - There is no need to set further environment variables. - ``` - - ```eval_rst - .. seealso:: - - For other Intel dGPU Series, please refer to `this guide `_ for more details regarding runtime configuration. - ``` -* Step 4: Install additional package required for Qwen-1.8B-Chat to conduct: +* Step 1: Follow [Runtime Configurations Section](#step-1-runtime-configurations) above to prepare your runtime environment. +* Step 2: Install additional package required for Qwen-1.8B-Chat to conduct: ```cmd pip install tiktoken transformers_stream_generator einops ``` -* Step 5: Create code file. IPEX-LLM supports loading model from Hugging Face or ModelScope. Please choose according to your requirements. +* Step 3: Create code file. IPEX-LLM supports loading model from Hugging Face or ModelScope. Please choose according to your requirements. ```eval_rst .. tabs:: .. tab:: Hugging Face @@ -322,7 +296,7 @@ Now let's play with a real LLM. We'll be using the [Qwen-1.8B-Chat](https://hugg This will allow the memory-intensive embedding layer to utilize the CPU instead of GPU. ``` -* Step 5. Run `demo.py` within the activated Python environment using the following command: +* Step 4. Run `demo.py` within the activated Python environment using the following command: ```cmd python demo.py ```