diff --git a/docs/mddocs/Quickstart/axolotl_quickstart.md b/docs/mddocs/Quickstart/axolotl_quickstart.md index 4a2cbb3a..f0a2dfa1 100644 --- a/docs/mddocs/Quickstart/axolotl_quickstart.md +++ b/docs/mddocs/Quickstart/axolotl_quickstart.md @@ -4,7 +4,7 @@ See the demo of finetuning LLaMA2-7B on Intel Arc GPU below. - +[](https://llm-assets.readthedocs.io/en/latest/_images/axolotl-qlora-linux-arc.mp4) ## Quickstart @@ -12,13 +12,13 @@ See the demo of finetuning LLaMA2-7B on Intel Arc GPU below. IPEX-LLM's support for [Axolotl v0.4.0](https://github.com/OpenAccess-AI-Collective/axolotl/tree/v0.4.0) is only available for Linux system. We recommend Ubuntu 20.04 or later (Ubuntu 22.04 is preferred). -Visit the [Install IPEX-LLM on Linux with Intel GPU](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/install_linux_gpu.html), follow [Install Intel GPU Driver](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/install_linux_gpu.html#install-intel-gpu-driver) and [Install oneAPI](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/install_linux_gpu.html#install-oneapi) to install GPU driver and Intel® oneAPI Base Toolkit 2024.0. +Visit the [Install IPEX-LLM on Linux with Intel GPU](./install_linux_gpu.md), follow [Install Intel GPU Driver](./install_linux_gpu.md#install-gpu-driver) and [Install oneAPI](./install_linux_gpu.md#install-oneapi) to install GPU driver and Intel® oneAPI Base Toolkit 2024.0. ### 1. Install IPEX-LLM for Axolotl Create a new conda env, and install `ipex-llm[xpu]`. -```cmd +```bash conda create -n axolotl python=3.11 conda activate axolotl # install ipex-llm @@ -27,7 +27,7 @@ pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-exte Install [axolotl v0.4.0](https://github.com/OpenAccess-AI-Collective/axolotl/tree/v0.4.0) from git. -```cmd +```bash # install axolotl v0.4.0 git clone https://github.com/OpenAccess-AI-Collective/axolotl/tree/v0.4.0 cd axolotl @@ -62,46 +62,37 @@ For more technical details, please refer to [Llama 2](https://arxiv.org/abs/2307 By default, Axolotl will automatically download models and datasets from Huggingface. Please ensure you have login to Huggingface. -```cmd +```bash huggingface-cli login ``` If you prefer offline models and datasets, please download [Llama-2-7B](https://huggingface.co/meta-llama/Llama-2-7b) and [alpaca_2k_test](https://huggingface.co/datasets/mhenrichsen/alpaca_2k_test). Then, set `HF_HUB_OFFLINE=1` to avoid connecting to Huggingface. -```cmd +```bash export HF_HUB_OFFLINE=1 ``` #### 2.2 Set Environment Variables -```eval_rst -.. note:: - - This is a required step on for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI. -``` +> [!NOTE] +> This is a required step on for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI. Configure oneAPI variables by running the following command: -```eval_rst -.. tabs:: - .. tab:: Linux - - .. code-block:: bash - - source /opt/intel/oneapi/setvars.sh - +```bash +source /opt/intel/oneapi/setvars.sh ``` Configure accelerate to avoid training with CPU. You can download a default `default_config.yaml` with `use_cpu: false`. -```cmd +```bash mkdir -p ~/.cache/huggingface/accelerate/ wget -O ~/.cache/huggingface/accelerate/default_config.yaml https://raw.githubusercontent.com/intel-analytics/ipex-llm/main/python/llm/example/GPU/LLM-Finetuning/axolotl/default_config.yaml ``` As an alternative, you can config accelerate based on your requirements. -```cmd +```bash accelerate config ``` @@ -113,7 +104,7 @@ After finishing accelerate config, check if `use_cpu` is disabled (i.e., `use_cp Prepare `lora.yml` for Axolotl LoRA finetune. You can download a template from github. -```cmd +```bash wget https://raw.githubusercontent.com/intel-analytics/ipex-llm/main/python/llm/example/GPU/LLM-Finetuning/axolotl/lora.yml ``` @@ -143,13 +134,13 @@ lora_fan_in_fan_out: Launch LoRA training with the following command. -```cmd +```bash accelerate launch finetune.py lora.yml ``` In Axolotl v0.4.0, you can use `train.py` instead of `-m axolotl.cli.train` or `finetune.py`. -```cmd +```bash accelerate launch train.py lora.yml ``` @@ -157,7 +148,7 @@ accelerate launch train.py lora.yml Prepare `lora.yml` for QLoRA finetune. You can download a template from github. -```cmd +```bash wget https://raw.githubusercontent.com/intel-analytics/ipex-llm/main/python/llm/example/GPU/LLM-Finetuning/axolotl/qlora.yml ``` @@ -188,13 +179,13 @@ lora_fan_in_fan_out: Launch LoRA training with the following command. -```cmd +```bash accelerate launch finetune.py qlora.yml ``` In Axolotl v0.4.0, you can use `train.py` instead of `-m axolotl.cli.train` or `finetune.py`. -```cmd +```bash accelerate launch train.py qlora.yml ``` @@ -206,7 +197,7 @@ Warning: this section will install axolotl main ([796a085](https://github.com/Op Axolotl main has lots of new dependencies. Please setup a new conda env for this version. -```cmd +```bash conda create -n llm python=3.11 conda activate llm # install axolotl main @@ -229,7 +220,7 @@ Based on [axolotl Llama-3 QLoRA example](https://github.com/OpenAccess-AI-Collec Prepare `llama3-qlora.yml` for QLoRA finetune. You can download a template from github. -```cmd +```bash wget https://raw.githubusercontent.com/intel-analytics/ipex-llm/main/python/llm/example/GPU/LLM-Finetuning/axolotl/llama3-qlora.yml ``` @@ -262,19 +253,19 @@ lora_target_linear: true lora_fan_in_fan_out: ``` -```cmd +```bash accelerate launch finetune.py llama3-qlora.yml ``` You can also use `train.py` instead of `-m axolotl.cli.train` or `finetune.py`. -```cmd +```bash accelerate launch train.py llama3-qlora.yml ``` Expected output -```cmd +```bash {'loss': 0.237, 'learning_rate': 1.2254711850265387e-06, 'epoch': 3.77} {'loss': 0.6068, 'learning_rate': 1.1692453482951115e-06, 'epoch': 3.77} {'loss': 0.2926, 'learning_rate': 1.1143322458989303e-06, 'epoch': 3.78} @@ -291,24 +282,24 @@ Expected output ## Troubleshooting -#### TypeError: PosixPath +### TypeError: PosixPath Error message: `TypeError: argument of type 'PosixPath' is not iterable` This issue is related to [axolotl #1544](https://github.com/OpenAccess-AI-Collective/axolotl/issues/1544). It can be fixed by downgrading datasets to 2.15.0. -```cmd +```bash pip install datasets==2.15.0 ``` -#### RuntimeError: out of device memory +### RuntimeError: out of device memory Error message: `RuntimeError: Allocation is out of device memory on current platform.` This issue is caused by running out of GPU memory. Please reduce `lora_r` or `micro_batch_size` in `qlora.yml` or `lora.yml`, or reduce data using in training. -#### OSError: libmkl_intel_lp64.so.2 +### OSError: libmkl_intel_lp64.so.2 Error message: `OSError: libmkl_intel_lp64.so.2: cannot open shared object file: No such file or directory` -oneAPI environment is not correctly set. Please refer to [Set Environment Variables](#set-environment-variables). +oneAPI environment is not correctly set. Please refer to [Set Environment Variables](#22-set-environment-variables). diff --git a/docs/mddocs/Quickstart/benchmark_quickstart.md b/docs/mddocs/Quickstart/benchmark_quickstart.md index ba26b770..a677398e 100644 --- a/docs/mddocs/Quickstart/benchmark_quickstart.md +++ b/docs/mddocs/Quickstart/benchmark_quickstart.md @@ -4,7 +4,7 @@ We can perform benchmarking for IPEX-LLM on Intel CPUs and GPUs using the benchm ## Prepare The Environment -You can refer to [here](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Overview/install.html) to install IPEX-LLM in your environment. The following dependencies are also needed to run the benchmark scripts. +You can refer to [here](../Overview/install.md) to install IPEX-LLM in your environment. The following dependencies are also needed to run the benchmark scripts. ``` pip install pandas @@ -65,109 +65,98 @@ Some parameters in the yaml file that you can configure: - `task`: There are three tasks: `continuation`, `QA` and `summarize`. `continuation` refers to writing additional content based on prompt. `QA` refers to answering questions based on prompt. `summarize` refers to summarizing the prompt. -```eval_rst -.. note:: - - If you want to benchmark the performance without warmup, you can set ``warm_up: 0`` and ``num_trials: 1`` in ``config.yaml``, and run each single model and in_out_pair separately. -``` +> [!NOTE] +> If you want to benchmark the performance without warmup, you can set ``warm_up: 0`` and ``num_trials: 1`` in ``config.yaml``, and run each single model and in_out_pair separately. ## Run on Windows -Please refer to [here](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Overview/install_gpu.html#runtime-configuration) to configure oneAPI environment variables. +Please refer to [here](../Overview/install_gpu.md#runtime-configuration) to configure oneAPI environment variables. Choose corresponding commands base on your device. -```eval_rst -.. tabs:: - .. tab:: Intel iGPU +- For **Intel iGPU**: - .. code-block:: bash + ```bash + set SYCL_CACHE_PERSISTENT=1 + set BIGDL_LLM_XMX_DISABLED=1 + + python run.py + ``` - set SYCL_CACHE_PERSISTENT=1 - set BIGDL_LLM_XMX_DISABLED=1 +- For **Intel Arc™ A300-Series or Pro A60**: - python run.py + ```bash + set SYCL_CACHE_PERSISTENT=1 + python run.py + ``` - .. tab:: Intel Arc™ A300-Series or Pro A60 +- For **Other Intel dGPU Series**: - .. code-block:: bash - - set SYCL_CACHE_PERSISTENT=1 - python run.py - - .. tab:: Other Intel dGPU Series - - .. code-block:: bash - - # e.g. Arc™ A770 - python run.py - -``` + ```bash + # e.g. Arc™ A770 + python run.py + ``` ## Run on Linux -```eval_rst -.. tabs:: - .. tab:: Intel Arc™ A-Series and Intel Data Center GPU Flex +Please choose corresponding commands base on your device. - For Intel Arc™ A-Series Graphics and Intel Data Center GPU Flex Series, we recommend: +- For **Intel Arc™ A-Series and Intel Data Center GPU Flex**: - .. code-block:: bash + For Intel Arc™ A-Series Graphics and Intel Data Center GPU Flex Series, we recommend: + + ```bash + ./run-arc.sh + ``` - ./run-arc.sh +- For **Intel iGPU**: - .. tab:: Intel iGPU + For Intel iGPU, we recommend: + + ```bash + ./run-igpu.sh + ``` - For Intel iGPU, we recommend: +- For **Intel Data Center GPU Max**: - .. code-block:: bash + Please note that you need to run ``conda install -c conda-forge -y gperftools=2.10`` before running the benchmark script on Intel Data Center GPU Max Series. + + ```bash + ./run-max-gpu.sh + ``` - ./run-igpu.sh +- For **Intel SPR**: - .. tab:: Intel Data Center GPU Max + For Intel SPR machine, we recommend: + + ```bash + ./run-spr.sh + ``` - Please note that you need to run ``conda install -c conda-forge -y gperftools=2.10`` before running the benchmark script on Intel Data Center GPU Max Series. + The scipt uses a default numactl strategy. If you want to customize it, please use ``lscpu`` or ``numactl -H`` to check how cpu indexs are assigned to numa node, and make sure the run command is binded to only one socket. - .. code-block:: bash +- For **Intel HBM**: - ./run-max-gpu.sh + For Intel HBM machine, we recommend: + + ```bash + ./run-hbm.sh + ``` + + The scipt uses a default numactl strategy. If you want to customize it, please use ``numactl -H`` to check how the index of hbm node and cpu are assigned. - .. tab:: Intel SPR - - For Intel SPR machine, we recommend: - - .. code-block:: bash - - ./run-spr.sh - - The scipt uses a default numactl strategy. If you want to customize it, please use ``lscpu`` or ``numactl -H`` to check how cpu indexs are assigned to numa node, and make sure the run command is binded to only one socket. - - .. tab:: Intel HBM - - For Intel HBM machine, we recommend: - - .. code-block:: bash - - ./run-hbm.sh - - The scipt uses a default numactl strategy. If you want to customize it, please use ``numactl -H`` to check how the index of hbm node and cpu are assigned. - - For example: - - - .. code-block:: bash - - node 0 1 2 3 - 0: 10 21 13 23 - 1: 21 10 23 13 - 2: 13 23 10 23 - 3: 23 13 23 10 - - - here hbm node is the node whose distance from the checked node is 13, node 2 is node 0's hbm node. - - And make sure the run command is binded to only one socket. - -``` + For example: + + ```bash + node 0 1 2 3 + 0: 10 21 13 23 + 1: 21 10 23 13 + 2: 13 23 10 23 + 3: 23 13 23 10 + ``` + + here hbm node is the node whose distance from the checked node is 13, node 2 is node 0's hbm node. + + And make sure the run command is binded to only one socket. ## Result diff --git a/docs/mddocs/Quickstart/bigdl_llm_migration.md b/docs/mddocs/Quickstart/bigdl_llm_migration.md index a1ef5051..0b7643e1 100644 --- a/docs/mddocs/Quickstart/bigdl_llm_migration.md +++ b/docs/mddocs/Quickstart/bigdl_llm_migration.md @@ -4,10 +4,9 @@ This guide helps you migrate your `bigdl-llm` application to use `ipex-llm`. ## Upgrade `bigdl-llm` package to `ipex-llm` -```eval_rst -.. note:: - This step assumes you have already installed `bigdl-llm`. -``` +> [!NOTE] +> This step assumes you have already installed `bigdl-llm`. + You need to uninstall `bigdl-llm` and install `ipex-llm`With your `bigdl-llm` conda environment activated, execute the following command according to your device type and location: ### For CPU @@ -19,23 +18,20 @@ pip install --pre --upgrade ipex-llm[all] # for cpu ### For GPU Choose either US or CN website for `extra-index-url`: -```eval_rst -.. tabs:: - .. tab:: US +- For **US**: - .. code-block:: cmd + ```bash + pip uninstall -y bigdl-llm + pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ + ``` - pip uninstall -y bigdl-llm - pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ +- For **CN**: - .. tab:: CN - - .. code-block:: cmd - - pip uninstall -y bigdl-llm - pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/cn/ -``` + ```bash + pip uninstall -y bigdl-llm + pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/cn/ + ``` ## Migrate `bigdl-llm` code to `ipex-llm` There are two options to migrate `bigdl-llm` code to `ipex-llm`. diff --git a/docs/mddocs/Quickstart/chatchat_quickstart.md b/docs/mddocs/Quickstart/chatchat_quickstart.md index e482751a..48c09de8 100644 --- a/docs/mddocs/Quickstart/chatchat_quickstart.md +++ b/docs/mddocs/Quickstart/chatchat_quickstart.md @@ -10,12 +10,13 @@

