diff --git a/docs/readthedocs/source/_templates/sidebar_quicklinks.html b/docs/readthedocs/source/_templates/sidebar_quicklinks.html index 9465e770..3afcebb7 100644 --- a/docs/readthedocs/source/_templates/sidebar_quicklinks.html +++ b/docs/readthedocs/source/_templates/sidebar_quicklinks.html @@ -55,6 +55,9 @@
  • Run IPEX-LLM Serving with FastChat
  • +
  • + Finetune LLM with Axolotl on Intel GPU without coding +
  • diff --git a/docs/readthedocs/source/_toc.yml b/docs/readthedocs/source/_toc.yml index 5076ea0b..482210e6 100644 --- a/docs/readthedocs/source/_toc.yml +++ b/docs/readthedocs/source/_toc.yml @@ -32,6 +32,7 @@ subtrees: - file: doc/LLM/Quickstart/ollama_quickstart - file: doc/LLM/Quickstart/llama3_llamacpp_ollama_quickstart - file: doc/LLM/Quickstart/fastchat_quickstart + - file: doc/LLM/Quickstart/axolotl_quickstart - file: doc/LLM/Overview/KeyFeatures/index title: "Key Features" subtrees: diff --git a/docs/readthedocs/source/doc/LLM/Quickstart/axolotl_quickstart.md b/docs/readthedocs/source/doc/LLM/Quickstart/axolotl_quickstart.md new file mode 100644 index 00000000..cd008d99 --- /dev/null +++ b/docs/readthedocs/source/doc/LLM/Quickstart/axolotl_quickstart.md @@ -0,0 +1,214 @@ +# Finetune LLM with Axolotl on Intel GPU without coding + +[Axolotl](https://github.com/OpenAccess-AI-Collective/axolotl) is a popular tool designed to streamline the fine-tuning of various AI models, offering support for multiple configurations and architectures. You can now use [`ipex-llm`](https://github.com/intel-analytics/ipex-llm) as an accelerated backend for `Axolotl` running on Intel **GPU** *(e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max)*. + +See the demo of finetuning LLaMA2-7B on Intel Arc GPU below. + +## Quickstart + +### 0. Prerequisites + +IPEX-LLM's support for [Axolotl v0.4.0](https://github.com/OpenAccess-AI-Collective/axolotl/tree/v0.4.0) is only available for Linux system. We recommend Ubuntu 20.04 or later (Ubuntu 22.04 is preferred). + +Visit the [Install IPEX-LLM on Linux with Intel GPU](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/install_linux_gpu.html), follow [Install Intel GPU Driver](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/install_linux_gpu.html#install-intel-gpu-driver) and [Install oneAPI](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/install_linux_gpu.html#install-oneapi) to install GPU driver and IntelĀ® oneAPI Base Toolkit 2024.0. + +### 1. Install IPEX-LLM for Axolotl + +Create a new conda env, and install `ipex-llm[xpu]`. + +```cmd +conda create -n axolotl python=3.11 +conda activate axolotl +# install ipex-llm +pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ +``` + +Install [axolotl v0.4.0](https://github.com/OpenAccess-AI-Collective/axolotl/tree/v0.4.0) from git. + +```cmd +# install axolotl v0.4.0 +git clone https://github.com/OpenAccess-AI-Collective/axolotl/tree/v0.4.0 +cd axolotl +# replace requirements.txt +remove requirements.txt +wget -O requirements.txt https://github.com/intel-analytics/ipex-llm/blob/main/python/llm/example/GPU/LLM-Finetuning/axolotl/requirements-xpu.txt +pip install -e . +pip install transformers==4.36.0 +# to avoid https://github.com/OpenAccess-AI-Collective/axolotl/issues/1544 +pip install datasets==2.15.0 +# prepare axolotl entrypoints +wget https://github.com/intel-analytics/ipex-llm/blob/main/python/llm/example/GPU/LLM-Finetuning/axolotl/finetune.py +wget https://github.com/intel-analytics/ipex-llm/blob/main/python/llm/example/GPU/LLM-Finetuning/axolotl/train.py +``` + +**After the installation, you should have created a conda environment, named `axolotl` for instance, for running `Axolotl` commands with IPEX-LLM.** + +### 2. Example: Finetune Llama-2-7B with Axolotl + +The following example will introduce finetuning [Llama-2-7B](https://huggingface.co/meta-llama/Llama-2-7b) with [alpaca_2k_test](https://huggingface.co/datasets/mhenrichsen/alpaca_2k_test) dataset using LoRA and QLoRA. + +Note that you don't need to write any code in this example. + +| Model | Dataset | Finetune method | +|-------|-------|-------| +| Llama-2-7B | alpaca_2k_test | LoRA (Low-Rank Adaptation) | +| Llama-2-7B | alpaca_2k_test | QLoRA (Quantized Low-Rank Adaptation) | + +For more technical details, please refer to [Llama 2](https://arxiv.org/abs/2307.09288), [LoRA](https://arxiv.org/abs/2106.09685) and [QLoRA](https://arxiv.org/abs/2305.14314). + +#### 2.1 Download Llama-2-7B and alpaca_2k_test + +By default, Axolotl will automatically download models and datasets from Huggingface. Please ensure you have login to Huggingface. + +```cmd +huggingface-cli login +``` + +If you prefer offline models and datasets, please download [Llama-2-7B](https://huggingface.co/meta-llama/Llama-2-7b) and [alpaca_2k_test](https://huggingface.co/datasets/mhenrichsen/alpaca_2k_test). Then, set `HF_HUB_OFFLINE=1` to avoid connecting to Huggingface. + +```cmd +export HF_HUB_OFFLINE=1 +``` + +#### 2.2 Set Environment Variables + +```eval_rst +.. note:: + + This is a required step on for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI. +``` + +Configure oneAPI variables by running the following command: + +```eval_rst +.. tabs:: + .. tab:: Linux + + .. code-block:: bash + + source /opt/intel/oneapi/setvars.sh + +``` + +Configure accelerate to avoid training with CPU + +```cmd +accelerate config +``` + +Please answer `NO` in option `Do you want to run your training on CPU only (even if a GPU / Apple Silicon device is available)? [yes/NO]:`. + +After finishing accelerate config, check if `use_cpu` is disabled (i.e., `use_cpu: false`) in accelerate config file (`~/.cache/huggingface/accelerate/default_config.yaml`). + +#### 2.3 LoRA finetune + +Prepare `lora.yml` for Axolotl LoRA finetune. You can download a template from github. + +```cmd +wget https://github.com/intel-analytics/ipex-llm/blob/main/python/llm/example/GPU/LLM-Finetuning/axolotl/lora.yml +``` + +**If you are using the offline model and dataset in local env**, please modify the model path and dataset path in `lora.yml`. Otherwise, keep them unchanged. + +```yaml +# Please change to local path if model is offline, e.g., /path/to/model/Llama-2-7b-hf +base_model: NousResearch/Llama-2-7b-hf +datasets: + # Please change to local path if dataset is offline, e.g., /path/to/dataset/alpaca_2k_test + - path: mhenrichsen/alpaca_2k_test + type: alpaca +``` + +Modify LoRA parameters, such as `lora_r` and `lora_alpha`, etc. + +```yaml +adapter: lora +lora_model_dir: + +lora_r: 16 +lora_alpha: 16 +lora_dropout: 0.05 +lora_target_linear: true +lora_fan_in_fan_out: +``` + +Launch LoRA training with the following command. + +```cmd +accelerate launch finetune.py lora.yml +``` + +In Axolotl v0.4.0, you can use `train.py` instead of `-m axolotl.cli.train` or `finetune.py`. + +```cmd +accelerate launch train.py lora.yml +``` + +#### 2.4 QLoRA finetune + +Prepare `lora.yml` for QLoRA finetune. You can download a template from github. + +```cmd +wget https://github.com/intel-analytics/ipex-llm/blob/main/python/llm/example/GPU/LLM-Finetuning/axolotl/qlora.yml +``` + +**If you are using the offline model and dataset in local env**, please modify the model path and dataset path in `qlora.yml`. Otherwise, keep them unchanged. + +```yaml +# Please change to local path if model is offline, e.g., /path/to/model/Llama-2-7b-hf +base_model: NousResearch/Llama-2-7b-hf +datasets: + # Please change to local path if dataset is offline, e.g., /path/to/dataset/alpaca_2k_test + - path: mhenrichsen/alpaca_2k_test + type: alpaca +``` + +Modify QLoRA parameters, such as `lora_r` and `lora_alpha`, etc. + +```yaml +adapter: qlora +lora_model_dir: + +lora_r: 16 +lora_alpha: 16 +lora_dropout: 0.05 +lora_target_modules: +lora_target_linear: true +lora_fan_in_fan_out: +``` + +Launch LoRA training with the following command. + +```cmd +accelerate launch finetune.py qlora.yml +``` + +In Axolotl v0.4.0, you can use `train.py` instead of `-m axolotl.cli.train` or `finetune.py`. + +```cmd +accelerate launch train.py qlora.yml +``` + +## Troubleshooting + +#### TypeError: PosixPath + +Error message: `TypeError: argument of type 'PosixPath' is not iterable` + +This issue is related to [axolotl #1544](https://github.com/OpenAccess-AI-Collective/axolotl/issues/1544). It can be fixed by downgrading datasets to 2.15.0. + +```cmd +pip install datasets==2.15.0 +``` + +#### RuntimeError: out of device memory + +Error message: `RuntimeError: Allocation is out of device memory on current platform.` + +This issue is caused by running out of GPU memory. Please reduce `lora_r` or `micro_batch_size` in `qlora.yml` or `lora.yml`, or reduce data using in training. + +#### OSError: libmkl_intel_lp64.so.2 + +Error message: `OSError: libmkl_intel_lp64.so.2: cannot open shared object file: No such file or directory` + +oneAPI environment is not correctly set. Please refer to [Set Environment Variables](#set-environment-variables). diff --git a/docs/readthedocs/source/doc/LLM/Quickstart/index.rst b/docs/readthedocs/source/doc/LLM/Quickstart/index.rst index fc6d3121..4dbf6643 100644 --- a/docs/readthedocs/source/doc/LLM/Quickstart/index.rst +++ b/docs/readthedocs/source/doc/LLM/Quickstart/index.rst @@ -21,6 +21,7 @@ This section includes efficient guide to show you how to: * `Run Ollama with IPEX-LLM on Intel GPU <./ollama_quickstart.html>`_ * `Run Llama 3 on Intel GPU using llama.cpp and ollama with IPEX-LLM <./llama3_llamacpp_ollama_quickstart.html>`_ * `Run IPEX-LLM Serving with FastChat <./fastchat_quickstart.html>`_ +* `Finetune LLM with Axolotl on Intel GPU without coding <./axolotl_quickstart.html>`_ .. |bigdl_llm_migration_guide| replace:: ``bigdl-llm`` Migration Guide .. _bigdl_llm_migration_guide: bigdl_llm_migration.html