LLM: reorganize GPU finetuning examples (#9952)
This commit is contained in:
parent
175027c90f
commit
171fb2d185
60 changed files with 1895 additions and 378 deletions
|
|
@ -13,13 +13,13 @@
|
||||||
|
|
||||||
### Latest update 🔥
|
### Latest update 🔥
|
||||||
- [2024/01] 🔔🔔🔔 ***Starting from 2024/01/08, the default `bigdl-llm` GPU Linux installation switched from PyTorch 2.0 to PyTorch 2.1, which requires new oneAPI and GPU driver versions. (See the [GPU installation guide](https://bigdl.readthedocs.io/en/latest/doc/LLM/Overview/install_gpu.html) for more details.)***
|
- [2024/01] 🔔🔔🔔 ***Starting from 2024/01/08, the default `bigdl-llm` GPU Linux installation switched from PyTorch 2.0 to PyTorch 2.1, which requires new oneAPI and GPU driver versions. (See the [GPU installation guide](https://bigdl.readthedocs.io/en/latest/doc/LLM/Overview/install_gpu.html) for more details.)***
|
||||||
- [2023/12] `bigdl-llm` now supports [ReLoRA](python/llm/example/GPU/QLoRA-FineTuning/alpaca-qlora#relora) (see *["ReLoRA: High-Rank Training Through Low-Rank Updates"](https://arxiv.org/abs/2307.05695)*)
|
- [2023/12] `bigdl-llm` now supports [ReLoRA](python/llm/example/GPU/LLM-Finetuning/ReLora) (see *["ReLoRA: High-Rank Training Through Low-Rank Updates"](https://arxiv.org/abs/2307.05695)*)
|
||||||
- [2023/12] `bigdl-llm` now supports [Mixtral-8x7B](python/llm/example/GPU/HF-Transformers-AutoModels/Model/mixtral) on both Intel [GPU](python/llm/example/GPU/HF-Transformers-AutoModels/Model/mixtral) and [CPU](python/llm/example/CPU/HF-Transformers-AutoModels/Model/mixtral).
|
- [2023/12] `bigdl-llm` now supports [Mixtral-8x7B](python/llm/example/GPU/HF-Transformers-AutoModels/Model/mixtral) on both Intel [GPU](python/llm/example/GPU/HF-Transformers-AutoModels/Model/mixtral) and [CPU](python/llm/example/CPU/HF-Transformers-AutoModels/Model/mixtral).
|
||||||
- [2023/12] `bigdl-llm` now supports [QA-LoRA](python/llm/example/GPU/QLoRA-FineTuning/alpaca-qlora#qa-lora) (see *["QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models"](https://arxiv.org/abs/2309.14717)*)
|
- [2023/12] `bigdl-llm` now supports [QA-LoRA](python/llm/example/GPU/LLM-Finetuning/QA-LoRA) (see *["QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models"](https://arxiv.org/abs/2309.14717)*)
|
||||||
- [2023/12] `bigdl-llm` now supports [FP8 and FP4 inference](python/llm/example/GPU/HF-Transformers-AutoModels/More-Data-Types) on Intel ***GPU***.
|
- [2023/12] `bigdl-llm` now supports [FP8 and FP4 inference](python/llm/example/GPU/HF-Transformers-AutoModels/More-Data-Types) on Intel ***GPU***.
|
||||||
- [2023/11] Initial support for directly loading [GGUF](python/llm/example/GPU/HF-Transformers-AutoModels/Advanced-Quantizations/GGUF), [AWQ](python/llm/example/GPU/HF-Transformers-AutoModels/Advanced-Quantizations/AWQ) and [GPTQ](python/llm/example/GPU/HF-Transformers-AutoModels/Advanced-Quantizations/GPTQ) models into `bigdl-llm` is available.
|
- [2023/11] Initial support for directly loading [GGUF](python/llm/example/GPU/HF-Transformers-AutoModels/Advanced-Quantizations/GGUF), [AWQ](python/llm/example/GPU/HF-Transformers-AutoModels/Advanced-Quantizations/AWQ) and [GPTQ](python/llm/example/GPU/HF-Transformers-AutoModels/Advanced-Quantizations/GPTQ) models into `bigdl-llm` is available.
|
||||||
- [2023/11] `bigdl-llm` now supports [vLLM continuous batching](python/llm/example/GPU/vLLM-Serving) on both Intel [GPU](python/llm/example/GPU/vLLM-Serving) and [CPU](python/llm/example/CPU/vLLM-Serving).
|
- [2023/11] `bigdl-llm` now supports [vLLM continuous batching](python/llm/example/GPU/vLLM-Serving) on both Intel [GPU](python/llm/example/GPU/vLLM-Serving) and [CPU](python/llm/example/CPU/vLLM-Serving).
|
||||||
- [2023/10] `bigdl-llm` now supports [QLoRA finetuning](python/llm/example/GPU/QLoRA-FineTuning) on both Intel [GPU](python/llm/example/GPU/QLoRA-FineTuning) and [CPU](python/llm/example/CPU/QLoRA-FineTuning).
|
- [2023/10] `bigdl-llm` now supports [QLoRA finetuning](python/llm/example/GPU/LLM-Finetuning/QLoRA) on both Intel [GPU](python/llm/example/GPU/LLM-Finetuning/QLoRA) and [CPU](python/llm/example/CPU/QLoRA-FineTuning).
|
||||||
- [2023/10] `bigdl-llm` now supports [FastChat serving](python/llm/src/bigdl/llm/serving) on on both Intel CPU and GPU.
|
- [2023/10] `bigdl-llm` now supports [FastChat serving](python/llm/src/bigdl/llm/serving) on on both Intel CPU and GPU.
|
||||||
- [2023/09] `bigdl-llm` now supports [Intel GPU](python/llm/example/GPU) (including Arc, Flex and MAX)
|
- [2023/09] `bigdl-llm` now supports [Intel GPU](python/llm/example/GPU) (including Arc, Flex and MAX)
|
||||||
- [2023/09] `bigdl-llm` [tutorial](https://github.com/intel-analytics/bigdl-llm-tutorial) is released.
|
- [2023/09] `bigdl-llm` [tutorial](https://github.com/intel-analytics/bigdl-llm-tutorial) is released.
|
||||||
|
|
|
||||||
|
|
@ -109,7 +109,7 @@ TrainOutput(global_step=200, training_loss=1.5072882556915284, metrics={'train_r
|
||||||
|
|
||||||
### 4. Merge the adapter into the original model
|
### 4. Merge the adapter into the original model
|
||||||
|
|
||||||
Using the [export_merged_model.py](https://github.com/intel-analytics/BigDL/blob/main/python/llm/example/GPU/QLoRA-FineTuning/export_merged_model.py) to merge.
|
Using the [export_merged_model.py](../../../../../../python/llm/example/GPU/LLM-Finetuning/QLoRA/export_merged_model.py) to merge.
|
||||||
|
|
||||||
```
|
```
|
||||||
python ./export_merged_model.py --repo-id-or-model-path REPO_ID_OR_MODEL_PATH --adapter_path ./outputs/checkpoint-200 --output_path ./outputs/checkpoint-200-merged
|
python ./export_merged_model.py --repo-id-or-model-path REPO_ID_OR_MODEL_PATH --adapter_path ./outputs/checkpoint-200 --output_path ./outputs/checkpoint-200-merged
|
||||||
|
|
|
||||||
|
|
@ -33,6 +33,6 @@ RUN curl -fsSL https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-P
|
||||||
# install huggingface dependencies
|
# install huggingface dependencies
|
||||||
pip install git+https://github.com/huggingface/transformers.git@${TRANSFORMERS_COMMIT_ID} && \
|
pip install git+https://github.com/huggingface/transformers.git@${TRANSFORMERS_COMMIT_ID} && \
|
||||||
pip install peft==0.5.0 datasets && \
|
pip install peft==0.5.0 datasets && \
|
||||||
wget https://raw.githubusercontent.com/intel-analytics/BigDL/main/python/llm/example/GPU/QLoRA-FineTuning/qlora_finetuning.py
|
wget https://raw.githubusercontent.com/intel-analytics/BigDL/main/python/llm/example/GPU/LLM-Finetuning/QLoRA/simple-example/qlora_finetuning.py
|
||||||
|
|
||||||
COPY ./start-qlora-finetuning-on-xpu.sh /start-qlora-finetuning-on-xpu.sh
|
COPY ./start-qlora-finetuning-on-xpu.sh /start-qlora-finetuning-on-xpu.sh
|
||||||
|
|
|
||||||
|
|
@ -25,13 +25,13 @@ BigDL-LLM: low-Bit LLM library
|
||||||
Latest update 🔥
|
Latest update 🔥
|
||||||
============================================
|
============================================
|
||||||
- [2024/01] 🔔🔔🔔 **Starting from 2024/01/08, the default** ``bigdl-llm`` **GPU Linux installation switched from PyTorch 2.0 to PyTorch 2.1, which requires new oneAPI and GPU driver versions. (See the** `GPU installation guide <https://bigdl.readthedocs.io/en/latest/doc/LLM/Overview/install_gpu.html>`_ **for more details.)**
|
- [2024/01] 🔔🔔🔔 **Starting from 2024/01/08, the default** ``bigdl-llm`` **GPU Linux installation switched from PyTorch 2.0 to PyTorch 2.1, which requires new oneAPI and GPU driver versions. (See the** `GPU installation guide <https://bigdl.readthedocs.io/en/latest/doc/LLM/Overview/install_gpu.html>`_ **for more details.)**
|
||||||
- [2023/12] ``bigdl-llm`` now supports `ReLoRA <https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/QLoRA-FineTuning/alpaca-qlora#relora>`_ (see `"ReLoRA: High-Rank Training Through Low-Rank Updates" <https://arxiv.org/abs/2307.05695>`_)
|
- [2023/12] ``bigdl-llm`` now supports `ReLoRA <https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/LLM-Finetuning/ReLora>`_ (see `"ReLoRA: High-Rank Training Through Low-Rank Updates" <https://arxiv.org/abs/2307.05695>`_)
|
||||||
- [2023/12] ``bigdl-llm`` now supports `Mixtral-8x7B <https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels/Model/mixtral>`_ on both Intel `GPU <https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels/Model/mixtral>`_ and `CPU <https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/Model/mixtral>`_.
|
- [2023/12] ``bigdl-llm`` now supports `Mixtral-8x7B <https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels/Model/mixtral>`_ on both Intel `GPU <https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels/Model/mixtral>`_ and `CPU <https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/Model/mixtral>`_.
|
||||||
- [2023/12] ``bigdl-llm`` now supports `QA-LoRA <https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/QLoRA-FineTuning/alpaca-qlora#qa-lora>`_ (see `"QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models" <https://arxiv.org/abs/2309.14717>`_).
|
- [2023/12] ``bigdl-llm`` now supports `QA-LoRA <https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/LLM-Finetuning/QA-LoRA>`_ (see `"QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models" <https://arxiv.org/abs/2309.14717>`_).
|
||||||
- [2023/12] ``bigdl-llm`` now supports `FP8 and FP4 inference <https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels/More-Data-Types>`_ on Intel **GPU**.
|
- [2023/12] ``bigdl-llm`` now supports `FP8 and FP4 inference <https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels/More-Data-Types>`_ on Intel **GPU**.
|
||||||
- [2023/11] Initial support for directly loading `GGUF <https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels/Advanced-Quantizations/GGUF>`_, `AWQ <https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels/Advanced-Quantizations/AWQ>`_ and `GPTQ <https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels/Advanced-Quantizations/GPTQ>`_ models in to ``bigdl-llm`` is available.
|
- [2023/11] Initial support for directly loading `GGUF <https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels/Advanced-Quantizations/GGUF>`_, `AWQ <https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels/Advanced-Quantizations/AWQ>`_ and `GPTQ <https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels/Advanced-Quantizations/GPTQ>`_ models in to ``bigdl-llm`` is available.
|
||||||
- [2023/11] ``bigdl-llm`` now supports `vLLM continuous batching <https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/vLLM-Serving>`_ on both Intel `GPU <https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/vLLM-Serving>`_ and `CPU <https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/vLLM-Serving>`_.
|
- [2023/11] ``bigdl-llm`` now supports `vLLM continuous batching <https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/vLLM-Serving>`_ on both Intel `GPU <https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/vLLM-Serving>`_ and `CPU <https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/vLLM-Serving>`_.
|
||||||
- [2023/10] ``bigdl-llm`` now supports `QLoRA finetuning <https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/QLoRA-FineTuning>`_ on both Intel `GPU <https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/QLoRA-FineTuning>`_ and `CPU <https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/QLoRA-FineTuning>`_.
|
- [2023/10] ``bigdl-llm`` now supports `QLoRA finetuning <https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/LLM-Finetuning/QLoRA>`_ on both Intel `GPU <https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/LLM-Finetuning/QLoRA>`_ and `CPU <https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/QLoRA-FineTuning>`_.
|
||||||
- [2023/10] ``bigdl-llm`` now supports `FastChat serving <https://github.com/intel-analytics/BigDL/tree/main/python/llm/src/bigdl/llm/serving>`_ on on both Intel CPU and GPU.
|
- [2023/10] ``bigdl-llm`` now supports `FastChat serving <https://github.com/intel-analytics/BigDL/tree/main/python/llm/src/bigdl/llm/serving>`_ on on both Intel CPU and GPU.
|
||||||
- [2023/09] ``bigdl-llm`` now supports `Intel GPU <https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU>`_ (including Arc, Flex and MAX)
|
- [2023/09] ``bigdl-llm`` now supports `Intel GPU <https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU>`_ (including Arc, Flex and MAX)
|
||||||
- [2023/09] ``bigdl-llm`` `tutorial <https://github.com/intel-analytics/bigdl-llm-tutorial>`_ is released.
|
- [2023/09] ``bigdl-llm`` `tutorial <https://github.com/intel-analytics/bigdl-llm-tutorial>`_ is released.
|
||||||
|
|
|
||||||
|
|
@ -54,7 +54,7 @@ TrainOutput(global_step=200, training_loss=1.3923714351654053, metrics={'train_r
|
||||||
```
|
```
|
||||||
|
|
||||||
### 3. Merge the adapter into the original model
|
### 3. Merge the adapter into the original model
|
||||||
Using the [export_merged_model.py](https://github.com/intel-analytics/BigDL/blob/main/python/llm/example/GPU/QLoRA-FineTuning/export_merged_model.py) to merge.
|
Using the [export_merged_model.py](https://github.com/intel-analytics/BigDL/blob/main/python/llm/example/GPU/LLM-Finetuning/QLoRA/export_merged_model.py) to merge.
|
||||||
```
|
```
|
||||||
python ./export_merged_model.py --repo-id-or-model-path REPO_ID_OR_MODEL_PATH --adapter_path ./outputs/checkpoint-200 --output_path ./outputs/checkpoint-200-merged
|
python ./export_merged_model.py --repo-id-or-model-path REPO_ID_OR_MODEL_PATH --adapter_path ./outputs/checkpoint-200 --output_path ./outputs/checkpoint-200-merged
|
||||||
```
|
```
|
||||||
|
|
|
||||||
|
|
@ -143,7 +143,7 @@ lora_target_modules: List[str] = ["W_pack"]
|
||||||
5. (Only for baichuan) According to this [issue](https://github.com/baichuan-inc/Baichuan2/issues/204#issuecomment-1774372008),
|
5. (Only for baichuan) According to this [issue](https://github.com/baichuan-inc/Baichuan2/issues/204#issuecomment-1774372008),
|
||||||
need to modify the [tokenization_baichuan.py](https://huggingface.co/baichuan-inc/Baichuan-7B/blob/main/tokenization_baichuan.py#L74) to fix issue.
|
need to modify the [tokenization_baichuan.py](https://huggingface.co/baichuan-inc/Baichuan-7B/blob/main/tokenization_baichuan.py#L74) to fix issue.
|
||||||
6. finetune as normal
|
6. finetune as normal
|
||||||
7. Using the [export_merged_model.py](https://github.com/intel-analytics/BigDL/blob/main/python/llm/example/GPU/QLoRA-FineTuning/export_merged_model.py) to merge. But also need to update tokenizer and model to ensure successful merge weight.
|
7. Using the [export_merged_model.py](https://github.com/intel-analytics/BigDL/blob/main/python/llm/example/GPU/LLM-Finetuning/QLoRA/export_merged_model.py) to merge. But also need to update tokenizer and model to ensure successful merge weight.
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
from transformers import AutoTokenizer # noqa: F402
|
from transformers import AutoTokenizer # noqa: F402
|
||||||
|
|
|
||||||
90
python/llm/example/GPU/LLM-Finetuning/LoRA/README.md
Normal file
90
python/llm/example/GPU/LLM-Finetuning/LoRA/README.md
Normal file
|
|
@ -0,0 +1,90 @@
|
||||||
|
# LoRA Finetuning with BigDL-LLM
|
||||||
|
|
||||||
|
This example ports [Alpaca-LoRA](https://github.com/tloen/alpaca-lora/tree/main) to BigDL-LLM (using [LoRA](https://arxiv.org/abs/2106.09685) algorithm) on [Intel GPU](../../README.md).
|
||||||
|
|
||||||
|
### 0. Requirements
|
||||||
|
To run this example with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../README.md#requirements) for more information.
|
||||||
|
|
||||||
|
### 1. Install
|
||||||
|
|
||||||
|
```bash
|
||||||
|
conda create -n llm python=3.9
|
||||||
|
conda activate llm
|
||||||
|
# below command will install intel_extension_for_pytorch==2.1.10+xpu as default
|
||||||
|
pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
|
||||||
|
pip install transformers==4.34.0 datasets
|
||||||
|
pip install fire peft==0.5.0
|
||||||
|
pip install oneccl_bind_pt==2.1.100 -f https://developer.intel.com/ipex-whl-stable-xpu # necessary to run distributed finetuning
|
||||||
|
pip install accelerate==0.23.0
|
||||||
|
pip install bitsandbytes scipy
|
||||||
|
```
|
||||||
|
|
||||||
|
### 2. Configures OneAPI environment variables
|
||||||
|
```bash
|
||||||
|
source /opt/intel/oneapi/setvars.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
### 3. LoRA Finetune
|
||||||
|
|
||||||
|
Here, we provide example usages on different hardware. Please refer to the appropriate script based on your device:
|
||||||
|
|
||||||
|
##### Finetuning LLaMA2-7B on single Arc A770
|
||||||
|
|
||||||
|
```bash
|
||||||
|
bash lora_finetune_llama2_7b_arc_1_card.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
##### Finetuning LLaMA2-7B on four Intel Data Center GPU Max 1100
|
||||||
|
|
||||||
|
```bash
|
||||||
|
bash lora_finetune_llama2_7b_pvc_1100_1_card.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
##### Finetuning LLaMA2-7B on single tile of Intel Data Center GPU Max 1550
|
||||||
|
|
||||||
|
```bash
|
||||||
|
bash lora_finetune_llama2_7b_pvc_1550_1_tile.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
##### Finetuning LLaMA2-7B on four Intel Data Center GPU Max 1550
|
||||||
|
|
||||||
|
```bash
|
||||||
|
bash lora_finetune_llama2_7b_pvc_1550_4_card.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
### 4. (Optional) Resume Training
|
||||||
|
**If you fail to complete the whole finetuning process, it is suggested to resume training from a previously saved checkpoint by specifying `resume_from_checkpoint` to the local checkpoint folder as following:**
|
||||||
|
```bash
|
||||||
|
python ./alpaca_lora_finetuning.py \
|
||||||
|
--base_model "meta-llama/Llama-2-7b-hf" \
|
||||||
|
--data_path "yahma/alpaca-cleaned" \
|
||||||
|
--output_dir "./bigdl-qlora-alpaca" \
|
||||||
|
--resume_from_checkpoint "./bigdl-qlora-alpaca/checkpoint-1100"
|
||||||
|
```
|
||||||
|
|
||||||
|
### 5. Sample Output
|
||||||
|
```log
|
||||||
|
{'loss': 1.9231, 'learning_rate': 2.9999945367033285e-05, 'epoch': 0.0}
|
||||||
|
{'loss': 1.8622, 'learning_rate': 2.9999781468531096e-05, 'epoch': 0.01}
|
||||||
|
{'loss': 1.9043, 'learning_rate': 2.9999508305687345e-05, 'epoch': 0.01}
|
||||||
|
{'loss': 1.8967, 'learning_rate': 2.999912588049185e-05, 'epoch': 0.01}
|
||||||
|
{'loss': 1.9658, 'learning_rate': 2.9998634195730358e-05, 'epoch': 0.01}
|
||||||
|
{'loss': 1.8386, 'learning_rate': 2.9998033254984483e-05, 'epoch': 0.02}
|
||||||
|
{'loss': 1.809, 'learning_rate': 2.999732306263172e-05, 'epoch': 0.02}
|
||||||
|
{'loss': 1.8552, 'learning_rate': 2.9996503623845395e-05, 'epoch': 0.02}
|
||||||
|
1%|█ | 8/1164 [xx:xx<xx:xx:xx, xx s/it]
|
||||||
|
```
|
||||||
|
|
||||||
|
### 6. Merge the adapter into the original model
|
||||||
|
```
|
||||||
|
python ./export_merged_model.py --repo-id-or-model-path REPO_ID_OR_MODEL_PATH --adapter_path ./outputs/checkpoint-200 --output_path ./outputs/checkpoint-200-merged
|
||||||
|
```
|
||||||
|
|
||||||
|
Then you can use `./outputs/checkpoint-200-merged` as a normal huggingface transformer model to do inference.
|
||||||
|
|
||||||
|
### 7. Troubleshooting
|
||||||
|
- If you fail to finetune on multi cards because of following error message:
|
||||||
|
```bash
|
||||||
|
RuntimeError: oneCCL: comm_selector.cpp:57 create_comm_impl: EXCEPTION: ze_data was not initialized
|
||||||
|
```
|
||||||
|
Please try `sudo apt install level-zero-dev` to fix it.
|
||||||
|
|
@ -0,0 +1,267 @@
|
||||||
|
#
|
||||||
|
# Copyright 2016 The BigDL Authors.
|
||||||
|
#
|
||||||
|
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
# you may not use this file except in compliance with the License.
|
||||||
|
# You may obtain a copy of the License at
|
||||||
|
#
|
||||||
|
# http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
#
|
||||||
|
# Unless required by applicable law or agreed to in writing, software
|
||||||
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
# See the License for the specific language governing permissions and
|
||||||
|
# limitations under the License.
|
||||||
|
#
|
||||||
|
# Some parts of this file is adapted from
|
||||||
|
# https://github.com/tloen/alpaca-lora/blob/main/finetune.py
|
||||||
|
#
|
||||||
|
# Copyright 2023 Rohan Taori, Ishaan Gulrajani, Tianyi Zhang, Yann Dubois, Xuechen Li
|
||||||
|
|
||||||
|
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
# you may not use this file except in compliance with the License.
|
||||||
|
# You may obtain a copy of the License at
|
||||||
|
#
|
||||||
|
# http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
#
|
||||||
|
# Unless required by applicable law or agreed to in writing, software
|
||||||
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
# See the License for the specific language governing permissions and
|
||||||
|
# limitations under the License.
|
||||||
|
|
||||||
|
import os
|
||||||
|
from typing import List
|
||||||
|
|
||||||
|
import fire
|
||||||
|
import torch
|
||||||
|
import transformers
|
||||||
|
from datasets import load_dataset
|
||||||
|
import accelerate
|
||||||
|
|
||||||
|
from transformers import LlamaTokenizer
|
||||||
|
from peft import (
|
||||||
|
get_peft_model_state_dict,
|
||||||
|
set_peft_model_state_dict,
|
||||||
|
)
|
||||||
|
|
||||||
|
current_dir = os.path.dirname(os.path.realpath(__file__))
|
||||||
|
common_util_path = os.path.join(current_dir, '..')
|
||||||
|
import sys
|
||||||
|
sys.path.append(common_util_path)
|
||||||
|
from common.utils import Prompter, get_int_from_env, wandb_check, get_train_val_data
|
||||||
|
|
||||||
|
from transformers import BitsAndBytesConfig
|
||||||
|
from bigdl.llm.transformers import AutoModelForCausalLM
|
||||||
|
# import them from bigdl.llm.transformers.qlora to get a BigDL-LLM compatible Peft model
|
||||||
|
from bigdl.llm.transformers.qlora import get_peft_model, prepare_model_for_kbit_training,\
|
||||||
|
LoraConfig
|
||||||
|
from bigdl.llm.utils.common import invalidInputError
|
||||||
|
|
||||||
|
local_rank = get_int_from_env(["LOCAL_RANK","MPI_LOCALRANKID"], "0")
|
||||||
|
world_size = get_int_from_env(["WORLD_SIZE","PMI_SIZE"], "1")
|
||||||
|
port = get_int_from_env(["MASTER_PORT"], 29500)
|
||||||
|
os.environ["LOCAL_RANK"] = str(local_rank)
|
||||||
|
os.environ["WORLD_SIZE"] = str(world_size)
|
||||||
|
os.environ["RANK"] = str(local_rank)
|
||||||
|
os.environ["MASTER_PORT"] = str(port)
|
||||||
|
|
||||||
|
def train(
|
||||||
|
# model/data params
|
||||||
|
base_model: str = "meta-llama/Llama-2-7b-hf", # the only required argument, default to be "meta-llama/Llama-2-7b-hf"
|
||||||
|
saved_low_bit_model: str = None, # optional, the path to the saved model with bigdl-llm low-bit optimization
|
||||||
|
data_path: str = "yahma/alpaca-cleaned",
|
||||||
|
output_dir: str = "./bigdl-qlora-alpaca",
|
||||||
|
# training hyperparams
|
||||||
|
bf16: bool = True, # default to bf16
|
||||||
|
batch_size: int = 128,
|
||||||
|
micro_batch_size: int = 2, # default to be 2, limited by GPU memory
|
||||||
|
num_epochs: int = 3,
|
||||||
|
learning_rate: float = 3e-5, # default to be 3e-5 to avoid divergence
|
||||||
|
cutoff_len: int = 256,
|
||||||
|
val_set_size: int = 2000,
|
||||||
|
# lora hyperparams
|
||||||
|
lora_r: int = 8,
|
||||||
|
lora_alpha: int = 16,
|
||||||
|
lora_dropout: float = 0.05,
|
||||||
|
lora_target_modules: List[str] = [
|
||||||
|
"q_proj",
|
||||||
|
"v_proj",
|
||||||
|
"k_proj",
|
||||||
|
"o_proj",
|
||||||
|
"up_proj",
|
||||||
|
"down_proj",
|
||||||
|
"gate_proj"
|
||||||
|
],
|
||||||
|
# llm hyperparams
|
||||||
|
train_on_inputs: bool = True, # if False, masks out inputs in loss
|
||||||
|
add_eos_token: bool = False,
|
||||||
|
group_by_length: bool = False, # faster, but produces an odd training loss curve
|
||||||
|
# wandb params
|
||||||
|
wandb_project: str = "",
|
||||||
|
wandb_run_name: str = "",
|
||||||
|
wandb_watch: str = "", # options: false | gradients | all
|
||||||
|
wandb_log_model: str = "", # options: false | true
|
||||||
|
resume_from_checkpoint: str = None, # either training checkpoint or final adapter
|
||||||
|
prompt_template_name: str = "alpaca", # The prompt template to use, will default to alpaca.
|
||||||
|
gradient_checkpointing: bool = False,
|
||||||
|
deepspeed: str = None,
|
||||||
|
training_mode: str = "lora",
|
||||||
|
):
|
||||||
|
invalidInputError(training_mode == "lora",
|
||||||
|
f"This example is for lora training mode, but got training_mode={training_mode}.")
|
||||||
|
if int(os.environ.get("LOCAL_RANK", 0)) == 0:
|
||||||
|
print(
|
||||||
|
f"Training Alpaca-LoRA model with params:\n"
|
||||||
|
f"base_model: {base_model}\n"
|
||||||
|
f"data_path: {data_path}\n"
|
||||||
|
f"output_dir: {output_dir}\n"
|
||||||
|
f"batch_size: {batch_size}\n"
|
||||||
|
f"micro_batch_size: {micro_batch_size}\n"
|
||||||
|
f"num_epochs: {num_epochs}\n"
|
||||||
|
f"learning_rate: {learning_rate}\n"
|
||||||
|
f"cutoff_len: {cutoff_len}\n"
|
||||||
|
f"val_set_size: {val_set_size}\n"
|
||||||
|
f"lora_r: {lora_r}\n"
|
||||||
|
f"lora_alpha: {lora_alpha}\n"
|
||||||
|
f"lora_dropout: {lora_dropout}\n"
|
||||||
|
f"lora_target_modules: {lora_target_modules}\n"
|
||||||
|
f"train_on_inputs: {train_on_inputs}\n"
|
||||||
|
f"add_eos_token: {add_eos_token}\n"
|
||||||
|
f"group_by_length: {group_by_length}\n"
|
||||||
|
f"wandb_project: {wandb_project}\n"
|
||||||
|
f"wandb_run_name: {wandb_run_name}\n"
|
||||||
|
f"wandb_watch: {wandb_watch}\n"
|
||||||
|
f"wandb_log_model: {wandb_log_model}\n"
|
||||||
|
f"resume_from_checkpoint: {resume_from_checkpoint or False}\n"
|
||||||
|
f"prompt template: {prompt_template_name}\n"
|
||||||
|
f"training_mode: {training_mode}\n"
|
||||||
|
)
|
||||||
|
assert (
|
||||||
|
base_model
|
||||||
|
), "Please specify a --base_model, e.g. --base_model='huggyllama/llama-7b'"
|
||||||
|
gradient_accumulation_steps = batch_size // micro_batch_size
|
||||||
|
|
||||||
|
prompter = Prompter(prompt_template_name)
|
||||||
|
|
||||||
|
device_map = "auto"
|
||||||
|
world_size = int(os.environ.get("WORLD_SIZE", 1))
|
||||||
|
ddp = world_size != 1
|
||||||
|
if ddp:
|
||||||
|
device_map = {"": int(os.environ.get("LOCAL_RANK") or 0)}
|
||||||
|
gradient_accumulation_steps = gradient_accumulation_steps // world_size
|
||||||
|
|
||||||
|
# Check if parameter passed or if set within environ
|
||||||
|
use_wandb = wandb_check(wandb_project, wandb_watch, wandb_log_model)
|
||||||
|
|
||||||
|
if saved_low_bit_model is not None:
|
||||||
|
# Load the low bit optimized model if provide the saved path
|
||||||
|
model = AutoModelForCausalLM.load_low_bit(
|
||||||
|
saved_low_bit_model,
|
||||||
|
optimize_model=False,
|
||||||
|
torch_dtype=torch.bfloat16,
|
||||||
|
modules_to_not_convert=["lm_head"],
|
||||||
|
)
|
||||||
|
else:
|
||||||
|
model = AutoModelForCausalLM.from_pretrained(
|
||||||
|
base_model,
|
||||||
|
load_in_low_bit="bf16",
|
||||||
|
optimize_model=False,
|
||||||
|
torch_dtype=torch.bfloat16,
|
||||||
|
modules_to_not_convert=["lm_head"],
|
||||||
|
)
|
||||||
|
|
||||||
|
print(f"Model loaded on rank {os.environ.get('LOCAL_RANK')}")
|
||||||
|
model = model.to(f'xpu:{os.environ.get("LOCAL_RANK", 0)}')
|
||||||
|
print(f"Model moved to rank {os.environ.get('LOCAL_RANK')}")
|
||||||
|
|
||||||
|
tokenizer = LlamaTokenizer.from_pretrained(base_model)
|
||||||
|
print(f"Tokenizer loaded on rank {os.environ.get('LOCAL_RANK')}")
|
||||||
|
|
||||||
|
tokenizer.pad_token_id = (
|
||||||
|
0 # unk. we want this to be different from the eos token
|
||||||
|
)
|
||||||
|
tokenizer.padding_side = "left" # Allow batched inference
|
||||||
|
|
||||||
|
print(model)
|
||||||
|
|
||||||
|
# Prepare a BigDL-LLM compatible Peft model
|
||||||
|
model = prepare_model_for_kbit_training(model, use_gradient_checkpointing=gradient_checkpointing)
|
||||||
|
|
||||||
|
config = LoraConfig(
|
||||||
|
r=lora_r,
|
||||||
|
lora_alpha=lora_alpha,
|
||||||
|
target_modules=lora_target_modules,
|
||||||
|
lora_dropout=lora_dropout,
|
||||||
|
bias="none",
|
||||||
|
task_type="CAUSAL_LM",
|
||||||
|
training_mode=training_mode,
|
||||||
|
)
|
||||||
|
print(f"Lora Config: {config}")
|
||||||
|
model = get_peft_model(model, config)
|
||||||
|
|
||||||
|
if data_path.endswith(".json") or data_path.endswith(".jsonl"):
|
||||||
|
data = load_dataset("json", data_files=data_path)
|
||||||
|
else:
|
||||||
|
data = load_dataset(data_path)
|
||||||
|
|
||||||
|
model.print_trainable_parameters() # Be more transparent about the % of trainable params.
|
||||||
|
|
||||||
|
train_data, val_data = get_train_val_data(data, tokenizer, prompter, train_on_inputs,
|
||||||
|
add_eos_token, cutoff_len, val_set_size, seed=42)
|
||||||
|
|
||||||
|
# Unused
|
||||||
|
# if not ddp and torch.cuda.device_count() > 1:
|
||||||
|
# # keeps Trainer from trying its own DataParallelism when more than 1 gpu is available
|
||||||
|
# model.is_parallelizable = True
|
||||||
|
# model.model_parallel = True
|
||||||
|
|
||||||
|
trainer = transformers.Trainer(
|
||||||
|
model=model,
|
||||||
|
train_dataset=train_data,
|
||||||
|
eval_dataset=val_data,
|
||||||
|
args=transformers.TrainingArguments(
|
||||||
|
per_device_train_batch_size=micro_batch_size,
|
||||||
|
gradient_accumulation_steps=gradient_accumulation_steps,
|
||||||
|
# warmup_ratio=0.03,
|
||||||
|
# warmup_steps=100,
|
||||||
|
max_grad_norm=0.3,
|
||||||
|
num_train_epochs=num_epochs,
|
||||||
|
learning_rate=learning_rate,
|
||||||
|
lr_scheduler_type="cosine",
|
||||||
|
bf16=True, # ensure training more stable
|
||||||
|
logging_steps=1,
|
||||||
|
optim="adamw_torch",
|
||||||
|
evaluation_strategy="steps" if val_set_size > 0 else "no",
|
||||||
|
save_strategy="steps",
|
||||||
|
eval_steps=100 if val_set_size > 0 else None,
|
||||||
|
save_steps=100,
|
||||||
|
output_dir=output_dir,
|
||||||
|
save_total_limit=100,
|
||||||
|
load_best_model_at_end=True if val_set_size > 0 else False,
|
||||||
|
ddp_find_unused_parameters=False if ddp else None,
|
||||||
|
group_by_length=group_by_length,
|
||||||
|
report_to="wandb" if use_wandb else None,
|
||||||
|
run_name=wandb_run_name if use_wandb else None,
|
||||||
|
gradient_checkpointing=gradient_checkpointing,
|
||||||
|
ddp_backend="ccl",
|
||||||
|
deepspeed=deepspeed,
|
||||||
|
save_safetensors=False,
|
||||||
|
),
|
||||||
|
data_collator=transformers.DataCollatorForSeq2Seq(
|
||||||
|
tokenizer, pad_to_multiple_of=8, return_tensors="pt", padding=True
|
||||||
|
),
|
||||||
|
)
|
||||||
|
model.config.use_cache = False
|
||||||
|
|
||||||
|
trainer.train(resume_from_checkpoint=resume_from_checkpoint)
|
||||||
|
|
||||||
|
model.save_pretrained(output_dir)
|
||||||
|
|
||||||
|
print(
|
||||||
|
"\n If there's a warning about missing keys above, please disregard :)"
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
fire.Fire(train)
|
||||||
|
|
@ -0,0 +1,44 @@
|
||||||
|
#
|
||||||
|
# Copyright 2016 The BigDL Authors.
|
||||||
|
#
|
||||||
|
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
# you may not use this file except in compliance with the License.
|
||||||
|
# You may obtain a copy of the License at
|
||||||
|
#
|
||||||
|
# http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
#
|
||||||
|
# Unless required by applicable law or agreed to in writing, software
|
||||||
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
# See the License for the specific language governing permissions and
|
||||||
|
# limitations under the License.
|
||||||
|
#
|
||||||
|
import os
|
||||||
|
|
||||||
|
import torch
|
||||||
|
from transformers import LlamaTokenizer # noqa: F402
|
||||||
|
import argparse
|
||||||
|
|
||||||
|
current_dir = os.path.dirname(os.path.realpath(__file__))
|
||||||
|
common_util_path = os.path.join(current_dir, '..')
|
||||||
|
import sys
|
||||||
|
sys.path.append(common_util_path)
|
||||||
|
from common.utils import merge_adapter
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
|
||||||
|
parser = argparse.ArgumentParser(description='Merge the adapter into the original model for Llama2 model')
|
||||||
|
parser.add_argument('--repo-id-or-model-path', type=str, default="meta-llama/Llama-2-7b-hf",
|
||||||
|
help='The huggingface repo id for the Llama2 (e.g. `meta-llama/Llama-2-7b-hf` and `meta-llama/Llama-2-13b-chat-hf`) to be downloaded'
|
||||||
|
', or the path to the huggingface checkpoint folder')
|
||||||
|
parser.add_argument('--adapter_path', type=str,)
|
||||||
|
parser.add_argument('--output_path', type=str,)
|
||||||
|
|
||||||
|
args = parser.parse_args()
|
||||||
|
base_model = model_path = args.repo_id_or_model_path
|
||||||
|
adapter_path = args.adapter_path
|
||||||
|
output_path = args.output_path
|
||||||
|
|
||||||
|
tokenizer = LlamaTokenizer.from_pretrained(base_model)
|
||||||
|
merge_adapter(base_model, tokenizer, adapter_path, output_path)
|
||||||
|
print(f'Finish to merge the adapter into the original model and you could find the merged model in {output_path}.')
|
||||||
|
|
@ -15,12 +15,11 @@
|
||||||
#
|
#
|
||||||
|
|
||||||
# You could also specify `--base_model` to the local path of the huggingface model checkpoint folder and `--data_path` to the local path of the dataset JSON file
|
# You could also specify `--base_model` to the local path of the huggingface model checkpoint folder and `--data_path` to the local path of the dataset JSON file
|
||||||
python ./alpaca_qlora_finetuning.py \
|
python ./alpaca_lora_finetuning.py \
|
||||||
--micro_batch_size 8 \
|
--micro_batch_size 8 \
|
||||||
--batch_size 128 \
|
--batch_size 128 \
|
||||||
--base_model "meta-llama/Llama-2-7b-hf" \
|
--base_model "meta-llama/Llama-2-7b-hf" \
|
||||||
--data_path "yahma/alpaca-cleaned" \
|
--data_path "yahma/alpaca-cleaned" \
|
||||||
--output_dir "./bigdl-lora-alpaca" \
|
--output_dir "./bigdl-lora-alpaca" \
|
||||||
--gradient_checkpointing True \
|
--gradient_checkpointing True \
|
||||||
--lora_target_modules "['k_proj', 'q_proj', 'o_proj', 'v_proj']" \
|
--lora_target_modules "['k_proj', 'q_proj', 'o_proj', 'v_proj']"
|
||||||
--training_mode "lora"
|
|
||||||
|
|
@ -20,12 +20,11 @@ export FI_PROVIDER=tcp
|
||||||
export CCL_ATL_TRANSPORT=ofi
|
export CCL_ATL_TRANSPORT=ofi
|
||||||
|
|
||||||
mpirun -n 4 \
|
mpirun -n 4 \
|
||||||
python -u ./alpaca_qlora_finetuning.py \
|
python -u ./alpaca_lora_finetuning.py \
|
||||||
--micro_batch_size 8 \
|
--micro_batch_size 8 \
|
||||||
--batch_size 128 \
|
--batch_size 128 \
|
||||||
--base_model "meta-llama/Llama-2-7b-hf" \
|
--base_model "meta-llama/Llama-2-7b-hf" \
|
||||||
--data_path "yahma/alpaca-cleaned" \
|
--data_path "yahma/alpaca-cleaned" \
|
||||||
--output_dir "./bigdl-lora-alpaca" \
|
--output_dir "./bigdl-lora-alpaca" \
|
||||||
--gradient_checkpointing True \
|
--gradient_checkpointing True \
|
||||||
--lora_target_modules "['k_proj', 'q_proj', 'o_proj', 'v_proj', 'up_proj', 'down_proj', 'gate_proj']" \
|
--lora_target_modules "['k_proj', 'q_proj', 'o_proj', 'v_proj', 'up_proj', 'down_proj', 'gate_proj']"
|
||||||
--training_mode "lora"
|
|
||||||
|
|
@ -15,12 +15,11 @@
|
||||||
#
|
#
|
||||||
|
|
||||||
# You could also specify `--base_model` to the local path of the huggingface model checkpoint folder and `--data_path` to the local path of the dataset JSON file
|
# You could also specify `--base_model` to the local path of the huggingface model checkpoint folder and `--data_path` to the local path of the dataset JSON file
|
||||||
python ./alpaca_qlora_finetuning.py \
|
python ./alpaca_lora_finetuning.py \
|
||||||
--micro_batch_size 8 \
|
--micro_batch_size 8 \
|
||||||
--batch_size 128 \
|
--batch_size 128 \
|
||||||
--base_model "meta-llama/Llama-2-7b-hf" \
|
--base_model "meta-llama/Llama-2-7b-hf" \
|
||||||
--data_path "yahma/alpaca-cleaned" \
|
--data_path "yahma/alpaca-cleaned" \
|
||||||
--output_dir "./bigdl-lora-alpaca" \
|
--output_dir "./bigdl-lora-alpaca" \
|
||||||
--gradient_checkpointing True \
|
--gradient_checkpointing True \
|
||||||
--lora_target_modules "['k_proj', 'q_proj', 'o_proj', 'v_proj', 'up_proj', 'down_proj', 'gate_proj']" \
|
--lora_target_modules "['k_proj', 'q_proj', 'o_proj', 'v_proj', 'up_proj', 'down_proj', 'gate_proj']"
|
||||||
--training_mode "lora"
|
|
||||||
|
|
@ -15,17 +15,16 @@
|
||||||
#
|
#
|
||||||
|
|
||||||
export MASTER_ADDR=127.0.0.1
|
export MASTER_ADDR=127.0.0.1
|
||||||
export OMP_NUM_THREADS=7
|
export OMP_NUM_THREADS=56
|
||||||
export FI_PROVIDER=tcp
|
export FI_PROVIDER=tcp
|
||||||
export CCL_ATL_TRANSPORT=ofi
|
export CCL_ATL_TRANSPORT=ofi
|
||||||
|
|
||||||
mpirun -n 8 \
|
mpirun -n 8 \
|
||||||
python -u ./alpaca_qlora_finetuning.py \
|
python -u ./alpaca_lora_finetuning.py \
|
||||||
--micro_batch_size 8 \
|
--micro_batch_size 8 \
|
||||||
--batch_size 128 \
|
--batch_size 128 \
|
||||||
--base_model "meta-llama/Llama-2-7b-hf" \
|
--base_model "meta-llama/Llama-2-7b-hf" \
|
||||||
--data_path "yahma/alpaca-cleaned" \
|
--data_path "yahma/alpaca-cleaned" \
|
||||||
--output_dir "./bigdl-lora-alpaca" \
|
--output_dir "./bigdl-lora-alpaca" \
|
||||||
--gradient_checkpointing False \
|
--gradient_checkpointing False \
|
||||||
--lora_target_modules "['k_proj', 'q_proj', 'o_proj', 'v_proj', 'up_proj', 'down_proj', 'gate_proj']" \
|
--lora_target_modules "['k_proj', 'q_proj', 'o_proj', 'v_proj', 'up_proj', 'down_proj', 'gate_proj']"
|
||||||
--training_mode "lora"
|
|
||||||
84
python/llm/example/GPU/LLM-Finetuning/QA-LoRA/README.md
Normal file
84
python/llm/example/GPU/LLM-Finetuning/QA-LoRA/README.md
Normal file
|
|
@ -0,0 +1,84 @@
|
||||||
|
# QA-LoRA Finetuning with BigDL-LLM
|
||||||
|
|
||||||
|
This example ports [Alpaca-LoRA](https://github.com/tloen/alpaca-lora/tree/main) to BigDL-LLM (using [QA-LoRA](https://arxiv.org/abs/2309.14717) algorithm) on [Intel GPU](../../README.md).
|
||||||
|
|
||||||
|
### 0. Requirements
|
||||||
|
To run this example with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../README.md#requirements) for more information.
|
||||||
|
|
||||||
|
### 1. Install
|
||||||
|
|
||||||
|
```bash
|
||||||
|
conda create -n llm python=3.9
|
||||||
|
conda activate llm
|
||||||
|
# below command will install intel_extension_for_pytorch==2.1.10+xpu as default
|
||||||
|
pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
|
||||||
|
pip install transformers==4.34.0 datasets
|
||||||
|
pip install fire peft==0.5.0
|
||||||
|
pip install oneccl_bind_pt==2.1.100 -f https://developer.intel.com/ipex-whl-stable-xpu # necessary to run distributed finetuning
|
||||||
|
pip install accelerate==0.23.0
|
||||||
|
pip install bitsandbytes scipy
|
||||||
|
```
|
||||||
|
|
||||||
|
### 2. Configures OneAPI environment variables
|
||||||
|
```bash
|
||||||
|
source /opt/intel/oneapi/setvars.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
### 3. QA-LoRA Finetune
|
||||||
|
|
||||||
|
Here, we provide example usages on different hardware. Please refer to the appropriate script based on your device:
|
||||||
|
|
||||||
|
##### Finetuning LLaMA2-7B on single Arc A770
|
||||||
|
|
||||||
|
```bash
|
||||||
|
bash qalora_finetune_llama2_7b_arc_1_card.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
##### Finetuning LLaMA2-7B on two Arc A770
|
||||||
|
|
||||||
|
```bash
|
||||||
|
bash qalora_finetune_llama2_7b_arc_2_card.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
##### Finetuning LLaMA2-7B on single tile of Intel Data Center GPU Max 1550
|
||||||
|
|
||||||
|
```bash
|
||||||
|
bash qalora_finetune_llama2_7b_pvc_1550_1_tile.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
### 4. (Optional) Resume Training
|
||||||
|
**If you fail to complete the whole finetuning process, it is suggested to resume training from a previously saved checkpoint by specifying `resume_from_checkpoint` to the local checkpoint folder as following:**
|
||||||
|
```bash
|
||||||
|
python ./alpaca_qalora_finetuning.py \
|
||||||
|
--base_model "meta-llama/Llama-2-7b-hf" \
|
||||||
|
--data_path "yahma/alpaca-cleaned" \
|
||||||
|
--output_dir "./bigdl-qlora-alpaca" \
|
||||||
|
--resume_from_checkpoint "./bigdl-qlora-alpaca/checkpoint-1100"
|
||||||
|
```
|
||||||
|
|
||||||
|
### 5. Sample Output
|
||||||
|
```log
|
||||||
|
{'loss': 1.9231, 'learning_rate': 2.9999945367033285e-05, 'epoch': 0.0}
|
||||||
|
{'loss': 1.8622, 'learning_rate': 2.9999781468531096e-05, 'epoch': 0.01}
|
||||||
|
{'loss': 1.9043, 'learning_rate': 2.9999508305687345e-05, 'epoch': 0.01}
|
||||||
|
{'loss': 1.8967, 'learning_rate': 2.999912588049185e-05, 'epoch': 0.01}
|
||||||
|
{'loss': 1.9658, 'learning_rate': 2.9998634195730358e-05, 'epoch': 0.01}
|
||||||
|
{'loss': 1.8386, 'learning_rate': 2.9998033254984483e-05, 'epoch': 0.02}
|
||||||
|
{'loss': 1.809, 'learning_rate': 2.999732306263172e-05, 'epoch': 0.02}
|
||||||
|
{'loss': 1.8552, 'learning_rate': 2.9996503623845395e-05, 'epoch': 0.02}
|
||||||
|
1%|█ | 8/1164 [xx:xx<xx:xx:xx, xx s/it]
|
||||||
|
```
|
||||||
|
|
||||||
|
### 6. Merge the adapter into the original model
|
||||||
|
```
|
||||||
|
python ./export_merged_model.py --repo-id-or-model-path REPO_ID_OR_MODEL_PATH --adapter_path ./outputs/checkpoint-200 --output_path ./outputs/checkpoint-200-merged
|
||||||
|
```
|
||||||
|
|
||||||
|
Then you can use `./outputs/checkpoint-200-merged` as a normal huggingface transformer model to do inference.
|
||||||
|
|
||||||
|
### 7. Troubleshooting
|
||||||
|
- If you fail to finetune on multi cards because of following error message:
|
||||||
|
```bash
|
||||||
|
RuntimeError: oneCCL: comm_selector.cpp:57 create_comm_impl: EXCEPTION: ze_data was not initialized
|
||||||
|
```
|
||||||
|
Please try `sudo apt install level-zero-dev` to fix it.
|
||||||
|
|
@ -0,0 +1,279 @@
|
||||||
|
#
|
||||||
|
# Copyright 2016 The BigDL Authors.
|
||||||
|
#
|
||||||
|
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
# you may not use this file except in compliance with the License.
|
||||||
|
# You may obtain a copy of the License at
|
||||||
|
#
|
||||||
|
# http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
#
|
||||||
|
# Unless required by applicable law or agreed to in writing, software
|
||||||
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
# See the License for the specific language governing permissions and
|
||||||
|
# limitations under the License.
|
||||||
|
#
|
||||||
|
# Some parts of this file is adapted from
|
||||||
|
# https://github.com/tloen/alpaca-lora/blob/main/finetune.py
|
||||||
|
#
|
||||||
|
# Copyright 2023 Rohan Taori, Ishaan Gulrajani, Tianyi Zhang, Yann Dubois, Xuechen Li
|
||||||
|
|
||||||
|
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
# you may not use this file except in compliance with the License.
|
||||||
|
# You may obtain a copy of the License at
|
||||||
|
#
|
||||||
|
# http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
#
|
||||||
|
# Unless required by applicable law or agreed to in writing, software
|
||||||
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
# See the License for the specific language governing permissions and
|
||||||
|
# limitations under the License.
|
||||||
|
|
||||||
|
import os
|
||||||
|
from typing import List
|
||||||
|
|
||||||
|
import fire
|
||||||
|
import torch
|
||||||
|
import transformers
|
||||||
|
from datasets import load_dataset
|
||||||
|
import accelerate
|
||||||
|
|
||||||
|
from transformers import LlamaTokenizer
|
||||||
|
from peft import (
|
||||||
|
get_peft_model_state_dict,
|
||||||
|
set_peft_model_state_dict,
|
||||||
|
)
|
||||||
|
|
||||||
|
current_dir = os.path.dirname(os.path.realpath(__file__))
|
||||||
|
common_util_path = os.path.join(current_dir, '..')
|
||||||
|
import sys
|
||||||
|
sys.path.append(common_util_path)
|
||||||
|
from common.utils import Prompter, get_int_from_env, wandb_check, get_train_val_data
|
||||||
|
|
||||||
|
from transformers import BitsAndBytesConfig
|
||||||
|
from bigdl.llm.transformers import AutoModelForCausalLM
|
||||||
|
# import them from bigdl.llm.transformers.qlora to get a BigDL-LLM compatible Peft model
|
||||||
|
from bigdl.llm.transformers.qlora import get_peft_model, prepare_model_for_kbit_training,\
|
||||||
|
LoraConfig
|
||||||
|
from bigdl.llm.utils.common import invalidInputError
|
||||||
|
|
||||||
|
local_rank = get_int_from_env(["LOCAL_RANK","MPI_LOCALRANKID"], "0")
|
||||||
|
world_size = get_int_from_env(["WORLD_SIZE","PMI_SIZE"], "1")
|
||||||
|
port = get_int_from_env(["MASTER_PORT"], 29500)
|
||||||
|
os.environ["LOCAL_RANK"] = str(local_rank)
|
||||||
|
os.environ["WORLD_SIZE"] = str(world_size)
|
||||||
|
os.environ["RANK"] = str(local_rank)
|
||||||
|
os.environ["MASTER_PORT"] = str(port)
|
||||||
|
|
||||||
|
def train(
|
||||||
|
# model/data params
|
||||||
|
base_model: str = "meta-llama/Llama-2-7b-hf", # the only required argument, default to be "meta-llama/Llama-2-7b-hf"
|
||||||
|
saved_low_bit_model: str = None, # optional, the path to the saved model with bigdl-llm low-bit optimization
|
||||||
|
data_path: str = "yahma/alpaca-cleaned",
|
||||||
|
output_dir: str = "./bigdl-qlora-alpaca",
|
||||||
|
# training hyperparams
|
||||||
|
bf16: bool = True, # default to bf16
|
||||||
|
batch_size: int = 128,
|
||||||
|
micro_batch_size: int = 2, # default to be 2, limited by GPU memory
|
||||||
|
num_epochs: int = 3,
|
||||||
|
learning_rate: float = 3e-5, # default to be 3e-5 to avoid divergence
|
||||||
|
cutoff_len: int = 256,
|
||||||
|
val_set_size: int = 2000,
|
||||||
|
# lora hyperparams
|
||||||
|
lora_r: int = 8,
|
||||||
|
lora_alpha: int = 16,
|
||||||
|
lora_dropout: float = 0.05,
|
||||||
|
lora_target_modules: List[str] = [
|
||||||
|
"q_proj",
|
||||||
|
"v_proj",
|
||||||
|
"k_proj",
|
||||||
|
"o_proj",
|
||||||
|
"up_proj",
|
||||||
|
"down_proj",
|
||||||
|
"gate_proj"
|
||||||
|
],
|
||||||
|
# llm hyperparams
|
||||||
|
train_on_inputs: bool = True, # if False, masks out inputs in loss
|
||||||
|
add_eos_token: bool = False,
|
||||||
|
group_by_length: bool = False, # faster, but produces an odd training loss curve
|
||||||
|
# wandb params
|
||||||
|
wandb_project: str = "",
|
||||||
|
wandb_run_name: str = "",
|
||||||
|
wandb_watch: str = "", # options: false | gradients | all
|
||||||
|
wandb_log_model: str = "", # options: false | true
|
||||||
|
resume_from_checkpoint: str = None, # either training checkpoint or final adapter
|
||||||
|
prompt_template_name: str = "alpaca", # The prompt template to use, will default to alpaca.
|
||||||
|
gradient_checkpointing: bool = False,
|
||||||
|
deepspeed: str = None,
|
||||||
|
training_mode: str = "qalora",
|
||||||
|
):
|
||||||
|
invalidInputError(training_mode == "qalora",
|
||||||
|
f"This example is for qalora training mode, but got training_mode={training_mode}.")
|
||||||
|
if int(os.environ.get("LOCAL_RANK", 0)) == 0:
|
||||||
|
print(
|
||||||
|
f"Training Alpaca-LoRA model with params:\n"
|
||||||
|
f"base_model: {base_model}\n"
|
||||||
|
f"data_path: {data_path}\n"
|
||||||
|
f"output_dir: {output_dir}\n"
|
||||||
|
f"batch_size: {batch_size}\n"
|
||||||
|
f"micro_batch_size: {micro_batch_size}\n"
|
||||||
|
f"num_epochs: {num_epochs}\n"
|
||||||
|
f"learning_rate: {learning_rate}\n"
|
||||||
|
f"cutoff_len: {cutoff_len}\n"
|
||||||
|
f"val_set_size: {val_set_size}\n"
|
||||||
|
f"lora_r: {lora_r}\n"
|
||||||
|
f"lora_alpha: {lora_alpha}\n"
|
||||||
|
f"lora_dropout: {lora_dropout}\n"
|
||||||
|
f"lora_target_modules: {lora_target_modules}\n"
|
||||||
|
f"train_on_inputs: {train_on_inputs}\n"
|
||||||
|
f"add_eos_token: {add_eos_token}\n"
|
||||||
|
f"group_by_length: {group_by_length}\n"
|
||||||
|
f"wandb_project: {wandb_project}\n"
|
||||||
|
f"wandb_run_name: {wandb_run_name}\n"
|
||||||
|
f"wandb_watch: {wandb_watch}\n"
|
||||||
|
f"wandb_log_model: {wandb_log_model}\n"
|
||||||
|
f"resume_from_checkpoint: {resume_from_checkpoint or False}\n"
|
||||||
|
f"prompt template: {prompt_template_name}\n"
|
||||||
|
f"training_mode: {training_mode}\n"
|
||||||
|
)
|
||||||
|
assert (
|
||||||
|
base_model
|
||||||
|
), "Please specify a --base_model, e.g. --base_model='huggyllama/llama-7b'"
|
||||||
|
gradient_accumulation_steps = batch_size // micro_batch_size
|
||||||
|
|
||||||
|
prompter = Prompter(prompt_template_name)
|
||||||
|
|
||||||
|
device_map = "auto"
|
||||||
|
world_size = int(os.environ.get("WORLD_SIZE", 1))
|
||||||
|
ddp = world_size != 1
|
||||||
|
if ddp:
|
||||||
|
device_map = {"": int(os.environ.get("LOCAL_RANK") or 0)}
|
||||||
|
gradient_accumulation_steps = gradient_accumulation_steps // world_size
|
||||||
|
|
||||||
|
# Check if parameter passed or if set within environ
|
||||||
|
use_wandb = wandb_check(wandb_project, wandb_watch, wandb_log_model)
|
||||||
|
|
||||||
|
if saved_low_bit_model is not None:
|
||||||
|
# Load the low bit optimized model if provide the saved path
|
||||||
|
model = AutoModelForCausalLM.load_low_bit(
|
||||||
|
saved_low_bit_model,
|
||||||
|
optimize_model=False,
|
||||||
|
torch_dtype=torch.bfloat16,
|
||||||
|
modules_to_not_convert=["lm_head"],
|
||||||
|
)
|
||||||
|
else:
|
||||||
|
# Default 4-bit format for qa-lora is sym_int4
|
||||||
|
# use bnb_config for qalora, which use 4bit for base model
|
||||||
|
bnb_config = BitsAndBytesConfig(
|
||||||
|
load_in_4bit=True,
|
||||||
|
bnb_4bit_use_double_quant=False,
|
||||||
|
bnb_4bit_quant_type="int4",
|
||||||
|
bnb_4bit_compute_dtype=torch.bfloat16
|
||||||
|
)
|
||||||
|
model = AutoModelForCausalLM.from_pretrained(base_model,
|
||||||
|
quantization_config=bnb_config, )
|
||||||
|
# below is also supported
|
||||||
|
# Load the base model from a directory or the HF Hub to 4-bit format
|
||||||
|
# model = AutoModelForCausalLM.from_pretrained(
|
||||||
|
# base_model,
|
||||||
|
# load_in_low_bit="sym_int4",
|
||||||
|
# optimize_model=False,
|
||||||
|
# torch_dtype=torch.bfloat16,
|
||||||
|
# # device_map=device_map,
|
||||||
|
# modules_to_not_convert=["lm_head"],
|
||||||
|
# )
|
||||||
|
print(f"Model loaded on rank {os.environ.get('LOCAL_RANK')}")
|
||||||
|
model = model.to(f'xpu:{os.environ.get("LOCAL_RANK", 0)}')
|
||||||
|
print(f"Model moved to rank {os.environ.get('LOCAL_RANK')}")
|
||||||
|
|
||||||
|
tokenizer = LlamaTokenizer.from_pretrained(base_model)
|
||||||
|
print(f"Tokenizer loaded on rank {os.environ.get('LOCAL_RANK')}")
|
||||||
|
|
||||||
|
tokenizer.pad_token_id = (
|
||||||
|
0 # unk. we want this to be different from the eos token
|
||||||
|
)
|
||||||
|
tokenizer.padding_side = "left" # Allow batched inference
|
||||||
|
|
||||||
|
print(model)
|
||||||
|
|
||||||
|
# Prepare a BigDL-LLM compatible Peft model
|
||||||
|
model = prepare_model_for_kbit_training(model, use_gradient_checkpointing=gradient_checkpointing)
|
||||||
|
|
||||||
|
config = LoraConfig(
|
||||||
|
r=lora_r,
|
||||||
|
lora_alpha=lora_alpha,
|
||||||
|
target_modules=lora_target_modules,
|
||||||
|
lora_dropout=lora_dropout,
|
||||||
|
bias="none",
|
||||||
|
task_type="CAUSAL_LM",
|
||||||
|
training_mode=training_mode,
|
||||||
|
)
|
||||||
|
print(f"Lora Config: {config}")
|
||||||
|
model = get_peft_model(model, config)
|
||||||
|
|
||||||
|
if data_path.endswith(".json") or data_path.endswith(".jsonl"):
|
||||||
|
data = load_dataset("json", data_files=data_path)
|
||||||
|
else:
|
||||||
|
data = load_dataset(data_path)
|
||||||
|
|
||||||
|
model.print_trainable_parameters() # Be more transparent about the % of trainable params.
|
||||||
|
|
||||||
|
train_data, val_data = get_train_val_data(data, tokenizer, prompter, train_on_inputs,
|
||||||
|
add_eos_token, cutoff_len, val_set_size, seed=42)
|
||||||
|
|
||||||
|
# Unused
|
||||||
|
# if not ddp and torch.cuda.device_count() > 1:
|
||||||
|
# # keeps Trainer from trying its own DataParallelism when more than 1 gpu is available
|
||||||
|
# model.is_parallelizable = True
|
||||||
|
# model.model_parallel = True
|
||||||
|
|
||||||
|
trainer = transformers.Trainer(
|
||||||
|
model=model,
|
||||||
|
train_dataset=train_data,
|
||||||
|
eval_dataset=val_data,
|
||||||
|
args=transformers.TrainingArguments(
|
||||||
|
per_device_train_batch_size=micro_batch_size,
|
||||||
|
gradient_accumulation_steps=gradient_accumulation_steps,
|
||||||
|
# warmup_ratio=0.03,
|
||||||
|
# warmup_steps=100,
|
||||||
|
max_grad_norm=0.3,
|
||||||
|
num_train_epochs=num_epochs,
|
||||||
|
learning_rate=learning_rate,
|
||||||
|
lr_scheduler_type="constant",
|
||||||
|
bf16=True, # ensure training more stable
|
||||||
|
logging_steps=1,
|
||||||
|
optim="adamw_torch",
|
||||||
|
evaluation_strategy="steps" if val_set_size > 0 else "no",
|
||||||
|
save_strategy="steps",
|
||||||
|
eval_steps=100 if val_set_size > 0 else None,
|
||||||
|
save_steps=100,
|
||||||
|
output_dir=output_dir,
|
||||||
|
save_total_limit=100,
|
||||||
|
load_best_model_at_end=True if val_set_size > 0 else False,
|
||||||
|
ddp_find_unused_parameters=False if ddp else None,
|
||||||
|
group_by_length=group_by_length,
|
||||||
|
report_to="wandb" if use_wandb else None,
|
||||||
|
run_name=wandb_run_name if use_wandb else None,
|
||||||
|
gradient_checkpointing=gradient_checkpointing,
|
||||||
|
ddp_backend="ccl",
|
||||||
|
deepspeed=deepspeed,
|
||||||
|
save_safetensors=False,
|
||||||
|
),
|
||||||
|
data_collator=transformers.DataCollatorForSeq2Seq(
|
||||||
|
tokenizer, pad_to_multiple_of=8, return_tensors="pt", padding=True
|
||||||
|
),
|
||||||
|
)
|
||||||
|
model.config.use_cache = False
|
||||||
|
|
||||||
|
trainer.train(resume_from_checkpoint=resume_from_checkpoint)
|
||||||
|
|
||||||
|
model.save_pretrained(output_dir)
|
||||||
|
|
||||||
|
print(
|
||||||
|
"\n If there's a warning about missing keys above, please disregard :)"
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
fire.Fire(train)
|
||||||
|
|
@ -0,0 +1,44 @@
|
||||||
|
#
|
||||||
|
# Copyright 2016 The BigDL Authors.
|
||||||
|
#
|
||||||
|
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
# you may not use this file except in compliance with the License.
|
||||||
|
# You may obtain a copy of the License at
|
||||||
|
#
|
||||||
|
# http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
#
|
||||||
|
# Unless required by applicable law or agreed to in writing, software
|
||||||
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
# See the License for the specific language governing permissions and
|
||||||
|
# limitations under the License.
|
||||||
|
#
|
||||||
|
import os
|
||||||
|
|
||||||
|
import torch
|
||||||
|
from transformers import LlamaTokenizer # noqa: F402
|
||||||
|
import argparse
|
||||||
|
|
||||||
|
current_dir = os.path.dirname(os.path.realpath(__file__))
|
||||||
|
common_util_path = os.path.join(current_dir, '..')
|
||||||
|
import sys
|
||||||
|
sys.path.append(common_util_path)
|
||||||
|
from common.utils import merge_adapter
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
|
||||||
|
parser = argparse.ArgumentParser(description='Merge the adapter into the original model for Llama2 model')
|
||||||
|
parser.add_argument('--repo-id-or-model-path', type=str, default="meta-llama/Llama-2-7b-hf",
|
||||||
|
help='The huggingface repo id for the Llama2 (e.g. `meta-llama/Llama-2-7b-hf` and `meta-llama/Llama-2-13b-chat-hf`) to be downloaded'
|
||||||
|
', or the path to the huggingface checkpoint folder')
|
||||||
|
parser.add_argument('--adapter_path', type=str,)
|
||||||
|
parser.add_argument('--output_path', type=str,)
|
||||||
|
|
||||||
|
args = parser.parse_args()
|
||||||
|
base_model = model_path = args.repo_id_or_model_path
|
||||||
|
adapter_path = args.adapter_path
|
||||||
|
output_path = args.output_path
|
||||||
|
|
||||||
|
tokenizer = LlamaTokenizer.from_pretrained(base_model)
|
||||||
|
merge_adapter(base_model, tokenizer, adapter_path, output_path)
|
||||||
|
print(f'Finish to merge the adapter into the original model and you could find the merged model in {output_path}.')
|
||||||
|
|
@ -15,7 +15,7 @@
|
||||||
#
|
#
|
||||||
|
|
||||||
# You could also specify `--base_model` to the local path of the huggingface model checkpoint folder and `--data_path` to the local path of the dataset JSON file
|
# You could also specify `--base_model` to the local path of the huggingface model checkpoint folder and `--data_path` to the local path of the dataset JSON file
|
||||||
python ./alpaca_qlora_finetuning.py \
|
python ./alpaca_qalora_finetuning.py \
|
||||||
--base_model "meta-llama/Llama-2-7b-hf" \
|
--base_model "meta-llama/Llama-2-7b-hf" \
|
||||||
--data_path "yahma/alpaca-cleaned" \
|
--data_path "yahma/alpaca-cleaned" \
|
||||||
--output_dir "./bigdl-qlora-alpaca" \
|
--output_dir "./bigdl-qlora-alpaca" \
|
||||||
|
|
@ -25,5 +25,4 @@ python ./alpaca_qlora_finetuning.py \
|
||||||
--lora_r 8 \
|
--lora_r 8 \
|
||||||
--lora_alpha 16 \
|
--lora_alpha 16 \
|
||||||
--lora_dropout 0.05 \
|
--lora_dropout 0.05 \
|
||||||
--val_set_size 2000 \
|
--val_set_size 2000
|
||||||
--training_mode "qalora"
|
|
||||||
|
|
@ -15,12 +15,12 @@
|
||||||
#
|
#
|
||||||
|
|
||||||
export MASTER_ADDR=127.0.0.1
|
export MASTER_ADDR=127.0.0.1
|
||||||
export OMP_NUM_THREADS=6 # adjust this to 1/4 of total physical cores
|
export OMP_NUM_THREADS=6
|
||||||
export FI_PROVIDER=tcp
|
export FI_PROVIDER=tcp
|
||||||
export CCL_ATL_TRANSPORT=ofi
|
export CCL_ATL_TRANSPORT=ofi
|
||||||
|
|
||||||
mpirun -n 2 \
|
mpirun -n 2 \
|
||||||
python -u ./alpaca_qlora_finetuning.py \
|
python -u ./alpaca_qalora_finetuning.py \
|
||||||
--base_model "meta-llama/Llama-2-7b-hf" \
|
--base_model "meta-llama/Llama-2-7b-hf" \
|
||||||
--data_path "yahma/alpaca-cleaned" \
|
--data_path "yahma/alpaca-cleaned" \
|
||||||
--output_dir "./bigdl-qlora-alpaca" \
|
--output_dir "./bigdl-qlora-alpaca" \
|
||||||
|
|
@ -30,5 +30,4 @@ mpirun -n 2 \
|
||||||
--lora_r 8 \
|
--lora_r 8 \
|
||||||
--lora_alpha 16 \
|
--lora_alpha 16 \
|
||||||
--lora_dropout 0.05 \
|
--lora_dropout 0.05 \
|
||||||
--val_set_size 2000 \
|
--val_set_size 2000 > training.log
|
||||||
--training_mode "qalora" > training.log
|
|
||||||
|
|
@ -15,20 +15,19 @@
|
||||||
#
|
#
|
||||||
|
|
||||||
export MASTER_ADDR=127.0.0.1
|
export MASTER_ADDR=127.0.0.1
|
||||||
export OMP_NUM_THREADS=28 # adjust this to 1/4 of total physical cores
|
export OMP_NUM_THREADS=56
|
||||||
export FI_PROVIDER=tcp
|
export FI_PROVIDER=tcp
|
||||||
export CCL_ATL_TRANSPORT=ofi
|
export CCL_ATL_TRANSPORT=ofi
|
||||||
|
|
||||||
mpirun -n 2 \
|
mpirun -n 2 \
|
||||||
python -u ./alpaca_qlora_finetuning.py \
|
python -u ./alpaca_qalora_finetuning.py \
|
||||||
--base_model "meta-llama/Llama-2-7b-hf" \
|
--base_model "meta-llama/Llama-2-7b-hf" \
|
||||||
--data_path "yahma/alpaca-cleaned" \
|
--data_path "yahma/alpaca-cleaned" \
|
||||||
--output_dir "./bigdl-qlora-alpaca" \
|
--output_dir "./bigdl-qlora-alpaca" \
|
||||||
--training_mode "qalora" \
|
|
||||||
--learning_rate 9e-5 \
|
--learning_rate 9e-5 \
|
||||||
--micro_batch_size 8 \
|
--micro_batch_size 8 \
|
||||||
--batch_size 128 \
|
--batch_size 128 \
|
||||||
--lora_r 8 \
|
--lora_r 8 \
|
||||||
--lora_alpha 16 \
|
--lora_alpha 16 \
|
||||||
--lora_dropout 0.05 \
|
--lora_dropout 0.05 \
|
||||||
--val_set_size 2000 > training.log
|
--val_set_size 2000 > training.log
|
||||||
|
|
@ -16,7 +16,7 @@
|
||||||
|
|
||||||
# You could also specify `--base_model` to the local path of the huggingface model checkpoint folder and `--data_path` to the local path of the dataset JSON file
|
# You could also specify `--base_model` to the local path of the huggingface model checkpoint folder and `--data_path` to the local path of the dataset JSON file
|
||||||
|
|
||||||
python ./alpaca_qlora_finetuning.py \
|
python ./alpaca_qalora_finetuning.py \
|
||||||
--base_model "meta-llama/Llama-2-7b-hf" \
|
--base_model "meta-llama/Llama-2-7b-hf" \
|
||||||
--data_path "yahma/alpaca-cleaned" \
|
--data_path "yahma/alpaca-cleaned" \
|
||||||
--output_dir "./bigdl-qlora-alpaca" \
|
--output_dir "./bigdl-qlora-alpaca" \
|
||||||
|
|
@ -27,5 +27,4 @@ python ./alpaca_qlora_finetuning.py \
|
||||||
--lora_r 8 \
|
--lora_r 8 \
|
||||||
--lora_alpha 16 \
|
--lora_alpha 16 \
|
||||||
--lora_dropout 0.05 \
|
--lora_dropout 0.05 \
|
||||||
--val_set_size 2000 \
|
--val_set_size 2000
|
||||||
--training_mode "qalora"
|
|
||||||
5
python/llm/example/GPU/LLM-Finetuning/QLoRA/README.md
Normal file
5
python/llm/example/GPU/LLM-Finetuning/QLoRA/README.md
Normal file
|
|
@ -0,0 +1,5 @@
|
||||||
|
# QLoRA Finetuning with BigDL-LLM
|
||||||
|
|
||||||
|
We provide [Alpaca-QLoRA example](./alpaca-qlora/), which ports [Alpaca-LoRA](https://github.com/tloen/alpaca-lora/tree/main) to BigDL-LLM (using [QLoRA](https://arxiv.org/abs/2305.14314) algorithm) on [Intel GPU](../../README.md).
|
||||||
|
|
||||||
|
Meanwhile, we also provide a [simple example](./simple-example/) to help you get started with QLoRA Finetuning using BigDL-LLM.
|
||||||
|
|
@ -1,9 +1,11 @@
|
||||||
# Alpaca Finetuning with BigDL-LLM
|
# QLoRA Finetuning with BigDL-LLM
|
||||||
|
|
||||||
This example ports [Alpaca-LoRA](https://github.com/tloen/alpaca-lora/tree/main) to BigDL-LLM (using either [QLoRA](https://arxiv.org/abs/2305.14314) / [QA-LoRA](https://arxiv.org/abs/2309.14717) / [LoRA](https://arxiv.org/abs/2106.09685) or [ReLoRA](https://arxiv.org/abs/2307.05695) algorithm) on [Intel GPU](../../README.md).
|
This example ports [Alpaca-LoRA](https://github.com/tloen/alpaca-lora/tree/main) to BigDL-LLM (using [QLoRA](https://arxiv.org/abs/2305.14314) algorithm) on [Intel GPU](../../../README.md).
|
||||||
|
|
||||||
|
> Note: You could also refer to [simple QLoRA example](../simple-example/) to try related usage.
|
||||||
|
|
||||||
### 0. Requirements
|
### 0. Requirements
|
||||||
To run this example with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../README.md#requirements) for more information.
|
To run this example with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
|
||||||
|
|
||||||
### 1. Install
|
### 1. Install
|
||||||
|
|
||||||
|
|
@ -17,6 +19,10 @@ pip install fire peft==0.5.0
|
||||||
pip install oneccl_bind_pt==2.1.100 -f https://developer.intel.com/ipex-whl-stable-xpu # necessary to run distributed finetuning
|
pip install oneccl_bind_pt==2.1.100 -f https://developer.intel.com/ipex-whl-stable-xpu # necessary to run distributed finetuning
|
||||||
pip install accelerate==0.23.0
|
pip install accelerate==0.23.0
|
||||||
pip install bitsandbytes scipy
|
pip install bitsandbytes scipy
|
||||||
|
# configures OneAPI environment variables
|
||||||
|
source /opt/intel/oneapi/setvars.sh # necessary to run before installing deepspeed
|
||||||
|
pip install git+https://github.com/microsoft/DeepSpeed.git@78c518e
|
||||||
|
pip install git+https://github.com/intel/intel-extension-for-deepspeed.git@ec33277
|
||||||
```
|
```
|
||||||
|
|
||||||
### 2. Configures OneAPI environment variables
|
### 2. Configures OneAPI environment variables
|
||||||
|
|
@ -24,131 +30,104 @@ pip install bitsandbytes scipy
|
||||||
source /opt/intel/oneapi/setvars.sh
|
source /opt/intel/oneapi/setvars.sh
|
||||||
```
|
```
|
||||||
|
|
||||||
### 3. Finetune
|
### 3. QLoRA Finetune
|
||||||
|
|
||||||
Now we support four training modes ([QLoRA](https://arxiv.org/abs/2305.14314) / [QA-LoRA](https://arxiv.org/abs/2309.14717) / [LoRA](https://arxiv.org/abs/2106.09685) / [ReLoRA](https://arxiv.org/abs/2307.05695)), to run different mode, just change `training_mode` to `qlora` / `qalora` / `lora` / `relora` in below script.
|
Here, we provide example usages on different hardware. Please refer to the appropriate script based on your device and model:
|
||||||
|
|
||||||
Here, we provide example usages on different hardware. Please refer to the appropriate script based on your device:
|
<details>
|
||||||
|
<summary> Show LLaMA2-7B examples </summary>
|
||||||
#### QLoRA
|
|
||||||
|
|
||||||
##### Finetuning LLaMA2-7B on single Arc A770
|
##### Finetuning LLaMA2-7B on single Arc A770
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
bash finetune_llama2_7b_arc_1_card.sh
|
bash qlora_finetune_llama2_7b_arc_1_card.sh
|
||||||
```
|
```
|
||||||
|
|
||||||
##### Finetuning LLaMA2-7B on two Arc A770
|
##### Finetuning LLaMA2-7B on two Arc A770
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
bash finetune_llama2_7b_arc_2_card.sh
|
bash qlora_finetune_llama2_7b_arc_2_card.sh
|
||||||
```
|
```
|
||||||
|
|
||||||
##### Finetuning LLaMA2-7B on single Data Center GPU Flex 170
|
##### Finetuning LLaMA2-7B on single Data Center GPU Flex 170
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
bash finetune_llama2_7b_flex_170_1_card.sh
|
bash qlora_finetune_llama2_7b_flex_170_1_card.sh
|
||||||
```
|
```
|
||||||
|
|
||||||
##### Finetuning LLaMA2-7B on three Data Center GPU Flex 170
|
##### Finetuning LLaMA2-7B on three Data Center GPU Flex 170
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
bash finetune_llama2_7b_flex_170_3_card.sh
|
bash qlora_finetune_llama2_7b_flex_170_3_card.sh
|
||||||
```
|
```
|
||||||
|
|
||||||
##### Finetuning LLaMA2-7B on single Intel Data Center GPU Max 1100
|
##### Finetuning LLaMA2-7B on single Intel Data Center GPU Max 1100
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
bash finetune_llama2_7b_pvc_1100_1_card.sh
|
bash qlora_finetune_llama2_7b_pvc_1100_1_card.sh
|
||||||
```
|
```
|
||||||
|
|
||||||
##### Finetuning LLaMA2-7B on four Intel Data Center GPU Max 1100
|
##### Finetuning LLaMA2-7B on four Intel Data Center GPU Max 1100
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
bash finetune_llama2_7b_pvc_1100_4_card.sh
|
bash qlora_finetune_llama2_7b_pvc_1100_4_card.sh
|
||||||
```
|
```
|
||||||
|
|
||||||
##### Finetuning LLaMA2-7B on single Intel Data Center GPU Max 1550
|
##### Finetuning LLaMA2-7B on single Intel Data Center GPU Max 1550
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
bash finetune_llama2_7b_pvc_1550_1_card.sh
|
bash qlora_finetune_llama2_7b_pvc_1550_1_card.sh
|
||||||
```
|
```
|
||||||
|
|
||||||
##### Finetuning LLaMA2-7B on four Intel Data Center GPU Max 1550
|
##### Finetuning LLaMA2-7B on four Intel Data Center GPU Max 1550
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
bash finetune_llama2_7b_pvc_1550_4_card.sh
|
bash qlora_finetune_llama2_7b_pvc_1550_4_card.sh
|
||||||
```
|
```
|
||||||
|
|
||||||
#### QA-LoRA
|
</details>
|
||||||
##### Finetuning LLaMA2-7B on single Arc A770
|
|
||||||
|
<details>
|
||||||
|
<summary> Show LLaMA2-13B examples </summary>
|
||||||
|
|
||||||
|
##### Finetuning LLaMA2-13B on single tile of Intel Data Center GPU Max 1550
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
bash qalora_finetune_llama2_7b_arc_1_card.sh
|
bash qlora_finetune_llama2_13b_pvc_1550_1_tile.sh
|
||||||
```
|
```
|
||||||
|
|
||||||
##### Finetuning LLaMA2-7B on two Arc A770
|
##### Finetuning LLaMA2-13B on single Intel Data Center GPU Max 1550
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
bash qalora_finetune_llama2_7b_arc_2_card.sh
|
bash qlora_finetune_llama2_13b_pvc_1550_1_card.sh
|
||||||
```
|
```
|
||||||
|
|
||||||
##### Finetuning LLaMA2-7B on single Tile Intel Data Center GPU Max 1550
|
##### Finetuning LLaMA2-13B on four Intel Data Center GPU Max 1550
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
bash qalora_finetune_llama2_7b_pvc_1550_1_tile.sh
|
bash qlora_finetune_llama2_13b_pvc_1550_4_card.sh
|
||||||
```
|
```
|
||||||
|
|
||||||
#### LoRA
|
</details>
|
||||||
|
|
||||||
##### Finetuning LLaMA2-7B on single Arc A770
|
<details>
|
||||||
|
<summary> Show LLaMA2-70B examples </summary>
|
||||||
|
|
||||||
|
Different from `LLaMA2-7B` and `LLaMA2-13B`, it is recommonded to save the model with bigdl-llm low-bit optimization first to avoid large amount of CPU memory usage. And DeepSpeed ZeRO2 technology is used during finetuning.
|
||||||
|
|
||||||
|
##### Finetuning LLaMA2-70B on one Intel Data Center GPU Max 1550
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
bash lora_finetune_llama2_7b_arc_1_card.sh
|
bash qlora_finetune_llama2_70b_pvc_1550_1_card.sh
|
||||||
```
|
```
|
||||||
|
|
||||||
##### Finetuning LLaMA2-7B on four Intel Data Center GPU Max 1100
|
##### Finetuning LLaMA2-70B on four Intel Data Center GPU Max 1550
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
bash lora_finetune_llama2_7b_pvc_1100_1_card.sh
|
bash qlora_finetune_llama2_70b_pvc_1550_4_card.sh
|
||||||
```
|
```
|
||||||
|
|
||||||
##### Finetuning LLaMA2-7B on single Tile Intel Data Center GPU Max 1550
|
</details>
|
||||||
|
|
||||||
```bash
|
|
||||||
bash lora_finetune_llama2_7b_pvc_1550_1_tile.sh
|
|
||||||
```
|
|
||||||
|
|
||||||
##### Finetuning LLaMA2-7B on four Intel Data Center GPU Max 1550
|
|
||||||
|
|
||||||
```bash
|
|
||||||
bash lora_finetune_llama2_7b_pvc_1550_4_card.sh
|
|
||||||
```
|
|
||||||
|
|
||||||
#### ReLoRA
|
|
||||||
##### Finetuning LLaMA2-7B on single Arc A770
|
|
||||||
|
|
||||||
```bash
|
|
||||||
bash relora_finetune_llama2_7b_arc_1_card.sh
|
|
||||||
```
|
|
||||||
|
|
||||||
##### Finetuning LLaMA2-7B on two Arc A770
|
|
||||||
|
|
||||||
```bash
|
|
||||||
bash relora_finetune_llama2_7b_arc_2_card.sh
|
|
||||||
```
|
|
||||||
|
|
||||||
##### Finetuning LLaMA2-7B on single Intel Data Center GPU Max 1550
|
|
||||||
|
|
||||||
```bash
|
|
||||||
bash relora_finetune_llama2_7b_pvc_1550_1_card.sh
|
|
||||||
```
|
|
||||||
|
|
||||||
##### Finetuning LLaMA2-7B on four Intel Data Center GPU Max 1550
|
|
||||||
|
|
||||||
```bash
|
|
||||||
bash relora_finetune_llama2_7b_pvc_1550_4_card.sh
|
|
||||||
```
|
|
||||||
|
|
||||||
### 4. (Optional) Resume Training
|
### 4. (Optional) Resume Training
|
||||||
If you fail to complete the whole finetuning process, it is suggested to resume training from a previously saved checkpoint by specifying `resume_from_checkpoint` to the local checkpoint folder as following:**
|
If you fail to complete the whole finetuning process, it is suggested to resume training from a previously saved checkpoint by specifying `resume_from_checkpoint` to the local checkpoint folder as following:**
|
||||||
|
|
@ -173,14 +152,14 @@ python ./alpaca_qlora_finetuning.py \
|
||||||
1%|█ | 8/1164 [xx:xx<xx:xx:xx, xx s/it]
|
1%|█ | 8/1164 [xx:xx<xx:xx:xx, xx s/it]
|
||||||
```
|
```
|
||||||
|
|
||||||
### 4. Merge the adapter into the original model
|
### 6. Merge the adapter into the original model
|
||||||
```
|
```
|
||||||
python ./export_merged_model.py --repo-id-or-model-path REPO_ID_OR_MODEL_PATH --adapter_path ./outputs/checkpoint-200 --output_path ./outputs/checkpoint-200-merged
|
python ./export_merged_model.py --repo-id-or-model-path REPO_ID_OR_MODEL_PATH --adapter_path ./outputs/checkpoint-200 --output_path ./outputs/checkpoint-200-merged
|
||||||
```
|
```
|
||||||
|
|
||||||
Then you can use `./outputs/checkpoint-200-merged` as a normal huggingface transformer model to do inference.
|
Then you can use `./outputs/checkpoint-200-merged` as a normal huggingface transformer model to do inference.
|
||||||
|
|
||||||
### 5. Troubleshooting
|
### 7. Troubleshooting
|
||||||
- If you fail to finetune on multi cards because of following error message:
|
- If you fail to finetune on multi cards because of following error message:
|
||||||
```bash
|
```bash
|
||||||
RuntimeError: oneCCL: comm_selector.cpp:57 create_comm_impl: EXCEPTION: ze_data was not initialized
|
RuntimeError: oneCCL: comm_selector.cpp:57 create_comm_impl: EXCEPTION: ze_data was not initialized
|
||||||
|
|
@ -0,0 +1,279 @@
|
||||||
|
#
|
||||||
|
# Copyright 2016 The BigDL Authors.
|
||||||
|
#
|
||||||
|
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
# you may not use this file except in compliance with the License.
|
||||||
|
# You may obtain a copy of the License at
|
||||||
|
#
|
||||||
|
# http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
#
|
||||||
|
# Unless required by applicable law or agreed to in writing, software
|
||||||
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
# See the License for the specific language governing permissions and
|
||||||
|
# limitations under the License.
|
||||||
|
#
|
||||||
|
# Some parts of this file is adapted from
|
||||||
|
# https://github.com/tloen/alpaca-lora/blob/main/finetune.py
|
||||||
|
#
|
||||||
|
# Copyright 2023 Rohan Taori, Ishaan Gulrajani, Tianyi Zhang, Yann Dubois, Xuechen Li
|
||||||
|
|
||||||
|
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
# you may not use this file except in compliance with the License.
|
||||||
|
# You may obtain a copy of the License at
|
||||||
|
#
|
||||||
|
# http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
#
|
||||||
|
# Unless required by applicable law or agreed to in writing, software
|
||||||
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
# See the License for the specific language governing permissions and
|
||||||
|
# limitations under the License.
|
||||||
|
|
||||||
|
import os
|
||||||
|
from typing import List
|
||||||
|
|
||||||
|
import fire
|
||||||
|
import torch
|
||||||
|
import transformers
|
||||||
|
from datasets import load_dataset
|
||||||
|
import accelerate
|
||||||
|
|
||||||
|
from transformers import LlamaTokenizer
|
||||||
|
from peft import (
|
||||||
|
get_peft_model_state_dict,
|
||||||
|
set_peft_model_state_dict,
|
||||||
|
)
|
||||||
|
|
||||||
|
current_dir = os.path.dirname(os.path.realpath(__file__))
|
||||||
|
common_util_path = os.path.join(current_dir, '..', '..')
|
||||||
|
import sys
|
||||||
|
sys.path.append(common_util_path)
|
||||||
|
from common.utils import Prompter, get_int_from_env, wandb_check, get_train_val_data
|
||||||
|
|
||||||
|
from transformers import BitsAndBytesConfig
|
||||||
|
from bigdl.llm.transformers import AutoModelForCausalLM
|
||||||
|
# import them from bigdl.llm.transformers.qlora to get a BigDL-LLM compatible Peft model
|
||||||
|
from bigdl.llm.transformers.qlora import get_peft_model, prepare_model_for_kbit_training,\
|
||||||
|
LoraConfig
|
||||||
|
from bigdl.llm.utils.common import invalidInputError
|
||||||
|
|
||||||
|
local_rank = get_int_from_env(["LOCAL_RANK","MPI_LOCALRANKID"], "0")
|
||||||
|
world_size = get_int_from_env(["WORLD_SIZE","PMI_SIZE"], "1")
|
||||||
|
port = get_int_from_env(["MASTER_PORT"], 29500)
|
||||||
|
os.environ["LOCAL_RANK"] = str(local_rank)
|
||||||
|
os.environ["WORLD_SIZE"] = str(world_size)
|
||||||
|
os.environ["RANK"] = str(local_rank)
|
||||||
|
os.environ["MASTER_PORT"] = str(port)
|
||||||
|
|
||||||
|
def train(
|
||||||
|
# model/data params
|
||||||
|
base_model: str = "meta-llama/Llama-2-7b-hf", # the only required argument, default to be "meta-llama/Llama-2-7b-hf"
|
||||||
|
saved_low_bit_model: str = None, # optional, the path to the saved model with bigdl-llm low-bit optimization
|
||||||
|
data_path: str = "yahma/alpaca-cleaned",
|
||||||
|
output_dir: str = "./bigdl-qlora-alpaca",
|
||||||
|
# training hyperparams
|
||||||
|
bf16: bool = True, # default to bf16
|
||||||
|
batch_size: int = 128,
|
||||||
|
micro_batch_size: int = 2, # default to be 2, limited by GPU memory
|
||||||
|
num_epochs: int = 3,
|
||||||
|
learning_rate: float = 3e-5, # default to be 3e-5 to avoid divergence
|
||||||
|
cutoff_len: int = 256,
|
||||||
|
val_set_size: int = 2000,
|
||||||
|
# lora hyperparams
|
||||||
|
lora_r: int = 8,
|
||||||
|
lora_alpha: int = 16,
|
||||||
|
lora_dropout: float = 0.05,
|
||||||
|
lora_target_modules: List[str] = [
|
||||||
|
"q_proj",
|
||||||
|
"v_proj",
|
||||||
|
"k_proj",
|
||||||
|
"o_proj",
|
||||||
|
"up_proj",
|
||||||
|
"down_proj",
|
||||||
|
"gate_proj"
|
||||||
|
], # according to the QLoRA paper (https://arxiv.org/pdf/2305.14314.pdf), it's suggested to fine tune all linear layers
|
||||||
|
# llm hyperparams
|
||||||
|
train_on_inputs: bool = True, # if False, masks out inputs in loss
|
||||||
|
add_eos_token: bool = False,
|
||||||
|
group_by_length: bool = False, # faster, but produces an odd training loss curve
|
||||||
|
# wandb params
|
||||||
|
wandb_project: str = "",
|
||||||
|
wandb_run_name: str = "",
|
||||||
|
wandb_watch: str = "", # options: false | gradients | all
|
||||||
|
wandb_log_model: str = "", # options: false | true
|
||||||
|
resume_from_checkpoint: str = None, # either training checkpoint or final adapter
|
||||||
|
prompt_template_name: str = "alpaca", # The prompt template to use, will default to alpaca.
|
||||||
|
gradient_checkpointing: bool = False,
|
||||||
|
deepspeed: str = None,
|
||||||
|
training_mode: str = "qlora",
|
||||||
|
):
|
||||||
|
invalidInputError(training_mode == "qlora",
|
||||||
|
f"This example is for qlora training mode, but got training_mode={training_mode}.")
|
||||||
|
if int(os.environ.get("LOCAL_RANK", 0)) == 0:
|
||||||
|
print(
|
||||||
|
f"Training Alpaca-LoRA model with params:\n"
|
||||||
|
f"base_model: {base_model}\n"
|
||||||
|
f"data_path: {data_path}\n"
|
||||||
|
f"output_dir: {output_dir}\n"
|
||||||
|
f"batch_size: {batch_size}\n"
|
||||||
|
f"micro_batch_size: {micro_batch_size}\n"
|
||||||
|
f"num_epochs: {num_epochs}\n"
|
||||||
|
f"learning_rate: {learning_rate}\n"
|
||||||
|
f"cutoff_len: {cutoff_len}\n"
|
||||||
|
f"val_set_size: {val_set_size}\n"
|
||||||
|
f"lora_r: {lora_r}\n"
|
||||||
|
f"lora_alpha: {lora_alpha}\n"
|
||||||
|
f"lora_dropout: {lora_dropout}\n"
|
||||||
|
f"lora_target_modules: {lora_target_modules}\n"
|
||||||
|
f"train_on_inputs: {train_on_inputs}\n"
|
||||||
|
f"add_eos_token: {add_eos_token}\n"
|
||||||
|
f"group_by_length: {group_by_length}\n"
|
||||||
|
f"wandb_project: {wandb_project}\n"
|
||||||
|
f"wandb_run_name: {wandb_run_name}\n"
|
||||||
|
f"wandb_watch: {wandb_watch}\n"
|
||||||
|
f"wandb_log_model: {wandb_log_model}\n"
|
||||||
|
f"resume_from_checkpoint: {resume_from_checkpoint or False}\n"
|
||||||
|
f"prompt template: {prompt_template_name}\n"
|
||||||
|
f"training_mode: {training_mode}\n"
|
||||||
|
)
|
||||||
|
assert (
|
||||||
|
base_model
|
||||||
|
), "Please specify a --base_model, e.g. --base_model='huggyllama/llama-7b'"
|
||||||
|
gradient_accumulation_steps = batch_size // micro_batch_size
|
||||||
|
|
||||||
|
prompter = Prompter(prompt_template_name)
|
||||||
|
|
||||||
|
device_map = "auto"
|
||||||
|
world_size = int(os.environ.get("WORLD_SIZE", 1))
|
||||||
|
ddp = world_size != 1
|
||||||
|
if ddp:
|
||||||
|
device_map = {"": int(os.environ.get("LOCAL_RANK") or 0)}
|
||||||
|
gradient_accumulation_steps = gradient_accumulation_steps // world_size
|
||||||
|
|
||||||
|
# Check if parameter passed or if set within environ
|
||||||
|
use_wandb = wandb_check(wandb_project, wandb_watch, wandb_log_model)
|
||||||
|
|
||||||
|
if saved_low_bit_model is not None:
|
||||||
|
# Load the low bit optimized model if provide the saved path
|
||||||
|
model = AutoModelForCausalLM.load_low_bit(
|
||||||
|
saved_low_bit_model,
|
||||||
|
optimize_model=False,
|
||||||
|
torch_dtype=torch.bfloat16,
|
||||||
|
modules_to_not_convert=["lm_head"],
|
||||||
|
)
|
||||||
|
else:
|
||||||
|
# According to the QLoRA paper, using "nf4" could yield better model quality than "int4"
|
||||||
|
# use bnb_config for qlora/qalora/relora, which use 4bit for base model
|
||||||
|
bnb_config = BitsAndBytesConfig(
|
||||||
|
load_in_4bit=True,
|
||||||
|
bnb_4bit_use_double_quant=False,
|
||||||
|
bnb_4bit_quant_type="nf4",
|
||||||
|
bnb_4bit_compute_dtype=torch.bfloat16
|
||||||
|
)
|
||||||
|
model = AutoModelForCausalLM.from_pretrained(base_model,
|
||||||
|
quantization_config=bnb_config, )
|
||||||
|
# below is also supported
|
||||||
|
# Load the base model from a directory or the HF Hub to 4-bit format
|
||||||
|
# model = AutoModelForCausalLM.from_pretrained(
|
||||||
|
# base_model,
|
||||||
|
# load_in_low_bit="nf4",
|
||||||
|
# optimize_model=False,
|
||||||
|
# torch_dtype=torch.bfloat16,
|
||||||
|
# # device_map=device_map,
|
||||||
|
# modules_to_not_convert=["lm_head"],
|
||||||
|
# )
|
||||||
|
print(f"Model loaded on rank {os.environ.get('LOCAL_RANK')}")
|
||||||
|
model = model.to(f'xpu:{os.environ.get("LOCAL_RANK", 0)}')
|
||||||
|
print(f"Model moved to rank {os.environ.get('LOCAL_RANK')}")
|
||||||
|
|
||||||
|
tokenizer = LlamaTokenizer.from_pretrained(base_model)
|
||||||
|
print(f"Tokenizer loaded on rank {os.environ.get('LOCAL_RANK')}")
|
||||||
|
|
||||||
|
tokenizer.pad_token_id = (
|
||||||
|
0 # unk. we want this to be different from the eos token
|
||||||
|
)
|
||||||
|
tokenizer.padding_side = "left" # Allow batched inference
|
||||||
|
|
||||||
|
print(model)
|
||||||
|
|
||||||
|
# Prepare a BigDL-LLM compatible Peft model
|
||||||
|
model = prepare_model_for_kbit_training(model, use_gradient_checkpointing=gradient_checkpointing)
|
||||||
|
|
||||||
|
config = LoraConfig(
|
||||||
|
r=lora_r,
|
||||||
|
lora_alpha=lora_alpha,
|
||||||
|
target_modules=lora_target_modules,
|
||||||
|
lora_dropout=lora_dropout,
|
||||||
|
bias="none",
|
||||||
|
task_type="CAUSAL_LM",
|
||||||
|
training_mode=training_mode,
|
||||||
|
)
|
||||||
|
print(f"Lora Config: {config}")
|
||||||
|
model = get_peft_model(model, config)
|
||||||
|
|
||||||
|
if data_path.endswith(".json") or data_path.endswith(".jsonl"):
|
||||||
|
data = load_dataset("json", data_files=data_path)
|
||||||
|
else:
|
||||||
|
data = load_dataset(data_path)
|
||||||
|
|
||||||
|
model.print_trainable_parameters() # Be more transparent about the % of trainable params.
|
||||||
|
|
||||||
|
train_data, val_data = get_train_val_data(data, tokenizer, prompter, train_on_inputs,
|
||||||
|
add_eos_token, cutoff_len, val_set_size, seed=42)
|
||||||
|
|
||||||
|
# Unused
|
||||||
|
# if not ddp and torch.cuda.device_count() > 1:
|
||||||
|
# # keeps Trainer from trying its own DataParallelism when more than 1 gpu is available
|
||||||
|
# model.is_parallelizable = True
|
||||||
|
# model.model_parallel = True
|
||||||
|
|
||||||
|
trainer = transformers.Trainer(
|
||||||
|
model=model,
|
||||||
|
train_dataset=train_data,
|
||||||
|
eval_dataset=val_data,
|
||||||
|
args=transformers.TrainingArguments(
|
||||||
|
per_device_train_batch_size=micro_batch_size,
|
||||||
|
gradient_accumulation_steps=gradient_accumulation_steps,
|
||||||
|
# warmup_ratio=0.03,
|
||||||
|
# warmup_steps=100,
|
||||||
|
max_grad_norm=0.3,
|
||||||
|
num_train_epochs=num_epochs,
|
||||||
|
learning_rate=learning_rate,
|
||||||
|
lr_scheduler_type="cosine",
|
||||||
|
bf16=True, # ensure training more stable
|
||||||
|
logging_steps=1,
|
||||||
|
optim="adamw_torch",
|
||||||
|
evaluation_strategy="steps" if val_set_size > 0 else "no",
|
||||||
|
save_strategy="steps",
|
||||||
|
eval_steps=100 if val_set_size > 0 else None,
|
||||||
|
save_steps=100,
|
||||||
|
output_dir=output_dir,
|
||||||
|
save_total_limit=100,
|
||||||
|
load_best_model_at_end=True if val_set_size > 0 else False,
|
||||||
|
ddp_find_unused_parameters=False if ddp else None,
|
||||||
|
group_by_length=group_by_length,
|
||||||
|
report_to="wandb" if use_wandb else None,
|
||||||
|
run_name=wandb_run_name if use_wandb else None,
|
||||||
|
gradient_checkpointing=gradient_checkpointing,
|
||||||
|
ddp_backend="ccl",
|
||||||
|
deepspeed=deepspeed,
|
||||||
|
save_safetensors=False,
|
||||||
|
),
|
||||||
|
data_collator=transformers.DataCollatorForSeq2Seq(
|
||||||
|
tokenizer, pad_to_multiple_of=8, return_tensors="pt", padding=True
|
||||||
|
),
|
||||||
|
)
|
||||||
|
model.config.use_cache = False
|
||||||
|
|
||||||
|
trainer.train(resume_from_checkpoint=resume_from_checkpoint)
|
||||||
|
|
||||||
|
model.save_pretrained(output_dir)
|
||||||
|
|
||||||
|
print(
|
||||||
|
"\n If there's a warning about missing keys above, please disregard :)"
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
fire.Fire(train)
|
||||||
|
|
@ -0,0 +1,16 @@
|
||||||
|
{
|
||||||
|
"zero_optimization": {
|
||||||
|
"stage": 2,
|
||||||
|
"offload_optimizer": {
|
||||||
|
"device": "cpu"
|
||||||
|
},
|
||||||
|
"contiguous_gradients": true,
|
||||||
|
"overlap_comm": true
|
||||||
|
},
|
||||||
|
"bp16": {
|
||||||
|
"enabled": true
|
||||||
|
},
|
||||||
|
"train_micro_batch_size_per_gpu": "auto",
|
||||||
|
"gradient_accumulation_steps": "auto"
|
||||||
|
}
|
||||||
|
|
||||||
|
|
@ -0,0 +1,44 @@
|
||||||
|
#
|
||||||
|
# Copyright 2016 The BigDL Authors.
|
||||||
|
#
|
||||||
|
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
# you may not use this file except in compliance with the License.
|
||||||
|
# You may obtain a copy of the License at
|
||||||
|
#
|
||||||
|
# http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
#
|
||||||
|
# Unless required by applicable law or agreed to in writing, software
|
||||||
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
# See the License for the specific language governing permissions and
|
||||||
|
# limitations under the License.
|
||||||
|
#
|
||||||
|
import os
|
||||||
|
|
||||||
|
import torch
|
||||||
|
from transformers import LlamaTokenizer # noqa: F402
|
||||||
|
import argparse
|
||||||
|
|
||||||
|
current_dir = os.path.dirname(os.path.realpath(__file__))
|
||||||
|
common_util_path = os.path.join(current_dir, '..', '..')
|
||||||
|
import sys
|
||||||
|
sys.path.append(common_util_path)
|
||||||
|
from common.utils import merge_adapter
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
|
||||||
|
parser = argparse.ArgumentParser(description='Merge the adapter into the original model for Llama2 model')
|
||||||
|
parser.add_argument('--repo-id-or-model-path', type=str, default="meta-llama/Llama-2-7b-hf",
|
||||||
|
help='The huggingface repo id for the Llama2 (e.g. `meta-llama/Llama-2-7b-hf` and `meta-llama/Llama-2-13b-chat-hf`) to be downloaded'
|
||||||
|
', or the path to the huggingface checkpoint folder')
|
||||||
|
parser.add_argument('--adapter_path', type=str,)
|
||||||
|
parser.add_argument('--output_path', type=str,)
|
||||||
|
|
||||||
|
args = parser.parse_args()
|
||||||
|
base_model = model_path = args.repo_id_or_model_path
|
||||||
|
adapter_path = args.adapter_path
|
||||||
|
output_path = args.output_path
|
||||||
|
|
||||||
|
tokenizer = LlamaTokenizer.from_pretrained(base_model)
|
||||||
|
merge_adapter(base_model, tokenizer, adapter_path, output_path)
|
||||||
|
print(f'Finish to merge the adapter into the original model and you could find the merged model in {output_path}.')
|
||||||
|
|
@ -0,0 +1,28 @@
|
||||||
|
#
|
||||||
|
# Copyright 2016 The BigDL Authors.
|
||||||
|
#
|
||||||
|
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
# you may not use this file except in compliance with the License.
|
||||||
|
# You may obtain a copy of the License at
|
||||||
|
#
|
||||||
|
# http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
#
|
||||||
|
# Unless required by applicable law or agreed to in writing, software
|
||||||
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
# See the License for the specific language governing permissions and
|
||||||
|
# limitations under the License.
|
||||||
|
#
|
||||||
|
|
||||||
|
export MASTER_ADDR=127.0.0.1
|
||||||
|
export OMP_NUM_THREADS=56
|
||||||
|
export FI_PROVIDER=tcp
|
||||||
|
export CCL_ATL_TRANSPORT=ofi
|
||||||
|
|
||||||
|
mpirun -n 2 \
|
||||||
|
python -u ./alpaca_qlora_finetuning.py \
|
||||||
|
--base_model "meta-llama/Llama-2-13b-hf" \
|
||||||
|
--data_path "yahma/alpaca-cleaned" \
|
||||||
|
--output_dir "./bigdl-qlora-alpaca" \
|
||||||
|
--micro_batch_size 8 \
|
||||||
|
--batch_size 128 > training.log
|
||||||
|
|
@ -0,0 +1,23 @@
|
||||||
|
#
|
||||||
|
# Copyright 2016 The BigDL Authors.
|
||||||
|
#
|
||||||
|
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
# you may not use this file except in compliance with the License.
|
||||||
|
# You may obtain a copy of the License at
|
||||||
|
#
|
||||||
|
# http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
#
|
||||||
|
# Unless required by applicable law or agreed to in writing, software
|
||||||
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
# See the License for the specific language governing permissions and
|
||||||
|
# limitations under the License.
|
||||||
|
#
|
||||||
|
|
||||||
|
# You could also specify `--base_model` to the local path of the huggingface model checkpoint folder and `--data_path` to the local path of the dataset JSON file
|
||||||
|
python ./alpaca_qlora_finetuning.py \
|
||||||
|
--base_model "meta-llama/Llama-2-13b-hf" \
|
||||||
|
--data_path "yahma/alpaca-cleaned" \
|
||||||
|
--output_dir "./bigdl-qlora-alpaca" \
|
||||||
|
--micro_batch_size 8 \
|
||||||
|
--batch_size 128
|
||||||
|
|
@ -0,0 +1,28 @@
|
||||||
|
#
|
||||||
|
# Copyright 2016 The BigDL Authors.
|
||||||
|
#
|
||||||
|
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
# you may not use this file except in compliance with the License.
|
||||||
|
# You may obtain a copy of the License at
|
||||||
|
#
|
||||||
|
# http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
#
|
||||||
|
# Unless required by applicable law or agreed to in writing, software
|
||||||
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
# See the License for the specific language governing permissions and
|
||||||
|
# limitations under the License.
|
||||||
|
#
|
||||||
|
|
||||||
|
export MASTER_ADDR=127.0.0.1
|
||||||
|
export OMP_NUM_THREADS=56
|
||||||
|
export FI_PROVIDER=tcp
|
||||||
|
export CCL_ATL_TRANSPORT=ofi
|
||||||
|
|
||||||
|
mpirun -n 8 \
|
||||||
|
python -u ./alpaca_qlora_finetuning.py \
|
||||||
|
--base_model "meta-llama/Llama-2-13b-hf" \
|
||||||
|
--data_path "yahma/alpaca-cleaned" \
|
||||||
|
--output_dir "./bigdl-qlora-alpaca" \
|
||||||
|
--micro_batch_size 8 \
|
||||||
|
--batch_size 128 > training.log
|
||||||
|
|
@ -0,0 +1,36 @@
|
||||||
|
#
|
||||||
|
# Copyright 2016 The BigDL Authors.
|
||||||
|
#
|
||||||
|
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
# you may not use this file except in compliance with the License.
|
||||||
|
# You may obtain a copy of the License at
|
||||||
|
#
|
||||||
|
# http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
#
|
||||||
|
# Unless required by applicable law or agreed to in writing, software
|
||||||
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
# See the License for the specific language governing permissions and
|
||||||
|
# limitations under the License.
|
||||||
|
#
|
||||||
|
|
||||||
|
# save Llama-2-70b-hf model with bigdl-llm low-bit optimization first
|
||||||
|
python save_low_bit_70b_model.py --output_path "./llama-2-70b-hf-nf4"
|
||||||
|
|
||||||
|
export MASTER_ADDR=127.0.0.1
|
||||||
|
export OMP_NUM_THREADS=56
|
||||||
|
export FI_PROVIDER=tcp
|
||||||
|
export CCL_ATL_TRANSPORT=ofi
|
||||||
|
export CCL_ZE_IPC_EXCHANGE=sockets
|
||||||
|
|
||||||
|
mpirun -n 2 \
|
||||||
|
python -u ./alpaca_qlora_finetuning.py \
|
||||||
|
--base_model "meta-llama/Llama-2-70b-hf" \
|
||||||
|
--data_path "yahma/alpaca-cleaned" \
|
||||||
|
--output_dir "./bigdl-qlora-alpaca" \
|
||||||
|
--gradient_checkpointing True \
|
||||||
|
--micro_batch_size 8 \
|
||||||
|
--batch_size 128 \
|
||||||
|
--deepspeed ./deepspeed_zero2.json \
|
||||||
|
--saved_low_bit_model ./llama-2-70b-hf-nf4 > training.log
|
||||||
|
|
||||||
|
|
@ -0,0 +1,36 @@
|
||||||
|
#
|
||||||
|
# Copyright 2016 The BigDL Authors.
|
||||||
|
#
|
||||||
|
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
# you may not use this file except in compliance with the License.
|
||||||
|
# You may obtain a copy of the License at
|
||||||
|
#
|
||||||
|
# http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
#
|
||||||
|
# Unless required by applicable law or agreed to in writing, software
|
||||||
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
# See the License for the specific language governing permissions and
|
||||||
|
# limitations under the License.
|
||||||
|
#
|
||||||
|
|
||||||
|
# save Llama-2-70b-hf model with bigdl-llm low-bit optimization first
|
||||||
|
python save_low_bit_70b_model.py --output_path "./llama-2-70b-hf-nf4"
|
||||||
|
|
||||||
|
export MASTER_ADDR=127.0.0.1
|
||||||
|
export OMP_NUM_THREADS=56
|
||||||
|
export FI_PROVIDER=tcp
|
||||||
|
export CCL_ATL_TRANSPORT=ofi
|
||||||
|
export CCL_ZE_IPC_EXCHANGE=sockets
|
||||||
|
|
||||||
|
mpirun -n 8 \
|
||||||
|
python -u ./alpaca_qlora_finetuning.py \
|
||||||
|
--base_model "meta-llama/Llama-2-70b-hf" \
|
||||||
|
--data_path "yahma/alpaca-cleaned" \
|
||||||
|
--output_dir "./bigdl-qlora-alpaca" \
|
||||||
|
--gradient_checkpointing True \
|
||||||
|
--micro_batch_size 8 \
|
||||||
|
--batch_size 128 \
|
||||||
|
--deepspeed ./deepspeed_zero2.json \
|
||||||
|
--saved_low_bit_model ./llama-2-70b-hf-nf4 > training.log
|
||||||
|
|
||||||
|
|
@ -15,7 +15,7 @@
|
||||||
#
|
#
|
||||||
|
|
||||||
export MASTER_ADDR=127.0.0.1
|
export MASTER_ADDR=127.0.0.1
|
||||||
export OMP_NUM_THREADS=6 # adjust this to 1/4 of total physical cores
|
export OMP_NUM_THREADS=6
|
||||||
export FI_PROVIDER=tcp
|
export FI_PROVIDER=tcp
|
||||||
export CCL_ATL_TRANSPORT=ofi
|
export CCL_ATL_TRANSPORT=ofi
|
||||||
|
|
||||||
|
|
@ -15,7 +15,7 @@
|
||||||
#
|
#
|
||||||
|
|
||||||
export MASTER_ADDR=127.0.0.1
|
export MASTER_ADDR=127.0.0.1
|
||||||
export OMP_NUM_THREADS=12 # adjust this to 1/4 of total physical cores
|
export OMP_NUM_THREADS=12
|
||||||
export FI_PROVIDER=tcp
|
export FI_PROVIDER=tcp
|
||||||
export CCL_ATL_TRANSPORT=ofi
|
export CCL_ATL_TRANSPORT=ofi
|
||||||
|
|
||||||
|
|
@ -15,7 +15,7 @@
|
||||||
#
|
#
|
||||||
|
|
||||||
export MASTER_ADDR=127.0.0.1
|
export MASTER_ADDR=127.0.0.1
|
||||||
export OMP_NUM_THREADS=28 # adjust this to 1/4 of total physical cores
|
export OMP_NUM_THREADS=28
|
||||||
export FI_PROVIDER=tcp
|
export FI_PROVIDER=tcp
|
||||||
export CCL_ATL_TRANSPORT=ofi
|
export CCL_ATL_TRANSPORT=ofi
|
||||||
|
|
||||||
|
|
@ -15,7 +15,7 @@
|
||||||
#
|
#
|
||||||
|
|
||||||
export MASTER_ADDR=127.0.0.1
|
export MASTER_ADDR=127.0.0.1
|
||||||
export OMP_NUM_THREADS=28 # adjust this to 1/4 of total physical cores
|
export OMP_NUM_THREADS=56
|
||||||
export FI_PROVIDER=tcp
|
export FI_PROVIDER=tcp
|
||||||
export CCL_ATL_TRANSPORT=ofi
|
export CCL_ATL_TRANSPORT=ofi
|
||||||
|
|
||||||
|
|
@ -15,7 +15,7 @@
|
||||||
#
|
#
|
||||||
|
|
||||||
export MASTER_ADDR=127.0.0.1
|
export MASTER_ADDR=127.0.0.1
|
||||||
export OMP_NUM_THREADS=28 # adjust this to 1/4 of total physical cores
|
export OMP_NUM_THREADS=56
|
||||||
export FI_PROVIDER=tcp
|
export FI_PROVIDER=tcp
|
||||||
export CCL_ATL_TRANSPORT=ofi
|
export CCL_ATL_TRANSPORT=ofi
|
||||||
|
|
||||||
|
|
@ -0,0 +1,45 @@
|
||||||
|
#
|
||||||
|
# Copyright 2016 The BigDL Authors.
|
||||||
|
#
|
||||||
|
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
# you may not use this file except in compliance with the License.
|
||||||
|
# You may obtain a copy of the License at
|
||||||
|
#
|
||||||
|
# http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
#
|
||||||
|
# Unless required by applicable law or agreed to in writing, software
|
||||||
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
# See the License for the specific language governing permissions and
|
||||||
|
# limitations under the License.
|
||||||
|
#
|
||||||
|
|
||||||
|
from transformers import LlamaTokenizer
|
||||||
|
from bigdl.llm.transformers import AutoModelForCausalLM
|
||||||
|
import torch
|
||||||
|
import argparse
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
parser = argparse.ArgumentParser(description='Save model with bigdl-llm low-bit optimization')
|
||||||
|
parser.add_argument('--base_model', type=str, default="meta-llama/Llama-2-70b-hf",
|
||||||
|
help='The huggingface repo id for the Llama2-70B model to be downloaded'
|
||||||
|
', or the path to the huggingface checkpoint folder')
|
||||||
|
parser.add_argument('--output_path', type=str, default="./llama-2-70b-hf-nf4",
|
||||||
|
help='The path to the saved model.')
|
||||||
|
|
||||||
|
args = parser.parse_args()
|
||||||
|
base_model = args.base_model
|
||||||
|
output_path = args.output_path
|
||||||
|
|
||||||
|
model = AutoModelForCausalLM.from_pretrained(
|
||||||
|
base_model,
|
||||||
|
load_in_low_bit="nf4",
|
||||||
|
# load_in_4bit=True,
|
||||||
|
optimize_model=False,
|
||||||
|
torch_dtype=torch.bfloat16,
|
||||||
|
# device_map=device_map,
|
||||||
|
modules_to_not_convert=["lm_head"],
|
||||||
|
)
|
||||||
|
|
||||||
|
model.save_low_bit(output_path)
|
||||||
|
print(f'Model with bigdl-llm low-bit optimization is saved to {output_path}.')
|
||||||
|
|
@ -1,9 +1,10 @@
|
||||||
# Finetuning LLAMA Using Q-Lora (experimental support)
|
# Simple Example of QLoRA Finetuning with BigDL-LLM
|
||||||
|
|
||||||
This example demonstrates how to finetune a llama2-7b model use Big-LLM 4bit optimizations using [Intel GPUs](../README.md).
|
This simple example demonstrates how to finetune a llama2-7b model use BigDL-LLM 4bit optimizations using [Intel GPUs](../../../README.md).
|
||||||
|
Note, this example is just used for illustrating related usage and don't guarantee convergence of training.
|
||||||
|
|
||||||
## 0. Requirements
|
## 0. Requirements
|
||||||
To run this example with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
|
To run this example with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
|
||||||
|
|
||||||
## Example: Finetune llama2-7b using qlora
|
## Example: Finetune llama2-7b using qlora
|
||||||
|
|
||||||
|
|
@ -0,0 +1,44 @@
|
||||||
|
#
|
||||||
|
# Copyright 2016 The BigDL Authors.
|
||||||
|
#
|
||||||
|
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
# you may not use this file except in compliance with the License.
|
||||||
|
# You may obtain a copy of the License at
|
||||||
|
#
|
||||||
|
# http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
#
|
||||||
|
# Unless required by applicable law or agreed to in writing, software
|
||||||
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
# See the License for the specific language governing permissions and
|
||||||
|
# limitations under the License.
|
||||||
|
#
|
||||||
|
import os
|
||||||
|
|
||||||
|
import torch
|
||||||
|
from transformers import LlamaTokenizer # noqa: F402
|
||||||
|
import argparse
|
||||||
|
|
||||||
|
current_dir = os.path.dirname(os.path.realpath(__file__))
|
||||||
|
common_util_path = os.path.join(current_dir, '..', '..')
|
||||||
|
import sys
|
||||||
|
sys.path.append(common_util_path)
|
||||||
|
from common.utils import merge_adapter
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
|
||||||
|
parser = argparse.ArgumentParser(description='Merge the adapter into the original model for Llama2 model')
|
||||||
|
parser.add_argument('--repo-id-or-model-path', type=str, default="meta-llama/Llama-2-7b-hf",
|
||||||
|
help='The huggingface repo id for the Llama2 (e.g. `meta-llama/Llama-2-7b-hf` and `meta-llama/Llama-2-13b-chat-hf`) to be downloaded'
|
||||||
|
', or the path to the huggingface checkpoint folder')
|
||||||
|
parser.add_argument('--adapter_path', type=str,)
|
||||||
|
parser.add_argument('--output_path', type=str,)
|
||||||
|
|
||||||
|
args = parser.parse_args()
|
||||||
|
base_model = model_path = args.repo_id_or_model_path
|
||||||
|
adapter_path = args.adapter_path
|
||||||
|
output_path = args.output_path
|
||||||
|
|
||||||
|
tokenizer = LlamaTokenizer.from_pretrained(base_model)
|
||||||
|
merge_adapter(base_model, tokenizer, adapter_path, output_path)
|
||||||
|
print(f'Finish to merge the adapter into the original model and you could find the merged model in {output_path}.')
|
||||||
|
|
@ -28,7 +28,7 @@ import argparse
|
||||||
|
|
||||||
if __name__ == "__main__":
|
if __name__ == "__main__":
|
||||||
|
|
||||||
parser = argparse.ArgumentParser(description='Predict Tokens using `generate()` API for Llama2 model')
|
parser = argparse.ArgumentParser(description='Simple example of how to qlora finetune llama2 model using bigdl-llm')
|
||||||
parser.add_argument('--repo-id-or-model-path', type=str, default="meta-llama/Llama-2-7b-hf",
|
parser.add_argument('--repo-id-or-model-path', type=str, default="meta-llama/Llama-2-7b-hf",
|
||||||
help='The huggingface repo id for the Llama2 (e.g. `meta-llama/Llama-2-7b-hf` and `meta-llama/Llama-2-13b-chat-hf`) to be downloaded'
|
help='The huggingface repo id for the Llama2 (e.g. `meta-llama/Llama-2-7b-hf` and `meta-llama/Llama-2-13b-chat-hf`) to be downloaded'
|
||||||
', or the path to the huggingface checkpoint folder')
|
', or the path to the huggingface checkpoint folder')
|
||||||
9
python/llm/example/GPU/LLM-Finetuning/README.md
Normal file
9
python/llm/example/GPU/LLM-Finetuning/README.md
Normal file
|
|
@ -0,0 +1,9 @@
|
||||||
|
# Running LLM Finetuning using BigDL-LLM on Intel GPU
|
||||||
|
|
||||||
|
This folder contains examples of running different training mode with BigDL-LLM on Intel GPU:
|
||||||
|
|
||||||
|
- [LoRA](LoRA): examples of running LoRA finetuning
|
||||||
|
- [QLoRA](QLoRA): examples of running QLoRA finetuning
|
||||||
|
- [QA-LoRA](QA-LoRA): examples of running QA-LoRA finetuning
|
||||||
|
- [ReLora](ReLora): examples of running ReLora finetuning
|
||||||
|
- [common](common): common templates and utility classes in finetuning examples
|
||||||
90
python/llm/example/GPU/LLM-Finetuning/ReLora/README.md
Normal file
90
python/llm/example/GPU/LLM-Finetuning/ReLora/README.md
Normal file
|
|
@ -0,0 +1,90 @@
|
||||||
|
# ReLoRA Finetuning with BigDL-LLM
|
||||||
|
|
||||||
|
This example ports [Alpaca-LoRA](https://github.com/tloen/alpaca-lora/tree/main) to BigDL-LLM (using [ReLoRA](https://arxiv.org/abs/2307.05695) algorithm) on [Intel GPU](../../README.md).
|
||||||
|
|
||||||
|
### 0. Requirements
|
||||||
|
To run this example with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../README.md#requirements) for more information.
|
||||||
|
|
||||||
|
### 1. Install
|
||||||
|
|
||||||
|
```bash
|
||||||
|
conda create -n llm python=3.9
|
||||||
|
conda activate llm
|
||||||
|
# below command will install intel_extension_for_pytorch==2.1.10+xpu as default
|
||||||
|
pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
|
||||||
|
pip install transformers==4.34.0 datasets
|
||||||
|
pip install fire peft==0.5.0
|
||||||
|
pip install oneccl_bind_pt==2.1.100 -f https://developer.intel.com/ipex-whl-stable-xpu # necessary to run distributed finetuning
|
||||||
|
pip install accelerate==0.23.0
|
||||||
|
pip install bitsandbytes scipy
|
||||||
|
```
|
||||||
|
|
||||||
|
### 2. Configures OneAPI environment variables
|
||||||
|
```bash
|
||||||
|
source /opt/intel/oneapi/setvars.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
### 3. ReLoRA Finetune
|
||||||
|
|
||||||
|
Here, we provide example usages on different hardware. Please refer to the appropriate script based on your device:
|
||||||
|
|
||||||
|
##### Finetuning LLaMA2-7B on single Arc A770
|
||||||
|
|
||||||
|
```bash
|
||||||
|
bash relora_finetune_llama2_7b_arc_1_card.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
##### Finetuning LLaMA2-7B on two Arc A770
|
||||||
|
|
||||||
|
```bash
|
||||||
|
bash relora_finetune_llama2_7b_arc_2_card.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
##### Finetuning LLaMA2-7B on single Intel Data Center GPU Max 1550
|
||||||
|
|
||||||
|
```bash
|
||||||
|
bash relora_finetune_llama2_7b_pvc_1550_1_card.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
##### Finetuning LLaMA2-7B on four Intel Data Center GPU Max 1550
|
||||||
|
|
||||||
|
```bash
|
||||||
|
bash relora_finetune_llama2_7b_pvc_1550_4_card.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
### 4. (Optional) Resume Training
|
||||||
|
**If you fail to complete the whole finetuning process, it is suggested to resume training from a previously saved checkpoint by specifying `resume_from_checkpoint` to the local checkpoint folder as following:**
|
||||||
|
```bash
|
||||||
|
python ./alpaca_relora_finetuning.py \
|
||||||
|
--base_model "meta-llama/Llama-2-7b-hf" \
|
||||||
|
--data_path "yahma/alpaca-cleaned" \
|
||||||
|
--output_dir "./bigdl-qlora-alpaca" \
|
||||||
|
--resume_from_checkpoint "./bigdl-qlora-alpaca/checkpoint-1100"
|
||||||
|
```
|
||||||
|
|
||||||
|
### 5. Sample Output
|
||||||
|
```log
|
||||||
|
{'loss': 1.9231, 'learning_rate': 2.9999945367033285e-05, 'epoch': 0.0}
|
||||||
|
{'loss': 1.8622, 'learning_rate': 2.9999781468531096e-05, 'epoch': 0.01}
|
||||||
|
{'loss': 1.9043, 'learning_rate': 2.9999508305687345e-05, 'epoch': 0.01}
|
||||||
|
{'loss': 1.8967, 'learning_rate': 2.999912588049185e-05, 'epoch': 0.01}
|
||||||
|
{'loss': 1.9658, 'learning_rate': 2.9998634195730358e-05, 'epoch': 0.01}
|
||||||
|
{'loss': 1.8386, 'learning_rate': 2.9998033254984483e-05, 'epoch': 0.02}
|
||||||
|
{'loss': 1.809, 'learning_rate': 2.999732306263172e-05, 'epoch': 0.02}
|
||||||
|
{'loss': 1.8552, 'learning_rate': 2.9996503623845395e-05, 'epoch': 0.02}
|
||||||
|
1%|█ | 8/1164 [xx:xx<xx:xx:xx, xx s/it]
|
||||||
|
```
|
||||||
|
|
||||||
|
### 6. Merge the adapter into the original model
|
||||||
|
```
|
||||||
|
python ./export_merged_model.py --repo-id-or-model-path REPO_ID_OR_MODEL_PATH --adapter_path ./outputs/checkpoint-200 --output_path ./outputs/checkpoint-200-merged
|
||||||
|
```
|
||||||
|
|
||||||
|
Then you can use `./outputs/checkpoint-200-merged` as a normal huggingface transformer model to do inference.
|
||||||
|
|
||||||
|
### 7. Troubleshooting
|
||||||
|
- If you fail to finetune on multi cards because of following error message:
|
||||||
|
```bash
|
||||||
|
RuntimeError: oneCCL: comm_selector.cpp:57 create_comm_impl: EXCEPTION: ze_data was not initialized
|
||||||
|
```
|
||||||
|
Please try `sudo apt install level-zero-dev` to fix it.
|
||||||
|
|
@ -44,29 +44,20 @@ from peft import (
|
||||||
get_peft_model_state_dict,
|
get_peft_model_state_dict,
|
||||||
set_peft_model_state_dict,
|
set_peft_model_state_dict,
|
||||||
)
|
)
|
||||||
from utils.prompter import Prompter
|
|
||||||
|
current_dir = os.path.dirname(os.path.realpath(__file__))
|
||||||
|
common_util_path = os.path.join(current_dir, '..')
|
||||||
|
import sys
|
||||||
|
sys.path.append(common_util_path)
|
||||||
|
from common.utils import Prompter, get_int_from_env, wandb_check, get_train_val_data
|
||||||
|
|
||||||
from transformers import BitsAndBytesConfig
|
from transformers import BitsAndBytesConfig
|
||||||
from bigdl.llm.transformers import AutoModelForCausalLM
|
from bigdl.llm.transformers import AutoModelForCausalLM
|
||||||
|
from bigdl.llm.transformers.relora import ReLoRATrainer
|
||||||
# import them from bigdl.llm.transformers.qlora to get a BigDL-LLM compatible Peft model
|
# import them from bigdl.llm.transformers.qlora to get a BigDL-LLM compatible Peft model
|
||||||
from bigdl.llm.transformers.qlora import get_peft_model, prepare_model_for_kbit_training,\
|
from bigdl.llm.transformers.qlora import get_peft_model, prepare_model_for_kbit_training,\
|
||||||
LoraConfig
|
LoraConfig
|
||||||
from bigdl.llm.utils.common import invalidInputError
|
from bigdl.llm.utils.common import invalidInputError
|
||||||
|
|
||||||
|
|
||||||
def get_int_from_env(env_keys, default):
|
|
||||||
"""Returns the first positive env value found in the `env_keys` list or the default."""
|
|
||||||
for e in env_keys:
|
|
||||||
val = int(os.environ.get(e, -1))
|
|
||||||
if val >= 0:
|
|
||||||
return val
|
|
||||||
return int(default)
|
|
||||||
|
|
||||||
def _get_trainer_cls(training_mode):
|
|
||||||
if training_mode == "relora":
|
|
||||||
from bigdl.llm.transformers.relora import ReLoRATrainer
|
|
||||||
return ReLoRATrainer
|
|
||||||
return transformers.Trainer
|
|
||||||
|
|
||||||
local_rank = get_int_from_env(["LOCAL_RANK","MPI_LOCALRANKID"], "0")
|
local_rank = get_int_from_env(["LOCAL_RANK","MPI_LOCALRANKID"], "0")
|
||||||
world_size = get_int_from_env(["WORLD_SIZE","PMI_SIZE"], "1")
|
world_size = get_int_from_env(["WORLD_SIZE","PMI_SIZE"], "1")
|
||||||
|
|
@ -102,7 +93,7 @@ def train(
|
||||||
"up_proj",
|
"up_proj",
|
||||||
"down_proj",
|
"down_proj",
|
||||||
"gate_proj"
|
"gate_proj"
|
||||||
], # according to the QLoRA paper (https://arxiv.org/pdf/2305.14314.pdf), it's suggested to fine tune all linear layers
|
],
|
||||||
# llm hyperparams
|
# llm hyperparams
|
||||||
train_on_inputs: bool = True, # if False, masks out inputs in loss
|
train_on_inputs: bool = True, # if False, masks out inputs in loss
|
||||||
add_eos_token: bool = False,
|
add_eos_token: bool = False,
|
||||||
|
|
@ -116,7 +107,7 @@ def train(
|
||||||
prompt_template_name: str = "alpaca", # The prompt template to use, will default to alpaca.
|
prompt_template_name: str = "alpaca", # The prompt template to use, will default to alpaca.
|
||||||
gradient_checkpointing: bool = False,
|
gradient_checkpointing: bool = False,
|
||||||
deepspeed: str = None,
|
deepspeed: str = None,
|
||||||
training_mode: str = "qlora",
|
training_mode: str = "relora",
|
||||||
# relora params, relora_steps should > 0 if the training mode is `relora`,
|
# relora params, relora_steps should > 0 if the training mode is `relora`,
|
||||||
# Implements the ReLoRA training procedure from https://arxiv.org/abs/2307.05695,
|
# Implements the ReLoRA training procedure from https://arxiv.org/abs/2307.05695,
|
||||||
# minus the initial full fine-tune.
|
# minus the initial full fine-tune.
|
||||||
|
|
@ -124,8 +115,8 @@ def train(
|
||||||
relora_warmup_steps: int = 10, # Number of per-restart warmup steps
|
relora_warmup_steps: int = 10, # Number of per-restart warmup steps
|
||||||
relora_cpu_offload: bool = True, # True to perform lora weight merges on cpu during restarts, for modest gpu memory savings
|
relora_cpu_offload: bool = True, # True to perform lora weight merges on cpu during restarts, for modest gpu memory savings
|
||||||
):
|
):
|
||||||
invalidInputError(training_mode in ["qlora", "qalora", "lora", "relora"],
|
invalidInputError(training_mode == "relora",
|
||||||
"Only qlora / qalora / lora / relora are supported for training_mode now.")
|
f"This example is for relora training mode, but got training_mode={training_mode}.")
|
||||||
if int(os.environ.get("LOCAL_RANK", 0)) == 0:
|
if int(os.environ.get("LOCAL_RANK", 0)) == 0:
|
||||||
print(
|
print(
|
||||||
f"Training Alpaca-LoRA model with params:\n"
|
f"Training Alpaca-LoRA model with params:\n"
|
||||||
|
|
@ -174,16 +165,7 @@ def train(
|
||||||
gradient_accumulation_steps = gradient_accumulation_steps // world_size
|
gradient_accumulation_steps = gradient_accumulation_steps // world_size
|
||||||
|
|
||||||
# Check if parameter passed or if set within environ
|
# Check if parameter passed or if set within environ
|
||||||
use_wandb = len(wandb_project) > 0 or (
|
use_wandb = wandb_check(wandb_project, wandb_watch, wandb_log_model)
|
||||||
"WANDB_PROJECT" in os.environ and len(os.environ["WANDB_PROJECT"]) > 0
|
|
||||||
)
|
|
||||||
# Only overwrite environ if wandb param passed
|
|
||||||
if len(wandb_project) > 0:
|
|
||||||
os.environ["WANDB_PROJECT"] = wandb_project
|
|
||||||
if len(wandb_watch) > 0:
|
|
||||||
os.environ["WANDB_WATCH"] = wandb_watch
|
|
||||||
if len(wandb_log_model) > 0:
|
|
||||||
os.environ["WANDB_LOG_MODEL"] = wandb_log_model
|
|
||||||
|
|
||||||
if saved_low_bit_model is not None:
|
if saved_low_bit_model is not None:
|
||||||
# Load the low bit optimized model if provide the saved path
|
# Load the low bit optimized model if provide the saved path
|
||||||
|
|
@ -194,42 +176,20 @@ def train(
|
||||||
modules_to_not_convert=["lm_head"],
|
modules_to_not_convert=["lm_head"],
|
||||||
)
|
)
|
||||||
else:
|
else:
|
||||||
# According to the QLoRA paper, using "nf4" could yield better model quality than "int4"
|
# use bnb_config for qlora/qalora/relora, which use 4bit for base model
|
||||||
# Default 4-bit format for qa-lora is sym_int4
|
bnb_config = BitsAndBytesConfig(
|
||||||
if training_mode == "lora":
|
load_in_4bit=True,
|
||||||
model = AutoModelForCausalLM.from_pretrained(
|
bnb_4bit_use_double_quant=False,
|
||||||
base_model,
|
bnb_4bit_quant_type="nf4",
|
||||||
load_in_low_bit="bf16",
|
bnb_4bit_compute_dtype=torch.bfloat16
|
||||||
optimize_model=False,
|
)
|
||||||
torch_dtype=torch.bfloat16,
|
model = AutoModelForCausalLM.from_pretrained(base_model,
|
||||||
modules_to_not_convert=["lm_head"],
|
quantization_config=bnb_config, )
|
||||||
)
|
|
||||||
else:
|
|
||||||
# use bnb_config for qlora/qalora/relora, which use 4bit for base model
|
|
||||||
if training_mode == "qalora":
|
|
||||||
low_bit_format = "int4"
|
|
||||||
else:
|
|
||||||
low_bit_format = "nf4"
|
|
||||||
bnb_config = BitsAndBytesConfig(
|
|
||||||
load_in_4bit=True,
|
|
||||||
bnb_4bit_use_double_quant=False,
|
|
||||||
bnb_4bit_quant_type=low_bit_format,
|
|
||||||
bnb_4bit_compute_dtype=torch.bfloat16
|
|
||||||
)
|
|
||||||
model = AutoModelForCausalLM.from_pretrained(base_model,
|
|
||||||
quantization_config=bnb_config, )
|
|
||||||
|
|
||||||
# below is also supported
|
# below is also supported
|
||||||
# Load the base model from a directory or the HF Hub to 4-bit format
|
# Load the base model from a directory or the HF Hub to 4-bit format
|
||||||
# if training_mode == "qalora":
|
|
||||||
# low_bit_format = "sym_int4"
|
|
||||||
# elif training_mode == "lora":
|
|
||||||
# low_bit_format = "bf16"
|
|
||||||
# else:
|
|
||||||
# low_bit_format = "nf4"
|
|
||||||
# model = AutoModelForCausalLM.from_pretrained(
|
# model = AutoModelForCausalLM.from_pretrained(
|
||||||
# base_model,
|
# base_model,
|
||||||
# load_in_low_bit=low_bit_format,
|
# load_in_low_bit="nf4",
|
||||||
# optimize_model=False,
|
# optimize_model=False,
|
||||||
# torch_dtype=torch.bfloat16,
|
# torch_dtype=torch.bfloat16,
|
||||||
# # device_map=device_map,
|
# # device_map=device_map,
|
||||||
|
|
@ -249,54 +209,6 @@ def train(
|
||||||
|
|
||||||
print(model)
|
print(model)
|
||||||
|
|
||||||
def tokenize(prompt, add_eos_token=True):
|
|
||||||
# there's probably a way to do this with the tokenizer settings
|
|
||||||
# but again, gotta move fast
|
|
||||||
result = tokenizer(
|
|
||||||
prompt,
|
|
||||||
truncation=True,
|
|
||||||
max_length=cutoff_len,
|
|
||||||
padding=False,
|
|
||||||
return_tensors=None,
|
|
||||||
)
|
|
||||||
if (
|
|
||||||
result["input_ids"][-1] != tokenizer.eos_token_id
|
|
||||||
and len(result["input_ids"]) < cutoff_len
|
|
||||||
and add_eos_token
|
|
||||||
):
|
|
||||||
result["input_ids"].append(tokenizer.eos_token_id)
|
|
||||||
result["attention_mask"].append(1)
|
|
||||||
|
|
||||||
result["labels"] = result["input_ids"].copy()
|
|
||||||
|
|
||||||
return result
|
|
||||||
|
|
||||||
def generate_and_tokenize_prompt(data_point):
|
|
||||||
full_prompt = prompter.generate_prompt(
|
|
||||||
data_point["instruction"],
|
|
||||||
data_point["input"],
|
|
||||||
data_point["output"],
|
|
||||||
)
|
|
||||||
tokenized_full_prompt = tokenize(full_prompt)
|
|
||||||
if not train_on_inputs:
|
|
||||||
user_prompt = prompter.generate_prompt(
|
|
||||||
data_point["instruction"], data_point["input"]
|
|
||||||
)
|
|
||||||
tokenized_user_prompt = tokenize(
|
|
||||||
user_prompt, add_eos_token=add_eos_token
|
|
||||||
)
|
|
||||||
user_prompt_len = len(tokenized_user_prompt["input_ids"])
|
|
||||||
|
|
||||||
if add_eos_token:
|
|
||||||
user_prompt_len -= 1
|
|
||||||
|
|
||||||
tokenized_full_prompt["labels"] = [
|
|
||||||
-100
|
|
||||||
] * user_prompt_len + tokenized_full_prompt["labels"][
|
|
||||||
user_prompt_len:
|
|
||||||
] # could be sped up, probably
|
|
||||||
return tokenized_full_prompt
|
|
||||||
|
|
||||||
# Prepare a BigDL-LLM compatible Peft model
|
# Prepare a BigDL-LLM compatible Peft model
|
||||||
model = prepare_model_for_kbit_training(model, use_gradient_checkpointing=gradient_checkpointing)
|
model = prepare_model_for_kbit_training(model, use_gradient_checkpointing=gradient_checkpointing)
|
||||||
|
|
||||||
|
|
@ -319,19 +231,8 @@ def train(
|
||||||
|
|
||||||
model.print_trainable_parameters() # Be more transparent about the % of trainable params.
|
model.print_trainable_parameters() # Be more transparent about the % of trainable params.
|
||||||
|
|
||||||
if val_set_size > 0:
|
train_data, val_data = get_train_val_data(data, tokenizer, prompter, train_on_inputs,
|
||||||
train_val = data["train"].train_test_split(
|
add_eos_token, cutoff_len, val_set_size, seed=42)
|
||||||
test_size=val_set_size, shuffle=True, seed=42
|
|
||||||
)
|
|
||||||
train_data = (
|
|
||||||
train_val["train"].shuffle().map(generate_and_tokenize_prompt)
|
|
||||||
)
|
|
||||||
val_data = (
|
|
||||||
train_val["test"].shuffle().map(generate_and_tokenize_prompt)
|
|
||||||
)
|
|
||||||
else:
|
|
||||||
train_data = data["train"].shuffle().map(generate_and_tokenize_prompt)
|
|
||||||
val_data = None
|
|
||||||
|
|
||||||
# Unused
|
# Unused
|
||||||
# if not ddp and torch.cuda.device_count() > 1:
|
# if not ddp and torch.cuda.device_count() > 1:
|
||||||
|
|
@ -339,7 +240,6 @@ def train(
|
||||||
# model.is_parallelizable = True
|
# model.is_parallelizable = True
|
||||||
# model.model_parallel = True
|
# model.model_parallel = True
|
||||||
|
|
||||||
trainer_cls = _get_trainer_cls(training_mode=training_mode)
|
|
||||||
extra_args = {}
|
extra_args = {}
|
||||||
if training_mode == "relora":
|
if training_mode == "relora":
|
||||||
extra_args["base_model"] = base_model
|
extra_args["base_model"] = base_model
|
||||||
|
|
@ -348,7 +248,7 @@ def train(
|
||||||
extra_args["relora_cpu_offload"] = relora_cpu_offload
|
extra_args["relora_cpu_offload"] = relora_cpu_offload
|
||||||
extra_args["resume_from_checkpoint"] = resume_from_checkpoint
|
extra_args["resume_from_checkpoint"] = resume_from_checkpoint
|
||||||
|
|
||||||
trainer = trainer_cls(
|
trainer = ReLoRATrainer(
|
||||||
model=model,
|
model=model,
|
||||||
train_dataset=train_data,
|
train_dataset=train_data,
|
||||||
eval_dataset=val_data,
|
eval_dataset=val_data,
|
||||||
|
|
@ -361,7 +261,7 @@ def train(
|
||||||
max_grad_norm=0.3,
|
max_grad_norm=0.3,
|
||||||
num_train_epochs=num_epochs,
|
num_train_epochs=num_epochs,
|
||||||
learning_rate=learning_rate,
|
learning_rate=learning_rate,
|
||||||
lr_scheduler_type="constant" if training_mode=="qalora" else "cosine",
|
lr_scheduler_type="cosine",
|
||||||
bf16=True, # ensure training more stable
|
bf16=True, # ensure training more stable
|
||||||
logging_steps=1,
|
logging_steps=1,
|
||||||
optim="adamw_torch",
|
optim="adamw_torch",
|
||||||
|
|
@ -370,7 +270,7 @@ def train(
|
||||||
eval_steps=100 if val_set_size > 0 else None,
|
eval_steps=100 if val_set_size > 0 else None,
|
||||||
save_steps=100,
|
save_steps=100,
|
||||||
output_dir=output_dir,
|
output_dir=output_dir,
|
||||||
save_total_limit=100 if training_mode != "relora" else 4, # relora will save the whole model, here we use 4 to save the disk space.
|
save_total_limit=4, # relora will save the whole model, here we use 4 to save the disk space.
|
||||||
load_best_model_at_end=True if val_set_size > 0 else False,
|
load_best_model_at_end=True if val_set_size > 0 else False,
|
||||||
ddp_find_unused_parameters=False if ddp else None,
|
ddp_find_unused_parameters=False if ddp else None,
|
||||||
group_by_length=group_by_length,
|
group_by_length=group_by_length,
|
||||||
|
|
@ -0,0 +1,44 @@
|
||||||
|
#
|
||||||
|
# Copyright 2016 The BigDL Authors.
|
||||||
|
#
|
||||||
|
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
# you may not use this file except in compliance with the License.
|
||||||
|
# You may obtain a copy of the License at
|
||||||
|
#
|
||||||
|
# http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
#
|
||||||
|
# Unless required by applicable law or agreed to in writing, software
|
||||||
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
# See the License for the specific language governing permissions and
|
||||||
|
# limitations under the License.
|
||||||
|
#
|
||||||
|
import os
|
||||||
|
|
||||||
|
import torch
|
||||||
|
from transformers import LlamaTokenizer # noqa: F402
|
||||||
|
import argparse
|
||||||
|
|
||||||
|
current_dir = os.path.dirname(os.path.realpath(__file__))
|
||||||
|
common_util_path = os.path.join(current_dir, '..')
|
||||||
|
import sys
|
||||||
|
sys.path.append(common_util_path)
|
||||||
|
from common.utils import merge_adapter
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
|
||||||
|
parser = argparse.ArgumentParser(description='Merge the adapter into the original model for Llama2 model')
|
||||||
|
parser.add_argument('--repo-id-or-model-path', type=str, default="meta-llama/Llama-2-7b-hf",
|
||||||
|
help='The huggingface repo id for the Llama2 (e.g. `meta-llama/Llama-2-7b-hf` and `meta-llama/Llama-2-13b-chat-hf`) to be downloaded'
|
||||||
|
', or the path to the huggingface checkpoint folder')
|
||||||
|
parser.add_argument('--adapter_path', type=str,)
|
||||||
|
parser.add_argument('--output_path', type=str,)
|
||||||
|
|
||||||
|
args = parser.parse_args()
|
||||||
|
base_model = model_path = args.repo_id_or_model_path
|
||||||
|
adapter_path = args.adapter_path
|
||||||
|
output_path = args.output_path
|
||||||
|
|
||||||
|
tokenizer = LlamaTokenizer.from_pretrained(base_model)
|
||||||
|
merge_adapter(base_model, tokenizer, adapter_path, output_path)
|
||||||
|
print(f'Finish to merge the adapter into the original model and you could find the merged model in {output_path}.')
|
||||||
|
|
@ -15,10 +15,9 @@
|
||||||
#
|
#
|
||||||
|
|
||||||
# You could also specify `--base_model` to the local path of the huggingface model checkpoint folder and `--data_path` to the local path of the dataset JSON file
|
# You could also specify `--base_model` to the local path of the huggingface model checkpoint folder and `--data_path` to the local path of the dataset JSON file
|
||||||
python ./alpaca_qlora_finetuning.py \
|
python ./alpaca_relora_finetuning.py \
|
||||||
--base_model "meta-llama/Llama-2-7b-hf" \
|
--base_model "meta-llama/Llama-2-7b-hf" \
|
||||||
--data_path "yahma/alpaca-cleaned" \
|
--data_path "yahma/alpaca-cleaned" \
|
||||||
--output_dir "./bigdl-relora-alpaca" \
|
--output_dir "./bigdl-relora-alpaca" \
|
||||||
--relora_steps 300 \
|
--relora_steps 300 \
|
||||||
--relora_warmup_steps 10 \
|
--relora_warmup_steps 10
|
||||||
--training_mode "relora"
|
|
||||||
|
|
@ -15,15 +15,14 @@
|
||||||
#
|
#
|
||||||
|
|
||||||
export MASTER_ADDR=127.0.0.1
|
export MASTER_ADDR=127.0.0.1
|
||||||
export OMP_NUM_THREADS=6 # adjust this to 1/4 of total physical cores
|
export OMP_NUM_THREADS=6
|
||||||
export FI_PROVIDER=tcp
|
export FI_PROVIDER=tcp
|
||||||
export CCL_ATL_TRANSPORT=ofi
|
export CCL_ATL_TRANSPORT=ofi
|
||||||
|
|
||||||
mpirun -n 2 \
|
mpirun -n 2 \
|
||||||
python -u ./alpaca_qlora_finetuning.py \
|
python -u ./alpaca_relora_finetuning.py \
|
||||||
--base_model "meta-llama/Llama-2-7b-hf" \
|
--base_model "meta-llama/Llama-2-7b-hf" \
|
||||||
--data_path "yahma/alpaca-cleaned" \
|
--data_path "yahma/alpaca-cleaned" \
|
||||||
--output_dir "./bigdl-relora-alpaca" \
|
--output_dir "./bigdl-relora-alpaca" \
|
||||||
--relora_steps 300 \
|
--relora_steps 300 \
|
||||||
--relora_warmup_steps 10 \
|
--relora_warmup_steps 10 > training.log
|
||||||
--training_mode "relora" > training.log
|
|
||||||
|
|
@ -15,17 +15,16 @@
|
||||||
#
|
#
|
||||||
|
|
||||||
export MASTER_ADDR=127.0.0.1
|
export MASTER_ADDR=127.0.0.1
|
||||||
export OMP_NUM_THREADS=28 # adjust this to 1/4 of total physical cores
|
export OMP_NUM_THREADS=56
|
||||||
export FI_PROVIDER=tcp
|
export FI_PROVIDER=tcp
|
||||||
export CCL_ATL_TRANSPORT=ofi
|
export CCL_ATL_TRANSPORT=ofi
|
||||||
|
|
||||||
mpirun -n 2 \
|
mpirun -n 2 \
|
||||||
python -u ./alpaca_qlora_finetuning.py \
|
python -u ./alpaca_relora_finetuning.py \
|
||||||
--base_model "meta-llama/Llama-2-7b-hf" \
|
--base_model "meta-llama/Llama-2-7b-hf" \
|
||||||
--data_path "yahma/alpaca-cleaned" \
|
--data_path "yahma/alpaca-cleaned" \
|
||||||
--output_dir "./bigdl-relora-alpaca" \
|
--output_dir "./bigdl-relora-alpaca" \
|
||||||
--micro_batch_size 8 \
|
--micro_batch_size 8 \
|
||||||
--relora_steps 300 \
|
--relora_steps 300 \
|
||||||
--relora_warmup_steps 10 \
|
--relora_warmup_steps 10 \
|
||||||
--batch_size 128 \
|
--batch_size 128 > relora_training.log
|
||||||
--training_mode "relora" > relora_training.log
|
|
||||||
|
|
@ -15,17 +15,16 @@
|
||||||
#
|
#
|
||||||
|
|
||||||
export MASTER_ADDR=127.0.0.1
|
export MASTER_ADDR=127.0.0.1
|
||||||
export OMP_NUM_THREADS=28 # adjust this to 1/4 of total physical cores
|
export OMP_NUM_THREADS=56
|
||||||
export FI_PROVIDER=tcp
|
export FI_PROVIDER=tcp
|
||||||
export CCL_ATL_TRANSPORT=ofi
|
export CCL_ATL_TRANSPORT=ofi
|
||||||
|
|
||||||
mpirun -n 8 \
|
mpirun -n 8 \
|
||||||
python -u ./alpaca_qlora_finetuning.py \
|
python -u ./alpaca_relora_finetuning.py \
|
||||||
--base_model "meta-llama/Llama-2-7b-hf" \
|
--base_model "meta-llama/Llama-2-7b-hf" \
|
||||||
--data_path "yahma/alpaca-cleaned" \
|
--data_path "yahma/alpaca-cleaned" \
|
||||||
--output_dir "./bigdl-relora-alpaca" \
|
--output_dir "./bigdl-relora-alpaca" \
|
||||||
--micro_batch_size 8 \
|
--micro_batch_size 8 \
|
||||||
--relora_steps 300 \
|
--relora_steps 300 \
|
||||||
--relora_warmup_steps 10 \
|
--relora_warmup_steps 10 \
|
||||||
--batch_size 128 \
|
--batch_size 128 > relora_training.log
|
||||||
--training_mode "relora" > relora_training.log
|
|
||||||
|
|
@ -0,0 +1,18 @@
|
||||||
|
#
|
||||||
|
# Copyright 2016 The BigDL Authors.
|
||||||
|
#
|
||||||
|
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
# you may not use this file except in compliance with the License.
|
||||||
|
# You may obtain a copy of the License at
|
||||||
|
#
|
||||||
|
# http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
#
|
||||||
|
# Unless required by applicable law or agreed to in writing, software
|
||||||
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
# See the License for the specific language governing permissions and
|
||||||
|
# limitations under the License.
|
||||||
|
#
|
||||||
|
|
||||||
|
from .prompter import Prompter
|
||||||
|
from .util import *
|
||||||
|
|
@ -45,7 +45,9 @@ class Prompter(object):
|
||||||
if not template_name:
|
if not template_name:
|
||||||
# Enforce the default here, so the constructor can be called with '' and will not break.
|
# Enforce the default here, so the constructor can be called with '' and will not break.
|
||||||
template_name = "alpaca"
|
template_name = "alpaca"
|
||||||
file_name = osp.join("templates", f"{template_name}.json")
|
current_dir = osp.dirname(osp.realpath(__file__))
|
||||||
|
common_util_path = osp.join(current_dir, '..')
|
||||||
|
file_name = osp.join(common_util_path, "templates", f"{template_name}.json")
|
||||||
if not osp.exists(file_name):
|
if not osp.exists(file_name):
|
||||||
invalidInputError(False, f"Can't read {file_name}")
|
invalidInputError(False, f"Can't read {file_name}")
|
||||||
with open(file_name) as fp:
|
with open(file_name) as fp:
|
||||||
213
python/llm/example/GPU/LLM-Finetuning/common/utils/util.py
Normal file
213
python/llm/example/GPU/LLM-Finetuning/common/utils/util.py
Normal file
|
|
@ -0,0 +1,213 @@
|
||||||
|
#
|
||||||
|
# Copyright 2016 The BigDL Authors.
|
||||||
|
#
|
||||||
|
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
# you may not use this file except in compliance with the License.
|
||||||
|
# You may obtain a copy of the License at
|
||||||
|
#
|
||||||
|
# http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
#
|
||||||
|
# Unless required by applicable law or agreed to in writing, software
|
||||||
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
# See the License for the specific language governing permissions and
|
||||||
|
# limitations under the License.
|
||||||
|
#
|
||||||
|
# Some parts of this file is adapted from
|
||||||
|
# https://github.com/tloen/alpaca-lora/blob/main/finetune.py
|
||||||
|
#
|
||||||
|
# Copyright 2023 Rohan Taori, Ishaan Gulrajani, Tianyi Zhang, Yann Dubois, Xuechen Li
|
||||||
|
|
||||||
|
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
# you may not use this file except in compliance with the License.
|
||||||
|
# You may obtain a copy of the License at
|
||||||
|
#
|
||||||
|
# http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
#
|
||||||
|
# Unless required by applicable law or agreed to in writing, software
|
||||||
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
# See the License for the specific language governing permissions and
|
||||||
|
# limitations under the License.
|
||||||
|
#
|
||||||
|
# Some parts of this file is adapted from https://github.com/tloen/alpaca-lora/blob/main/export_hf_checkpoint.py
|
||||||
|
#
|
||||||
|
# Copyright 2023 Rohan Taori, Ishaan Gulrajani, Tianyi Zhang, Yann Dubois, Xuechen Li
|
||||||
|
|
||||||
|
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
# you may not use this file except in compliance with the License.
|
||||||
|
# You may obtain a copy of the License at
|
||||||
|
|
||||||
|
# http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
|
||||||
|
# Unless required by applicable law or agreed to in writing, software
|
||||||
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
# See the License for the specific language governing permissions and
|
||||||
|
# limitations under the License.
|
||||||
|
#
|
||||||
|
|
||||||
|
import os
|
||||||
|
import transformers
|
||||||
|
|
||||||
|
|
||||||
|
def get_int_from_env(env_keys, default):
|
||||||
|
"""Returns the first positive env value found in the `env_keys` list or the default."""
|
||||||
|
for e in env_keys:
|
||||||
|
val = int(os.environ.get(e, -1))
|
||||||
|
if val >= 0:
|
||||||
|
return val
|
||||||
|
return int(default)
|
||||||
|
|
||||||
|
|
||||||
|
def wandb_check(wandb_project, wandb_watch, wandb_log_model):
|
||||||
|
"""Check if wandb related parameter passed or if set within environ"""
|
||||||
|
use_wandb = len(wandb_project) > 0 or (
|
||||||
|
"WANDB_PROJECT" in os.environ and len(os.environ["WANDB_PROJECT"]) > 0
|
||||||
|
)
|
||||||
|
# Only overwrite environ if wandb param passed
|
||||||
|
if len(wandb_project) > 0:
|
||||||
|
os.environ["WANDB_PROJECT"] = wandb_project
|
||||||
|
if len(wandb_watch) > 0:
|
||||||
|
os.environ["WANDB_WATCH"] = wandb_watch
|
||||||
|
if len(wandb_log_model) > 0:
|
||||||
|
os.environ["WANDB_LOG_MODEL"] = wandb_log_model
|
||||||
|
return use_wandb
|
||||||
|
|
||||||
|
|
||||||
|
def get_train_val_data(data, tokenizer, prompter, train_on_inputs,
|
||||||
|
add_eos_token, cutoff_len, val_set_size, seed=42):
|
||||||
|
"""Data processing to get train data and val data"""
|
||||||
|
def tokenize(prompt, add_eos_token=True):
|
||||||
|
# there's probably a way to do this with the tokenizer settings
|
||||||
|
# but again, gotta move fast
|
||||||
|
result = tokenizer(
|
||||||
|
prompt,
|
||||||
|
truncation=True,
|
||||||
|
max_length=cutoff_len,
|
||||||
|
padding=False,
|
||||||
|
return_tensors=None,
|
||||||
|
)
|
||||||
|
if (
|
||||||
|
result["input_ids"][-1] != tokenizer.eos_token_id
|
||||||
|
and len(result["input_ids"]) < cutoff_len
|
||||||
|
and add_eos_token
|
||||||
|
):
|
||||||
|
result["input_ids"].append(tokenizer.eos_token_id)
|
||||||
|
result["attention_mask"].append(1)
|
||||||
|
result["labels"] = result["input_ids"].copy()
|
||||||
|
return result
|
||||||
|
|
||||||
|
def generate_and_tokenize_prompt(data_point):
|
||||||
|
full_prompt = prompter.generate_prompt(
|
||||||
|
data_point["instruction"],
|
||||||
|
data_point["input"],
|
||||||
|
data_point["output"],
|
||||||
|
)
|
||||||
|
tokenized_full_prompt = tokenize(full_prompt)
|
||||||
|
if not train_on_inputs:
|
||||||
|
user_prompt = prompter.generate_prompt(
|
||||||
|
data_point["instruction"], data_point["input"]
|
||||||
|
)
|
||||||
|
tokenized_user_prompt = tokenize(
|
||||||
|
user_prompt, add_eos_token=add_eos_token
|
||||||
|
)
|
||||||
|
user_prompt_len = len(tokenized_user_prompt["input_ids"])
|
||||||
|
if add_eos_token:
|
||||||
|
user_prompt_len -= 1
|
||||||
|
tokenized_full_prompt["labels"] = [
|
||||||
|
-100
|
||||||
|
] * user_prompt_len + tokenized_full_prompt["labels"][
|
||||||
|
user_prompt_len:
|
||||||
|
] # could be sped up, probably
|
||||||
|
return tokenized_full_prompt
|
||||||
|
|
||||||
|
if val_set_size > 0:
|
||||||
|
train_val = data["train"].train_test_split(
|
||||||
|
test_size=val_set_size, shuffle=True, seed=seed
|
||||||
|
)
|
||||||
|
train_data = (
|
||||||
|
train_val["train"].shuffle().map(generate_and_tokenize_prompt)
|
||||||
|
)
|
||||||
|
val_data = (
|
||||||
|
train_val["test"].shuffle().map(generate_and_tokenize_prompt)
|
||||||
|
)
|
||||||
|
else:
|
||||||
|
train_data = data["train"].shuffle().map(generate_and_tokenize_prompt)
|
||||||
|
val_data = None
|
||||||
|
return train_data, val_data
|
||||||
|
|
||||||
|
|
||||||
|
def merge_adapter(base_model, tokenizer, adapter_path, output_path):
|
||||||
|
"""Merge the adapter into the original model and save"""
|
||||||
|
import torch
|
||||||
|
from bigdl.llm.transformers.qlora import PeftModel, LoraConfig
|
||||||
|
from bigdl.llm.transformers import AutoModelForCausalLM
|
||||||
|
from bigdl.llm.transformers.low_bit_linear import get_block_size
|
||||||
|
import tempfile
|
||||||
|
import shutil
|
||||||
|
|
||||||
|
lora_config = LoraConfig.from_json_file(os.path.join(adapter_path, "adapter_config.json"))
|
||||||
|
training_mode = lora_config.get("training_mode", "qlora")
|
||||||
|
qa_lora = training_mode == "qalora"
|
||||||
|
|
||||||
|
temp_dir = None
|
||||||
|
if qa_lora:
|
||||||
|
# Convert the qa-lora adapter to the correct shapes
|
||||||
|
# The default 4-bit format for qa_lora is sym_int4
|
||||||
|
block_size = get_block_size("sym_int4")
|
||||||
|
temp_dir = tempfile.TemporaryDirectory()
|
||||||
|
tmpdirname = os.path.join(temp_dir.name, "adapter")
|
||||||
|
try:
|
||||||
|
shutil.copytree(adapter_path, tmpdirname)
|
||||||
|
except Exception as e:
|
||||||
|
print(f"Failed to copy adapter dir, error: {e}")
|
||||||
|
mid_lora_path = os.path.join(tmpdirname, "adapter_model.bin")
|
||||||
|
|
||||||
|
adapter_path = os.path.join(adapter_path, "adapter_model.bin")
|
||||||
|
|
||||||
|
lora = torch.load(adapter_path, map_location='cpu')
|
||||||
|
# Get lora_a names
|
||||||
|
tmp_keys = [key for key in lora.keys() if 'lora_A' in key]
|
||||||
|
|
||||||
|
for tmp_key in tmp_keys:
|
||||||
|
lora_a = lora[tmp_key] / block_size
|
||||||
|
lora[tmp_key] = torch.repeat_interleave(lora_a, block_size, dim=1)
|
||||||
|
|
||||||
|
torch.save(lora, mid_lora_path)
|
||||||
|
adapter_path = tmpdirname
|
||||||
|
|
||||||
|
try:
|
||||||
|
base_model = AutoModelForCausalLM.from_pretrained(
|
||||||
|
base_model,
|
||||||
|
# load_in_low_bit="nf4", # should load the orignal model
|
||||||
|
torch_dtype=torch.float16,
|
||||||
|
device_map={"": "cpu"},
|
||||||
|
)
|
||||||
|
|
||||||
|
lora_model = PeftModel.from_pretrained(
|
||||||
|
base_model,
|
||||||
|
adapter_path,
|
||||||
|
device_map={"": "cpu"},
|
||||||
|
torch_dtype=torch.float16,
|
||||||
|
)
|
||||||
|
|
||||||
|
# merge weights - new merging method from peft
|
||||||
|
lora_model = lora_model.merge_and_unload()
|
||||||
|
|
||||||
|
lora_model.train(False)
|
||||||
|
|
||||||
|
lora_model_sd = lora_model.state_dict()
|
||||||
|
deloreanized_sd = {
|
||||||
|
k.replace("base_model.model.", ""): v
|
||||||
|
for k, v in lora_model_sd.items()
|
||||||
|
if "lora" not in k
|
||||||
|
}
|
||||||
|
|
||||||
|
base_model.save_pretrained(output_path, state_dict=deloreanized_sd)
|
||||||
|
tokenizer.save_pretrained(output_path)
|
||||||
|
except Exception as e:
|
||||||
|
print(f"Failed to merge the adapter, error: {e}.")
|
||||||
|
finally:
|
||||||
|
if qa_lora and temp_dir:
|
||||||
|
temp_dir.cleanup()
|
||||||
|
|
@ -1,119 +0,0 @@
|
||||||
#
|
|
||||||
# Copyright 2016 The BigDL Authors.
|
|
||||||
#
|
|
||||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
|
||||||
# you may not use this file except in compliance with the License.
|
|
||||||
# You may obtain a copy of the License at
|
|
||||||
#
|
|
||||||
# http://www.apache.org/licenses/LICENSE-2.0
|
|
||||||
#
|
|
||||||
# Unless required by applicable law or agreed to in writing, software
|
|
||||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
|
||||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
|
||||||
# See the License for the specific language governing permissions and
|
|
||||||
# limitations under the License.
|
|
||||||
#
|
|
||||||
# This file is adapted from https://github.com/tloen/alpaca-lora/blob/main/export_hf_checkpoint.py
|
|
||||||
#
|
|
||||||
# Copyright 2023 Rohan Taori, Ishaan Gulrajani, Tianyi Zhang, Yann Dubois, Xuechen Li
|
|
||||||
|
|
||||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
|
||||||
# you may not use this file except in compliance with the License.
|
|
||||||
# You may obtain a copy of the License at
|
|
||||||
|
|
||||||
# http://www.apache.org/licenses/LICENSE-2.0
|
|
||||||
|
|
||||||
# Unless required by applicable law or agreed to in writing, software
|
|
||||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
|
||||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
|
||||||
# See the License for the specific language governing permissions and
|
|
||||||
# limitations under the License.
|
|
||||||
import os
|
|
||||||
|
|
||||||
import torch
|
|
||||||
from transformers import LlamaTokenizer # noqa: F402
|
|
||||||
from bigdl.llm.transformers.qlora import PeftModel, LoraConfig
|
|
||||||
from bigdl.llm.transformers import AutoModelForCausalLM
|
|
||||||
from bigdl.llm.transformers.low_bit_linear import get_block_size
|
|
||||||
import argparse
|
|
||||||
import tempfile
|
|
||||||
import shutil
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
|
||||||
|
|
||||||
parser = argparse.ArgumentParser(description='Predict Tokens using `generate()` API for Llama2 model')
|
|
||||||
parser.add_argument('--repo-id-or-model-path', type=str, default="meta-llama/Llama-2-7b-hf",
|
|
||||||
help='The huggingface repo id for the Llama2 (e.g. `meta-llama/Llama-2-7b-hf` and `meta-llama/Llama-2-13b-chat-hf`) to be downloaded'
|
|
||||||
', or the path to the huggingface checkpoint folder')
|
|
||||||
parser.add_argument('--adapter_path', type=str,)
|
|
||||||
parser.add_argument('--output_path', type=str,)
|
|
||||||
|
|
||||||
args = parser.parse_args()
|
|
||||||
base_model = model_path = args.repo_id_or_model_path
|
|
||||||
adapter_path = args.adapter_path
|
|
||||||
tokenizer = LlamaTokenizer.from_pretrained(base_model)
|
|
||||||
|
|
||||||
lora_config = LoraConfig.from_json_file(os.path.join(adapter_path, "adapter_config.json"))
|
|
||||||
training_mode = lora_config.get("training_mode", "qlora")
|
|
||||||
qa_lora = training_mode == "qalora"
|
|
||||||
|
|
||||||
temp_dir = None
|
|
||||||
if qa_lora:
|
|
||||||
# Convert the qa-lora adapter to the correct shapes
|
|
||||||
# The default 4-bit format for qa_lora is sym_int4
|
|
||||||
block_size = get_block_size("sym_int4")
|
|
||||||
temp_dir = tempfile.TemporaryDirectory()
|
|
||||||
tmpdirname = os.path.join(temp_dir.name, "adapter")
|
|
||||||
try:
|
|
||||||
shutil.copytree(adapter_path, tmpdirname)
|
|
||||||
except Exception as e:
|
|
||||||
print(f"Failed to copy adapter dir, error: {e}")
|
|
||||||
mid_lora_path = os.path.join(tmpdirname, "adapter_model.bin")
|
|
||||||
|
|
||||||
adapter_path = os.path.join(adapter_path, "adapter_model.bin")
|
|
||||||
|
|
||||||
lora = torch.load(adapter_path, map_location='cpu')
|
|
||||||
# Get lora_a names
|
|
||||||
tmp_keys = [key for key in lora.keys() if 'lora_A' in key]
|
|
||||||
|
|
||||||
for tmp_key in tmp_keys:
|
|
||||||
lora_a = lora[tmp_key] / block_size
|
|
||||||
lora[tmp_key] = torch.repeat_interleave(lora_a, block_size, dim=1)
|
|
||||||
|
|
||||||
torch.save(lora, mid_lora_path)
|
|
||||||
adapter_path = tmpdirname
|
|
||||||
|
|
||||||
try:
|
|
||||||
base_model = AutoModelForCausalLM.from_pretrained(
|
|
||||||
base_model,
|
|
||||||
# load_in_low_bit="nf4", # should load the orignal model
|
|
||||||
torch_dtype=torch.float16,
|
|
||||||
device_map={"": "cpu"},
|
|
||||||
)
|
|
||||||
|
|
||||||
lora_model = PeftModel.from_pretrained(
|
|
||||||
base_model,
|
|
||||||
adapter_path,
|
|
||||||
device_map={"": "cpu"},
|
|
||||||
torch_dtype=torch.float16,
|
|
||||||
)
|
|
||||||
|
|
||||||
# merge weights - new merging method from peft
|
|
||||||
lora_model = lora_model.merge_and_unload()
|
|
||||||
|
|
||||||
lora_model.train(False)
|
|
||||||
|
|
||||||
lora_model_sd = lora_model.state_dict()
|
|
||||||
deloreanized_sd = {
|
|
||||||
k.replace("base_model.model.", ""): v
|
|
||||||
for k, v in lora_model_sd.items()
|
|
||||||
if "lora" not in k
|
|
||||||
}
|
|
||||||
|
|
||||||
base_model.save_pretrained(args.output_path, state_dict=deloreanized_sd)
|
|
||||||
tokenizer.save_pretrained(args.output_path)
|
|
||||||
except Exception as e:
|
|
||||||
print(f"Failed to merge the adapter, error: {e}.")
|
|
||||||
finally:
|
|
||||||
if qa_lora and temp_dir:
|
|
||||||
temp_dir.cleanup()
|
|
||||||
|
|
@ -3,7 +3,7 @@
|
||||||
This folder contains examples of running BigDL-LLM on Intel GPU:
|
This folder contains examples of running BigDL-LLM on Intel GPU:
|
||||||
|
|
||||||
- [HF-Transformers-AutoModels](HF-Transformers-AutoModels): running any ***Hugging Face Transformers*** model on BigDL-LLM (using the standard AutoModel APIs)
|
- [HF-Transformers-AutoModels](HF-Transformers-AutoModels): running any ***Hugging Face Transformers*** model on BigDL-LLM (using the standard AutoModel APIs)
|
||||||
- [QLoRA-FineTuning](QLoRA-FineTuning): running ***QLoRA finetuning*** using BigDL-LLM on Intel GPUs
|
- [LLM-Finetuning](LLM-Finetuning): running ***finetuning*** (such as LoRA, QLoRA, QA-LoRA, etc) using BigDL-LLM on Intel GPUs
|
||||||
- [vLLM-Serving](vLLM-Serving): running ***vLLM*** serving framework on intel GPUs (with BigDL-LLM low-bit optimized models)
|
- [vLLM-Serving](vLLM-Serving): running ***vLLM*** serving framework on intel GPUs (with BigDL-LLM low-bit optimized models)
|
||||||
- [Deepspeed-AutoTP](Deepspeed-AutoTP): running distributed inference using ***DeepSpeed AutoTP*** (with BigDL-LLM low-bit optimized models) on Intel GPUs
|
- [Deepspeed-AutoTP](Deepspeed-AutoTP): running distributed inference using ***DeepSpeed AutoTP*** (with BigDL-LLM low-bit optimized models) on Intel GPUs
|
||||||
- [PyTorch-Models](PyTorch-Models): running any PyTorch model on BigDL-LLM (with "one-line code change")
|
- [PyTorch-Models](PyTorch-Models): running any PyTorch model on BigDL-LLM (with "one-line code change")
|
||||||
|
|
|
||||||
|
|
@ -8,13 +8,13 @@ echo "# Start testing qlora fine-tuning"
|
||||||
start=$(date "+%s")
|
start=$(date "+%s")
|
||||||
|
|
||||||
sed -i 's/max_steps=200/max_steps=2/; s/save_steps=100/save_steps=2/; s/logging_steps=20/logging_steps=1/' \
|
sed -i 's/max_steps=200/max_steps=2/; s/save_steps=100/save_steps=2/; s/logging_steps=20/logging_steps=1/' \
|
||||||
${ANALYTICS_ZOO_ROOT}/python/llm/example/GPU/QLoRA-FineTuning/qlora_finetuning.py
|
${ANALYTICS_ZOO_ROOT}/python/llm/example/GPU/LLM-Finetuning/QLoRA/simple-example/qlora_finetuning.py
|
||||||
|
|
||||||
python ${ANALYTICS_ZOO_ROOT}/python/llm/example/GPU/QLoRA-FineTuning/qlora_finetuning.py \
|
python ${ANALYTICS_ZOO_ROOT}/python/llm/example/GPU/LLM-Finetuning/QLoRA/simple-example/qlora_finetuning.py \
|
||||||
--repo-id-or-model-path ${LLAMA2_7B_ORIGIN_PATH} \
|
--repo-id-or-model-path ${LLAMA2_7B_ORIGIN_PATH} \
|
||||||
--dataset ${ABIRATE_ENGLISH_QUOTES_PATH}
|
--dataset ${ABIRATE_ENGLISH_QUOTES_PATH}
|
||||||
|
|
||||||
python ${ANALYTICS_ZOO_ROOT}/python/llm/example/GPU/QLoRA-FineTuning/export_merged_model.py \
|
python ${ANALYTICS_ZOO_ROOT}/python/llm/example/GPU/LLM-Finetuning/QLoRA/simple-example/export_merged_model.py \
|
||||||
--repo-id-or-model-path ${LLAMA2_7B_ORIGIN_PATH} \
|
--repo-id-or-model-path ${LLAMA2_7B_ORIGIN_PATH} \
|
||||||
--adapter_path ${PWD}/outputs/checkpoint-2 \
|
--adapter_path ${PWD}/outputs/checkpoint-2 \
|
||||||
--output_path ${PWD}/outputs/checkpoint-2-merged
|
--output_path ${PWD}/outputs/checkpoint-2-merged
|
||||||
|
|
|
||||||
Loading…
Reference in a new issue