Yang Wang c88f6ec457 Experiment XPU QLora Finetuning (#8937 )

* Support xpu finetuning

* support xpu finetuning

* fix style

* fix style

* fix style

* refine example

* add readme

* refine readme

* refine api

* fix fp16

* fix example

* refactor

* fix style

* fix compute type

* add qlora

* refine training args

* fix example

* fix style

* fast path forinference

* address comments

* refine readme

* revert lint

2023-09-19 10:15:44 -07:00

3.2 KiB

Raw Blame History

Q-Lora (experimental support)

This example demonstrates how to finetune a llama2-7b model use Big-LLM 4bit optimizations using Intel GPUs.

0. Requirements

To run this example with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to here for more information.

Example: Finetune llama2-7b using qlora

This example is ported from bnb-4bit-training

1. Install

conda create -n llm python=3.9
conda activate llm
# below command will install intel_extension_for_pytorch==2.0.110+xpu as default
# you can install specific ipex/torch version for your need
pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
pip install git+https://github.com/huggingface/transformers.git@95fe0f5
pip install peft==0.5.0

2. Configures OneAPI environment variables

source /opt/intel/oneapi/setvars.sh

3. Run

python ./qlora_finetuning.py --repo-id-or-model-path REPO_ID_OR_MODEL_PATH

Sample Output

{'loss': 1.6134, 'learning_rate': 0.0002, 'epoch': 0.03}                                                                                 
{'loss': 1.3038, 'learning_rate': 0.00017777777777777779, 'epoch': 0.06}                                                                 
{'loss': 1.2634, 'learning_rate': 0.00015555555555555556, 'epoch': 0.1}                                                                  
{'loss': 1.2389, 'learning_rate': 0.00013333333333333334, 'epoch': 0.13}                                                                 
{'loss': 1.0399, 'learning_rate': 0.00011111111111111112, 'epoch': 0.16}                                                                 
{'loss': 1.0406, 'learning_rate': 8.888888888888889e-05, 'epoch': 0.19}                                                                  
{'loss': 1.3114, 'learning_rate': 6.666666666666667e-05, 'epoch': 0.22}                                                                  
{'loss': 0.9876, 'learning_rate': 4.4444444444444447e-05, 'epoch': 0.26}                                                                 
{'loss': 1.1406, 'learning_rate': 2.2222222222222223e-05, 'epoch': 0.29}                                                                 
{'loss': 1.1728, 'learning_rate': 0.0, 'epoch': 0.32}                                                                                    
{'train_runtime': 225.8005, 'train_samples_per_second': 3.543, 'train_steps_per_second': 0.886, 'train_loss': 1.211241865158081, 'epoch': 0.32}
100%|██████████████████████████████████████████████████████████████████████████████████████████████████| 200/200 [03:45<00:00,  1.13s/it]
TrainOutput(global_step=200, training_loss=1.211241865158081, metrics={'train_runtime': 225.8005, 'train_samples_per_second': 3.543, 'train_steps_per_second': 0.886, 'train_loss': 1.211241865158081, 'epoch': 0.32})

3.2 KiB Raw Blame History