History

Heyang Sun fc33aa3721 fix missing import (#10839 )		2024-04-22 14:34:52 +08:00
..
galore_finetuning.py	fix missing import (#10839 )	2024-04-22 14:34:52 +08:00
README.md	GaLore Finetuning Example (#10722 )	2024-04-18 13:47:41 +08:00

README.md

GaLore Finetuning with IPEX-LLM

This is an example of IPEX-LLM GaLore fine-tuning on Intel GPU, which refers Huggingface GaLore blog and changes model to openlm-research/open_llama_3b_v2 and dataset to HuggingFaceH4/helpful_instructions.

0. Requirements

To run this example with IPEX-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to here for more information.

1. Install

conda create -n llm python=3.11
conda activate llm
# below command will install intel_extension_for_pytorch==2.1.10+xpu as default
pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
pip install galore-torch
pip install accelerate==0.28.0
pip install bitsandbytes==0.43.0
pip install datasets==2.18.0
pip install transformers==4.39.1
pip install trl==0.8.1

2. GaLore Finetune

Currently, GaLore only supports local fine-tuning, and here is how to fine-tune Llama2 7B on an Intel Max GPU server:

# Configures OneAPI environment variables
source /opt/intel/oneapi/setvars.sh
python galore_finetuning.py # optional parameters as below

Optional parameters for galore_finetuning.py:

--repo-id-or-model-path : default to openlm-research/open_llama_3b_v2, and you can also specify your local model path.

--data-path : default to HuggingFaceH4/helpful_instructions, and you can also specify your local datal path, while note that changing to the other datasets will introduce code modification effort for yourself.

--output-dir : default to ./ipex-llm-galore to save fine-tuned model, and you can change if needed.

3. Sample Output

......
{'loss': 2.0989, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}
{'loss': 1.9064, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}
{'loss': 1.7483, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.01}
{'loss': 1.9551, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.01}
{'loss': 1.783, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.01}
{'loss': 1.3328, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.01}
{'loss': 1.4622, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.01}
{'loss': 1.9094, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.02}
  5%|████▏                                                                                      | 70/1500 [xx:xx<x:xx:xx, xx.xxs/it]
......