ipex-llm/python/llm/example/CPU/QLoRA-FineTuning/alpaca-qlora/README.md
Wang, Jian4 ed0dc57c6e LLM: Add cpu qlora support other models guide (#9567)
* use bf16 flag

* add using baichuan model

* update merge

* remove

* update
2023-12-01 11:18:04 +08:00

6.4 KiB

Alpaca QLoRA Finetuning (experimental support)

This example ports Alpaca-LoRA to BigDL-LLM QLoRA on Intel CPUs.

1. Install

conda create -n llm python=3.9
conda activate llm
pip install --pre --upgrade bigdl-llm[all]
pip install datasets transformers==4.34.0
pip install fire peft==0.5.0
pip install accelerate==0.23.0

2. Configures environment variables

source bigdl-llm-init -t

3. Finetuning LLaMA-2-7B on a node:

Example usage:

python ./alpaca_qlora_finetuning_cpu.py \
    --base_model "meta-llama/Llama-2-7b-hf" \
    --data_path "yahma/alpaca-cleaned" \
    --output_dir "./bigdl-qlora-alpaca"

Note: You could also specify --base_model to the local path of the huggingface model checkpoint folder and --data_path to the local path of the dataset JSON file.

Sample Output

{'loss': 1.9231, 'learning_rate': 2.9999945367033285e-05, 'epoch': 0.0}                                                                                                                            
{'loss': 1.8622, 'learning_rate': 2.9999781468531096e-05, 'epoch': 0.01}                                                                                                                           
{'loss': 1.9043, 'learning_rate': 2.9999508305687345e-05, 'epoch': 0.01}                                                                                                                           
{'loss': 1.8967, 'learning_rate': 2.999912588049185e-05, 'epoch': 0.01}                                                                                                                            
{'loss': 1.9658, 'learning_rate': 2.9998634195730358e-05, 'epoch': 0.01}                                                                                                                           
{'loss': 1.8386, 'learning_rate': 2.9998033254984483e-05, 'epoch': 0.02}                                                                                                                           
{'loss': 1.809, 'learning_rate': 2.999732306263172e-05, 'epoch': 0.02}                                                                                                                             
{'loss': 1.8552, 'learning_rate': 2.9996503623845395e-05, 'epoch': 0.02}                                                                                                                           
  1%|█                                                                                                                                                         | 8/1164 [xx:xx<xx:xx:xx, xx s/it]

Guide to finetuning QLoRA on one node with multiple sockets

  1. install extra lib
# need to run the alpaca stand-alone version first
# for using mpirun
pip install oneccl_bind_pt -f https://developer.intel.com/ipex-whl-stable
  1. modify conf in finetune_one_node_two_sockets.sh and run
source ${conda_env}/lib/python3.9/site-packages/oneccl_bindings_for_pytorch/env/setvars.sh
bash finetune_one_node_two_sockets.sh

Guide to use different prompts or different datasets

Now the prompter is for the datasets with instruction input(optional) and output. If you want to use different datasets, you can add template file xxx.json in templates. And then update utils.prompter.py's generate_prompt method and update generate_and_tokenize_prompt method to fix the dataset. For example, I want to train llama2-7b with english_quotes just like this example

  1. add template english_quotes.json
{
    "prompt": "{quote} ->: {tags}"
}
  1. update prompter.py and add new generate_prompt method
def generate_quote_prompt(self, quote: str, tags: Union[None, list]=None,) -> str:
    tags = str(tags)
    res = self.template["prompt"].format(
        quote=quote, tags=tags
    )
    if self._verbose:
        print(res)
    return res
  1. update generate_and_tokenize_prompt method
def generate_and_tokenize_prompt(data_point):
    full_prompt = prompter.generate_quote_prompt(
        data_point["quote"], data_point["tags"]
    )
    user_prompt = prompter.generate_quote_prompt(
        data_point["quote"], data_point["tags"]
    )
  1. choose prompt english_quotes to train
python ./quotes_qlora_finetuning_cpu.py \
    --base_model "meta-llama/Llama-2-7b-hf" \
    --data_path "./english_quotes" \
    --output_dir "./bigdl-qlora-alpaca" \
    --prompt_template_name "english_quotes"

Guide to finetuning QLoRA using different models

Make sure you fully understand the entire finetune process and the model is the latest version. Using Baichuan-7B as an example:

  1. Update the Tokenizer first. Because the base example is for llama model.
from transformers import LlamaTokenizer
AutoTokenizer.from_pretrained(base_model)
  1. Maybe some models need to add trust_remote_code=True in from_pretrained model and tokenizer
tokenizer = AutoTokenizer.from_pretrained(base_model, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(base_model, xxxxx, trust_remote_code=True)
  1. Modify the target_modules according to the model you need to train, you can refer to here. Or just search for the recommended training target modules.
lora_target_modules: List[str] = ["W_pack"]
  1. Maybe need to change the tokenizer.pad_token_id = tokenizer.eod_id (Qwen)
  2. (Only for baichuan) According to this issue, need to modify the tokenization_baichuan.py to fix issue.
  3. finetune as normal
  4. Using the export_merged_model.py to merge. But also need to update tokenizer and model to ensure successful merge weight.
from transformers import AutoTokenizer  # noqa: F402
tokenizer = AutoTokenizer.from_pretrained(base_model, trust_remote_code=True)
base_model = AutoModelForCausalLM.from_pretrained(base_model,trust_remote_code=True)