* Change to 'pip install .. --extra-index-url' for readthedocs * Change to 'pip install .. --extra-index-url' for examples * Change to 'pip install .. --extra-index-url' for remaining files * Fix URL for ipex * Add links for ipex US and CN servers * Update ipex cpu url * remove readme * Update for github actions * Update for dockerfiles
7.2 KiB
Alpaca QLoRA Finetuning (experimental support)
This example ports Alpaca-LoRA to IPEX-LLM QLoRA on Intel CPUs.
1. Install
conda create -n llm python=3.9
conda activate llm
pip install --pre --upgrade ipex-llm[all]
pip install datasets transformers==4.35.0
pip install fire peft==0.5.0
pip install accelerate==0.23.0
pip install bitsandbytes scipy
2. Configures environment variables
source ipex-llm-init -t
3. Finetuning LLaMA-2-7B on a node:
Example usage:
python ./alpaca_qlora_finetuning_cpu.py \
--base_model "meta-llama/Llama-2-7b-hf" \
--data_path "yahma/alpaca-cleaned" \
--output_dir "./ipex-qlora-alpaca"
Note: You could also specify --base_model to the local path of the huggingface model checkpoint folder and --data_path to the local path of the dataset JSON file.
Sample Output
{'loss': 1.9231, 'learning_rate': 2.9999945367033285e-05, 'epoch': 0.0}
{'loss': 1.8622, 'learning_rate': 2.9999781468531096e-05, 'epoch': 0.01}
{'loss': 1.9043, 'learning_rate': 2.9999508305687345e-05, 'epoch': 0.01}
{'loss': 1.8967, 'learning_rate': 2.999912588049185e-05, 'epoch': 0.01}
{'loss': 1.9658, 'learning_rate': 2.9998634195730358e-05, 'epoch': 0.01}
{'loss': 1.8386, 'learning_rate': 2.9998033254984483e-05, 'epoch': 0.02}
{'loss': 1.809, 'learning_rate': 2.999732306263172e-05, 'epoch': 0.02}
{'loss': 1.8552, 'learning_rate': 2.9996503623845395e-05, 'epoch': 0.02}
1%|█ | 8/1164 [xx:xx<xx:xx:xx, xx s/it]
Guide to finetuning QLoRA on one node with multiple sockets
- install extra lib
# need to run the alpaca stand-alone version first
# for using mpirun
pip install oneccl_bind_pt --extra-index-url https://developer.intel.com/ipex-whl-stable
- modify conf in
finetune_one_node_two_sockets.shand run
source ${conda_env}/lib/python3.9/site-packages/oneccl_bindings_for_pytorch/env/setvars.sh
bash finetune_one_node_two_sockets.sh
Guide to use different prompts or different datasets
Now the prompter is for the datasets with instruction input(optional) and output. If you want to use different datasets,
you can add template file xxx.json in templates. And then update utils.prompter.py's generate_prompt method and update generate_and_tokenize_prompt method to fix the dataset.
For example, I want to train llama2-7b with english_quotes just like this example
- add template english_quotes.json
{
"prompt": "{quote} ->: {tags}"
}
- update prompter.py and add new generate_prompt method
def generate_quote_prompt(self, quote: str, tags: Union[None, list]=None,) -> str:
tags = str(tags)
res = self.template["prompt"].format(
quote=quote, tags=tags
)
if self._verbose:
print(res)
return res
- update generate_and_tokenize_prompt method
def generate_and_tokenize_prompt(data_point):
full_prompt = prompter.generate_quote_prompt(
data_point["quote"], data_point["tags"]
)
user_prompt = prompter.generate_quote_prompt(
data_point["quote"], data_point["tags"]
)
- choose prompt
english_quotesto train
python ./quotes_qlora_finetuning_cpu.py \
--base_model "meta-llama/Llama-2-7b-hf" \
--data_path "./english_quotes" \
--output_dir "./ipex-qlora-alpaca" \
--prompt_template_name "english_quotes"
Guide to finetuning QLoRA using different models
Make sure you fully understand the entire finetune process and the model is the latest version. Using Baichuan-7B as an example:
- Update the Tokenizer first. Because the base example is for llama model.
from transformers import LlamaTokenizer
AutoTokenizer.from_pretrained(base_model)
- Maybe some models need to add
trust_remote_code=Truein from_pretrained model and tokenizer
tokenizer = AutoTokenizer.from_pretrained(base_model, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(base_model, xxxxx, trust_remote_code=True)
- Modify the
target_modulesaccording to the model you need to train, you can refer to here. Or just search for the recommended training target modules.
lora_target_modules: List[str] = ["W_pack"]
- Maybe need to change the
tokenizer.pad_token_id = tokenizer.eod_id(Qwen) - (Only for baichuan) According to this issue, need to modify the tokenization_baichuan.py to fix issue.
- finetune as normal
- Using the export_merged_model.py to merge. But also need to update tokenizer and model to ensure successful merge weight.
from transformers import AutoTokenizer # noqa: F402
tokenizer = AutoTokenizer.from_pretrained(base_model, trust_remote_code=True)
base_model = AutoModelForCausalLM.from_pretrained(base_model,trust_remote_code=True)
4. Finetuning in docker and multiple nodes (k8s)
If you want to run multi-process fine-tuning, or do not want to manually install the above dependencies, we provide a docker solution to quickly start a one-container finetuning. Please refer to here.
Moreover, for users with multiple CPU server resources e.g. Xeon series like SPR and ICX, we give a k8s distributed solution, where machines and processor sockets are allowed to collaborate by one click easily. Please refer to here for how to run QLoRA on k8s.