Fix type mismatch in eval for Baichuan2 QLora example (#11117)

* During the evaluation stage, Baichuan2 will raise type mismatch when training with bfloat16. Fix this issue by modifying modeling_baichuan.py. Add doc about how to modify this file.
This commit is contained in:
Qiyuan Gong 2024-05-24 14:14:30 +08:00 committed by GitHub
parent 21a1a973c1
commit 120a0035ac
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
2 changed files with 18 additions and 1 deletions

View file

@ -175,6 +175,23 @@ bash qlora_finetune_qwen15_7b_arc_1_card.sh
##### Finetuning Baichuan2-7B examples on single Arc A770
Please download [Baichuan2-7B-Chat](https://huggingface.co/baichuan-inc/Baichuan2-7B-Chat). Modify `modeling_baichuan.py` in model dir. Add following 2 lines into Line 234. This change fixes [Baichuan2 type mismatch issue](https://github.com/baichuan-inc/Baichuan2/issues/291).
```python
if(attention_mask.dtype != query_states.dtype):
attention_mask = attention_mask.to(query_states.dtype)
```
After modification, line 234-236 should look like below.
```python
with torch.backends.cuda.sdp_kernel(enable_flash=True, enable_math=True, enable_mem_efficient=True):
if(attention_mask.dtype != query_states.dtype):
attention_mask = attention_mask.to(query_states.dtype)
attn_output = F.scaled_dot_product_attention(query_states, key_states, value_states, attn_mask = attention_mask)
```
Modify `--base_model` in `qlora_finetune_baichuan2_7b_arc_1_card.sh`. Then, launch finetune.
```bash
bash qlora_finetune_baichuan2_7b_arc_1_card.sh
```

View file

@ -16,6 +16,6 @@
# You could also specify `--base_model` to the local path of the huggingface model checkpoint folder and `--data_path` to the local path of the dataset JSON file
python ./alpaca_qlora_finetuning.py \
--base_model "baichuan-inc/Baichuan2-7B-Chat" \
--base_model "path/to/Baichuan2-7B-Chat" \
--data_path "yahma/alpaca-cleaned" \
--output_dir "./ipex-llm-qlora-alpaca"