Fix type mismatch in eval for Baichuan2 QLora example (#11117)
* During the evaluation stage, Baichuan2 will raise type mismatch when training with bfloat16. Fix this issue by modifying modeling_baichuan.py. Add doc about how to modify this file.
This commit is contained in:
parent
21a1a973c1
commit
120a0035ac
2 changed files with 18 additions and 1 deletions
|
|
@ -175,6 +175,23 @@ bash qlora_finetune_qwen15_7b_arc_1_card.sh
|
|||
|
||||
##### Finetuning Baichuan2-7B examples on single Arc A770
|
||||
|
||||
Please download [Baichuan2-7B-Chat](https://huggingface.co/baichuan-inc/Baichuan2-7B-Chat). Modify `modeling_baichuan.py` in model dir. Add following 2 lines into Line 234. This change fixes [Baichuan2 type mismatch issue](https://github.com/baichuan-inc/Baichuan2/issues/291).
|
||||
|
||||
```python
|
||||
if(attention_mask.dtype != query_states.dtype):
|
||||
attention_mask = attention_mask.to(query_states.dtype)
|
||||
```
|
||||
|
||||
After modification, line 234-236 should look like below.
|
||||
|
||||
```python
|
||||
with torch.backends.cuda.sdp_kernel(enable_flash=True, enable_math=True, enable_mem_efficient=True):
|
||||
if(attention_mask.dtype != query_states.dtype):
|
||||
attention_mask = attention_mask.to(query_states.dtype)
|
||||
attn_output = F.scaled_dot_product_attention(query_states, key_states, value_states, attn_mask = attention_mask)
|
||||
```
|
||||
Modify `--base_model` in `qlora_finetune_baichuan2_7b_arc_1_card.sh`. Then, launch finetune.
|
||||
|
||||
```bash
|
||||
bash qlora_finetune_baichuan2_7b_arc_1_card.sh
|
||||
```
|
||||
|
|
|
|||
|
|
@ -16,6 +16,6 @@
|
|||
|
||||
# You could also specify `--base_model` to the local path of the huggingface model checkpoint folder and `--data_path` to the local path of the dataset JSON file
|
||||
python ./alpaca_qlora_finetuning.py \
|
||||
--base_model "baichuan-inc/Baichuan2-7B-Chat" \
|
||||
--base_model "path/to/Baichuan2-7B-Chat" \
|
||||
--data_path "yahma/alpaca-cleaned" \
|
||||
--output_dir "./ipex-llm-qlora-alpaca"
|
||||
|
|
|
|||
Loading…
Reference in a new issue