History

Heyang Sun 00f322d8ee Finetune ChatGLM with Deepspeed Zero3 LoRA (#11314 ) * Fintune ChatGLM with Deepspeed Zero3 LoRA * add deepspeed zero3 config * rename config * remove offload_param * add save_checkpoint parameter * Update lora_deepspeed_zero3_finetune_chatglm3_6b_arc_2_card.sh * refine		2024-06-18 12:31:26 +08:00
..
axolotl	Remove accelerate 0.23.0 install command in readme and docker (#11333 )	2024-06-17 17:52:12 +08:00
common	Refactor bigdl.llm to ipex_llm (#24 )	2024-03-22 15:41:21 +08:00
DPO	Remove accelerate 0.23.0 install command in readme and docker (#11333 )	2024-06-17 17:52:12 +08:00
GaLore	fix missing import (#10839 )	2024-04-22 14:34:52 +08:00
HF-PEFT	Remove accelerate 0.23.0 install command in readme and docker (#11333 )	2024-06-17 17:52:12 +08:00
LISA	Remove accelerate 0.23.0 install command in readme and docker (#11333 )	2024-06-17 17:52:12 +08:00
LoRA	Finetune ChatGLM with Deepspeed Zero3 LoRA (#11314 )	2024-06-18 12:31:26 +08:00
QA-LoRA	Remove accelerate 0.23.0 install command in readme and docker (#11333 )	2024-06-17 17:52:12 +08:00
QLoRA	Remove accelerate 0.23.0 install command in readme and docker (#11333 )	2024-06-17 17:52:12 +08:00
ReLora	Remove accelerate 0.23.0 install command in readme and docker (#11333 )	2024-06-17 17:52:12 +08:00
README.md	Add WANDB_MODE and HF_HUB_OFFLINE to XPU finetune README (#11097 )	2024-05-22 15:20:53 +08:00

Running LLM Finetuning using IPEX-LLM on Intel GPU

This folder contains examples of running different training mode with IPEX-LLM on Intel GPU:

LoRA: examples of running LoRA finetuning
QLoRA: examples of running QLoRA finetuning
QA-LoRA: examples of running QA-LoRA finetuning
ReLora: examples of running ReLora finetuning
DPO: examples of running DPO finetuning
common: common templates and utility classes in finetuning examples
HF-PEFT: run finetuning on Intel GPU using Hugging Face PEFT code without modification
axolotl: LLM finetuning on Intel GPU using axolotl without writing code

Verified Models

If you fail to finetune on multi cards because of following error message:
```
RuntimeError: oneCCL: comm_selector.cpp:57 create_comm_impl: EXCEPTION: ze_data was not initialized
```
Please try sudo apt install level-zero-dev to fix it.
Please raise the system open file limit using ulimit -n 1048576. Otherwise, there may exist error Too many open files.
If application raise wandb.errors.UsageError: api_key not configured (no-tty). Please login wandb or disable wandb login with this command:

export WANDB_MODE=offline

If application raise Hugging Face related errors, i.e., NewConnectionError or Failed to download etc. Please download models and datasets, set model and data path, then set HF_HUB_OFFLINE with this command:

export HF_HUB_OFFLINE=1