## Fine-tune LLM with BigDL LLM Container The following shows how to fine-tune LLM with Quantization (QLoRA built on BigDL-LLM 4bit optimizations) in a docker environment, which is accelerated by Intel XPU. ### 1. Prepare Docker Image You can download directly from Dockerhub like: ```bash docker pull intelanalytics/bigdl-llm-finetune-qlora-xpu:2.5.0-SNAPSHOT ``` Or build the image from source: ```bash export HTTP_PROXY=your_http_proxy export HTTPS_PROXY=your_https_proxy docker build \ --build-arg http_proxy=${HTTP_PROXY} \ --build-arg https_proxy=${HTTPS_PROXY} \ -t intelanalytics/bigdl-llm-finetune-qlora-xpu:2.5.0-SNAPSHOT \ -f ./Dockerfile . ``` ### 2. Prepare Base Model, Data and Container Here, we try to fine-tune a [Llama2-7b](https://huggingface.co/meta-llama/Llama-2-7b) with [English Quotes](https://huggingface.co/datasets/Abirate/english_quotes) dataset, and please download them and start a docker container with files mounted like below: ```bash export BASE_MODE_PATH=your_downloaded_base_model_path export DATA_PATH=your_downloaded_data_path export HTTP_PROXY=your_http_proxy export HTTPS_PROXY=your_https_proxy docker run -itd \ --net=host \ --device=/dev/dri \ --memory="32G" \ --name=bigdl-llm-fintune-qlora-xpu \ -e http_proxy=${HTTP_PROXY} \ -e https_proxy=${HTTPS_PROXY} \ -v $BASE_MODE_PATH:/model \ -v $DATA_PATH:/data/english_quotes \ --shm-size="16g" \ intelanalytics/bigdl-llm-fintune-qlora-xpu:2.5.0-SNAPSHOT ``` The download and mount of base model and data to a docker container demonstrates a standard fine-tuning process. You can skip this step for a quick start, and in this way, the fine-tuning codes will automatically download the needed files: ```bash export HTTP_PROXY=your_http_proxy export HTTPS_PROXY=your_https_proxy docker run -itd \ --net=host \ --device=/dev/dri \ --memory="32G" \ --name=bigdl-llm-fintune-qlora-xpu \ -e http_proxy=${HTTP_PROXY} \ -e https_proxy=${HTTPS_PROXY} \ --shm-size="16g" \ intelanalytics/bigdl-llm-fintune-qlora-xpu:2.5.0-SNAPSHOT ``` However, we do recommend you to handle them manually, because the automatical download can be blocked by Internet access and Huggingface authentication etc. according to different environment, and the manual method allows you to fine-tune in a custom way (with different base model and dataset). ### 3. Start Fine-Tuning Enter the running container: ```bash docker exec -it bigdl-llm-fintune-qlora-xpu bash ``` Then, start QLoRA fine-tuning: ```bash bash start-qlora-finetuning-on-xpu.sh ``` After minutes, it is expected to get results like: ```bash {'loss': 2.256, 'learning_rate': 0.0002, 'epoch': 0.03} {'loss': 1.8869, 'learning_rate': 0.00017777777777777779, 'epoch': 0.06} {'loss': 1.5334, 'learning_rate': 0.00015555555555555556, 'epoch': 0.1} {'loss': 1.4975, 'learning_rate': 0.00013333333333333334, 'epoch': 0.13} {'loss': 1.3245, 'learning_rate': 0.00011111111111111112, 'epoch': 0.16} {'loss': 1.2622, 'learning_rate': 8.888888888888889e-05, 'epoch': 0.19} {'loss': 1.3944, 'learning_rate': 6.666666666666667e-05, 'epoch': 0.22} {'loss': 1.2481, 'learning_rate': 4.4444444444444447e-05, 'epoch': 0.26} {'loss': 1.3442, 'learning_rate': 2.2222222222222223e-05, 'epoch': 0.29} {'loss': 1.3256, 'learning_rate': 0.0, 'epoch': 0.32} {'train_runtime': 204.4633, 'train_samples_per_second': 3.913, 'train_steps_per_second': 0.978, 'train_loss': 1.5072882556915284, 'epoch': 0.32} 100%|██████████████████████████████████████████████████████████████████████████████████████| 200/200 [03:24<00:00, 1.02s/it] TrainOutput(global_step=200, training_loss=1.5072882556915284, metrics={'train_runtime': 204.4633, 'train_samples_per_second': 3.913, 'train_steps_per_second': 0.978, 'train_loss': 1.5072882556915284, 'epoch': 0.32}) ```