LLM: Add qlora cpu distributed readme (#9561)

* init readme * add distributed guide * update
2023-11-30 13:42:30 +08:00 · 2023-11-30 13:42:30 +08:00 · a0a80d232e
commit a0a80d232e
parent c8e0c2ed48
3 changed files with 38 additions and 0 deletions
--- a/python/llm/example/CPU/QLoRA-FineTuning/README.md
+++ b/python/llm/example/CPU/QLoRA-FineTuning/README.md
@ -3,6 +3,12 @@
 This example demonstrates how to finetune a llama2-7b model using Big-LLM 4bit optimizations on [Intel CPUs](../README.md).


+## Distributed Training Guide
+1. Single node with single socket: [simple example](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/QLoRA-FineTuning#example-finetune-llama2-7b-using-qlora)
+or [alpaca example](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/QLoRA-FineTuning/alpaca-qlora)
+2. [Single node with multiple sockets](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/QLoRA-FineTuning/alpaca-qlora#guide-to-finetuning-qlora-on-one-node-with-multiple-sockets)
+3. multiple nodes with multiple sockets
+
 ## Example: Finetune llama2-7b using QLoRA

 This example is ported from [bnb-4bit-training](https://colab.research.google.com/drive/1VoYNfYDKcKRQRor98Zbf2-9VQTtGJ24k). 
--- a/python/llm/example/CPU/QLoRA-FineTuning/alpaca-qlora/README.md
+++ b/python/llm/example/CPU/QLoRA-FineTuning/alpaca-qlora/README.md
@ -44,6 +44,20 @@ python ./alpaca_qlora_finetuning_cpu.py \
  1%|█                                                                                                                                                         | 8/1164 [xx:xx<xx:xx:xx, xx s/it]
 ```

+### Guide to finetuning QLoRA on one node with multiple sockets
+1. install extra lib
+```bash
+# need to run the alpaca stand-alone version first
+# for using mpirun
+pip install oneccl_bind_pt -f https://developer.intel.com/ipex-whl-stable
+```
+2. modify conf in `finetune_one_node_two_sockets.sh` and run
+```
+source ${conda_env}/lib/python3.9/site-packages/oneccl_bindings_for_pytorch/env/setvars.sh
+bash finetune_one_node_two_sockets.sh
+```
+
+
 ### Guide to use different prompts or different datasets
 Now the prompter is for the datasets with `instruction` `input`(optional) and `output`. If you want to use different datasets,
 you can add template file xxx.json in templates. And then update utils.prompter.py's `generate_prompt` method and update `generate_and_tokenize_prompt` method to fix the dataset.
--- a/python/llm/example/CPU/QLoRA-FineTuning/alpaca-qlora/finetune_one_node_two_sockets.sh
+++ b/python/llm/example/CPU/QLoRA-FineTuning/alpaca-qlora/finetune_one_node_two_sockets.sh
@ -0,0 +1,18 @@
+export MASTER_ADDR=127.0.0.1
+export SOCKET_CORES=48
+
+source bigdl-llm-init -t
+mpirun -n 2 \
+ --bind-to socket \
+ -genv OMP_NUM_THREADS=$SOCKET_CORES \
+ -genv KMP_AFFINITY="granularity=fine,none" \
+ -genv KMP_BLOCKTIME=1 \
+ python alpaca_qlora_finetuning_cpu.py \
+ --gradient_checkpointing False \
+ --batch_size 128 \
+ --micro_batch_size 8 \
+ --max_steps -1 \
+ --base_model "meta-llama/Llama-2-7b-hf" \
+ --data_path "yahma/alpaca-cleaned" \
+ --output_dir "./bigdl-qlora-alpaca"
+