From 499100daf1ced16202d9f238b76d4f00ed2c7cb9 Mon Sep 17 00:00:00 2001
From: binbin Deng <108676127+plusbang@users.noreply.github.com>
Date: Fri, 8 Dec 2023 10:51:55 +0800
Subject: [PATCH] LLM: Add solution to fix `oneccl` related error (#9630)

---
 .../example/GPU/QLoRA-FineTuning/alpaca-qlora/README.md    | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/python/llm/example/GPU/QLoRA-FineTuning/alpaca-qlora/README.md b/python/llm/example/GPU/QLoRA-FineTuning/alpaca-qlora/README.md
index bc8ca67e..98cfbae3 100644
--- a/python/llm/example/GPU/QLoRA-FineTuning/alpaca-qlora/README.md
+++ b/python/llm/example/GPU/QLoRA-FineTuning/alpaca-qlora/README.md
@@ -125,3 +125,10 @@ python ./export_merged_model.py --repo-id-or-model-path REPO_ID_OR_MODEL_PATH --
 ```
 
 Then you can use `./outputs/checkpoint-200-merged` as a normal huggingface transformer model to do inference.
+
+### 5. Troubleshooting
+- If you fail to finetune on multi cards because of following error message:
+  ```bash
+  RuntimeError: oneCCL: comm_selector.cpp:57 create_comm_impl: EXCEPTION: ze_data was not initialized
+  ```
+  Please try `sudo apt install level-zero-dev` to fix it.