LLM: Add solution to fix oneccl related error (#9630)
This commit is contained in:
parent
d204125e88
commit
499100daf1
1 changed files with 7 additions and 0 deletions
|
|
@ -125,3 +125,10 @@ python ./export_merged_model.py --repo-id-or-model-path REPO_ID_OR_MODEL_PATH --
|
|||
```
|
||||
|
||||
Then you can use `./outputs/checkpoint-200-merged` as a normal huggingface transformer model to do inference.
|
||||
|
||||
### 5. Troubleshooting
|
||||
- If you fail to finetune on multi cards because of following error message:
|
||||
```bash
|
||||
RuntimeError: oneCCL: comm_selector.cpp:57 create_comm_impl: EXCEPTION: ze_data was not initialized
|
||||
```
|
||||
Please try `sudo apt install level-zero-dev` to fix it.
|
||||
|
|
|
|||
Loading…
Reference in a new issue