LLM: Add solution to fix oneccl related error (#9630)
				
					
				
			This commit is contained in:
		
							parent
							
								
									d204125e88
								
							
						
					
					
						commit
						499100daf1
					
				
					 1 changed files with 7 additions and 0 deletions
				
			
		| 
						 | 
					@ -125,3 +125,10 @@ python ./export_merged_model.py --repo-id-or-model-path REPO_ID_OR_MODEL_PATH --
 | 
				
			||||||
```
 | 
					```
 | 
				
			||||||
 | 
					
 | 
				
			||||||
Then you can use `./outputs/checkpoint-200-merged` as a normal huggingface transformer model to do inference.
 | 
					Then you can use `./outputs/checkpoint-200-merged` as a normal huggingface transformer model to do inference.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					### 5. Troubleshooting
 | 
				
			||||||
 | 
					- If you fail to finetune on multi cards because of following error message:
 | 
				
			||||||
 | 
					  ```bash
 | 
				
			||||||
 | 
					  RuntimeError: oneCCL: comm_selector.cpp:57 create_comm_impl: EXCEPTION: ze_data was not initialized
 | 
				
			||||||
 | 
					  ```
 | 
				
			||||||
 | 
					  Please try `sudo apt install level-zero-dev` to fix it.
 | 
				
			||||||
| 
						 | 
					
 | 
				
			||||||
		Loading…
	
		Reference in a new issue