update troubleshooting for llama.cpp and ollama (#11890)
* update troubleshooting for llama.cpp and ollama * update * update
This commit is contained in:
		
							parent
							
								
									c1d07bc626
								
							
						
					
					
						commit
						5a8fc1baa2
					
				
					 2 changed files with 14 additions and 0 deletions
				
			
		| 
						 | 
					@ -296,6 +296,18 @@ Log end
 | 
				
			||||||
 | 
					
 | 
				
			||||||
### Troubleshooting
 | 
					### Troubleshooting
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					#### Unable to run the initialization script
 | 
				
			||||||
 | 
					If you are unable to run `init-llama-cpp.bat`, please make sure you have installed `ipex-llm[cpp]` in your conda environment. If you have installed it, please check if you have activated the correct conda environment. Also, if you are using Windows, please make sure you have run the script with administrator privilege in prompt terminal.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					#### `DeviceList is empty. -30 (PI_ERROR_INVALID_VALUE)` error
 | 
				
			||||||
 | 
					On Linux, this error happens when devices starting with `[ext_oneapi_level_zero]` are not found. Please make sure you have installed level-zero, and have sourced `/opt/intel/oneapi/setvars.sh` before running the command.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					#### `Prompt is too long` error
 | 
				
			||||||
 | 
					If you encounter `main: prompt is too long (xxx tokens, max xxx)`, please increase the `-c` parameter to set a larger size of context.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					#### `gemm: cannot allocate memory on host` error / `could not create an engine` error
 | 
				
			||||||
 | 
					If you meet `oneapi::mkl::oneapi::mkl::blas::gemm: cannot allocate memory on host` error, or `could not create an engine` on Linux, this is probably caused by pip installed OneAPI dependencies. You should prevent installing like `pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0`, and instead use `apt` to install on Linux. Please refer to [this guide](./install_linux_gpu.md) for more details.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
#### Fail to quantize model
 | 
					#### Fail to quantize model
 | 
				
			||||||
If you encounter `main: failed to quantize model from xxx`, please make sure you have created related output directory.
 | 
					If you encounter `main: failed to quantize model from xxx`, please make sure you have created related output directory.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
| 
						 | 
					
 | 
				
			||||||
| 
						 | 
					@ -185,6 +185,8 @@ An example process of interacting with model with `ollama run example` looks lik
 | 
				
			||||||
</a>
 | 
					</a>
 | 
				
			||||||
 | 
					
 | 
				
			||||||
### Troubleshooting
 | 
					### Troubleshooting
 | 
				
			||||||
 | 
					#### Unable to run the initialization script
 | 
				
			||||||
 | 
					If you are unable to run `init-ollama.bat`, please make sure you have installed `ipex-llm[cpp]` in your conda environment. If you have installed it, please check if you have activated the correct conda environment. Also, if you are using Windows, please make sure you have run the script with administrator privilege in prompt terminal.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
#### Why model is always loaded again after several minutes
 | 
					#### Why model is always loaded again after several minutes
 | 
				
			||||||
Ollama will unload model from gpu memory in every 5 minutes as default. For latest version of ollama, you could set `OLLAMA_KEEP_ALIVE=-1` to keep the model loaded in memory. Reference issue: https://github.com/intel-analytics/ipex-llm/issues/11608
 | 
					Ollama will unload model from gpu memory in every 5 minutes as default. For latest version of ollama, you could set `OLLAMA_KEEP_ALIVE=-1` to keep the model loaded in memory. Reference issue: https://github.com/intel-analytics/ipex-llm/issues/11608
 | 
				
			||||||
| 
						 | 
					
 | 
				
			||||||
		Loading…
	
		Reference in a new issue