update troubleshooting for llama.cpp and ollama (#11890)

* update troubleshooting for llama.cpp and ollama

* update

* update
This commit is contained in:
Ch1y0q 2024-08-26 20:55:23 +08:00 committed by GitHub
parent c1d07bc626
commit 5a8fc1baa2
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
2 changed files with 14 additions and 0 deletions

View file

@ -296,6 +296,18 @@ Log end
### Troubleshooting ### Troubleshooting
#### Unable to run the initialization script
If you are unable to run `init-llama-cpp.bat`, please make sure you have installed `ipex-llm[cpp]` in your conda environment. If you have installed it, please check if you have activated the correct conda environment. Also, if you are using Windows, please make sure you have run the script with administrator privilege in prompt terminal.
#### `DeviceList is empty. -30 (PI_ERROR_INVALID_VALUE)` error
On Linux, this error happens when devices starting with `[ext_oneapi_level_zero]` are not found. Please make sure you have installed level-zero, and have sourced `/opt/intel/oneapi/setvars.sh` before running the command.
#### `Prompt is too long` error
If you encounter `main: prompt is too long (xxx tokens, max xxx)`, please increase the `-c` parameter to set a larger size of context.
#### `gemm: cannot allocate memory on host` error / `could not create an engine` error
If you meet `oneapi::mkl::oneapi::mkl::blas::gemm: cannot allocate memory on host` error, or `could not create an engine` on Linux, this is probably caused by pip installed OneAPI dependencies. You should prevent installing like `pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0`, and instead use `apt` to install on Linux. Please refer to [this guide](./install_linux_gpu.md) for more details.
#### Fail to quantize model #### Fail to quantize model
If you encounter `main: failed to quantize model from xxx`, please make sure you have created related output directory. If you encounter `main: failed to quantize model from xxx`, please make sure you have created related output directory.

View file

@ -185,6 +185,8 @@ An example process of interacting with model with `ollama run example` looks lik
</a> </a>
### Troubleshooting ### Troubleshooting
#### Unable to run the initialization script
If you are unable to run `init-ollama.bat`, please make sure you have installed `ipex-llm[cpp]` in your conda environment. If you have installed it, please check if you have activated the correct conda environment. Also, if you are using Windows, please make sure you have run the script with administrator privilege in prompt terminal.
#### Why model is always loaded again after several minutes #### Why model is always loaded again after several minutes
Ollama will unload model from gpu memory in every 5 minutes as default. For latest version of ollama, you could set `OLLAMA_KEEP_ALIVE=-1` to keep the model loaded in memory. Reference issue: https://github.com/intel-analytics/ipex-llm/issues/11608 Ollama will unload model from gpu memory in every 5 minutes as default. For latest version of ollama, you could set `OLLAMA_KEEP_ALIVE=-1` to keep the model loaded in memory. Reference issue: https://github.com/intel-analytics/ipex-llm/issues/11608