update troubleshooting for llama.cpp and ollama (#11890)

* update troubleshooting for llama.cpp and ollama * update * update
2024-08-26 20:55:23 +08:00 · 2024-08-26 20:55:23 +08:00 · 5a8fc1baa2
commit 5a8fc1baa2
parent c1d07bc626
2 changed files with 14 additions and 0 deletions
--- a/docs/mddocs/Quickstart/llama_cpp_quickstart.md
+++ b/docs/mddocs/Quickstart/llama_cpp_quickstart.md
@ -296,6 +296,18 @@ Log end
 ### Troubleshooting
 #### Unable to run the initialization script
 If you are unable to run `init-llama-cpp.bat`, please make sure you have installed `ipex-llm[cpp]` in your conda environment. If you have installed it, please check if you have activated the correct conda environment. Also, if you are using Windows, please make sure you have run the script with administrator privilege in prompt terminal.
 #### `DeviceList is empty. -30 (PI_ERROR_INVALID_VALUE)` error
 On Linux, this error happens when devices starting with `[ext_oneapi_level_zero]` are not found. Please make sure you have installed level-zero, and have sourced `/opt/intel/oneapi/setvars.sh` before running the command.
 #### `Prompt is too long` error
 If you encounter `main: prompt is too long (xxx tokens, max xxx)`, please increase the `-c` parameter to set a larger size of context.
 #### `gemm: cannot allocate memory on host` error / `could not create an engine` error
 If you meet `oneapi::mkl::oneapi::mkl::blas::gemm: cannot allocate memory on host` error, or `could not create an engine` on Linux, this is probably caused by pip installed OneAPI dependencies. You should prevent installing like `pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0`, and instead use `apt` to install on Linux. Please refer to [this guide](./install_linux_gpu.md) for more details.
 #### Fail to quantize model
 If you encounter `main: failed to quantize model from xxx`, please make sure you have created related output directory.
--- a/docs/mddocs/Quickstart/ollama_quickstart.md
+++ b/docs/mddocs/Quickstart/ollama_quickstart.md
@ -185,6 +185,8 @@ An example process of interacting with model with `ollama run example` looks lik
 </a>
 ### Troubleshooting
 #### Unable to run the initialization script
 If you are unable to run `init-ollama.bat`, please make sure you have installed `ipex-llm[cpp]` in your conda environment. If you have installed it, please check if you have activated the correct conda environment. Also, if you are using Windows, please make sure you have run the script with administrator privilege in prompt terminal.
 #### Why model is always loaded again after several minutes
 Ollama will unload model from gpu memory in every 5 minutes as default. For latest version of ollama, you could set `OLLAMA_KEEP_ALIVE=-1` to keep the model loaded in memory. Reference issue: https://github.com/intel-analytics/ipex-llm/issues/11608