updated llama.cpp and ollama quickstart (#11732)
* updated llama.cpp and ollama quickstart.md * added qwen2-1.5B sample output * revision on quickstart updates * revision on quickstart updates * revision on qwen2 readme * added 2 troubleshoots“ ” * troubleshoot revision
This commit is contained in:
parent
54cc9353db
commit
d0c89fb715
3 changed files with 52 additions and 4 deletions
|
|
@ -40,7 +40,7 @@ Visit the [Install IPEX-LLM on Linux with Intel GPU](./install_linux_gpu.md), fo
|
|||
|
||||
#### Windows (Optional)
|
||||
|
||||
Please make sure your GPU driver version is equal or newer than `31.0.101.5333`. If it is not, follow the instructions in [this section](./install_windows_gpu.md#optional-update-gpu-driver) to update your GPU driver; otherwise, you might encounter gibberish output.
|
||||
Please make sure your GPU driver version is equal or newer than `31.0.101.5522`. If it is not, follow the instructions in [this section](./install_windows_gpu.md#optional-update-gpu-driver) to update your GPU driver; otherwise, you might encounter gibberish output.
|
||||
|
||||
### 1. Install IPEX-LLM for llama.cpp
|
||||
|
||||
|
|
@ -146,7 +146,7 @@ Before running, you should download or copy community GGUF model to your current
|
|||
- For **Linux users**:
|
||||
|
||||
```bash
|
||||
./main -m mistral-7b-instruct-v0.1.Q4_K_M.gguf -n 32 --prompt "Once upon a time, there existed a little girl who liked to have adventures. She wanted to go to places and meet new people, and have fun" -t 8 -e -ngl 33 --color
|
||||
./main -m mistral-7b-instruct-v0.1.Q4_K_M.gguf -n 32 --prompt "Once upon a time, there existed a little girl who liked to have adventures. She wanted to go to places and meet new people, and have fun" -t 8 -e -ngl 99 --color
|
||||
```
|
||||
|
||||
> **Note**:
|
||||
|
|
@ -158,7 +158,7 @@ Before running, you should download or copy community GGUF model to your current
|
|||
Please run the following command in Miniforge Prompt.
|
||||
|
||||
```cmd
|
||||
main -m mistral-7b-instruct-v0.1.Q4_K_M.gguf -n 32 --prompt "Once upon a time, there existed a little girl who liked to have adventures. She wanted to go to places and meet new people, and have fun" -t 8 -e -ngl 33 --color
|
||||
main -m mistral-7b-instruct-v0.1.Q4_K_M.gguf -n 32 --prompt "Once upon a time, there existed a little girl who liked to have adventures. She wanted to go to places and meet new people, and have fun" -t 8 -e -ngl 99 --color
|
||||
```
|
||||
|
||||
> **Note**:
|
||||
|
|
@ -316,3 +316,13 @@ Also, you can use `ONEAPI_DEVICE_SELECTOR=level_zero:[gpu_id]` to select device
|
|||
If you run the llama.cpp program on Windows and find that your program crashes or outputs abnormally when accepting Chinese prompts, you can open `Region->Administrative->Change System locale..`, check `Beta: Use Unicode UTF-8 for worldwide language support` option and then restart your computer.
|
||||
|
||||
For detailed instructions on how to do this, see [this issue](https://github.com/intel-analytics/ipex-llm/issues/10989#issuecomment-2105600469).
|
||||
|
||||
#### sycl7.dll not found error
|
||||
If you meet `System Error: sycl7.dll not found` on Windows or you meet similar error on Linux, please check:
|
||||
|
||||
1. if you have installed conda and if you are in the right conda environment which has pip installed oneapi dependencies on Windows
|
||||
2. if you have executed `source /opt/intel/oneapi/setvars.sh` on Linux
|
||||
|
||||
#### Check driver first when you meet garbage output
|
||||
If you meet garbage output, please check if your GPU driver version is >= [31.0.101.5522](https://www.intel.cn/content/www/cn/zh/download/785597/823163/intel-arc-iris-xe-graphics-windows.html). If not, please follow the instructions in [this section](./install_linux_gpu.md#install-gpu-driver) to update your GPU driver.
|
||||
|
||||
|
|
|
|||
|
|
@ -183,3 +183,24 @@ An example process of interacting with model with `ollama run example` looks lik
|
|||
<a href="https://llm-assets.readthedocs.io/en/latest/_images/ollama_gguf_demo_image.png" target="_blank">
|
||||
<img src="https://llm-assets.readthedocs.io/en/latest/_images/ollama_gguf_demo_image.png" width=100%; />
|
||||
</a>
|
||||
|
||||
### Troubleshooting
|
||||
|
||||
#### Why model is always loaded again after several minutes
|
||||
Ollama will unload model from gpu memory in every 5 minutes as default. For latest version of ollama, you could set `OLLAMA_KEEP_ALIVE=-1` to keep the model loaded in memory. Reference issue: https://github.com/intel-analytics/ipex-llm/issues/11608
|
||||
|
||||
#### `exit status 0xc0000135` error when executing `ollama serve`
|
||||
When executing `ollama serve`, if you meet `llama runner process has terminated: exit status 0xc0000135` on Windows or you meet `ollama_llama_server: error while loading shared libraries: libmkl_core.so.2: cannot open shared object file` on Linux, this is most likely caused by the lack of sycl dependency. Please check:
|
||||
|
||||
1. if you have installed conda and if you are in the right conda environment which has pip installed oneapi dependencies on Windows
|
||||
2. if you have executed `source /opt/intel/oneapi/setvars.sh` on Linux
|
||||
|
||||
#### Program hang during initial model loading stage
|
||||
When launching `ollama serve` for the first time on Windows, it may get stuck during the model loading phase. If you notice that the program is hanging for a long time during the first run, you can manually input a space or other characters on the server side to ensure the program is running.
|
||||
|
||||
#### How to distinguish the community version of Ollama from the ipex-llm version of Ollama
|
||||
In the server log of community version of Ollama, you may see `source=payload_common.go:139 msg="Dynamic LLM libraries [rocm_v60000 cpu_avx2 cuda_v11 cpu cpu_avx]"`.
|
||||
But in the server log of ipex-llm version of Ollama, you should only see `source=payload.go:44 msg="Dynamic LLM libraries [cpu cpu_avx cpu_avx2]"`.
|
||||
|
||||
#### Ollama hang when multiple different questions is asked or context is long
|
||||
If you find ollama hang when multiple different questions is asked or context is long, and you see `update_slots : failed to free spaces in the KV cache` in the server log, this could be because that sometimes the LLM context is larger than the default `n_ctx` value, you may increase the `n_ctx` and try it again.
|
||||
|
|
@ -1,5 +1,5 @@
|
|||
# Qwen2
|
||||
In this directory, you will find examples on how you could apply IPEX-LLM INT4 optimizations on Qwen2 models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [Qwen/Qwen2-7B-Instruct](https://huggingface.co/Qwen/Qwen2-7B-Instruct) as a reference InternLM model.
|
||||
In this directory, you will find examples on how you could apply IPEX-LLM INT4 optimizations on Qwen2 models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize [Qwen/Qwen2-7B-Instruct](https://huggingface.co/Qwen/Qwen2-7B-Instruct) and [Qwen/Qwen2-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2-1.5B-Instruct) as reference Qwen2 models.
|
||||
|
||||
## 0. Requirements
|
||||
To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
|
||||
|
|
@ -131,4 +131,21 @@ Inference time: xxxx s
|
|||
What is AI?
|
||||
-------------------- Output --------------------
|
||||
AI, or Artificial Intelligence, refers to the simulation of human intelligence in machines that are programmed to think and learn like humans and mimic their actions. The term may
|
||||
```
|
||||
|
||||
##### [Qwen/Qwen2-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2-1.5B-Instruct)
|
||||
```log
|
||||
Inference time: 0.33887791633605957 s
|
||||
-------------------- Prompt --------------------
|
||||
AI是什么?
|
||||
-------------------- Output --------------------
|
||||
AI是人工智能的简称,是一种计算机科学和技术领域,旨在使机器能够完成通常需要人类智能的任务。这包括识别和理解语言、图像处理
|
||||
```
|
||||
|
||||
```log
|
||||
Inference time: 0.340407133102417 s
|
||||
-------------------- Prompt --------------------
|
||||
What is AI?
|
||||
-------------------- Output --------------------
|
||||
Artificial Intelligence (AI) refers to the simulation of human intelligence in machines that are programmed to think and work like humans. It involves creating computer programs, algorithms
|
||||
```
|
||||
Loading…
Reference in a new issue