Minior fix for quick start (#10857)
* Fix typo and space in quick start.
This commit is contained in:
		
							parent
							
								
									5eee1976ac
								
							
						
					
					
						commit
						bce99a5b00
					
				
					 8 changed files with 29 additions and 55 deletions
				
			
		| 
						 | 
				
			
			@ -1,6 +1,6 @@
 | 
			
		|||
# Run Performance Benchmarking with IPEX-LLM
 | 
			
		||||
 | 
			
		||||
We can do benchmarking for IPEX-LLM on Intel CPUs and GPUs using the benchmark scripts we provide.
 | 
			
		||||
We can perform benchmarking for IPEX-LLM on Intel CPUs and GPUs using the benchmark scripts we provide.
 | 
			
		||||
 | 
			
		||||
## Prepare The Environment
 | 
			
		||||
 | 
			
		||||
| 
						 | 
				
			
			@ -13,7 +13,7 @@ pip install omegaconf
 | 
			
		|||
 | 
			
		||||
## Prepare The Scripts
 | 
			
		||||
 | 
			
		||||
Navigate to your local workspace and then download IPEX-LLM from GitHub. Modify the `config.yaml` under `all-in-one` folder for your own benchmark configurations.
 | 
			
		||||
Navigate to your local workspace and then download IPEX-LLM from GitHub. Modify the `config.yaml` under `all-in-one` folder for your benchmark configurations.
 | 
			
		||||
 | 
			
		||||
```
 | 
			
		||||
cd your/local/workspace
 | 
			
		||||
| 
						 | 
				
			
			@ -47,15 +47,15 @@ Some parameters in the yaml file that you can configure:
 | 
			
		|||
- warm_up: The number of runs as warmup trials, executed before performance benchmarking.
 | 
			
		||||
- num_trials: The number of runs for performance benchmarking. The final benchmark result would be the average of all the trials.
 | 
			
		||||
- low_bit: The low_bit precision you want to convert to for benchmarking.
 | 
			
		||||
- batch_size: The number of samples on which the models makes predictions in one forward pass.
 | 
			
		||||
- batch_size: The number of samples on which the models make predictions in one forward pass.
 | 
			
		||||
- in_out_pairs: Input sequence length and output sequence length combined by '-'.
 | 
			
		||||
- test_api: Use different test functions on different machines.
 | 
			
		||||
  - `transformer_int4_gpu` on Intel GPU for Linux
 | 
			
		||||
  - `transformer_int4_gpu_win` on Intel GPU for Windows
 | 
			
		||||
  - `transformer_int4` on Intel CPU
 | 
			
		||||
- cpu_embedding: Whether to put embedding on CPU (only avaiable now for windows gpu related test_api).
 | 
			
		||||
- cpu_embedding: Whether to put embedding on CPU (only available now for windows gpu related test_api).
 | 
			
		||||
 | 
			
		||||
Remark: If you want to benchmark the performance without warmup, you can set `warm_up: 0` as well as `num_trials: 1` in `config.yaml`, and run each single model and in_out_pair separately.
 | 
			
		||||
Remark: If you want to benchmark the performance without warmup, you can set `warm_up: 0` and `num_trials: 1` in `config.yaml`, and run each single model and in_out_pair separately.
 | 
			
		||||
 | 
			
		||||
## Run on Windows
 | 
			
		||||
 | 
			
		||||
| 
						 | 
				
			
			@ -148,4 +148,4 @@ Please refer to [here](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Overvie
 | 
			
		|||
 | 
			
		||||
## Result
 | 
			
		||||
 | 
			
		||||
After the benchmarking completes, you can obtain a CSV result file under the current folder. You can mainly look at the results of columns `1st token avg latency (ms)` and `2+ avg latency (ms/token)` for the benchmark results. You can also check whether the column `actual input/output tokens` is consistent with the column `input/output tokens` and whether the parameters you specified in `config.yaml` have been successfully applied in the benchmarking.
 | 
			
		||||
After the benchmarking is completed, you can obtain a CSV result file under the current folder. You can mainly look at the results of columns `1st token avg latency (ms)` and `2+ avg latency (ms/token)` for the benchmark results. You can also check whether the column `actual input/output tokens` is consistent with the column `input/output tokens` and whether the parameters you specified in `config.yaml` have been successfully applied in the benchmarking.
 | 
			
		||||
| 
						 | 
				
			
			
 | 
			
		|||
| 
						 | 
				
			
			@ -8,7 +8,7 @@ This guide helps you migrate your `bigdl-llm` application to use `ipex-llm`.
 | 
			
		|||
.. note::
 | 
			
		||||
   This step assumes you have already installed `bigdl-llm`.
 | 
			
		||||
```
 | 
			
		||||
You need to uninstall `bigdl-llm` and install `ipex-llm`With your `bigdl-llm` conda envionment activated, exeucte the folloiwng command according to your device type and location: 
 | 
			
		||||
You need to uninstall `bigdl-llm` and install `ipex-llm`With your `bigdl-llm` conda environment activated, execute the following command according to your device type and location:
 | 
			
		||||
 | 
			
		||||
### For CPU
 | 
			
		||||
 | 
			
		||||
| 
						 | 
				
			
			@ -37,7 +37,6 @@ Choose either US or CN website for `extra-index-url`:
 | 
			
		|||
         pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/cn/
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
## Migrate `bigdl-llm` code to `ipex-llm`
 | 
			
		||||
There are two options to migrate `bigdl-llm` code to `ipex-llm`.
 | 
			
		||||
 | 
			
		||||
| 
						 | 
				
			
			@ -62,4 +61,3 @@ model = AutoModelForCausalLM.from_pretrained(model_path,
 | 
			
		|||
                                             load_in_4bit=True,
 | 
			
		||||
                                             trust_remote_code=True)
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
| 
						 | 
				
			
			
 | 
			
		|||
| 
						 | 
				
			
			@ -1,6 +1,6 @@
 | 
			
		|||
# Run Local RAG using Langchain-Chatchat on Intel CPU and GPU
 | 
			
		||||
 | 
			
		||||
[chatchat-space/Langchain-Chatchat](https://github.com/chatchat-space/Langchain-Chatchat) is a Knowledge Base QA application using RAG pipeline; by porting it to [`ipex-llm`](https://github.com/intel-analytics/ipex-llm), users can now easily run ***local RAG pipelines*** using [Langchain-Chatchat](https://github.com/intel-analytics/Langchain-Chatchat) with LLMs and Embedding models on Intel CPU and GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max); 
 | 
			
		||||
[chatchat-space/Langchain-Chatchat](https://github.com/chatchat-space/Langchain-Chatchat) is a Knowledge Base QA application using RAG pipeline; by porting it to [`ipex-llm`](https://github.com/intel-analytics/ipex-llm), users can now easily run ***local RAG pipelines*** using [Langchain-Chatchat](https://github.com/intel-analytics/Langchain-Chatchat) with LLMs and Embedding models on Intel CPU and GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max).
 | 
			
		||||
 | 
			
		||||
*See the demos of running LLaMA2-7B (English) and ChatGLM-3-6B (Chinese) on an Intel Core Ultra laptop below.*
 | 
			
		||||
 | 
			
		||||
| 
						 | 
				
			
			@ -15,7 +15,6 @@
 | 
			
		|||
</tr>
 | 
			
		||||
</table>
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
>You can change the UI language in the left-side menu. We currently support **English** and **简体中文** (see video demos below).
 | 
			
		||||
 | 
			
		||||
## Langchain-Chatchat Architecture
 | 
			
		||||
| 
						 | 
				
			
			@ -26,8 +25,6 @@ See the Langchain-Chatchat architecture below ([source](https://github.com/chatc
 | 
			
		|||
 | 
			
		||||
## Quickstart
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
  
 | 
			
		||||
### Install and Run
 | 
			
		||||
 | 
			
		||||
Follow the guide that corresponds to your specific system and device from the links provided below:
 | 
			
		||||
| 
						 | 
				
			
			@ -48,7 +45,6 @@ Follow the guide that corresponds to your specific system and device from the li
 | 
			
		|||
- Upload knowledge files from your computer and allow some time for the upload to complete. Once finished, click on `Add files to Knowledge Base` button to build the vector store. Note: this process may take several minutes.
 | 
			
		||||
  <p align="center"><img src="https://llm-assets.readthedocs.io/en/latest/_images/build-kb.png" alt="image1" width="70%" align="center"></p>
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
#### Step 2: Chat with RAG
 | 
			
		||||
 | 
			
		||||
You can now click `Dialogue` on the left-side menu to return to the chat UI. Then in `Knowledge base settings` menu, choose the Knowledge Base you just created, e.g, "test". Now you can start chatting.
 | 
			
		||||
| 
						 | 
				
			
			@ -59,8 +55,6 @@ You can now click `Dialogue` on the left-side menu to return to the chat UI. The
 | 
			
		|||
 | 
			
		||||
For more information about how to use Langchain-Chatchat, refer to Official Quickstart guide in [English](./README_en.md#), [Chinese](./README_chs.md#), or the [Wiki](https://github.com/chatchat-space/Langchain-Chatchat/wiki/).
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
### Trouble Shooting & Tips
 | 
			
		||||
 | 
			
		||||
#### 1. Version Compatibility
 | 
			
		||||
| 
						 | 
				
			
			
 | 
			
		|||
| 
						 | 
				
			
			@ -16,14 +16,10 @@ See the demos of using Continue with [Mistral-7B-Instruct-v0.1](https://huggingf
 | 
			
		|||
</tr>
 | 
			
		||||
</table>
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
## Quickstart
 | 
			
		||||
 | 
			
		||||
This guide walks you through setting up and running **Continue** within _Visual Studio Code_, empowered by local large language models served via [Text Generation WebUI](https://github.com/intel-analytics/text-generation-webui/) with `ipex-llm` optimizations.
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
### 1. Install and Run Text Generation WebUI
 | 
			
		||||
 | 
			
		||||
Visit [Run Text Generation WebUI Quickstart Guide](webui_quickstart.html), and follow the steps 1) [Install IPEX-LLM](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/webui_quickstart.html#install-ipex-llm), 2) [Install WebUI](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/webui_quickstart.html#install-the-webui) and 3) [Start the Server](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/webui_quickstart.html#start-the-webui-server) to install and start the Text Generation WebUI API Service. **Please pay attention to below items during installation:**
 | 
			
		||||
| 
						 | 
				
			
			@ -34,8 +30,6 @@ Visit [Run Text Generation WebUI Quickstart Guide](webui_quickstart.html), and f
 | 
			
		|||
  ```
 | 
			
		||||
- Remember to launch the server **with API service** as specified in [Launch the Server](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/webui_quickstart.html#launch-the-server)
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
### 2. Use WebUI to Load Model
 | 
			
		||||
 | 
			
		||||
#### Access the WebUI
 | 
			
		||||
| 
						 | 
				
			
			@ -45,7 +39,6 @@ Upon successful launch, URLs to access the WebUI will be displayed in the termin
 | 
			
		|||
  <img src="https://llm-assets.readthedocs.io/en/latest/_images/continue_quickstart_launch_server.jpeg" width=100%; />
 | 
			
		||||
</a>
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
#### Model Download and Loading
 | 
			
		||||
 | 
			
		||||
Here's a list of models that can be used for coding copilot on local PC.
 | 
			
		||||
| 
						 | 
				
			
			@ -63,8 +56,6 @@ Follow the steps in [Model Download](https://ipex-llm.readthedocs.io/en/latest/d
 | 
			
		|||
  If you don't need to use the API service anymore, you can follow the instructions in refer to `Exit WebUI <https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/webui_quickstart.html#exit-the-webui>`_  to stop the service.  
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
### 3. Install `Continue` Extension
 | 
			
		||||
1. Click `Install` on the [Continue extension in the Visual Studio Marketplace](https://marketplace.visualstudio.com/items?itemName=Continue.continue)
 | 
			
		||||
2. This will open the Continue extension page in VS Code, where you will need to click `Install` again
 | 
			
		||||
| 
						 | 
				
			
			@ -80,8 +71,6 @@ Follow the steps in [Model Download](https://ipex-llm.readthedocs.io/en/latest/d
 | 
			
		|||
   Note: We strongly recommend moving Continue to VS Code's right sidebar. This helps keep the file explorer open while using Continue, and the sidebar can be toggled with a simple keyboard shortcut.
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
### 4. Configure `Continue`
 | 
			
		||||
 | 
			
		||||
<a href="https://llm-assets.readthedocs.io/en/latest/_images/continue_quickstart_configuration.png" target="_blank">
 | 
			
		||||
| 
						 | 
				
			
			@ -122,13 +111,8 @@ You can ask Continue to edit your highlighted code with the command `/edit`.
 | 
			
		|||
  <img src="https://llm-assets.readthedocs.io/en/latest/_images/continue_quickstart_sample_usage2.png" width=100%; />
 | 
			
		||||
</a>
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
### Troubleshooting
 | 
			
		||||
 | 
			
		||||
#### Failed to load the extension `openai`
 | 
			
		||||
 | 
			
		||||
If you encounter `TypeError: unsupported operand type(s) for |: 'type' and 'NoneType'` when you run `python server.py --load-in-4bit --api`, please make sure you are using `Python 3.11` instead of lower versions.
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
| 
						 | 
				
			
			
 | 
			
		|||
| 
						 | 
				
			
			@ -8,7 +8,6 @@ It applies to Intel Core Core 12 - 14 gen integrated GPUs (iGPUs) and Intel Arc
 | 
			
		|||
> - WSL2 support is required during the installation process.
 | 
			
		||||
> - This installation method requires at least 35GB of free disk space on C drive.
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
## Install Docker on Windows
 | 
			
		||||
**Getting Started with Docker:**
 | 
			
		||||
1. **For New Users:**
 | 
			
		||||
| 
						 | 
				
			
			
 | 
			
		|||
| 
						 | 
				
			
			@ -74,7 +74,7 @@ Under your current directory, exceuting below command to do inference with Llama
 | 
			
		|||
        main -ngl 33 -m <model_dir>/Meta-Llama-3-8B-Instruct-Q4_K_M.gguf -n 32 --prompt "Once upon a time, there existed a little girl who liked to have adventures. She wanted to go to places and meet new people, and have fun doing something" -e -ngl 33 --color --no-mmap
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
Under your current directory, you can also exceute below command to have interative chat with Llama3:
 | 
			
		||||
Under your current directory, you can also execute below command to have interactive chat with Llama3:
 | 
			
		||||
 | 
			
		||||
```eval_rst
 | 
			
		||||
.. tabs::
 | 
			
		||||
| 
						 | 
				
			
			@ -96,7 +96,6 @@ Under your current directory, you can also exceute below command to have interat
 | 
			
		|||
Below is a sample output on Intel Arc GPU:
 | 
			
		||||
<img src="https://llm-assets.readthedocs.io/en/latest/_images/llama3-cpp-arc-demo.png" width=100%; />
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
### 2. Run Llama3 using Ollama
 | 
			
		||||
 | 
			
		||||
#### 2.1 Install IPEX-LLM for Ollama and Initialize
 | 
			
		||||
| 
						 | 
				
			
			
 | 
			
		|||
| 
						 | 
				
			
			@ -10,7 +10,7 @@ See the demo of running LLaMA2-7B on Intel Arc GPU below.
 | 
			
		|||
This quickstart guide walks you through installing and running `llama.cpp` with `ipex-llm`.
 | 
			
		||||
 | 
			
		||||
### 0 Prerequisites
 | 
			
		||||
IPEX-LLM's support for `llama.cpp` now is avaliable for Linux system and Windows system.
 | 
			
		||||
IPEX-LLM's support for `llama.cpp` now is available for Linux system and Windows system.
 | 
			
		||||
 | 
			
		||||
#### Linux
 | 
			
		||||
For Linux system, we recommend Ubuntu 20.04 or later (Ubuntu 22.04 is preferred).
 | 
			
		||||
| 
						 | 
				
			
			
 | 
			
		|||
| 
						 | 
				
			
			@ -10,7 +10,7 @@ See the demo of running LLaMA2-7B on Intel Arc GPU below.
 | 
			
		|||
 | 
			
		||||
### 1 Install IPEX-LLM for Ollama
 | 
			
		||||
 | 
			
		||||
IPEX-LLM's support for `ollama` now is avaliable for Linux system and Windows system.
 | 
			
		||||
IPEX-LLM's support for `ollama` now is available for Linux system and Windows system.
 | 
			
		||||
 | 
			
		||||
Visit [Run llama.cpp with IPEX-LLM on Intel GPU Guide](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/llama_cpp_quickstart.html), and follow the instructions in section [Prerequisites](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/llama_cpp_quickstart.html#prerequisites) to setup and section [Install IPEX-LLM cpp](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/llama_cpp_quickstart.html#install-ipex-llm-for-llama-cpp) to install the IPEX-LLM with Ollama binaries.
 | 
			
		||||
 | 
			
		||||
| 
						 | 
				
			
			
 | 
			
		|||
		Loading…
	
		Reference in a new issue