Minior fix for quick start (#10857)
* Fix typo and space in quick start.
This commit is contained in:
parent
5eee1976ac
commit
bce99a5b00
8 changed files with 29 additions and 55 deletions
|
|
@ -1,6 +1,6 @@
|
|||
# Run Performance Benchmarking with IPEX-LLM
|
||||
|
||||
We can do benchmarking for IPEX-LLM on Intel CPUs and GPUs using the benchmark scripts we provide.
|
||||
We can perform benchmarking for IPEX-LLM on Intel CPUs and GPUs using the benchmark scripts we provide.
|
||||
|
||||
## Prepare The Environment
|
||||
|
||||
|
|
@ -13,7 +13,7 @@ pip install omegaconf
|
|||
|
||||
## Prepare The Scripts
|
||||
|
||||
Navigate to your local workspace and then download IPEX-LLM from GitHub. Modify the `config.yaml` under `all-in-one` folder for your own benchmark configurations.
|
||||
Navigate to your local workspace and then download IPEX-LLM from GitHub. Modify the `config.yaml` under `all-in-one` folder for your benchmark configurations.
|
||||
|
||||
```
|
||||
cd your/local/workspace
|
||||
|
|
@ -47,15 +47,15 @@ Some parameters in the yaml file that you can configure:
|
|||
- warm_up: The number of runs as warmup trials, executed before performance benchmarking.
|
||||
- num_trials: The number of runs for performance benchmarking. The final benchmark result would be the average of all the trials.
|
||||
- low_bit: The low_bit precision you want to convert to for benchmarking.
|
||||
- batch_size: The number of samples on which the models makes predictions in one forward pass.
|
||||
- batch_size: The number of samples on which the models make predictions in one forward pass.
|
||||
- in_out_pairs: Input sequence length and output sequence length combined by '-'.
|
||||
- test_api: Use different test functions on different machines.
|
||||
- `transformer_int4_gpu` on Intel GPU for Linux
|
||||
- `transformer_int4_gpu_win` on Intel GPU for Windows
|
||||
- `transformer_int4` on Intel CPU
|
||||
- cpu_embedding: Whether to put embedding on CPU (only avaiable now for windows gpu related test_api).
|
||||
- cpu_embedding: Whether to put embedding on CPU (only available now for windows gpu related test_api).
|
||||
|
||||
Remark: If you want to benchmark the performance without warmup, you can set `warm_up: 0` as well as `num_trials: 1` in `config.yaml`, and run each single model and in_out_pair separately.
|
||||
Remark: If you want to benchmark the performance without warmup, you can set `warm_up: 0` and `num_trials: 1` in `config.yaml`, and run each single model and in_out_pair separately.
|
||||
|
||||
## Run on Windows
|
||||
|
||||
|
|
@ -148,4 +148,4 @@ Please refer to [here](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Overvie
|
|||
|
||||
## Result
|
||||
|
||||
After the benchmarking completes, you can obtain a CSV result file under the current folder. You can mainly look at the results of columns `1st token avg latency (ms)` and `2+ avg latency (ms/token)` for the benchmark results. You can also check whether the column `actual input/output tokens` is consistent with the column `input/output tokens` and whether the parameters you specified in `config.yaml` have been successfully applied in the benchmarking.
|
||||
After the benchmarking is completed, you can obtain a CSV result file under the current folder. You can mainly look at the results of columns `1st token avg latency (ms)` and `2+ avg latency (ms/token)` for the benchmark results. You can also check whether the column `actual input/output tokens` is consistent with the column `input/output tokens` and whether the parameters you specified in `config.yaml` have been successfully applied in the benchmarking.
|
||||
|
|
|
|||
|
|
@ -8,7 +8,7 @@ This guide helps you migrate your `bigdl-llm` application to use `ipex-llm`.
|
|||
.. note::
|
||||
This step assumes you have already installed `bigdl-llm`.
|
||||
```
|
||||
You need to uninstall `bigdl-llm` and install `ipex-llm`With your `bigdl-llm` conda envionment activated, exeucte the folloiwng command according to your device type and location:
|
||||
You need to uninstall `bigdl-llm` and install `ipex-llm`With your `bigdl-llm` conda environment activated, execute the following command according to your device type and location:
|
||||
|
||||
### For CPU
|
||||
|
||||
|
|
@ -37,7 +37,6 @@ Choose either US or CN website for `extra-index-url`:
|
|||
pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/cn/
|
||||
```
|
||||
|
||||
|
||||
## Migrate `bigdl-llm` code to `ipex-llm`
|
||||
There are two options to migrate `bigdl-llm` code to `ipex-llm`.
|
||||
|
||||
|
|
@ -62,4 +61,3 @@ model = AutoModelForCausalLM.from_pretrained(model_path,
|
|||
load_in_4bit=True,
|
||||
trust_remote_code=True)
|
||||
```
|
||||
|
||||
|
|
|
|||
|
|
@ -1,6 +1,6 @@
|
|||
# Run Local RAG using Langchain-Chatchat on Intel CPU and GPU
|
||||
|
||||
[chatchat-space/Langchain-Chatchat](https://github.com/chatchat-space/Langchain-Chatchat) is a Knowledge Base QA application using RAG pipeline; by porting it to [`ipex-llm`](https://github.com/intel-analytics/ipex-llm), users can now easily run ***local RAG pipelines*** using [Langchain-Chatchat](https://github.com/intel-analytics/Langchain-Chatchat) with LLMs and Embedding models on Intel CPU and GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max);
|
||||
[chatchat-space/Langchain-Chatchat](https://github.com/chatchat-space/Langchain-Chatchat) is a Knowledge Base QA application using RAG pipeline; by porting it to [`ipex-llm`](https://github.com/intel-analytics/ipex-llm), users can now easily run ***local RAG pipelines*** using [Langchain-Chatchat](https://github.com/intel-analytics/Langchain-Chatchat) with LLMs and Embedding models on Intel CPU and GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max).
|
||||
|
||||
*See the demos of running LLaMA2-7B (English) and ChatGLM-3-6B (Chinese) on an Intel Core Ultra laptop below.*
|
||||
|
||||
|
|
@ -15,7 +15,6 @@
|
|||
</tr>
|
||||
</table>
|
||||
|
||||
|
||||
>You can change the UI language in the left-side menu. We currently support **English** and **简体中文** (see video demos below).
|
||||
|
||||
## Langchain-Chatchat Architecture
|
||||
|
|
@ -26,8 +25,6 @@ See the Langchain-Chatchat architecture below ([source](https://github.com/chatc
|
|||
|
||||
## Quickstart
|
||||
|
||||
|
||||
|
||||
### Install and Run
|
||||
|
||||
Follow the guide that corresponds to your specific system and device from the links provided below:
|
||||
|
|
@ -48,7 +45,6 @@ Follow the guide that corresponds to your specific system and device from the li
|
|||
- Upload knowledge files from your computer and allow some time for the upload to complete. Once finished, click on `Add files to Knowledge Base` button to build the vector store. Note: this process may take several minutes.
|
||||
<p align="center"><img src="https://llm-assets.readthedocs.io/en/latest/_images/build-kb.png" alt="image1" width="70%" align="center"></p>
|
||||
|
||||
|
||||
#### Step 2: Chat with RAG
|
||||
|
||||
You can now click `Dialogue` on the left-side menu to return to the chat UI. Then in `Knowledge base settings` menu, choose the Knowledge Base you just created, e.g, "test". Now you can start chatting.
|
||||
|
|
@ -59,8 +55,6 @@ You can now click `Dialogue` on the left-side menu to return to the chat UI. The
|
|||
|
||||
For more information about how to use Langchain-Chatchat, refer to Official Quickstart guide in [English](./README_en.md#), [Chinese](./README_chs.md#), or the [Wiki](https://github.com/chatchat-space/Langchain-Chatchat/wiki/).
|
||||
|
||||
|
||||
|
||||
### Trouble Shooting & Tips
|
||||
|
||||
#### 1. Version Compatibility
|
||||
|
|
|
|||
|
|
@ -16,14 +16,10 @@ See the demos of using Continue with [Mistral-7B-Instruct-v0.1](https://huggingf
|
|||
</tr>
|
||||
</table>
|
||||
|
||||
|
||||
|
||||
|
||||
## Quickstart
|
||||
|
||||
This guide walks you through setting up and running **Continue** within _Visual Studio Code_, empowered by local large language models served via [Text Generation WebUI](https://github.com/intel-analytics/text-generation-webui/) with `ipex-llm` optimizations.
|
||||
|
||||
|
||||
### 1. Install and Run Text Generation WebUI
|
||||
|
||||
Visit [Run Text Generation WebUI Quickstart Guide](webui_quickstart.html), and follow the steps 1) [Install IPEX-LLM](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/webui_quickstart.html#install-ipex-llm), 2) [Install WebUI](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/webui_quickstart.html#install-the-webui) and 3) [Start the Server](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/webui_quickstart.html#start-the-webui-server) to install and start the Text Generation WebUI API Service. **Please pay attention to below items during installation:**
|
||||
|
|
@ -34,8 +30,6 @@ Visit [Run Text Generation WebUI Quickstart Guide](webui_quickstart.html), and f
|
|||
```
|
||||
- Remember to launch the server **with API service** as specified in [Launch the Server](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/webui_quickstart.html#launch-the-server)
|
||||
|
||||
|
||||
|
||||
### 2. Use WebUI to Load Model
|
||||
|
||||
#### Access the WebUI
|
||||
|
|
@ -45,7 +39,6 @@ Upon successful launch, URLs to access the WebUI will be displayed in the termin
|
|||
<img src="https://llm-assets.readthedocs.io/en/latest/_images/continue_quickstart_launch_server.jpeg" width=100%; />
|
||||
</a>
|
||||
|
||||
|
||||
#### Model Download and Loading
|
||||
|
||||
Here's a list of models that can be used for coding copilot on local PC.
|
||||
|
|
@ -63,8 +56,6 @@ Follow the steps in [Model Download](https://ipex-llm.readthedocs.io/en/latest/d
|
|||
If you don't need to use the API service anymore, you can follow the instructions in refer to `Exit WebUI <https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/webui_quickstart.html#exit-the-webui>`_ to stop the service.
|
||||
```
|
||||
|
||||
|
||||
|
||||
### 3. Install `Continue` Extension
|
||||
1. Click `Install` on the [Continue extension in the Visual Studio Marketplace](https://marketplace.visualstudio.com/items?itemName=Continue.continue)
|
||||
2. This will open the Continue extension page in VS Code, where you will need to click `Install` again
|
||||
|
|
@ -80,8 +71,6 @@ Follow the steps in [Model Download](https://ipex-llm.readthedocs.io/en/latest/d
|
|||
Note: We strongly recommend moving Continue to VS Code's right sidebar. This helps keep the file explorer open while using Continue, and the sidebar can be toggled with a simple keyboard shortcut.
|
||||
```
|
||||
|
||||
|
||||
|
||||
### 4. Configure `Continue`
|
||||
|
||||
<a href="https://llm-assets.readthedocs.io/en/latest/_images/continue_quickstart_configuration.png" target="_blank">
|
||||
|
|
@ -122,13 +111,8 @@ You can ask Continue to edit your highlighted code with the command `/edit`.
|
|||
<img src="https://llm-assets.readthedocs.io/en/latest/_images/continue_quickstart_sample_usage2.png" width=100%; />
|
||||
</a>
|
||||
|
||||
|
||||
### Troubleshooting
|
||||
|
||||
#### Failed to load the extension `openai`
|
||||
|
||||
If you encounter `TypeError: unsupported operand type(s) for |: 'type' and 'NoneType'` when you run `python server.py --load-in-4bit --api`, please make sure you are using `Python 3.11` instead of lower versions.
|
||||
|
||||
|
||||
|
||||
|
||||
|
|
|
|||
|
|
@ -8,7 +8,6 @@ It applies to Intel Core Core 12 - 14 gen integrated GPUs (iGPUs) and Intel Arc
|
|||
> - WSL2 support is required during the installation process.
|
||||
> - This installation method requires at least 35GB of free disk space on C drive.
|
||||
|
||||
|
||||
## Install Docker on Windows
|
||||
**Getting Started with Docker:**
|
||||
1. **For New Users:**
|
||||
|
|
|
|||
|
|
@ -1,4 +1,4 @@
|
|||
# Run Llama 3 on Intel GPU using llama.cpp and ollama with IPEX-LLM
|
||||
# Run Llama 3 on Intel GPU using llama.cpp and ollama with IPEX-LLM
|
||||
|
||||
[Llama 3](https://llama.meta.com/llama3/) is the latest Large Language Models released by [Meta](https://llama.meta.com/) which provides state-of-the-art performance and excels at language nuances, contextual understanding, and complex tasks like translation and dialogue generation.
|
||||
|
||||
|
|
@ -74,7 +74,7 @@ Under your current directory, exceuting below command to do inference with Llama
|
|||
main -ngl 33 -m <model_dir>/Meta-Llama-3-8B-Instruct-Q4_K_M.gguf -n 32 --prompt "Once upon a time, there existed a little girl who liked to have adventures. She wanted to go to places and meet new people, and have fun doing something" -e -ngl 33 --color --no-mmap
|
||||
```
|
||||
|
||||
Under your current directory, you can also exceute below command to have interative chat with Llama3:
|
||||
Under your current directory, you can also execute below command to have interactive chat with Llama3:
|
||||
|
||||
```eval_rst
|
||||
.. tabs::
|
||||
|
|
@ -96,7 +96,6 @@ Under your current directory, you can also exceute below command to have interat
|
|||
Below is a sample output on Intel Arc GPU:
|
||||
<img src="https://llm-assets.readthedocs.io/en/latest/_images/llama3-cpp-arc-demo.png" width=100%; />
|
||||
|
||||
|
||||
### 2. Run Llama3 using Ollama
|
||||
|
||||
#### 2.1 Install IPEX-LLM for Ollama and Initialize
|
||||
|
|
|
|||
|
|
@ -10,7 +10,7 @@ See the demo of running LLaMA2-7B on Intel Arc GPU below.
|
|||
This quickstart guide walks you through installing and running `llama.cpp` with `ipex-llm`.
|
||||
|
||||
### 0 Prerequisites
|
||||
IPEX-LLM's support for `llama.cpp` now is avaliable for Linux system and Windows system.
|
||||
IPEX-LLM's support for `llama.cpp` now is available for Linux system and Windows system.
|
||||
|
||||
#### Linux
|
||||
For Linux system, we recommend Ubuntu 20.04 or later (Ubuntu 22.04 is preferred).
|
||||
|
|
|
|||
|
|
@ -10,7 +10,7 @@ See the demo of running LLaMA2-7B on Intel Arc GPU below.
|
|||
|
||||
### 1 Install IPEX-LLM for Ollama
|
||||
|
||||
IPEX-LLM's support for `ollama` now is avaliable for Linux system and Windows system.
|
||||
IPEX-LLM's support for `ollama` now is available for Linux system and Windows system.
|
||||
|
||||
Visit [Run llama.cpp with IPEX-LLM on Intel GPU Guide](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/llama_cpp_quickstart.html), and follow the instructions in section [Prerequisites](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/llama_cpp_quickstart.html#prerequisites) to setup and section [Install IPEX-LLM cpp](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/llama_cpp_quickstart.html#install-ipex-llm-for-llama-cpp) to install the IPEX-LLM with Ollama binaries.
|
||||
|
||||
|
|
|
|||
Loading…
Reference in a new issue