revise ragflow quickstart (#11363)

* revise ragflow quickstart

* update titles and split the quickstart into sections

* update
This commit is contained in:
Shengsheng Huang 2024-06-19 22:24:31 +08:00 committed by GitHub
parent 5283df0078
commit 13727635e8
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
3 changed files with 121 additions and 87 deletions

View file

@ -69,7 +69,7 @@
using DeepSpeed AutoTP and FastApi</a> using DeepSpeed AutoTP and FastApi</a>
</li> </li>
<li> <li>
<a href="doc/LLM/Quickstart/ragflow_quickstart.html">Run RAGFlow using Ollama with IPEX_LLM</a> <a href="doc/LLM/Quickstart/ragflow_quickstart.html">Run RAGFlow with IPEX_LLM on Intel GPU</a>
</li> </li>
</ul> </ul>
</li> </li>

View file

@ -8,12 +8,22 @@ IPEX-LLM Quickstart
This section includes efficient guide to show you how to: This section includes efficient guide to show you how to:
=================
Install
=================
* |bigdl_llm_migration_guide|_ * |bigdl_llm_migration_guide|_
* `Install IPEX-LLM on Linux with Intel GPU <./install_linux_gpu.html>`_ * `Install IPEX-LLM on Linux with Intel GPU <./install_linux_gpu.html>`_
* `Install IPEX-LLM on Windows with Intel GPU <./install_windows_gpu.html>`_ * `Install IPEX-LLM on Windows with Intel GPU <./install_windows_gpu.html>`_
* `Install IPEX-LLM in Docker on Windows with Intel GPU <./docker_windows_gpu.html>`_ * `Install IPEX-LLM in Docker on Windows with Intel GPU <./docker_windows_gpu.html>`_
* `Run PyTorch Inference on Intel GPU using Docker (on Linux or WSL) <./docker_benchmark_quickstart.html>`_
=================
Inference
=================
* `Run Performance Benchmarking with IPEX-LLM <./benchmark_quickstart.html>`_ * `Run Performance Benchmarking with IPEX-LLM <./benchmark_quickstart.html>`_
* `Run PyTorch Inference on Intel GPU using Docker (on Linux or WSL) <./docker_benchmark_quickstart.html>`_
* `Run Local RAG using Langchain-Chatchat on Intel GPU <./chatchat_quickstart.html>`_ * `Run Local RAG using Langchain-Chatchat on Intel GPU <./chatchat_quickstart.html>`_
* `Run Text Generation WebUI on Intel GPU <./webui_quickstart.html>`_ * `Run Text Generation WebUI on Intel GPU <./webui_quickstart.html>`_
* `Run Open WebUI on Intel GPU <./open_webui_with_ollama_quickstart.html>`_ * `Run Open WebUI on Intel GPU <./open_webui_with_ollama_quickstart.html>`_
@ -23,12 +33,21 @@ This section includes efficient guide to show you how to:
* `Run llama.cpp with IPEX-LLM on Intel GPU <./llama_cpp_quickstart.html>`_ * `Run llama.cpp with IPEX-LLM on Intel GPU <./llama_cpp_quickstart.html>`_
* `Run Ollama with IPEX-LLM on Intel GPU <./ollama_quickstart.html>`_ * `Run Ollama with IPEX-LLM on Intel GPU <./ollama_quickstart.html>`_
* `Run Llama 3 on Intel GPU using llama.cpp and ollama with IPEX-LLM <./llama3_llamacpp_ollama_quickstart.html>`_ * `Run Llama 3 on Intel GPU using llama.cpp and ollama with IPEX-LLM <./llama3_llamacpp_ollama_quickstart.html>`_
* `Run RAGFlow with IPEX_LLM on Intel GPU <./ragflow_quickstart.html>`_
=================
Serving
=================
* `Run IPEX-LLM Serving with FastChat <./fastchat_quickstart.html>`_ * `Run IPEX-LLM Serving with FastChat <./fastchat_quickstart.html>`_
* `Run IPEX-LLM Serving with vLLM on Intel GPU <./vLLM_quickstart.html>`_ * `Run IPEX-LLM Serving with vLLM on Intel GPU <./vLLM_quickstart.html>`_
* `Finetune LLM with Axolotl on Intel GPU <./axolotl_quickstart.html>`_
* `Run IPEX-LLM serving on Multiple Intel GPUs using DeepSpeed AutoTP and FastApi <./deepspeed_autotp_fastapi_quickstart.html>`_ * `Run IPEX-LLM serving on Multiple Intel GPUs using DeepSpeed AutoTP and FastApi <./deepspeed_autotp_fastapi_quickstart.html>`_
* `Run RAGFlow using Ollama with IPEX_LLM <./ragflow_quickstart.html>`_
=================
Finetune
=================
* `Finetune LLM with Axolotl on Intel GPU <./axolotl_quickstart.html>`_
.. |bigdl_llm_migration_guide| replace:: ``bigdl-llm`` Migration Guide .. |bigdl_llm_migration_guide| replace:: ``bigdl-llm`` Migration Guide
.. _bigdl_llm_migration_guide: bigdl_llm_migration.html .. _bigdl_llm_migration_guide: bigdl_llm_migration.html

View file

@ -1,18 +1,12 @@
# Run RAGFlow using Ollama with IPEX_LLM # Run RAGFlow with IPEX_LLM on Intel GPU
[ollama/ollama](https://github.com/ollama/ollama) is popular framework designed to build and run language models on a local machine; you can now use the C++ interface of [`ipex-llm`](https://github.com/intel-analytics/ipex-llm) as an accelerated backend for `ollama` running on Intel **GPU** *(e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max)*. [RAGFlow](https://github.com/infiniflow/ragflow) is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding; by integrating it with [`ipex-llm`](https://github.com/intel-analytics/ipex-llm), users can now easily leverage local LLMs running on Intel GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max).
See the demo of running Qwen2-7B on Intel Arc GPU below.
*See the demo of ragflow running Qwen2:7B on Intel Arc A770 below.*
<video src="https://llm-assets.readthedocs.io/en/latest/_images/ragflow-record.mp4" width="100%" controls></video> <video src="https://llm-assets.readthedocs.io/en/latest/_images/ragflow-record.mp4" width="100%" controls></video>
```eval_rst
.. note::
`ipex-llm[cpp]==2.5.0b20240527` is consistent with `v0.1.34 <https://github.com/ollama/ollama/releases/tag/v0.1.34>`_ of ollama.
Our current version is consistent with `v0.1.39 <https://github.com/ollama/ollama/releases/tag/v0.1.39>`_ of ollama.
```
## Quickstart ## Quickstart
@ -22,17 +16,18 @@ See the demo of running Qwen2-7B on Intel Arc GPU below.
- RAM >= 16 GB - RAM >= 16 GB
- Disk >= 50 GB - Disk >= 50 GB
- Docker >= 24.0.0 & Docker Compose >= v2.26.1 - Docker >= 24.0.0 & Docker Compose >= v2.26.1
- Ollama service initialized
### 1. Install and Run Ollama Serve
Visit [Run Ollama with IPEX-LLM on Intel GPU](./ollama_quickstart.html), and follow the steps 1) [Install IPEX-LLM for Ollama](./ollama_quickstart.html#install-ipex-llm-for-ollama), 2) [Initialize Ollama](./ollama_quickstart.html#initialize-ollama) 3) [Run Ollama Serve](./ollama_quickstart.html#run-ollama-serve) to install, init and start the Ollama Service. ### 1. Install and Start `Ollama` Service on Intel GPU
Follow the steps in [Run Ollama with IPEX-LLM on Intel GPU Guide](./ollama_quickstart.md) to install and run Ollama on Intel GPU. Ensure that `ollama serve` is running correctly and can be accessed through a local URL (e.g., `https://127.0.0.1:11434`) or a remote URL (e.g., `http://your_ip:11434`).
```eval_rst ```eval_rst
.. important:: .. important::
If the `Ragflow` is not deployed on the same machine where Ollama is running (which means `Ragflow` needs to connect to a remote Ollama service), you must configure the Ollama service to accept connections from any IP address. To achieve this, set or export the environment variable `OLLAMA_HOST=0.0.0.0` before executing the command `ollama serve`. If the `RAGFlow` is not deployed on the same machine where Ollama is running (which means `RAGFlow` needs to connect to a remote Ollama service), you must configure the Ollama service to accept connections from any IP address. To achieve this, set or export the environment variable `OLLAMA_HOST=0.0.0.0` before executing the command `ollama serve`.
.. tip:: .. tip::
@ -43,11 +38,9 @@ Visit [Run Ollama with IPEX-LLM on Intel GPU](./ollama_quickstart.html), and fol
export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
``` ```
### 2. Pull and Prepare the Model ### 2. Pull Model
#### 2.1 Pull Model Now we need to pull a model for RAG using Ollama. Here we use [Qwen/Qwen2-7B](https://huggingface.co/Qwen/Qwen2-7B) model as an example. Open a new terminal window, run the following command to pull [`qwen2:latest`](https://ollama.com/library/qwen2).
Now we need to pull a model for coding. Here we use [Qwen/Qwen2-7B](https://huggingface.co/Qwen/Qwen2-7B) model as an example. Open a new terminal window, run the following command to pull [`qwen2:latest`](https://ollama.com/library/qwen2).
```eval_rst ```eval_rst
@ -61,7 +54,7 @@ Now we need to pull a model for coding. Here we use [Qwen/Qwen2-7B](https://hugg
.. tab:: Windows .. tab:: Windows
Please run the following command in Miniforge Prompt. Please run the following command in Miniforge or Anaconda Prompt.
.. code-block:: cmd .. code-block:: cmd
@ -70,43 +63,51 @@ Now we need to pull a model for coding. Here we use [Qwen/Qwen2-7B](https://hugg
.. seealso:: .. seealso::
Besides Qwen2, there are other coding models you might want to explore, such as Magicoder, Wizardcoder, Codellama, Codegemma, Starcoder, Starcoder2, and etc. You can find these models in the `Ollama model library <https://ollama.com/library>`_. Simply search for the model, pull it in a similar manner, and give it a try. Besides Qwen2, there are other LLM models you might want to explore, such as Llama3, Phi3, Mistral, etc. You can find all available models in the `Ollama model library <https://ollama.com/library>`_. Simply search for the model, pull it in a similar manner, and give it a try.
``` ```
### 3. Initialize Ragflow ### 3. Start `RAGFlow` Service
Ensure `vm.max_map_count` >= 262144:
> To check the value of `vm.max_map_count`: #### 3.1 Download `RAGFlow`
>
> ```bash
> $ sysctl vm.max_map_count
> ```
>
> Reset `vm.max_map_count` to a value at least 262144 if it is not.
>
> ```bash
> # In this case, we set it to 262144:
> $ sudo sysctl -w vm.max_map_count=262144
> ```
>
> This change will be reset after a system reboot. To ensure your change remains permanent, add or update the `vm.max_map_count` value in **/etc/sysctl.conf** accordingly:
>
> ```bash
> vm.max_map_count=262144
> ```
Clone the repo: You can either clone the repository or download the source zip from [github](https://github.com/infiniflow/ragflow/archive/refs/heads/main.zip):
```bash ```bash
$ git clone https://github.com/infiniflow/ragflow.git $ git clone https://github.com/infiniflow/ragflow.git
``` ```
### 4. Start up Ragflow server from Docker #### 3.2 Environment Settings
Ensure `vm.max_map_count` is set to at least 262144. To check the current value of `vm.max_map_count`, use:
```bash
$ sysctl vm.max_map_count
```
##### Changing `vm.max_map_count`
To set the value temporarily, use:
```bash
$ sudo sysctl -w vm.max_map_count=262144
```
To make the change permanent and ensure it persists after a reboot, add or update the following line in `/etc/sysctl.conf`:
```bash
vm.max_map_count=262144
```
### 3.3 Start the `RAGFlow` server using Docker
Build the pre-built Docker images and start up the server: Build the pre-built Docker images and start up the server:
> Running the following commands automatically downloads the *dev* version RAGFlow Docker image. To download and run a specified Docker version, update `RAGFLOW_VERSION` in **docker/.env** to the intended version, for example `RAGFLOW_VERSION=v0.7.0`, before running the following commands. ```eval_rst
.. note::
Running the following commands automatically downloads the *dev* version RAGFlow Docker image. To download and run a specified Docker version, update `RAGFLOW_VERSION` in **docker/.env** to the intended version, for example `RAGFLOW_VERSION=v0.7.0`, before running the following commands.
```
```bash ```bash
$ export no_proxy=localhost,127.0.0.1 $ export no_proxy=localhost,127.0.0.1
@ -115,8 +116,11 @@ $ chmod +x ./entrypoint.sh
$ docker compose up -d $ docker compose up -d
``` ```
```eval_rst
> The core image is about 9 GB in size and may take a while to load. .. note::
The core image is about 9 GB in size and may take a while to load.
```
Check the server status after having the server up and running: Check the server status after having the server up and running:
@ -124,7 +128,7 @@ Check the server status after having the server up and running:
$ docker logs -f ragflow-server $ docker logs -f ragflow-server
``` ```
_The following output confirms a successful launch of the system:_ Upon successful deployment, you will see logs in the terminal similar to the following:
```bash ```bash
____ ______ __ ____ ______ __
@ -139,15 +143,12 @@ _The following output confirms a successful launch of the system:_
* Running on http://x.x.x.x:9380 * Running on http://x.x.x.x:9380
INFO:werkzeug:Press CTRL+C to quit INFO:werkzeug:Press CTRL+C to quit
``` ```
> If you skip this confirmation step and directly log in to RAGFlow, your browser may prompt a `network anomaly` error because, at that moment, your RAGFlow may not be fully initialized.
In your web browser, enter the IP address of your server and log in to RAGFlow.
> With the default settings, you only need to enter `http://IP_OF_YOUR_MACHINE` (**sans** port number) as the default HTTP serving port `80` can be omitted when using the default configurations.
In [service_conf.yaml](./docker/service_conf.yaml), select the desired LLM factory in `user_default_llm` and update the `API_KEY` field with the corresponding API key.
> See [llm_api_key_setup](https://ragflow.io/docs/dev/llm_api_key_setup) for more information. Open a browser and navigate to the URL displayed in the terminal logs. Look for messages like `Running on http://ip:port`. For local deployment, you can usually access the web portal at `http://127.0.0.1:9380`. For remote access, use `http://your_ip:9380`.
### 5. Using the Ragflow
### 4. Using `RAGFlow`
```eval_rst ```eval_rst
.. note:: .. note::
@ -158,20 +159,21 @@ In [service_conf.yaml](./docker/service_conf.yaml), select the desired LLM facto
#### Log-in #### Log-in
If this is your first time using it, you need to register. After registering, log in with the registered account to access the interface. If this is your first time using RAGFlow, you will need to register. After registering, log in with your new account to access the portal.
<a href="https://llm-assets.readthedocs.io/en/latest/_images/ragflow-login.png" target="_blank"> <div style="display: flex; gap: 5px;">
<img src="https://llm-assets.readthedocs.io/en/latest/_images/ragflow-login.png" width="100%" /> <a href="https://llm-assets.readthedocs.io/en/latest/_images/ragflow-login.png" target="_blank" style="flex: 1;">
</a> <img src="https://llm-assets.readthedocs.io/en/latest/_images/ragflow-login.png" style="width: 100%;" />
</a>
<a href="https://llm-assets.readthedocs.io/en/latest/_images/ragflow-login2.png" target="_blank" style="flex: 1;">
<img src="https://llm-assets.readthedocs.io/en/latest/_images/ragflow-login2.png" style="width: 100%;" />
</a>
</div>
<a href="https://llm-assets.readthedocs.io/en/latest/_images/ragflow-login2.png" target="_blank">
<img src="https://llm-assets.readthedocs.io/en/latest/_images/ragflow-login2.png" width="100%" />
</a>
#### Configure `Ollama` service URL #### Configure `Ollama` service URL
Access the Ollama settings through **Settings -> Model Providers** in the menu. Fill out the and **Base url**, and then hit the **OK** button at the bottom. Access the Ollama settings through **Settings -> Model Providers** in the menu. Fill out the **Base URL**, and then click the **OK** button at the bottom.
<a href="https://llm-assets.readthedocs.io/en/latest/_images/ragflow-add-ollama.png" target="_blank"> <a href="https://llm-assets.readthedocs.io/en/latest/_images/ragflow-add-ollama.png" target="_blank">
@ -191,36 +193,40 @@ If the connection is successful, you will see the model listed down **Show more
``` ```
#### Create Knowledge Base #### Create Knowledge Base
Go to **Knowledge Base** after clicking **Knowledge Base** at the top bar. Hit the **+Create knowledge base** button on the right. You will be prompted to input a name for the knowledge base.
Go to **Knowledge Base** by clicking on **Knowledge Base** in the top bar. Click the **+Create knowledge base** button on the right. You will be prompted to input a name for the knowledge base.
<a href="https://llm-assets.readthedocs.io/en/latest/_images/ragflow-knowledgebase.png" target="_blank"> <a href="https://llm-assets.readthedocs.io/en/latest/_images/ragflow-knowledgebase.png" target="_blank">
<img src="https://llm-assets.readthedocs.io/en/latest/_images/ragflow-knowledgebase.png" width="100%" /> <img src="https://llm-assets.readthedocs.io/en/latest/_images/ragflow-knowledgebase.png" width="100%" />
</a> </a>
#### Edit Knowledge Base #### Edit Knowledge Base
After inputting a name, you will be directed to edit the knowledge base. Hit the **Dataset** on the left, and then hit **+ Add file -> Local files**.
<a href="https://llm-assets.readthedocs.io/en/latest/_images/ragflow-knowledgebase2.png" target="_blank"> After entering a name, you will be directed to edit the knowledge base. Click on **Dataset** on the left, then click **+ Add file -> Local files**. Upload your file in the pop-up window and click **OK**.
<img src="https://llm-assets.readthedocs.io/en/latest/_images/ragflow-knowledgebase2.png" width="100%" />
</a>
Choose the file you want to train, and hit the green start button marked to start parsing the file. <div style="display: flex; gap: 5px;">
<a href="https://llm-assets.readthedocs.io/en/latest/_images/ragflow-knowledgebase2.png" target="_blank" style="flex: 1;">
<img src="https://llm-assets.readthedocs.io/en/latest/_images/ragflow-knowledgebase2.png" style="width: 100%;" />
</a>
<a href="https://llm-assets.readthedocs.io/en/latest/_images/ragflow-knowledgebase3.png" target="_blank" style="flex: 1;">
<img src="https://llm-assets.readthedocs.io/en/latest/_images/ragflow-knowledgebase3.png" style="width: 100%;" />
</a>
</div>
<a href="https://llm-assets.readthedocs.io/en/latest/_images/ragflow-knowledgebase3.png" target="_blank"> After the upload is successful, you will see a new record in the dataset. The _**Parsing Status**_ column will show `UNSTARTED`. Click the green start button in the _**Action**_ column to begin file parsing. Once parsing is finished, the _**Parsing Status**_ column will change to **SUCCESS**.
<img src="https://llm-assets.readthedocs.io/en/latest/_images/ragflow-knowledgebase3.png" width="100%" />
</a>
<a href="https://llm-assets.readthedocs.io/en/latest/_images/ragflow-knowledgebase4.png" target="_blank"> <div style="display: flex; gap: 5px;">
<img src="https://llm-assets.readthedocs.io/en/latest/_images/ragflow-knowledgebase4.png" width="100%" /> <a href="https://llm-assets.readthedocs.io/en/latest/_images/ragflow-knowledgebase4.png" target="_blank" style="flex: 1;">
</a> <img src="https://llm-assets.readthedocs.io/en/latest/_images/ragflow-knowledgebase4.png" style="width: 100%;" />
</a>
<a href="https://llm-assets.readthedocs.io/en/latest/_images/ragflow-knowledgebase5.png" target="_blank" style="flex: 1;">
<img src="https://llm-assets.readthedocs.io/en/latest/_images/ragflow-knowledgebase5.png" style="width: 100%;" />
</a>
</div>
It will show **SUCCESS** when the parsing is completed.
<a href="https://llm-assets.readthedocs.io/en/latest/_images/ragflow-knowledgebase5.png" target="_blank"> Next, go to **Configuration** on the left menu and click **Save** at the bottom to save the changes.
<img src="https://llm-assets.readthedocs.io/en/latest/_images/ragflow-knowledgebase5.png" width="100%" />
</a>
Then you can go to **Configuration** and hit **Save** at the bottom to save the changes.
<a href="https://llm-assets.readthedocs.io/en/latest/_images/ragflow-knowledgebase6.png" target="_blank"> <a href="https://llm-assets.readthedocs.io/en/latest/_images/ragflow-knowledgebase6.png" target="_blank">
<img src="https://llm-assets.readthedocs.io/en/latest/_images/ragflow-knowledgebase6.png" width="100%" /> <img src="https://llm-assets.readthedocs.io/en/latest/_images/ragflow-knowledgebase6.png" width="100%" />
@ -228,27 +234,36 @@ Then you can go to **Configuration** and hit **Save** at the bottom to save the
#### Chat with the Model #### Chat with the Model
Start new conversations with **Chat** at the top navbar. Start new conversations by clicking **Chat** in the top navbar.
On the left side, create a conversation by clicking **Create an Assistant**. Under **Assistant Setting**, give it a name and select your knowledge bases.
On the left-side, create a conversation by clicking **Create an Assistant**. Under **Assistant Setting**, give it a name and select your Knowledgebases.
<a href="https://llm-assets.readthedocs.io/en/latest/_images/ragflow-chat.png" target="_blank"> <a href="https://llm-assets.readthedocs.io/en/latest/_images/ragflow-chat.png" target="_blank">
<img src="https://llm-assets.readthedocs.io/en/latest/_images/ragflow-chat.png" width="100%" /> <img src="https://llm-assets.readthedocs.io/en/latest/_images/ragflow-chat.png" width="100%" />
</a> </a>
Then go to **Model Setting**, choose your model added by Ollama. Make sure to disable the **Max Tokens** toggle and hit **OK** to start.
Next, go to **Model Setting**, choose your model added by Ollama, and disable the **Max Tokens** toggle. Finally, click **OK** to start.
```eval_rst
.. tip::
Enabling the **Max Tokens** toggle may result in very short answers.
```
<a href="https://llm-assets.readthedocs.io/en/latest/_images/ragflow-chat2.png" target="_blank"> <a href="https://llm-assets.readthedocs.io/en/latest/_images/ragflow-chat2.png" target="_blank">
<img src="https://llm-assets.readthedocs.io/en/latest/_images/ragflow-chat2.png" width="100%" /> <img src="https://llm-assets.readthedocs.io/en/latest/_images/ragflow-chat2.png" width="100%" />
</a> </a>
<br/> <br/>
Input your questions into the **Message Resume Assistant** textbox at the bottom, and click the button on the right to get responses. Input your questions into the **Message Resume Assistant** textbox at the bottom, and click the button on the right to get responses.
<a href="https://llm-assets.readthedocs.io/en/latest/_images/ragflow-chat3.png" target="_blank"> <a href="https://llm-assets.readthedocs.io/en/latest/_images/ragflow-chat3.png" target="_blank">
<img src="https://llm-assets.readthedocs.io/en/latest/_images/ragflow-chat3.png" width="100%" /> <img src="https://llm-assets.readthedocs.io/en/latest/_images/ragflow-chat3.png" width="100%" />
</a> </a>
#### Exit RAGFlow #### Exit
To shut down the RAGFlow server, use **Ctrl+C** in the terminal where the Ragflow server is runing, then close your browser tab. To shut down the RAGFlow server, use **Ctrl+C** in the terminal where the Ragflow server is runing, then close your browser tab.