Update llamaindex examples (#11940)

* modify rag.py

* update readme of gpu example

* update llamaindex cpu example and readme

* add llamaindex doc

* update note style

* import before instancing IpexLLMEmbedding

* update index in readme

* update links

* update link

* update related links
This commit is contained in:
hxsz1997 2024-08-28 09:03:44 +03:00 committed by GitHub
parent 23f51f87f0
commit e23549f63f
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
4 changed files with 42 additions and 24 deletions

View file

@ -14,12 +14,16 @@ The RAG example ([rag.py](./rag.py)) is adapted from the [Official llama index R
* **Install LlamaIndex Packages** * **Install LlamaIndex Packages**
```bash ```bash
pip install llama-index-readers-file llama-index-vector-stores-postgres llama-index-embeddings-huggingface pip install llama-index-llms-ipex-llm==0.1.8
pip install llama-index-embeddings-ipex-llm==0.1.5
pip install llama-index-readers-file==0.1.33
pip install llama-index-vector-stores-postgres==0.1.14
pip install pymupdf
``` ```
> [!NOTE]
* **Install IPEX-LLM** > - You could refer [llama-index-llms-ipex-llm](https://docs.llamaindex.ai/en/stable/examples/llm/ipex_llm/) and [llama-index-embeddings-ipex-llm](https://docs.llamaindex.ai/en/stable/examples/embeddings/ipex_llm/) for more information.
Ensure `ipex-llm` is installed by following the [IPEX-LLM Installation Guide](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Overview/install.html) before proceeding with the examples provided here. > - The installation of `llama-index-llms-ipex-llm` or `llama-index-embeddings-ipex-llm` will also install `IPEX-LLM` and its dependencies.
> - `IpexLLMEmbedding` currently only provides optimization for Hugging Face Bge models.
* **Database Setup (using PostgreSQL)**: * **Database Setup (using PostgreSQL)**:
* Installation: * Installation:

View file

@ -16,7 +16,6 @@
import torch import torch
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from sqlalchemy import make_url from sqlalchemy import make_url
from llama_index.vector_stores.postgres import PGVectorStore from llama_index.vector_stores.postgres import PGVectorStore
# from llama_index.llms.llama_cpp import LlamaCPP # from llama_index.llms.llama_cpp import LlamaCPP
@ -161,10 +160,11 @@ def messages_to_prompt(messages):
return prompt return prompt
def main(args): def main(args):
embed_model = HuggingFaceEmbedding(model_name=args.embedding_model_path) from llama_index.embeddings.ipex_llm import IpexLLMEmbedding
embed_model = IpexLLMEmbedding(model_name=args.embedding_model_path)
# Use custom LLM in BigDL # Use custom LLM in BigDL
from ipex_llm.llamaindex.llms import IpexLLM from llama_index.llms.ipex_llm import IpexLLM
llm = IpexLLM.from_model_id( llm = IpexLLM.from_model_id(
model_name=args.model_path, model_name=args.model_path,
tokenizer_name=args.tokenizer_path, tokenizer_name=args.tokenizer_path,

View file

@ -8,17 +8,31 @@ This folder contains examples showcasing how to use [**LlamaIndex**](https://git
## Retrieval-Augmented Generation (RAG) Example ## Retrieval-Augmented Generation (RAG) Example
The RAG example ([rag.py](./rag.py)) is adapted from the [Official llama index RAG example](https://docs.llamaindex.ai/en/stable/examples/low_level/oss_ingestion_retrieval.html). This example builds a pipeline to ingest data (e.g. llama2 paper in pdf format) into a vector database (e.g. PostgreSQL), and then build a retrieval pipeline from that vector database. The RAG example ([rag.py](./rag.py)) is adapted from the [Official llama index RAG example](https://docs.llamaindex.ai/en/stable/examples/low_level/oss_ingestion_retrieval.html). This example builds a pipeline to ingest data (e.g. llama2 paper in pdf format) into a vector database (e.g. PostgreSQL), and then build a retrieval pipeline from that vector database.
### 1. Install Prerequisites
To benefit from IPEX-LLM on Intel GPUs, there are several prerequisite steps for tools installation and environment preparation.
If you are a Windows user, visit the [Install IPEX-LLM on Windows with Intel GPU Guide](../../../../../docs/mddocs/Quickstart/install_windows_gpu.md), and follow [Install Prerequisites](../../../../../docs/mddocs/Quickstart/install_windows_gpu.md#install-prerequisites) to update GPU driver (optional) and install Conda.
If you are a Linux user, visit the [Install IPEX-LLM on Linux with Intel GPU](../../../../../docs/mddocs/Quickstart/install_linux_gpu.md), and follow [Install Prerequisites](../../../../../docs/mddocs/Quickstart/install_linux_gpu.md#install-prerequisites) to install GPU driver, Intel® oneAPI Base Toolkit 2024.0, and Conda.
### 1. Setting up Dependencies ### 2. Setting up Dependencies
* **Install LlamaIndex Packages** * **Install LlamaIndex Packages**
```bash ```bash
pip install llama-index-readers-file llama-index-vector-stores-postgres llama-index-embeddings-huggingface conda activate <your-conda-env-name>
pip install llama-index-llms-ipex-llm[xpu]==0.1.8 --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
pip install llama-index-embeddings-ipex-llm[xpu]==0.1.5 --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
pip install llama-index-readers-file==0.1.33
pip install llama-index-vector-stores-postgres==0.1.14
pip install pymupdf
``` ```
* **Install IPEX-LLM** > [!NOTE]
> - You could refer [llama-index-llms-ipex-llm](https://docs.llamaindex.ai/en/stable/examples/llm/ipex_llm_gpu/) and [llama-index-embeddings-ipex-llm](https://docs.llamaindex.ai/en/stable/examples/embeddings/ipex_llm_gpu/) for more information.
Follow the instructions in [GPU Install Guide](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Overview/install.html) to install ipex-llm. > - The installation of `llama-index-llms-ipex-llm` or `llama-index-embeddings-ipex-llm` will also install `IPEX-LLM` and its dependencies.
> - You can also use `https://pytorch-extension.intel.com/release-whl/stable/xpu/cn/` as the `extra-indel-url`.
> - `IpexLLMEmbedding` currently only provides optimization for Hugging Face Bge models.
* **Database Setup (using PostgreSQL)**: * **Database Setup (using PostgreSQL)**:
* Linux * Linux
@ -71,7 +85,7 @@ The RAG example ([rag.py](./rag.py)) is adapted from the [Official llama index R
wget --user-agent "Mozilla" "https://arxiv.org/pdf/2307.09288.pdf" -O "data/llama2.pdf" wget --user-agent "Mozilla" "https://arxiv.org/pdf/2307.09288.pdf" -O "data/llama2.pdf"
``` ```
### 2. Configures OneAPI environment variables for Linux ### 3. Configures OneAPI environment variables for Linux
> [!NOTE] > [!NOTE]
> Skip this step if you are running on Windows. > Skip this step if you are running on Windows.
@ -82,9 +96,9 @@ This is a required step on Linux for APT or offline installed oneAPI. Skip this
source /opt/intel/oneapi/setvars.sh source /opt/intel/oneapi/setvars.sh
``` ```
### 3. Runtime Configurations ### 4. Runtime Configurations
For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device. For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
#### 3.1 Configurations for Linux #### 4.1 Configurations for Linux
<details> <details>
<summary>For Intel Arc™ A-Series Graphics and Intel Data Center GPU Flex Series</summary> <summary>For Intel Arc™ A-Series Graphics and Intel Data Center GPU Flex Series</summary>
@ -121,7 +135,7 @@ export BIGDL_LLM_XMX_DISABLED=1
</details> </details>
#### 3.2 Configurations for Windows #### 4.2 Configurations for Windows
<details> <details>
<summary>For Intel iGPU</summary> <summary>For Intel iGPU</summary>
@ -147,7 +161,7 @@ set SYCL_CACHE_PERSISTENT=1
> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. > For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
### 4. Running the RAG example ### 5. Running the RAG example
In the current directory, run the example with command: In the current directory, run the example with command:
@ -164,7 +178,7 @@ python rag.py -m <path_to_model> -t <path_to_tokenizer>
- `-n N_PREDICT`: max predict tokens - `-n N_PREDICT`: max predict tokens
- `-t TOKENIZER_PATH`: **Required**, path to the tokenizer model - `-t TOKENIZER_PATH`: **Required**, path to the tokenizer model
### 5. Example Output ### 6. Example Output
A query such as **"How does Llama 2 compare to other open-source models?"** with the Llama2 paper as the data source, using the `Llama-2-7b-chat-hf` model, will produce the output like below: A query such as **"How does Llama 2 compare to other open-source models?"** with the Llama2 paper as the data source, using the `Llama-2-7b-chat-hf` model, will produce the output like below:
@ -178,6 +192,6 @@ However, it's important to note that the performance of Llama 2 can vary dependi
In conclusion, while Llama 2 performs well on most benchmarks compared to other open-source models, its performance In conclusion, while Llama 2 performs well on most benchmarks compared to other open-source models, its performance
``` ```
### 6. Trouble shooting ### 7. Trouble shooting
#### 6.1 Core dump #### 7.1 Core dump
If you encounter a core dump error in your Python code, it is crucial to verify that the `import torch` statement is placed at the top of your Python file, just as what we did in `rag.py`. If you encounter a core dump error in your Python code, it is crucial to verify that the `import torch` statement is placed at the top of your Python file, just as what we did in `rag.py`.

View file

@ -15,7 +15,6 @@
# #
import torch import torch
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from sqlalchemy import make_url from sqlalchemy import make_url
from llama_index.vector_stores.postgres import PGVectorStore from llama_index.vector_stores.postgres import PGVectorStore
# from llama_index.llms.llama_cpp import LlamaCPP # from llama_index.llms.llama_cpp import LlamaCPP
@ -160,10 +159,11 @@ def messages_to_prompt(messages):
return prompt return prompt
def main(args): def main(args):
embed_model = HuggingFaceEmbedding(model_name=args.embedding_model_path) from llama_index.embeddings.ipex_llm import IpexLLMEmbedding
embed_model = IpexLLMEmbedding(model_name=args.embedding_model_path, device="xpu")
# Use custom LLM in BigDL # Use custom LLM in BigDL
from ipex_llm.llamaindex.llms import IpexLLM from llama_index.llms.ipex_llm import IpexLLM
llm = IpexLLM.from_model_id( llm = IpexLLM.from_model_id(
model_name=args.model_path, model_name=args.model_path,
tokenizer_name=args.tokenizer_path, tokenizer_name=args.tokenizer_path,