Update llamaindex examples (#11940)
* modify rag.py * update readme of gpu example * update llamaindex cpu example and readme * add llamaindex doc * update note style * import before instancing IpexLLMEmbedding * update index in readme * update links * update link * update related links
This commit is contained in:
		
							parent
							
								
									23f51f87f0
								
							
						
					
					
						commit
						e23549f63f
					
				
					 4 changed files with 42 additions and 24 deletions
				
			
		| 
						 | 
					@ -14,12 +14,16 @@ The RAG example ([rag.py](./rag.py)) is adapted from the [Official llama index R
 | 
				
			||||||
 | 
					
 | 
				
			||||||
* **Install LlamaIndex Packages**
 | 
					* **Install LlamaIndex Packages**
 | 
				
			||||||
    ```bash
 | 
					    ```bash
 | 
				
			||||||
    pip install llama-index-readers-file llama-index-vector-stores-postgres llama-index-embeddings-huggingface
 | 
					    pip install llama-index-llms-ipex-llm==0.1.8
 | 
				
			||||||
 | 
					    pip install llama-index-embeddings-ipex-llm==0.1.5
 | 
				
			||||||
 | 
					    pip install llama-index-readers-file==0.1.33
 | 
				
			||||||
 | 
					    pip install llama-index-vector-stores-postgres==0.1.14
 | 
				
			||||||
 | 
					    pip install pymupdf
 | 
				
			||||||
    ```
 | 
					    ```
 | 
				
			||||||
 | 
					> [!NOTE]
 | 
				
			||||||
* **Install IPEX-LLM**
 | 
					> - You could refer [llama-index-llms-ipex-llm](https://docs.llamaindex.ai/en/stable/examples/llm/ipex_llm/) and [llama-index-embeddings-ipex-llm](https://docs.llamaindex.ai/en/stable/examples/embeddings/ipex_llm/) for more information.
 | 
				
			||||||
Ensure `ipex-llm` is installed by following the [IPEX-LLM Installation Guide](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Overview/install.html) before proceeding with the examples provided here. 
 | 
					> - The installation of `llama-index-llms-ipex-llm` or `llama-index-embeddings-ipex-llm` will also install `IPEX-LLM` and its dependencies.
 | 
				
			||||||
 | 
					> - `IpexLLMEmbedding` currently only provides optimization for Hugging Face Bge models.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
* **Database Setup (using PostgreSQL)**:
 | 
					* **Database Setup (using PostgreSQL)**:
 | 
				
			||||||
    * Installation: 
 | 
					    * Installation: 
 | 
				
			||||||
| 
						 | 
					
 | 
				
			||||||
| 
						 | 
					@ -16,7 +16,6 @@
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					
 | 
				
			||||||
import torch
 | 
					import torch
 | 
				
			||||||
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
 | 
					 | 
				
			||||||
from sqlalchemy import make_url
 | 
					from sqlalchemy import make_url
 | 
				
			||||||
from llama_index.vector_stores.postgres import PGVectorStore
 | 
					from llama_index.vector_stores.postgres import PGVectorStore
 | 
				
			||||||
# from llama_index.llms.llama_cpp import LlamaCPP
 | 
					# from llama_index.llms.llama_cpp import LlamaCPP
 | 
				
			||||||
| 
						 | 
					@ -161,10 +160,11 @@ def messages_to_prompt(messages):
 | 
				
			||||||
    return prompt
 | 
					    return prompt
 | 
				
			||||||
 | 
					
 | 
				
			||||||
def main(args):
 | 
					def main(args):
 | 
				
			||||||
    embed_model = HuggingFaceEmbedding(model_name=args.embedding_model_path)
 | 
					    from llama_index.embeddings.ipex_llm import IpexLLMEmbedding
 | 
				
			||||||
 | 
					    embed_model = IpexLLMEmbedding(model_name=args.embedding_model_path)
 | 
				
			||||||
    
 | 
					    
 | 
				
			||||||
    # Use custom LLM in BigDL
 | 
					    # Use custom LLM in BigDL
 | 
				
			||||||
    from ipex_llm.llamaindex.llms import IpexLLM
 | 
					    from llama_index.llms.ipex_llm import IpexLLM
 | 
				
			||||||
    llm = IpexLLM.from_model_id(
 | 
					    llm = IpexLLM.from_model_id(
 | 
				
			||||||
        model_name=args.model_path,
 | 
					        model_name=args.model_path,
 | 
				
			||||||
        tokenizer_name=args.tokenizer_path,
 | 
					        tokenizer_name=args.tokenizer_path,
 | 
				
			||||||
| 
						 | 
					
 | 
				
			||||||
| 
						 | 
					@ -8,17 +8,31 @@ This folder contains examples showcasing how to use [**LlamaIndex**](https://git
 | 
				
			||||||
## Retrieval-Augmented Generation (RAG) Example
 | 
					## Retrieval-Augmented Generation (RAG) Example
 | 
				
			||||||
The RAG example ([rag.py](./rag.py)) is adapted from the [Official llama index RAG example](https://docs.llamaindex.ai/en/stable/examples/low_level/oss_ingestion_retrieval.html). This example builds a pipeline to ingest data (e.g. llama2 paper in pdf format) into a vector database (e.g. PostgreSQL), and then build a retrieval pipeline from that vector database. 
 | 
					The RAG example ([rag.py](./rag.py)) is adapted from the [Official llama index RAG example](https://docs.llamaindex.ai/en/stable/examples/low_level/oss_ingestion_retrieval.html). This example builds a pipeline to ingest data (e.g. llama2 paper in pdf format) into a vector database (e.g. PostgreSQL), and then build a retrieval pipeline from that vector database. 
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					### 1. Install Prerequisites
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					To benefit from IPEX-LLM on Intel GPUs, there are several prerequisite steps for tools installation and environment preparation.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					If you are a Windows user, visit the [Install IPEX-LLM on Windows with Intel GPU Guide](../../../../../docs/mddocs/Quickstart/install_windows_gpu.md), and follow [Install Prerequisites](../../../../../docs/mddocs/Quickstart/install_windows_gpu.md#install-prerequisites) to update GPU driver (optional) and install Conda.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					If you are a Linux user, visit the [Install IPEX-LLM on Linux with Intel GPU](../../../../../docs/mddocs/Quickstart/install_linux_gpu.md), and follow [Install Prerequisites](../../../../../docs/mddocs/Quickstart/install_linux_gpu.md#install-prerequisites) to install GPU driver, Intel® oneAPI Base Toolkit 2024.0, and Conda.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					
 | 
				
			||||||
### 1. Setting up Dependencies 
 | 
					### 2. Setting up Dependencies 
 | 
				
			||||||
 | 
					
 | 
				
			||||||
* **Install LlamaIndex Packages**
 | 
					* **Install LlamaIndex Packages**
 | 
				
			||||||
    ```bash
 | 
					    ```bash
 | 
				
			||||||
    pip install llama-index-readers-file llama-index-vector-stores-postgres llama-index-embeddings-huggingface
 | 
					    conda activate <your-conda-env-name>
 | 
				
			||||||
 | 
					    pip install llama-index-llms-ipex-llm[xpu]==0.1.8 --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 | 
				
			||||||
 | 
					    pip install llama-index-embeddings-ipex-llm[xpu]==0.1.5 --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 | 
				
			||||||
 | 
					    pip install llama-index-readers-file==0.1.33
 | 
				
			||||||
 | 
					    pip install llama-index-vector-stores-postgres==0.1.14
 | 
				
			||||||
 | 
					    pip install pymupdf
 | 
				
			||||||
    ```
 | 
					    ```
 | 
				
			||||||
* **Install IPEX-LLM**
 | 
					> [!NOTE]
 | 
				
			||||||
 | 
					> - You could refer [llama-index-llms-ipex-llm](https://docs.llamaindex.ai/en/stable/examples/llm/ipex_llm_gpu/) and [llama-index-embeddings-ipex-llm](https://docs.llamaindex.ai/en/stable/examples/embeddings/ipex_llm_gpu/) for more information.
 | 
				
			||||||
    Follow the instructions in [GPU Install Guide](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Overview/install.html) to install ipex-llm.
 | 
					> - The installation of `llama-index-llms-ipex-llm` or `llama-index-embeddings-ipex-llm` will also install `IPEX-LLM` and its dependencies. 
 | 
				
			||||||
 | 
					> - You can also use `https://pytorch-extension.intel.com/release-whl/stable/xpu/cn/` as the `extra-indel-url`.
 | 
				
			||||||
 | 
					> - `IpexLLMEmbedding` currently only provides optimization for Hugging Face Bge models.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
* **Database Setup (using PostgreSQL)**:
 | 
					* **Database Setup (using PostgreSQL)**:
 | 
				
			||||||
    * Linux
 | 
					    * Linux
 | 
				
			||||||
| 
						 | 
					@ -71,7 +85,7 @@ The RAG example ([rag.py](./rag.py)) is adapted from the [Official llama index R
 | 
				
			||||||
    wget --user-agent "Mozilla" "https://arxiv.org/pdf/2307.09288.pdf" -O "data/llama2.pdf"
 | 
					    wget --user-agent "Mozilla" "https://arxiv.org/pdf/2307.09288.pdf" -O "data/llama2.pdf"
 | 
				
			||||||
    ```
 | 
					    ```
 | 
				
			||||||
 | 
					
 | 
				
			||||||
### 2. Configures OneAPI environment variables for Linux
 | 
					### 3. Configures OneAPI environment variables for Linux
 | 
				
			||||||
 | 
					
 | 
				
			||||||
> [!NOTE]
 | 
					> [!NOTE]
 | 
				
			||||||
> Skip this step if you are running on Windows.
 | 
					> Skip this step if you are running on Windows.
 | 
				
			||||||
| 
						 | 
					@ -82,9 +96,9 @@ This is a required step on Linux for APT or offline installed oneAPI. Skip this
 | 
				
			||||||
source /opt/intel/oneapi/setvars.sh
 | 
					source /opt/intel/oneapi/setvars.sh
 | 
				
			||||||
```
 | 
					```
 | 
				
			||||||
 | 
					
 | 
				
			||||||
### 3. Runtime Configurations
 | 
					### 4. Runtime Configurations
 | 
				
			||||||
For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 | 
					For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
 | 
				
			||||||
#### 3.1 Configurations for Linux
 | 
					#### 4.1 Configurations for Linux
 | 
				
			||||||
<details>
 | 
					<details>
 | 
				
			||||||
 | 
					
 | 
				
			||||||
<summary>For Intel Arc™ A-Series Graphics and Intel Data Center GPU Flex Series</summary>
 | 
					<summary>For Intel Arc™ A-Series Graphics and Intel Data Center GPU Flex Series</summary>
 | 
				
			||||||
| 
						 | 
					@ -121,7 +135,7 @@ export BIGDL_LLM_XMX_DISABLED=1
 | 
				
			||||||
 | 
					
 | 
				
			||||||
</details>
 | 
					</details>
 | 
				
			||||||
 | 
					
 | 
				
			||||||
#### 3.2 Configurations for Windows
 | 
					#### 4.2 Configurations for Windows
 | 
				
			||||||
<details>
 | 
					<details>
 | 
				
			||||||
 | 
					
 | 
				
			||||||
<summary>For Intel iGPU</summary>
 | 
					<summary>For Intel iGPU</summary>
 | 
				
			||||||
| 
						 | 
					@ -147,7 +161,7 @@ set SYCL_CACHE_PERSISTENT=1
 | 
				
			||||||
> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 | 
					> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					
 | 
				
			||||||
### 4. Running the RAG example
 | 
					### 5. Running the RAG example
 | 
				
			||||||
 | 
					
 | 
				
			||||||
In the current directory, run the example with command:
 | 
					In the current directory, run the example with command:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
| 
						 | 
					@ -164,7 +178,7 @@ python rag.py -m <path_to_model> -t <path_to_tokenizer>
 | 
				
			||||||
- `-n N_PREDICT`: max predict tokens
 | 
					- `-n N_PREDICT`: max predict tokens
 | 
				
			||||||
- `-t TOKENIZER_PATH`: **Required**, path to the tokenizer model
 | 
					- `-t TOKENIZER_PATH`: **Required**, path to the tokenizer model
 | 
				
			||||||
 | 
					
 | 
				
			||||||
### 5. Example Output
 | 
					### 6. Example Output
 | 
				
			||||||
 | 
					
 | 
				
			||||||
A query such as **"How does Llama 2 compare to other open-source models?"** with the Llama2 paper as the data source, using the `Llama-2-7b-chat-hf` model, will produce the output like below:
 | 
					A query such as **"How does Llama 2 compare to other open-source models?"** with the Llama2 paper as the data source, using the `Llama-2-7b-chat-hf` model, will produce the output like below:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
| 
						 | 
					@ -178,6 +192,6 @@ However, it's important to note that the performance of Llama 2 can vary dependi
 | 
				
			||||||
In conclusion, while Llama 2 performs well on most benchmarks compared to other open-source models, its performance
 | 
					In conclusion, while Llama 2 performs well on most benchmarks compared to other open-source models, its performance
 | 
				
			||||||
```
 | 
					```
 | 
				
			||||||
 | 
					
 | 
				
			||||||
### 6. Trouble shooting
 | 
					### 7. Trouble shooting
 | 
				
			||||||
#### 6.1 Core dump
 | 
					#### 7.1 Core dump
 | 
				
			||||||
If you encounter a core dump error in your Python code, it is crucial to verify that the `import torch` statement is placed at the top of your Python file, just as what we did in `rag.py`.
 | 
					If you encounter a core dump error in your Python code, it is crucial to verify that the `import torch` statement is placed at the top of your Python file, just as what we did in `rag.py`.
 | 
				
			||||||
| 
						 | 
					@ -15,7 +15,6 @@
 | 
				
			||||||
#
 | 
					#
 | 
				
			||||||
 | 
					
 | 
				
			||||||
import torch
 | 
					import torch
 | 
				
			||||||
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
 | 
					 | 
				
			||||||
from sqlalchemy import make_url
 | 
					from sqlalchemy import make_url
 | 
				
			||||||
from llama_index.vector_stores.postgres import PGVectorStore
 | 
					from llama_index.vector_stores.postgres import PGVectorStore
 | 
				
			||||||
# from llama_index.llms.llama_cpp import LlamaCPP
 | 
					# from llama_index.llms.llama_cpp import LlamaCPP
 | 
				
			||||||
| 
						 | 
					@ -160,10 +159,11 @@ def messages_to_prompt(messages):
 | 
				
			||||||
    return prompt
 | 
					    return prompt
 | 
				
			||||||
 | 
					
 | 
				
			||||||
def main(args):
 | 
					def main(args):
 | 
				
			||||||
    embed_model = HuggingFaceEmbedding(model_name=args.embedding_model_path)
 | 
					    from llama_index.embeddings.ipex_llm import IpexLLMEmbedding
 | 
				
			||||||
 | 
					    embed_model = IpexLLMEmbedding(model_name=args.embedding_model_path, device="xpu")
 | 
				
			||||||
    
 | 
					    
 | 
				
			||||||
    # Use custom LLM in BigDL
 | 
					    # Use custom LLM in BigDL
 | 
				
			||||||
    from ipex_llm.llamaindex.llms import IpexLLM
 | 
					    from llama_index.llms.ipex_llm import IpexLLM
 | 
				
			||||||
    llm = IpexLLM.from_model_id(
 | 
					    llm = IpexLLM.from_model_id(
 | 
				
			||||||
        model_name=args.model_path,
 | 
					        model_name=args.model_path,
 | 
				
			||||||
        tokenizer_name=args.tokenizer_path,
 | 
					        tokenizer_name=args.tokenizer_path,
 | 
				
			||||||
| 
						 | 
					
 | 
				
			||||||
		Loading…
	
		Reference in a new issue