Update GraphRAG QuickStart (#11995)
* Update GraphRAG QuickStart * Further updates * Small fixes * Small fix
This commit is contained in:
		
							parent
							
								
									01099f08ee
								
							
						
					
					
						commit
						643458d8f0
					
				
					 1 changed files with 80 additions and 19 deletions
				
			
		| 
						 | 
				
			
			@ -9,12 +9,16 @@ The [GraphRAG project](https://github.com/microsoft/graphrag) is designed to lev
 | 
			
		|||
- [Setup Python Environment for GraphRAG](#3-setup-python-environment-for-graphrag)
 | 
			
		||||
- [Index GraphRAG](#4-index-graphrag)
 | 
			
		||||
- [Query GraphRAG](#5-query-graphrag)
 | 
			
		||||
- [Query GraphRAG](#5-query-graphrag)
 | 
			
		||||
- [Troubleshooting](#troubleshooting)
 | 
			
		||||
 | 
			
		||||
## Quickstart
 | 
			
		||||
 | 
			
		||||
### 1. Install and Start `Ollama` Service on Intel GPU 
 | 
			
		||||
 | 
			
		||||
Follow the steps in [Run Ollama with IPEX-LLM on Intel GPU Guide](./ollama_quickstart.md) to install and run Ollama on Intel GPU. Ensure that `ollama serve` is running correctly and can be accessed through a local URL (e.g., `https://127.0.0.1:11434`).
 | 
			
		||||
Follow the steps in [Run Ollama with IPEX-LLM on Intel GPU Guide](./ollama_quickstart.md) to install `ipex-llm[cpp]==2.1.0` and run Ollama on Intel GPU. Ensure that `ollama serve` is running correctly and can be accessed through a local URL (e.g., `https://127.0.0.1:11434`).
 | 
			
		||||
 | 
			
		||||
**Please note that for GraphRAG, we highly recommand using the stable version of ipex-llm through `pip install ipex-llm[cpp]==2.1.0`**.
 | 
			
		||||
 | 
			
		||||
### 2. Prepare LLM and Embedding Model
 | 
			
		||||
 | 
			
		||||
| 
						 | 
				
			
			@ -57,6 +61,7 @@ conda create -n graphrag-local-ollama python=3.10
 | 
			
		|||
conda activate graphrag-local-ollama
 | 
			
		||||
 | 
			
		||||
pip install -e .
 | 
			
		||||
pip install future
 | 
			
		||||
 | 
			
		||||
pip install ollama
 | 
			
		||||
pip install plotly
 | 
			
		||||
| 
						 | 
				
			
			@ -64,6 +69,9 @@ pip install plotly
 | 
			
		|||
 | 
			
		||||
in which `pip install ollama` is for enabling restful APIs through python, and `pip install plotly` is for visualizing the knowledge graph.
 | 
			
		||||
 | 
			
		||||
> [!NOTE]
 | 
			
		||||
> Please note that the Python environment for GraphRAG setup here is separate from the one for Ollama server on Intel GPUs.
 | 
			
		||||
 | 
			
		||||
### 4. Index GraphRAG
 | 
			
		||||
 | 
			
		||||
The environment is now ready for GraphRAG with local LLMs and embedding models running on Intel GPUs. Before querying GraphRAG, it is necessary to first index GraphRAG, which could be a resource-intensive operation.
 | 
			
		||||
| 
						 | 
				
			
			@ -114,24 +122,25 @@ Perpare the input corpus, and then initialize the workspace:
 | 
			
		|||
#### Update `settings.yml`
 | 
			
		||||
 | 
			
		||||
In the `settings.yml` file inside the `ragtest` folder, add the configuration `request_timeout: 1800.0` for `llm`. Besides, if you would like to use LLMs or embedding models other than `mistral` or `nomic-embed-text`, you are required to update the `settings.yml` in `ragtest` folder accordingly:
 | 
			
		||||
>
 | 
			
		||||
> ```yml
 | 
			
		||||
> llm:
 | 
			
		||||
>   api_key: ${GRAPHRAG_API_KEY}
 | 
			
		||||
>   type: openai_chat
 | 
			
		||||
>   model: mistral # change it accordingly if using another LLM
 | 
			
		||||
>   model_supports_json: true
 | 
			
		||||
>   request_timeout: 1800.0 # add this configuration; you could also increase the request_timeout
 | 
			
		||||
>   api_base: http://localhost:11434/v1
 | 
			
		||||
> 
 | 
			
		||||
> embeddings:
 | 
			
		||||
>   async_mode: threaded
 | 
			
		||||
>   llm:
 | 
			
		||||
>     api_key: ${GRAPHRAG_API_KEY}
 | 
			
		||||
>     type: openai_embedding
 | 
			
		||||
>     model: nomic_embed_text # change it accordingly if using another embedding model
 | 
			
		||||
>     api_base: http://localhost:11434/api
 | 
			
		||||
> ```
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
```yml
 | 
			
		||||
llm:
 | 
			
		||||
  api_key: ${GRAPHRAG_API_KEY}
 | 
			
		||||
  type: openai_chat
 | 
			
		||||
  model: mistral # change it accordingly if using another LLM
 | 
			
		||||
  model_supports_json: true
 | 
			
		||||
  request_timeout: 1800.0 # add this configuration; you could also increase the request_timeout
 | 
			
		||||
  api_base: http://localhost:11434/v1
 | 
			
		||||
 | 
			
		||||
embeddings:
 | 
			
		||||
  async_mode: threaded
 | 
			
		||||
  llm:
 | 
			
		||||
    api_key: ${GRAPHRAG_API_KEY}
 | 
			
		||||
    type: openai_embedding
 | 
			
		||||
    model: nomic_embed_text # change it accordingly if using another embedding model
 | 
			
		||||
    api_base: http://localhost:11434/api
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
#### Conduct GraphRAG indexing
 | 
			
		||||
 | 
			
		||||
| 
						 | 
				
			
			@ -197,3 +206,55 @@ The Transformer model has been very successful in various natural language proce
 | 
			
		|||
 | 
			
		||||
Since its initial introduction, the Transformer model has been further developed and improved upon. Variants of the Transformer architecture, such as BERT (Bidirectional Encoder Representations from Transformers) and RoBERTa (Robustly Optimized BERT Pretraining Approach), have achieved state-of-the-art performance on a wide range of natural language processing tasks [Data: Reports (1, 2, 34, 46, 64, +more)].
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
### Troubleshooting
 | 
			
		||||
 | 
			
		||||
#### `failed to find free space in the KV cache, retrying with smaller n_batch` when conducting GraphRAG Indexing, and `JSONDecodeError` when querying GraphRAG
 | 
			
		||||
 | 
			
		||||
If you observe the Ollama server log showing `failed to find free space in the KV cache, retrying with smaller n_batch` while conducting GraphRAG indexing, and receive `JSONDecodeError` when querying GraphRAG, try to increase context length for the LLM model and index/query GraphRAG again.
 | 
			
		||||
 | 
			
		||||
Here introduce how to make the LLM model support larger context. To do this, we need to first create a file named `Modelfile`:
 | 
			
		||||
 | 
			
		||||
```
 | 
			
		||||
FROM mistral:latest
 | 
			
		||||
PARAMETER num_ctx 4096
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
> [!TIP]
 | 
			
		||||
> Here we increase `num_ctx` to 4096 as an example. You could adjust it accordingly.
 | 
			
		||||
 | 
			
		||||
and then use the following commands to create a new model in Ollama named `mistral:latest-nctx4096`:
 | 
			
		||||
 | 
			
		||||
- For **Linux users**:
 | 
			
		||||
 | 
			
		||||
  ```bash
 | 
			
		||||
  ./ollama create mistral:latest-nctx4096 -f Modelfile
 | 
			
		||||
  ```
 | 
			
		||||
 | 
			
		||||
- For **Windows users**:
 | 
			
		||||
 | 
			
		||||
  Please run the following command in Miniforge or Anaconda Prompt.
 | 
			
		||||
 | 
			
		||||
  ```cmd
 | 
			
		||||
  ollama create mistral:latest-nctx4096 -f Modelfile
 | 
			
		||||
  ```
 | 
			
		||||
 | 
			
		||||
Finally, update `settings.yml` inside the `ragtest` folder to use `llm` model `mistral:latest-nctx4096`:
 | 
			
		||||
 | 
			
		||||
```yml
 | 
			
		||||
llm:
 | 
			
		||||
  api_key: ${GRAPHRAG_API_KEY}
 | 
			
		||||
  type: openai_chat
 | 
			
		||||
  model: mistral:latest-nctx4096 # change it accordingly if using another LLM, or LLM model with larger num_ctx
 | 
			
		||||
  model_supports_json: true
 | 
			
		||||
  request_timeout: 1800.0 # add this configuration; you could also increase the request_timeout
 | 
			
		||||
  api_base: http://localhost:11434/v1
 | 
			
		||||
 | 
			
		||||
embeddings:
 | 
			
		||||
  async_mode: threaded
 | 
			
		||||
  llm:
 | 
			
		||||
    api_key: ${GRAPHRAG_API_KEY}
 | 
			
		||||
    type: openai_embedding
 | 
			
		||||
    model: nomic_embed_text # change it accordingly if using another embedding model
 | 
			
		||||
    api_base: http://localhost:11434/api
 | 
			
		||||
```
 | 
			
		||||
		Loading…
	
		Reference in a new issue