* fix: remove BIGDL_LLM_XMX_DISABLED in mddocs * fix: remove set SYCL_CACHE_PERSISTENT=1 in example * fix: remove BIGDL_LLM_XMX_DISABLED in workflows * fix: merge igpu and A-series Graphics * fix: remove set BIGDL_LLM_XMX_DISABLED=1 in example * fix: remove BIGDL_LLM_XMX_DISABLED in workflows * fix: merge igpu and A-series Graphics * fix: textual adjustment * fix: textual adjustment * fix: textual adjustment |
||
|---|---|---|
| .. | ||
| chat.py | ||
| low_bit.py | ||
| rag.py | ||
| README.md | ||
Langchain Example
The examples in this folder shows how to use LangChain with ipex-llm on Intel GPU.
Note
Please refer here for upstream LangChain LLM documentation with ipex-llm and here for upstream LangChain embedding documentation with ipex-llm.
0. Requirements
To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to here for more information.
1. Install
1.1 Installation on Linux
We suggest using conda to manage environment:
conda create -n llm python=3.11
conda activate llm
pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
1.2 Installation on Windows
We suggest using conda to manage environment:
conda create -n llm python=3.11 libuv
conda activate llm
pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
2. Configures OneAPI environment variables for Linux
Note
Skip this step if you are running on Windows.
This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
source /opt/intel/oneapi/setvars.sh
3. Runtime Configurations
For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
3.1 Configurations for Linux
For Intel Arc™ A-Series Graphics and Intel Data Center GPU Flex Series
export USE_XETLA=OFF
export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
export SYCL_CACHE_PERSISTENT=1
For Intel Data Center GPU Max Series
export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
export SYCL_CACHE_PERSISTENT=1
export ENABLE_SDP_FUSION=1
Note: Please note that
libtcmalloc.socan be installed byconda install -c conda-forge -y gperftools=2.10.
For Intel iGPU
export SYCL_CACHE_PERSISTENT=1
3.2 Configurations for Windows
For Intel iGPU and Intel Arc™ A-Series Graphics
set SYCL_CACHE_PERSISTENT=1
Note
For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
4. Run examples with LangChain
4.1. Example: Streaming Chat
Install LangChain dependencies:
pip install -U langchain langchain-community
In the current directory, run the example with command:
python chat.py -m MODEL_PATH -q QUESTION
Additional Parameters for Configuration:
-m MODEL_PATH: required, path to the model-q QUESTION: question to ask. Default isWhat is AI?.
4.2. Example: Retrival Augmented Generation (RAG)
The RAG example (rag.py) shows how to load the input text into vector database, and then use LangChain to build a retrival pipeline.
Install LangChain dependencies:
pip install -U langchain langchain-community langchain-chroma sentence-transformers==3.0.1
In the current directory, run the example with command:
python rag.py -m <path_to_llm_model> -e <path_to_embedding_model> [-q QUESTION] [-i INPUT_PATH]
Additional Parameters for Configuration:
-m LLM_MODEL_PATH: required, path to the model.-e EMBEDDING_MODEL_PATH: required, path to the embedding model.-q QUESTION: question to ask. Default isWhat is IPEX-LLM?.-i INPUT_PATH: path to the input doc.
4.3. Example: Low Bit
The low_bit example (low_bit.py) showcases how to use use LangChain with low_bit optimized model.LangChain
By save_low_bit we save the weights of low_bit model into the target folder.
Note
save_low_bitonly saves the weights of the model. Users could copy the tokenizer model into the target folder or specifytokenizer_idduring initialization.
Install LangChain dependencies:
pip install -U langchain langchain-community
In the current directory, run the example with command:
python low_bit.py -m <path_to_model> -t <path_to_target> [-q <your question>]
Additional Parameters for Configuration:
-m MODEL_PATH: Required, the path to the model-t TARGET_PATH: Required, the path to save the low_bit model-q QUESTION: question to ask. Default isWhat is AI?.