History

Shaojun Liu f37a1f2a81 Upgrade to python 3.11 (#10711 ) * create conda env with python 3.11 * recommend to use Python 3.11 * update		2024-04-09 17:41:17 +08:00
..
README.md	Upgrade to python 3.11 (#10711 )	2024-04-09 17:41:17 +08:00
run_agent.py	Refactor bigdl.llm to ipex_llm (#24 )	2024-03-22 15:41:21 +08:00

README.md

IPEX-LLM Transformers INT4 Optimization for HuggingFace Transformers Agent

In this example, we apply low-bit optimizations to HuggingFace Transformers Agents using IPEX-LLM, which allows LLMs to use tools such as image generation, image captioning, text summarization, etc.

For illustration purposes, we utilize the lmsys/vicuna-7b-v1.5 as the reference model. We use lmsys/vicuna-7b-v1.5 to create an agent, and then ask the agent to generate the caption for an image from coco dataset, i.e. demo.jpg

0. Requirements

To run this example with IPEX-LLM, we have some recommended requirements for your machine, please refer to here for more information.

1. Install

We suggest using conda to manage environment:

conda create -n llm python=3.11
conda activate llm

pip install ipex-llm[all] # install ipex-llm with 'all' option
pip install pillow # additional package required for opening images

2. Run

python ./run_agent.py --repo-id-or-model-path REPO_ID_OR_MODEL_PATH --image-path IMAGE_PATH

Arguments info:

--repo-id-or-model-path REPO_ID_OR_MODEL_PATH: argument defining the huggingface repo id for the Vicuna model (e.g. lmsys/vicuna-7b-v1.5) to be downloaded, or the path to the huggingface checkpoint folder. It is default to be 'lmsys/vicuna-7b-v1.5'.
--image-path IMAGE_PATH: argument defining the image to be infered.

Note

: When loading the model in 4-bit, IPEX-LLM converts linear layers in the model into INT4 format. In theory, a XB model saved in 16-bit will requires approximately 2X GB of memory for loading, and ~0.5X GB memory for further inference.

Please select the appropriate size of the Vicuna model based on the capabilities of your machine.

2.1 Client

On client Windows machine, it is recommended to run directly with full utilization of all cores:

python ./run_agent.py --image-path IMAGE_PATH

2.2 Server

For optimal performance on server, it is recommended to set several environment variables (refer to here for more information), and run the example with all the physical cores of a single socket.

E.g. on Linux,

# set IPEX-LLM env variables
source ipex-llm-init

# e.g. for a server with 48 cores per socket
export OMP_NUM_THREADS=48
numactl -C 0-47 -m 0 python ./run_agent.py --image-path IMAGE_PATH

2.3 Sample Output

demo.jpg

lmsys/vicuna-7b-v1.5

Image path: demo.jpg
== Prompt ==
Generate a caption for the 'image'
==Explanation from the agent==
I will use the following tool: `image_captioner` to generate a caption for the image.


==Code generated by the agent==
caption = image_captioner(image)


==Result==
a little girl holding a stuffed teddy bear