| 
				 | 
			||
|---|---|---|
| .. | ||
| README.md | ||
| run_agent.py | ||
IPEX-LLM Transformers INT4 Optimization for HuggingFace Transformers Agent
In this example, we apply low-bit optimizations to HuggingFace Transformers Agents using IPEX-LLM, which allows LLMs to use tools such as image generation, image captioning, text summarization, etc.
For illustration purposes, we utilize the lmsys/vicuna-7b-v1.5 as the reference model. We use lmsys/vicuna-7b-v1.5 to create an agent, and then ask the agent to generate the caption for an image from coco dataset, i.e. demo.jpg
0. Requirements
To run this example with IPEX-LLM, we have some recommended requirements for your machine, please refer to here for more information.
1. Install
We suggest using conda to manage environment:
conda create -n llm python=3.11
conda activate llm
pip install ipex-llm[all] # install ipex-llm with 'all' option
pip install pillow # additional package required for opening images
2. Run
python ./run_agent.py --repo-id-or-model-path REPO_ID_OR_MODEL_PATH --image-path IMAGE_PATH
Arguments info:
--repo-id-or-model-path REPO_ID_OR_MODEL_PATH: argument defining the huggingface repo id for the Vicuna model (e.g.lmsys/vicuna-7b-v1.5) to be downloaded, or the path to the huggingface checkpoint folder. It is default to be'lmsys/vicuna-7b-v1.5'.--image-path IMAGE_PATH: argument defining the image to be infered.
Note
: When loading the model in 4-bit, IPEX-LLM converts linear layers in the model into INT4 format. In theory, a XB model saved in 16-bit will requires approximately 2X GB of memory for loading, and ~0.5X GB memory for further inference.
Please select the appropriate size of the Vicuna model based on the capabilities of your machine.
2.1 Client
On client Windows machine, it is recommended to run directly with full utilization of all cores:
python ./run_agent.py --image-path IMAGE_PATH
2.2 Server
For optimal performance on server, it is recommended to set several environment variables (refer to here for more information), and run the example with all the physical cores of a single socket.
E.g. on Linux,
# set IPEX-LLM env variables
source ipex-llm-init
# e.g. for a server with 48 cores per socket
export OMP_NUM_THREADS=48
numactl -C 0-47 -m 0 python ./run_agent.py --image-path IMAGE_PATH
2.3 Sample Output
demo.jpg
lmsys/vicuna-7b-v1.5
Image path: demo.jpg
== Prompt ==
Generate a caption for the 'image'
==Explanation from the agent==
I will use the following tool: `image_captioner` to generate a caption for the image.
==Code generated by the agent==
caption = image_captioner(image)
==Result==
a little girl holding a stuffed teddy bear