Zheng, Yi a4a1dec064 Add a cpu example of HuggingFace Transformers Agent (use vicuna-7b-v1.5) (#9284 )

* Add examples of HF Agent

* Modify folder structure and add link of demo.jpg

* Fixes of readme

* Merge applications and Applications

2023-10-27 17:14:12 +08:00

3.1 KiB

Raw Blame History

BigDL-LLM Transformers INT4 Optimization for HuggingFace Transformers Agent

In this example, we apply low-bit optimizations to HuggingFace Transformers Agents using BigDL-LLM, which allows LLMs to use tools such as image generation, image captioning, text summarization, etc.

For illustration purposes, we utilize the lmsys/vicuna-7b-v1.5 as the reference model. We use lmsys/vicuna-7b-v1.5 to create an agent, and then ask the agent to generate the caption for an image from coco dataset, i.e. demo.jpg

0. Requirements

To run this example with BigDL-LLM, we have some recommended requirements for your machine, please refer to here for more information.

1. Install

We suggest using conda to manage environment:

conda create -n llm python=3.9
conda activate llm

pip install bigdl-llm[all] # install bigdl-llm with 'all' option
pip install pillow # additional package required for opening images

2. Run

python ./run_agent.py --repo-id-or-model-path REPO_ID_OR_MODEL_PATH --image-path IMAGE_PATH

Arguments info:

--repo-id-or-model-path REPO_ID_OR_MODEL_PATH: argument defining the huggingface repo id for the Vicuna model (e.g. lmsys/vicuna-7b-v1.5) to be downloaded, or the path to the huggingface checkpoint folder. It is default to be 'lmsys/vicuna-7b-v1.5'.
--image-path IMAGE_PATH: argument defining the image to be infered. It is default to be demo.jpg.

Note

: When loading the model in 4-bit, BigDL-LLM converts linear layers in the model into INT4 format. In theory, a XB model saved in 16-bit will requires approximately 2X GB of memory for loading, and ~0.5X GB memory for further inference.

Please select the appropriate size of the Vicuna model based on the capabilities of your machine.

2.1 Client

On client Windows machine, it is recommended to run directly with full utilization of all cores:

python ./run_agent.py

2.2 Server

For optimal performance on server, it is recommended to set several environment variables (refer to here for more information), and run the example with all the physical cores of a single socket.

E.g. on Linux,

# set BigDL-Nano env variables
source bigdl-nano-init

# e.g. for a server with 48 cores per socket
export OMP_NUM_THREADS=48
numactl -C 0-47 -m 0 python ./run_agent.py

2.3 Sample Output

lmsys/vicuna-7b-v1.5

Image path: demo.jpg
== Prompt ==
Generate a caption for the 'image'
==Explanation from the agent==
I will use the following tool: `image_captioner` to generate a caption for the image.


==Code generated by the agent==
caption = image_captioner(image)


==Result==
a little girl holding a stuffed teddy bear

3.1 KiB Raw Blame History