Add --modelscope option for glm-v4 MiniCPM-V-2_6 glm-edge and internvl2 (#12583)

* Add --modelscope option for glm-v4 and MiniCPM-V-2_6

* glm-edge

* minicpm-v-2_6:don't use model_hub=modelscope when use lowbit; internvl2

---------

Co-authored-by: ATMxsp01 <shou.xu@intel.com>
This commit is contained in:
Xu, Shuo 2024-12-20 13:54:17 +08:00 committed by GitHub
parent f3b5fad3be
commit b0338c5529
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
8 changed files with 142 additions and 39 deletions

View file

@ -1,5 +1,5 @@
# GLM-Edge # GLM-Edge
In this directory, you will find examples on how you could apply IPEX-LLM INT4 optimizations on GLM-Edge models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [THUDM/glm-edge-1.5b-chat](https://huggingface.co/THUDM/glm-edge-1.5b-chat) and [THUDM/glm-edge-4b-chat](https://huggingface.co/THUDM/glm-edge-4b-chat) as reference GLM-Edge models. In this directory, you will find examples on how you could apply IPEX-LLM INT4 optimizations on GLM-Edge models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [THUDM/glm-edge-1.5b-chat](https://huggingface.co/THUDM/glm-edge-1.5b-chat) and [THUDM/glm-edge-4b-chat](https://huggingface.co/THUDM/glm-edge-4b-chat) (or [ZhipuAI/glm-edge-1.5b-chat](https://www.modelscope.cn/models/ZhipuAI/glm-edge-1.5b-chat) and [ZhipuAI/glm-edge-4b-chat](https://www.modelscope.cn/models/ZhipuAI/glm-edge-4b-chat) for ModelScope) as reference GLM-Edge models.
## 0. Requirements ## 0. Requirements
To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information. To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
@ -17,6 +17,9 @@ pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-exte
pip install transformers==4.47.0 pip install transformers==4.47.0
pip install accelerate==0.33.0 pip install accelerate==0.33.0
pip install "trl<0.12.0" pip install "trl<0.12.0"
# [optional] only needed if you would like to use ModelScope as model hub
pip install modelscope==1.11.0
``` ```
### 1.2 Installation on Windows ### 1.2 Installation on Windows
@ -32,6 +35,9 @@ pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-exte
pip install transformers==4.47.0 pip install transformers==4.47.0
pip install accelerate==0.33.0 pip install accelerate==0.33.0
pip install "trl<0.12.0" pip install "trl<0.12.0"
# [optional] only needed if you would like to use ModelScope as model hub
pip install modelscope==1.11.0
``` ```
## 2. Configures OneAPI environment variables for Linux ## 2. Configures OneAPI environment variables for Linux
@ -102,14 +108,19 @@ set SYCL_CACHE_PERSISTENT=1
### Example 1: Predict Tokens using `generate()` API ### Example 1: Predict Tokens using `generate()` API
In the example [generate.py](./generate.py), we show a basic use case for a GLM-Edge model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs. In the example [generate.py](./generate.py), we show a basic use case for a GLM-Edge model to predict the next N tokens using `generate()` API, with IPEX-LLM INT4 optimizations on Intel GPUs.
``` ```bash
# for Hugging Face model hub
python ./generate.py --repo-id-or-model-path REPO_ID_OR_MODEL_PATH --prompt PROMPT --n-predict N_PREDICT python ./generate.py --repo-id-or-model-path REPO_ID_OR_MODEL_PATH --prompt PROMPT --n-predict N_PREDICT
# for ModelScope model hub
python ./generate.py --repo-id-or-model-path REPO_ID_OR_MODEL_PATH --prompt PROMPT --n-predict N_PREDICT --modelscope
``` ```
Arguments info: Arguments info:
- `--repo-id-or-model-path REPO_ID_OR_MODEL_PATH`: argument defining the huggingface repo id for the GLM-Edge model (e.g. `THUDM/glm-edge-1.5b-chat` or `THUDM/glm-edge-4b-chat`) to be downloaded, or the path to the huggingface checkpoint folder. It is default to be `'THUDM/glm-edge-4b-chat'`. - `--repo-id-or-model-path REPO_ID_OR_MODEL_PATH`: argument defining the huggingface repo id for the GLM-Edge model (e.g. `THUDM/glm-edge-1.5b-chat` or `THUDM/glm-edge-4b-chat`) to be downloaded, or the path to the huggingface checkpoint folder. It is default to be `'THUDM/glm-edge-4b-chat'` for **Hugging Face** or `'ZhipuAI/glm-edge-4b-chat'` for **ModelScope**.
- `--prompt PROMPT`: argument defining the prompt to be infered (with integrated prompt format for chat). It is default to be `'AI是什么'`. - `--prompt PROMPT`: argument defining the prompt to be infered (with integrated prompt format for chat). It is default to be `'AI是什么'`.
- `--n-predict N_PREDICT`: argument defining the max number of tokens to predict. It is default to be `32`. - `--n-predict N_PREDICT`: argument defining the max number of tokens to predict. It is default to be `32`.
- `--modelscope`: using **ModelScope** as model hub instead of **Hugging Face**.
#### Sample Output #### Sample Output
#### [THUDM/glm-edge-1.5b-chat](https://huggingface.co/THUDM/glm-edge-1.5b-chat) #### [THUDM/glm-edge-1.5b-chat](https://huggingface.co/THUDM/glm-edge-1.5b-chat)

View file

@ -19,21 +19,32 @@ import time
import argparse import argparse
from ipex_llm.transformers import AutoModelForCausalLM from ipex_llm.transformers import AutoModelForCausalLM
from transformers import AutoTokenizer
if __name__ == '__main__': if __name__ == '__main__':
parser = argparse.ArgumentParser(description='Predict Tokens using `generate()` API for GLM-Edge model') parser = argparse.ArgumentParser(description='Predict Tokens using `generate()` API for GLM-Edge model')
parser.add_argument('--repo-id-or-model-path', type=str, default="THUDM/glm-edge-4b-chat", parser.add_argument('--repo-id-or-model-path', type=str,
help='The huggingface repo id for the GLM-Edge model to be downloaded' help='The Hugging Face or ModelScope repo id for the GLM-Edge model to be downloaded'
', or the path to the huggingface checkpoint folder') ', or the path to the checkpoint folder')
parser.add_argument('--prompt', type=str, default="AI是什么", parser.add_argument('--prompt', type=str, default="AI是什么",
help='Prompt to infer') help='Prompt to infer')
parser.add_argument('--n-predict', type=int, default=32, parser.add_argument('--n-predict', type=int, default=32,
help='Max tokens to predict') help='Max tokens to predict')
parser.add_argument('--modelscope', action="store_true", default=False,
help="Use models from modelscope")
args = parser.parse_args() args = parser.parse_args()
model_path = args.repo_id_or_model_path
if args.modelscope:
from modelscope import AutoTokenizer
model_hub = 'modelscope'
else:
from transformers import AutoTokenizer
model_hub = 'huggingface'
model_path = args.repo_id_or_model_path if args.repo_id_or_model_path else \
("ZhipuAI/glm-edge-4b-chat" if args.modelscope else "THUDM/glm-edge-4b-chat")
# Load model in 4 bit, # Load model in 4 bit,
# which convert the relevant layers in the model into INT4 format # which convert the relevant layers in the model into INT4 format
@ -43,7 +54,8 @@ if __name__ == '__main__':
load_in_4bit=True, load_in_4bit=True,
optimize_model=True, optimize_model=True,
trust_remote_code=True, trust_remote_code=True,
use_cache=True) use_cache=True,
model_hub=model_hub)
model = model.half().to("xpu") model = model.half().to("xpu")
# Load tokenizer # Load tokenizer

View file

@ -1,5 +1,5 @@
# MiniCPM-V-2_6 # MiniCPM-V-2_6
In this directory, you will find examples on how you could apply IPEX-LLM INT4 optimizations on MiniCPM-V-2_6 model on [Intel GPUs](../../../README.md). For illustration purposes, we utilize [openbmb/MiniCPM-V-2_6](https://huggingface.co/openbmb/MiniCPM-V-2_6) as reference MiniCPM-V-2_6 model. In this directory, you will find examples on how you could apply IPEX-LLM INT4 optimizations on MiniCPM-V-2_6 model on [Intel GPUs](../../../README.md). For illustration purposes, we utilize [openbmb/MiniCPM-V-2_6](https://huggingface.co/openbmb/MiniCPM-V-2_6) (or [OpenBMB/MiniCPM-V-2_6](https://www.modelscope.cn/models/OpenBMB/MiniCPM-V-2_6) for ModelScope) as reference MiniCPM-V-2_6 model.
## 0. Requirements ## 0. Requirements
To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information. To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
@ -16,6 +16,9 @@ conda activate llm
pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
pip install transformers==4.40.0 "trl<0.12.0" pip install transformers==4.40.0 "trl<0.12.0"
# [optional] only needed if you would like to use ModelScope as model hub
pip install modelscope==1.11.0
``` ```
#### 1.2 Installation on Windows #### 1.2 Installation on Windows
@ -28,6 +31,9 @@ conda activate llm
pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
pip install transformers==4.40.0 "trl<0.12.0" pip install transformers==4.40.0 "trl<0.12.0"
# [optional] only needed if you would like to use ModelScope as model hub
pip install modelscope==1.11.0
``` ```
### 2. Configures OneAPI environment variables for Linux ### 2. Configures OneAPI environment variables for Linux
@ -96,31 +102,48 @@ set SYCL_CACHE_PERSISTENT=1
### 4. Running examples ### 4. Running examples
- chat without streaming mode: - chat without streaming mode:
``` ```bash
# for Hugging Face model hub
python ./chat.py --prompt 'What is in the image?' python ./chat.py --prompt 'What is in the image?'
# for ModelScope model hub
python ./chat.py --prompt 'What is in the image?' --modelscope
``` ```
- chat in streaming mode: - chat in streaming mode:
``` ```bash
# for Hugging Face model hub
python ./chat.py --prompt 'What is in the image?' --stream python ./chat.py --prompt 'What is in the image?' --stream
# for ModelScope model hub
python ./chat.py --prompt 'What is in the image?' --stream --modelscope
``` ```
- save model with low-bit optimization (if `LOWBIT_MODEL_PATH` does not exist) - save model with low-bit optimization (if `LOWBIT_MODEL_PATH` does not exist)
``` ```bash
# for Hugging Face model hub
python ./chat.py --repo-id-or-model-path REPO_ID_OR_MODEL_PATH --lowbit-path LOWBIT_MODEL_PATH --prompt 'What is in the image?' python ./chat.py --repo-id-or-model-path REPO_ID_OR_MODEL_PATH --lowbit-path LOWBIT_MODEL_PATH --prompt 'What is in the image?'
# for ModelScope model hub
python ./chat.py --repo-id-or-model-path REPO_ID_OR_MODEL_PATH --lowbit-path LOWBIT_MODEL_PATH --prompt 'What is in the image?' --modelscope
``` ```
- chat with saved model with low-bit optimization (if `LOWBIT_MODEL_PATH` exists): - chat with saved model with low-bit optimization (if `LOWBIT_MODEL_PATH` exists):
``` ```bash
# for Hugging Face model hub
python ./chat.py --lowbit-path LOWBIT_MODEL_PATH --prompt 'What is in the image?' python ./chat.py --lowbit-path LOWBIT_MODEL_PATH --prompt 'What is in the image?'
# for ModelScope model hub
python ./chat.py --lowbit-path LOWBIT_MODEL_PATH --prompt 'What is in the image?' --modelscope
``` ```
> [!TIP] > [!TIP]
> For chatting in streaming mode, it is recommended to set the environment variable `PYTHONUNBUFFERED=1`. > For chatting in streaming mode, it is recommended to set the environment variable `PYTHONUNBUFFERED=1`.
Arguments info: Arguments info:
- `--repo-id-or-model-path REPO_ID_OR_MODEL_PATH`: argument defining the huggingface repo id for the MiniCPM-V-2_6 (e.g. `openbmb/MiniCPM-V-2_6`) to be downloaded, or the path to the huggingface checkpoint folder. It is default to be `'openbmb/MiniCPM-V-2_6'`. - `--repo-id-or-model-path REPO_ID_OR_MODEL_PATH`: argument defining the **Hugging Face** or **ModelScope** repo id for the MiniCPM-V-2_6 (e.g. `openbmb/MiniCPM-V-2_6`) to be downloaded, or the path to the checkpoint folder. It is default to be `'openbmb/MiniCPM-V-2_6'` for **Hugging Face** or `'OpenBMB/MiniCPM-V-2_6'` for **ModelScope**.
- `--lowbit-path LOWBIT_MODEL_PATH`: argument defining the path to save/load the model with IPEX-LLM low-bit optimization. If it is an empty string, the original pretrained model specified by `REPO_ID_OR_MODEL_PATH` will be loaded. If it is an existing path, the saved model with low-bit optimization in `LOWBIT_MODEL_PATH` will be loaded. If it is a non-existing path, the original pretrained model specified by `REPO_ID_OR_MODEL_PATH` will be loaded, and the optimized low-bit model will be saved into `LOWBIT_MODEL_PATH`. It is default to be `''`, i.e. an empty string. - `--lowbit-path LOWBIT_MODEL_PATH`: argument defining the path to save/load the model with IPEX-LLM low-bit optimization. If it is an empty string, the original pretrained model specified by `REPO_ID_OR_MODEL_PATH` will be loaded. If it is an existing path, the saved model with low-bit optimization in `LOWBIT_MODEL_PATH` will be loaded. If it is a non-existing path, the original pretrained model specified by `REPO_ID_OR_MODEL_PATH` will be loaded, and the optimized low-bit model will be saved into `LOWBIT_MODEL_PATH`. It is default to be `''`, i.e. an empty string.
- `--image-url-or-path IMAGE_URL_OR_PATH`: argument defining the image to be infered. It is default to be `'http://farm6.staticflickr.com/5268/5602445367_3504763978_z.jpg'`. - `--image-url-or-path IMAGE_URL_OR_PATH`: argument defining the image to be infered. It is default to be `'http://farm6.staticflickr.com/5268/5602445367_3504763978_z.jpg'`.
- `--prompt PROMPT`: argument defining the prompt to be infered (with integrated prompt format for chat). It is default to be `'What is in the image?'`. - `--prompt PROMPT`: argument defining the prompt to be infered (with integrated prompt format for chat). It is default to be `'What is in the image?'`.
- `--stream`: flag to chat in streaming mode - `--stream`: flag to chat in streaming mode
- `--modelscope`: using **ModelScope** as model hub instead of **Hugging Face**.
#### Sample Output #### Sample Output

View file

@ -22,14 +22,14 @@ import requests
import torch import torch
from PIL import Image from PIL import Image
from ipex_llm.transformers import AutoModel from ipex_llm.transformers import AutoModel
from transformers import AutoTokenizer, AutoProcessor from transformers import AutoProcessor
if __name__ == '__main__': if __name__ == '__main__':
parser = argparse.ArgumentParser(description='Predict Tokens using `chat()` API for openbmb/MiniCPM-V-2_6 model') parser = argparse.ArgumentParser(description='Predict Tokens using `chat()` API for openbmb/MiniCPM-V-2_6 model')
parser.add_argument('--repo-id-or-model-path', type=str, default="openbmb/MiniCPM-V-2_6", parser.add_argument('--repo-id-or-model-path', type=str,
help='The huggingface repo id for the openbmb/MiniCPM-V-2_6 model to be downloaded' help='The Hugging Face or ModelScope repo id for the MiniCPM-V-2_6 model to be downloaded'
', or the path to the huggingface checkpoint folder') ', or the path to the checkpoint folder')
parser.add_argument("--lowbit-path", type=str, parser.add_argument("--lowbit-path", type=str,
default="", default="",
help="The path to the saved model folder with IPEX-LLM low-bit optimization. " help="The path to the saved model folder with IPEX-LLM low-bit optimization. "
@ -44,9 +44,20 @@ if __name__ == '__main__':
help='Prompt to infer') help='Prompt to infer')
parser.add_argument('--stream', action='store_true', parser.add_argument('--stream', action='store_true',
help='Whether to chat in streaming mode') help='Whether to chat in streaming mode')
parser.add_argument('--modelscope', action="store_true", default=False,
help="Use models from modelscope")
args = parser.parse_args() args = parser.parse_args()
model_path = args.repo_id_or_model_path
if args.modelscope:
from modelscope import AutoTokenizer
model_hub = 'modelscope'
else:
from transformers import AutoTokenizer
model_hub = 'huggingface'
model_path = args.repo_id_or_model_path if args.repo_id_or_model_path else \
("OpenBMB/MiniCPM-V-2_6" if args.modelscope else "openbmb/MiniCPM-V-2_6")
image_path = args.image_url_or_path image_path = args.image_url_or_path
lowbit_path = args.lowbit_path lowbit_path = args.lowbit_path
@ -61,7 +72,8 @@ if __name__ == '__main__':
optimize_model=True, optimize_model=True,
trust_remote_code=True, trust_remote_code=True,
use_cache=True, use_cache=True,
modules_to_not_convert=["vpm", "resampler"]) modules_to_not_convert=["vpm", "resampler"],
model_hub=model_hub)
tokenizer = AutoTokenizer.from_pretrained(model_path, tokenizer = AutoTokenizer.from_pretrained(model_path,
trust_remote_code=True) trust_remote_code=True)

View file

@ -1,5 +1,5 @@
# GLM-4V # GLM-4V
In this directory, you will find examples on how you could apply IPEX-LLM FP8 optimizations on GLM-4V models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [THUDM/glm-4v-9b](https://huggingface.co/THUDM/glm-4v-9b) as a reference GLM-4V model. In this directory, you will find examples on how you could apply IPEX-LLM FP8 optimizations on GLM-4V models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [THUDM/glm-4v-9b](https://huggingface.co/THUDM/glm-4v-9b) (or [ZhipuAI/glm-4v-9b](https://www.modelscope.cn/models/ZhipuAI/glm-4v-9b) for ModelScope) as a reference GLM-4V model.
## 0. Requirements ## 0. Requirements
To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information. To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
@ -16,6 +16,9 @@ conda activate llm
pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
pip install tiktoken transformers==4.42.4 "trl<0.12.0" pip install tiktoken transformers==4.42.4 "trl<0.12.0"
# [optional] only needed if you would like to use ModelScope as model hub
pip install modelscope==1.11.0
``` ```
#### 1.2 Installation on Windows #### 1.2 Installation on Windows
@ -28,6 +31,9 @@ conda activate llm
pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
pip install tiktoken transformers==4.42.4 "trl<0.12.0" pip install tiktoken transformers==4.42.4 "trl<0.12.0"
# [optional] only needed if you would like to use ModelScope as model hub
pip install modelscope==1.11.0
``` ```
### 2. Configures OneAPI environment variables for Linux ### 2. Configures OneAPI environment variables for Linux
@ -95,15 +101,20 @@ set SYCL_CACHE_PERSISTENT=1
> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. > For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
### 4. Running examples ### 4. Running examples
``` ```bash
python ./generate.py --prompt 'What is in the image?' # for Hugging Face model hub
python ./generate.py --repo-id-or-model-path REPO_ID_OR_MODEL_PATH --prompt PROMPT --n-predict N_PREDICT --image-url-or-path IMAGE_URL_OR_PATH
# for ModelScope model hub
python ./generate.py --repo-id-or-model-path REPO_ID_OR_MODEL_PATH --prompt PROMPT --n-predict N_PREDICT --image-url-or-path IMAGE_URL_OR_PATH --modelscope
``` ```
Arguments info: Arguments info:
- `--repo-id-or-model-path REPO_ID_OR_MODEL_PATH`: argument defining the huggingface repo id for the GLM-4V model (e.g. `THUDM/glm-4v-9b`) to be downloaded, or the path to the huggingface checkpoint folder. It is default to be `'THUDM/glm-4v-9b'`. - `--repo-id-or-model-path REPO_ID_OR_MODEL_PATH`: argument defining the **Hugging Face** or **ModelScope** repo id for the GLM-4V model (e.g. `THUDM/glm-4v-9b`) to be downloaded, or the path to the checkpoint folder. It is default to be `'THUDM/glm-4v-9b'` for **Hugging Face** or `'ZhipuAI/glm-4v-9b'` for **ModelScope**.
- `--image-url-or-path IMAGE_URL_OR_PATH`: argument defining the image to be infered. It is default to be `'http://farm6.staticflickr.com/5268/5602445367_3504763978_z.jpg'`. - `--image-url-or-path IMAGE_URL_OR_PATH`: argument defining the image to be infered. It is default to be `'http://farm6.staticflickr.com/5268/5602445367_3504763978_z.jpg'`.
- `--prompt PROMPT`: argument defining the prompt to be infered (with integrated prompt format for chat). It is default to be `'What is in the image?'`. - `--prompt PROMPT`: argument defining the prompt to be infered (with integrated prompt format for chat). It is default to be `'What is in the image?'`.
- `--n-predict N_PREDICT`: argument defining the max number of tokens to predict. It is default to be `32`. - `--n-predict N_PREDICT`: argument defining the max number of tokens to predict. It is default to be `32`.
- `--modelscope`: using **ModelScope** as model hub instead of **Hugging Face**.
#### Sample Output #### Sample Output
#### [THUDM/glm-4v-9b](https://huggingface.co/THUDM/glm-4v-9b) #### [THUDM/glm-4v-9b](https://huggingface.co/THUDM/glm-4v-9b)

View file

@ -22,13 +22,12 @@ import requests
from PIL import Image from PIL import Image
from ipex_llm.transformers import AutoModelForCausalLM from ipex_llm.transformers import AutoModelForCausalLM
from transformers import AutoTokenizer
if __name__ == '__main__': if __name__ == '__main__':
parser = argparse.ArgumentParser(description='Predict Tokens using `generate()` API for THUDM/glm-4v-9b model') parser = argparse.ArgumentParser(description='Predict Tokens using `generate()` API for THUDM/glm-4v-9b model')
parser.add_argument('--repo-id-or-model-path', type=str, default="THUDM/glm-4v-9b", parser.add_argument('--repo-id-or-model-path', type=str,
help='The huggingface repo id for the THUDM/glm-4v-9b model to be downloaded' help='The Hugging Face or ModelScope repo id for the glm-4v model to be downloaded'
', or the path to the huggingface checkpoint folder') ', or the path to the checkpoint folder')
parser.add_argument('--image-url-or-path', type=str, parser.add_argument('--image-url-or-path', type=str,
default='http://farm6.staticflickr.com/5268/5602445367_3504763978_z.jpg', default='http://farm6.staticflickr.com/5268/5602445367_3504763978_z.jpg',
help='The URL or path to the image to infer') help='The URL or path to the image to infer')
@ -36,9 +35,20 @@ if __name__ == '__main__':
help='Prompt to infer') help='Prompt to infer')
parser.add_argument('--n-predict', type=int, default=32, parser.add_argument('--n-predict', type=int, default=32,
help='Max tokens to predict') help='Max tokens to predict')
parser.add_argument('--modelscope', action="store_true", default=False,
help="Use models from modelscope")
args = parser.parse_args() args = parser.parse_args()
model_path = args.repo_id_or_model_path
if args.modelscope:
from modelscope import AutoTokenizer
model_hub = 'modelscope'
else:
from transformers import AutoTokenizer
model_hub = 'huggingface'
model_path = args.repo_id_or_model_path if args.repo_id_or_model_path else \
("ZhipuAI/glm-4v-9b" if args.modelscope else "THUDM/glm-4v-9b")
image_path = args.image_url_or_path image_path = args.image_url_or_path
# Load model in 4 bit, # Load model in 4 bit,
@ -49,7 +59,9 @@ if __name__ == '__main__':
load_in_4bit=True, load_in_4bit=True,
optimize_model=True, optimize_model=True,
trust_remote_code=True, trust_remote_code=True,
use_cache=True).half().to('xpu') use_cache=True,
model_hub=model_hub)
model = model.half().to('xpu')
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True) tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)

View file

@ -22,22 +22,32 @@ import requests
import torch import torch
from PIL import Image from PIL import Image
from ipex_llm.transformers import AutoModelForCausalLM from ipex_llm.transformers import AutoModelForCausalLM
from transformers import AutoTokenizer, CLIPImageProcessor from transformers import CLIPImageProcessor
if __name__ == '__main__': if __name__ == '__main__':
parser = argparse.ArgumentParser(description='Predict Tokens using `chat()` API for OpenGVLab/InternVL2-4B model') parser = argparse.ArgumentParser(description='Predict Tokens using `chat()` API for OpenGVLab/InternVL2-4B model')
parser.add_argument('--repo-id-or-model-path', type=str, default="OpenGVLab/InternVL2-4B", parser.add_argument('--repo-id-or-model-path', type=str, default="OpenGVLab/InternVL2-4B",
help='The huggingface repo id for the OpenGVLab/InternVL2-4B model to be downloaded' help='The Hugging Face or ModelScope repo id for the InternVL2 model to be downloaded'
', or the path to the huggingface checkpoint folder') ', or the path to the checkpoint folder')
parser.add_argument('--image-url-or-path', type=str, parser.add_argument('--image-url-or-path', type=str,
default='https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/tests/data/tiger.jpeg', default='https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/tests/data/tiger.jpeg',
help='The URL or path to the image to infer') help='The URL or path to the image to infer')
parser.add_argument('--prompt', type=str, default="What is in the image?", parser.add_argument('--prompt', type=str, default="What is in the image?",
help='Prompt to infer') help='Prompt to infer')
parser.add_argument('--n-predict', type=int, default=64, help='Max tokens to predict') parser.add_argument('--n-predict', type=int, default=64, help='Max tokens to predict')
parser.add_argument('--modelscope', action="store_true", default=False,
help="Use models from modelscope")
args = parser.parse_args() args = parser.parse_args()
if args.modelscope:
from modelscope import AutoTokenizer
model_hub = 'modelscope'
else:
from transformers import AutoTokenizer
model_hub = 'huggingface'
model_path = args.repo_id_or_model_path model_path = args.repo_id_or_model_path
image_path = args.image_url_or_path image_path = args.image_url_or_path
n_predict = args.n_predict n_predict = args.n_predict
@ -48,7 +58,8 @@ if __name__ == '__main__':
# This will allow the memory-intensive embedding layer to utilize the CPU instead of iGPU. # This will allow the memory-intensive embedding layer to utilize the CPU instead of iGPU.
model = AutoModelForCausalLM.from_pretrained(model_path, trust_remote_code=True, model = AutoModelForCausalLM.from_pretrained(model_path, trust_remote_code=True,
load_in_low_bit="sym_int4", load_in_low_bit="sym_int4",
modules_to_not_convert=["vision_model"]) modules_to_not_convert=["vision_model"],
model_hub=model_hub)
model = model.half().to('xpu') model = model.half().to('xpu')
tokenizer = AutoTokenizer.from_pretrained(model_path, tokenizer = AutoTokenizer.from_pretrained(model_path,
trust_remote_code=True) trust_remote_code=True)

View file

@ -1,5 +1,5 @@
# InternVL2 # InternVL2
In this directory, you will find examples on how you could apply IPEX-LLM INT4 optimizations on InternVL2 model on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [OpenGVLab/InternVL2-4B](https://huggingface.co/OpenGVLab/InternVL2-4B) as a reference InternVL2 model. In this directory, you will find examples on how you could apply IPEX-LLM INT4 optimizations on InternVL2 model on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [OpenGVLab/InternVL2-4B](https://huggingface.co/OpenGVLab/InternVL2-4B) (or [OpenGVLab/InternVL2-4B](https://www.modelscope.cn/models/OpenGVLab/InternVL2-4B) for ModelScope) as a reference InternVL2 model.
## 0. Requirements ## 0. Requirements
To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information. To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
@ -17,6 +17,9 @@ pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-exte
pip install einops timm pip install einops timm
# [optional] only needed if you would like to use ModelScope as model hub
pip install modelscope==1.11.0
``` ```
#### 1.2 Installation on Windows #### 1.2 Installation on Windows
@ -30,6 +33,9 @@ pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-exte
pip install einops timm pip install einops timm
# [optional] only needed if you would like to use ModelScope as model hub
pip install modelscope==1.11.0
``` ```
### 2. Configures OneAPI environment variables for Linux ### 2. Configures OneAPI environment variables for Linux
@ -98,15 +104,20 @@ set SYCL_CACHE_PERSISTENT=1
### 4. Running examples ### 4. Running examples
- chat with specified prompt: - chat with specified prompt:
``` ```bash
python ./chat.py --prompt 'What is in the image?' # for Hugging Face model hub
python ./generate.py --repo-id-or-model-path REPO_ID_OR_MODEL_PATH --prompt PROMPT --n-predict N_PREDICT --image-url-or-path IMAGE_URL_OR_PATH
# for ModelScope model hub
python ./generate.py --repo-id-or-model-path REPO_ID_OR_MODEL_PATH --prompt PROMPT --n-predict N_PREDICT --image-url-or-path IMAGE_URL_OR_PATH --modelscope
``` ```
Arguments info: Arguments info:
- `--repo-id-or-model-path REPO_ID_OR_MODEL_PATH`: argument defining the huggingface repo id for the InternVL2 (e.g. `OpenGVLab/InternVL2-4B`) to be downloaded, or the path to the huggingface checkpoint folder. It is default to be `'OpenGVLab/InternVL2-4B'`. - `--repo-id-or-model-path REPO_ID_OR_MODEL_PATH`: argument defining the **Hugging Face** or **ModelScope** repo id for the InternVL2 (e.g. `OpenGVLab/InternVL2-4B`) to be downloaded, or the path to the checkpoint folder. It is default to be `'OpenGVLab/InternVL2-4B'`.
- `--image-url-or-path IMAGE_URL_OR_PATH`: argument defining the image to be infered. It is default to be `'https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/tests/data/tiger.jpeg'`. - `--image-url-or-path IMAGE_URL_OR_PATH`: argument defining the image to be infered. It is default to be `'https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/tests/data/tiger.jpeg'`.
- `--prompt PROMPT`: argument defining the prompt to be infered (with integrated prompt format for chat). It is default to be `'What is in the image?'`. - `--prompt PROMPT`: argument defining the prompt to be infered (with integrated prompt format for chat). It is default to be `'What is in the image?'`.
- `--n-predict N_PREDICT`: argument defining the max number of tokens to predict. It is default to be `64`. - `--n-predict N_PREDICT`: argument defining the max number of tokens to predict. It is default to be `64`.
- `--modelscope`: using **ModelScope** as model hub instead of **Hugging Face**.
#### Sample Output #### Sample Output