Add --modelscope in GPU examples for minicpm, minicpm3, baichuan2 (#12564)

* Add --modelscope for more models

* minicpm

---------

Co-authored-by: ATMxsp01 <shou.xu@intel.com>
This commit is contained in:
Xu, Shuo 2024-12-19 17:25:46 +08:00 committed by GitHub
parent 3eeb02f1be
commit 47da3c999f
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
6 changed files with 102 additions and 28 deletions

View file

@ -1,5 +1,5 @@
# Baichuan # Baichuan
In this directory, you will find examples on how you could apply IPEX-LLM INT4 optimizations on Baichuan2 models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [baichuan-inc/Baichuan2-7B-Chat](https://huggingface.co/baichuan-inc/Baichuan-7B-Chat) as a reference Baichuan model. In this directory, you will find examples on how you could apply IPEX-LLM INT4 optimizations on Baichuan2 models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [baichuan-inc/Baichuan2-7B-Chat](https://huggingface.co/baichuan-inc/Baichuan-7B-Chat) (or [baichuan-inc/Baichuan2-7B-Chat](https://www.modelscope.cn/models/[baichuan-inc/Baichuan2-7B-Chat]) for ModelScope) as a reference Baichuan model.
## 0. Requirements ## 0. Requirements
To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information. To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
@ -16,6 +16,9 @@ conda activate llm
pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
pip install transformers_stream_generator # additional package required for Baichuan-7B-Chat to conduct generation pip install transformers_stream_generator # additional package required for Baichuan-7B-Chat to conduct generation
# [optional] only needed if you would like to use ModelScope as model hub
pip install modelscope==1.11.0
``` ```
#### 1.2 Installation on Windows #### 1.2 Installation on Windows
@ -28,6 +31,9 @@ conda activate llm
pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
pip install transformers_stream_generator # additional package required for Baichuan-7B-Chat to conduct generation pip install transformers_stream_generator # additional package required for Baichuan-7B-Chat to conduct generation
# [optional] only needed if you would like to use ModelScope as model hub
pip install modelscope==1.11.0
``` ```
### 2. Configures OneAPI environment variables for Linux ### 2. Configures OneAPI environment variables for Linux
@ -95,14 +101,19 @@ set SYCL_CACHE_PERSISTENT=1
> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. > For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
### 4. Running examples ### 4. Running examples
``` ```bash
# for Hugging Face model hub
python ./generate.py --repo-id-or-model-path REPO_ID_OR_MODEL_PATH --prompt PROMPT --n-predict N_PREDICT python ./generate.py --repo-id-or-model-path REPO_ID_OR_MODEL_PATH --prompt PROMPT --n-predict N_PREDICT
# for ModelScope model hub
python ./generate.py --repo-id-or-model-path REPO_ID_OR_MODEL_PATH --prompt PROMPT --n-predict N_PREDICT --modelscope
``` ```
Arguments info: Arguments info:
- `--repo-id-or-model-path REPO_ID_OR_MODEL_PATH`: argument defining the huggingface repo id for the Baichuan model (e.g `baichuan-inc/Baichuan2-7B-Chat`) to be downloaded, or the path to the huggingface checkpoint folder. It is default to be `'baichuan-inc/Baichuan2-7B-Chat'`. - `--repo-id-or-model-path REPO_ID_OR_MODEL_PATH`: argument defining the **Hugging Face** or **ModelScope** repo id for the Baichuan model (e.g `baichuan-inc/Baichuan2-7B-Chat`) to be downloaded, or the path to the checkpoint folder. It is default to be `'baichuan-inc/Baichuan2-7B-Chat'`.
- `--prompt PROMPT`: argument defining the prompt to be infered (with integrated prompt format for chat). It is default to be `'AI是什么'`. - `--prompt PROMPT`: argument defining the prompt to be infered (with integrated prompt format for chat). It is default to be `'AI是什么'`.
- `--n-predict N_PREDICT`: argument defining the max number of tokens to predict. It is default to be `32`. - `--n-predict N_PREDICT`: argument defining the max number of tokens to predict. It is default to be `32`.
- `--modelscope`: using **ModelScope** as model hub instead of **Hugging Face**.
#### Sample Output #### Sample Output
#### [baichuan-inc/Baichuan2-7B-Chat](https://huggingface.co/baichuan-inc/Baichuan2-7B-Chat) #### [baichuan-inc/Baichuan2-7B-Chat](https://huggingface.co/baichuan-inc/Baichuan2-7B-Chat)

View file

@ -19,7 +19,6 @@ import time
import argparse import argparse
from ipex_llm.transformers import AutoModelForCausalLM from ipex_llm.transformers import AutoModelForCausalLM
from transformers import AutoTokenizer
# prompt format referred from https://github.com/baichuan-inc/Baichuan2/issues/227 # prompt format referred from https://github.com/baichuan-inc/Baichuan2/issues/227
# and https://huggingface.co/baichuan-inc/Baichuan2-7B-Chat/blob/main/generation_utils.py#L7-L49 # and https://huggingface.co/baichuan-inc/Baichuan2-7B-Chat/blob/main/generation_utils.py#L7-L49
@ -29,14 +28,24 @@ BAICHUAN_PROMPT_FORMAT = "<reserved_106> {prompt} <reserved_107>"
if __name__ == '__main__': if __name__ == '__main__':
parser = argparse.ArgumentParser(description='Predict Tokens using `generate()` API for Baichuan model') parser = argparse.ArgumentParser(description='Predict Tokens using `generate()` API for Baichuan model')
parser.add_argument('--repo-id-or-model-path', type=str, default="baichuan-inc/Baichuan2-7B-Chat", parser.add_argument('--repo-id-or-model-path', type=str, default="baichuan-inc/Baichuan2-7B-Chat",
help='The huggingface repo id for the Baichuan model to be downloaded' help='The Hugging Face repo id for the Baichuan model to be downloaded'
', or the path to the huggingface checkpoint folder') ', or the path to the checkpoint folder')
parser.add_argument('--prompt', type=str, default="AI是什么", parser.add_argument('--prompt', type=str, default="AI是什么",
help='Prompt to infer') help='Prompt to infer')
parser.add_argument('--n-predict', type=int, default=32, parser.add_argument('--n-predict', type=int, default=32,
help='Max tokens to predict') help='Max tokens to predict')
parser.add_argument('--modelscope', action="store_true", default=False,
help="Use models from modelscope")
args = parser.parse_args() args = parser.parse_args()
if args.modelscope:
from modelscope import AutoTokenizer
model_hub = 'modelscope'
else:
from transformers import AutoTokenizer
model_hub = 'huggingface'
model_path = args.repo_id_or_model_path model_path = args.repo_id_or_model_path
# Load model in 4 bit, # Load model in 4 bit,
@ -50,7 +59,8 @@ if __name__ == '__main__':
model = AutoModelForCausalLM.from_pretrained(model_path, model = AutoModelForCausalLM.from_pretrained(model_path,
load_in_4bit=True, load_in_4bit=True,
trust_remote_code=True, trust_remote_code=True,
use_cache=True) use_cache=True,
model_hub=model_hub)
model = model.half().to('xpu') model = model.half().to('xpu')
# Load tokenizer # Load tokenizer

View file

@ -1,5 +1,5 @@
# MiniCPM # MiniCPM
In this directory, you will find examples on how you could apply IPEX-LLM INT4 optimizations on MiniCPM models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [openbmb/MiniCPM-2B-sft-bf16](https://huggingface.co/openbmb/MiniCPM-2B-sft-bf16) as a reference MiniCPM model. In this directory, you will find examples on how you could apply IPEX-LLM INT4 optimizations on MiniCPM models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [openbmb/MiniCPM-2B-sft-bf16](https://huggingface.co/openbmb/MiniCPM-2B-sft-bf16) and [openbmb/MiniCPM-1B-sft-bf16](https://huggingface.co/openbmb/MiniCPM-1B-sft-bf16) (or [OpenBMB/MiniCPM-2B-sft-bf16](https://www.modelscope.cn/models/OpenBMB/MiniCPM-2B-sft-bf16) and [OpenBMB/MiniCPM-1B-sft-bf16](https://www.modelscope.cn/models/OpenBMB/MiniCPM-1B-sft-bf16) for ModelScope) as a reference MiniCPM model.
## 0. Requirements ## 0. Requirements
To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information. To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
@ -15,6 +15,9 @@ conda activate llm
# below command will install intel_extension_for_pytorch==2.1.10+xpu as default # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
pip install "transformers>=4.36" pip install "transformers>=4.36"
# [optional] only needed if you would like to use ModelScope as model hub
pip install modelscope==1.11.0
``` ```
#### 1.2 Installation on Windows #### 1.2 Installation on Windows
@ -26,6 +29,9 @@ conda activate llm
# below command will install intel_extension_for_pytorch==2.1.10+xpu as default # below command will install intel_extension_for_pytorch==2.1.10+xpu as default
pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
pip install "transformers>=4.36" pip install "transformers>=4.36"
# [optional] only needed if you would like to use ModelScope as model hub
pip install modelscope==1.11.0
``` ```
### 2. Configures OneAPI environment variables for Linux ### 2. Configures OneAPI environment variables for Linux
@ -93,14 +99,19 @@ set SYCL_CACHE_PERSISTENT=1
> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. > For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
### 4. Running examples ### 4. Running examples
``` ```bash
python ./generate.py --prompt 'What is AI?' # for Hugging Face model hub
python ./generate.py --repo-id-or-model-path REPO_ID_OR_MODEL_PATH --prompt PROMPT --n-predict N_PREDICT
# for ModelScope model hub
python ./generate.py --repo-id-or-model-path REPO_ID_OR_MODEL_PATH --prompt PROMPT --n-predict N_PREDICT --modelscope
``` ```
Arguments info: Arguments info:
- `--repo-id-or-model-path REPO_ID_OR_MODEL_PATH`: argument defining the huggingface repo id for the MiniCPM model (e.g. `openbmb/MiniCPM-2B-sft-bf16`) to be downloaded, or the path to the huggingface checkpoint folder. It is default to be `'openbmb/MiniCPM-2B-sft-bf16'`. - `--repo-id-or-model-path REPO_ID_OR_MODEL_PATH`: argument defining the **Hugging Face** or **ModelScope** repo id for the MiniCPM model (e.g. `openbmb/MiniCPM-2B-sft-bf16` or `openbmb/MiniCPM-1B-sft-bf16`) to be downloaded, or the path to the checkpoint folder. It is default to be `'openbmb/MiniCPM-2B-sft-bf16'` for **Hugging Face** and `'OpenBMB/MiniCPM-2B-sft-bf16'` for **ModelScope**.
- `--prompt PROMPT`: argument defining the prompt to be infered (with integrated prompt format for chat). It is default to be `'What is AI?'`. - `--prompt PROMPT`: argument defining the prompt to be infered (with integrated prompt format for chat). It is default to be `'What is AI?'`.
- `--n-predict N_PREDICT`: argument defining the max number of tokens to predict. It is default to be `32`. - `--n-predict N_PREDICT`: argument defining the max number of tokens to predict. It is default to be `32`.
- `--modelscope`: using **ModelScope** as model hub instead of **Hugging Face**.
#### Sample Output #### Sample Output
#### [openbmb/MiniCPM-2B-sft-bf16](https://huggingface.co/openbmb/MiniCPM-2B-sft-bf16) #### [openbmb/MiniCPM-2B-sft-bf16](https://huggingface.co/openbmb/MiniCPM-2B-sft-bf16)
@ -112,3 +123,12 @@ Inference time: xxxx s
-------------------- Output -------------------- -------------------- Output --------------------
<s> <用户>what is AI?<AI> AI, or Artificial Intelligence, refers to the simulation of human intelligence in machines that are programmed to think and learn like humans. It is a field of computer science <s> <用户>what is AI?<AI> AI, or Artificial Intelligence, refers to the simulation of human intelligence in machines that are programmed to think and learn like humans. It is a field of computer science
``` ```
#### [openbmb/MiniCPM-1B-sft-bf16](https://huggingface.co/openbmb/MiniCPM-1B-sft-bf16)
```log
-------------------- Prompt --------------------
<用户>What is AI?<AI>
-------------------- Output --------------------
<s> <用户>What is AI?<AI> Artificial Intelligence (AI) refers to the simulation of human intelligence in machines that are programmed to think and learn like humans. It involves the development of computer systems that
```

View file

@ -19,22 +19,32 @@ import time
import argparse import argparse
from ipex_llm.transformers import AutoModelForCausalLM from ipex_llm.transformers import AutoModelForCausalLM
from transformers import AutoTokenizer
if __name__ == '__main__': if __name__ == '__main__':
parser = argparse.ArgumentParser(description='Predict Tokens using `generate()` API for MiniCPM model') parser = argparse.ArgumentParser(description='Predict Tokens using `generate()` API for MiniCPM model')
parser.add_argument('--repo-id-or-model-path', type=str, default="openbmb/MiniCPM-2B-sft-bf16", parser.add_argument('--repo-id-or-model-path', type=str,
help='The huggingface repo id for the MiniCPM model to be downloaded' help='The Hugging Face or ModelScope repo id for the MiniCPM model to be downloaded'
', or the path to the huggingface checkpoint folder') ', or the path to the checkpoint folder')
parser.add_argument('--prompt', type=str, default="What is AI?", parser.add_argument('--prompt', type=str, default="What is AI?",
help='Prompt to infer') help='Prompt to infer')
parser.add_argument('--n-predict', type=int, default=32, parser.add_argument('--n-predict', type=int, default=32,
help='Max tokens to predict') help='Max tokens to predict')
parser.add_argument('--modelscope', action="store_true", default=False,
help="Use models from modelscope")
args = parser.parse_args() args = parser.parse_args()
model_path = args.repo_id_or_model_path
if args.modelscope:
from modelscope import AutoTokenizer
model_hub = 'modelscope'
else:
from transformers import AutoTokenizer
model_hub = 'huggingface'
model_path = args.repo_id_or_model_path if args.repo_id_or_model_path else \
("OpenBMB/MiniCPM-2B-sft-bf16" if args.modelscope else "openbmb/MiniCPM-2B-sft-bf16")
# Load model in 4 bit, # Load model in 4 bit,
# which convert the relevant layers in the model into INT4 format # which convert the relevant layers in the model into INT4 format
# When running LLMs on Intel iGPUs for Windows users, we recommend setting `cpu_embedding=True` in the from_pretrained function. # When running LLMs on Intel iGPUs for Windows users, we recommend setting `cpu_embedding=True` in the from_pretrained function.
@ -43,9 +53,10 @@ if __name__ == '__main__':
load_in_4bit=True, load_in_4bit=True,
trust_remote_code=True, trust_remote_code=True,
optimize_model=True, optimize_model=True,
use_cache=True) use_cache=True,
model_hub=model_hub)
model = model.to('xpu') model = model.half().to('xpu')
# Load tokenizer # Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_path, tokenizer = AutoTokenizer.from_pretrained(model_path,

View file

@ -1,5 +1,5 @@
# MiniCPM3 # MiniCPM3
In this directory, you will find examples on how you could apply IPEX-LLM INT4 optimizations on MiniCPM3 models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [openbmb/MiniCPM3-4B](https://huggingface.co/openbmb/MiniCPM3-4B) as a reference MiniCPM3 model. In this directory, you will find examples on how you could apply IPEX-LLM INT4 optimizations on MiniCPM3 models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [openbmb/MiniCPM3-4B](https://huggingface.co/openbmb/MiniCPM3-4B) (or [OpenBMB/MiniCPM3-4B](https://www.modelscope.cn/models/OpenBMB/MiniCPM3-4B) for ModelScope) as a reference MiniCPM3 model.
## 0. Requirements ## 0. Requirements
To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information. To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
@ -16,6 +16,9 @@ conda activate llm
pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
pip install jsonschema datamodel_code_generator pip install jsonschema datamodel_code_generator
# [optional] only needed if you would like to use ModelScope as model hub
pip install modelscope==1.11.0
``` ```
#### 1.2 Installation on Windows #### 1.2 Installation on Windows
@ -28,6 +31,9 @@ conda activate llm
pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
pip install jsonschema datamodel_code_generator pip install jsonschema datamodel_code_generator
# [optional] only needed if you would like to use ModelScope as model hub
pip install modelscope==1.11.0
``` ```
### 2. Configures OneAPI environment variables for Linux ### 2. Configures OneAPI environment variables for Linux
@ -95,14 +101,19 @@ set SYCL_CACHE_PERSISTENT=1
> For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile. > For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
### 4. Running examples ### 4. Running examples
``` ```bash
python ./generate.py --prompt 'What is AI?' # for Hugging Face model hub
python ./generate.py --repo-id-or-model-path REPO_ID_OR_MODEL_PATH --prompt PROMPT --n-predict N_PREDICT
# for ModelScope model hub
python ./generate.py --repo-id-or-model-path REPO_ID_OR_MODEL_PATH --prompt PROMPT --n-predict N_PREDICT --modelscope
``` ```
Arguments info: Arguments info:
- `--repo-id-or-model-path REPO_ID_OR_MODEL_PATH`: argument defining the huggingface repo id for the MiniCPM3 model (e.g. `openbmb/MiniCPM3-4B`) to be downloaded, or the path to the huggingface checkpoint folder. It is default to be `'openbmb/MiniCPM3-4B'`. - `--repo-id-or-model-path REPO_ID_OR_MODEL_PATH`: argument defining the **Hugging Face** or **ModelScope** repo id for the MiniCPM3 model (e.g. `openbmb/MiniCPM3-4B`) to be downloaded, or the path to the checkpoint folder. It is default to be `'openbmb/MiniCPM3-4B'` for **Hugging Face** or `'OpenBMB/MiniCPM3-4B'` for **ModelScope**.
- `--prompt PROMPT`: argument defining the prompt to be infered (with integrated prompt format for chat). It is default to be `'What is AI?'`. - `--prompt PROMPT`: argument defining the prompt to be infered (with integrated prompt format for chat). It is default to be `'What is AI?'`.
- `--n-predict N_PREDICT`: argument defining the max number of tokens to predict. It is default to be `32`. - `--n-predict N_PREDICT`: argument defining the max number of tokens to predict. It is default to be `32`.
- `--modelscope`: using **ModelScope** as model hub instead of **Hugging Face**.
#### Sample Output #### Sample Output
#### [openbmb/MiniCPM3-4B](https://huggingface.co/openbmb/MiniCPM3-4B) #### [openbmb/MiniCPM3-4B](https://huggingface.co/openbmb/MiniCPM3-4B)

View file

@ -19,21 +19,31 @@ import time
import argparse import argparse
from ipex_llm.transformers import AutoModelForCausalLM from ipex_llm.transformers import AutoModelForCausalLM
from transformers import AutoTokenizer
if __name__ == '__main__': if __name__ == '__main__':
parser = argparse.ArgumentParser(description='Predict Tokens using `generate()` API for MiniCPM3 model') parser = argparse.ArgumentParser(description='Predict Tokens using `generate()` API for MiniCPM3 model')
parser.add_argument('--repo-id-or-model-path', type=str, default="openbmb/MiniCPM3-4B", parser.add_argument('--repo-id-or-model-path', type=str,
help='The huggingface repo id for the MiniCPM3 model to be downloaded' help='The Hugging Face or ModelScope repo id for the MiniCPM3 model to be downloaded'
', or the path to the huggingface checkpoint folder') ', or the path to the checkpoint folder')
parser.add_argument('--prompt', type=str, default="What is AI?", parser.add_argument('--prompt', type=str, default="What is AI?",
help='Prompt to infer') help='Prompt to infer')
parser.add_argument('--n-predict', type=int, default=32, parser.add_argument('--n-predict', type=int, default=32,
help='Max tokens to predict') help='Max tokens to predict')
parser.add_argument('--modelscope', action="store_true", default=False,
help="Use models from modelscope")
args = parser.parse_args() args = parser.parse_args()
model_path = args.repo_id_or_model_path
if args.modelscope:
from modelscope import AutoTokenizer
model_hub = 'modelscope'
else:
from transformers import AutoTokenizer
model_hub = 'huggingface'
model_path = args.repo_id_or_model_path if args.repo_id_or_model_path else \
("OpenBMB/MiniCPM3-4B" if args.modelscope else "openbmb/MiniCPM3-4B")
# Load model in 4 bit, # Load model in 4 bit,
# which convert the relevant layers in the model into INT4 format # which convert the relevant layers in the model into INT4 format
@ -43,7 +53,8 @@ if __name__ == '__main__':
load_in_4bit=True, load_in_4bit=True,
trust_remote_code=True, trust_remote_code=True,
optimize_model=True, optimize_model=True,
use_cache=True) use_cache=True,
model_hub=model_hub)
model = model.half().to('xpu') model = model.half().to('xpu')