update npu multimodal readme (#11979)
* update npu readme of multimodal * small fix * meet comment
This commit is contained in:
parent
4811a490ef
commit
79978e6f36
2 changed files with 43 additions and 8 deletions
|
|
@ -6,6 +6,8 @@ In this directory, you will find examples on how you could apply IPEX-LLM INT4 o
|
||||||
| Model | Model Link |
|
| Model | Model Link |
|
||||||
|------------|----------------------------------------------------------------|
|
|------------|----------------------------------------------------------------|
|
||||||
| Phi-3-Vision | [microsoft/Phi-3-vision-128k-instruct](https://huggingface.co/microsoft/Phi-3-vision-128k-instruct) |
|
| Phi-3-Vision | [microsoft/Phi-3-vision-128k-instruct](https://huggingface.co/microsoft/Phi-3-vision-128k-instruct) |
|
||||||
|
| MiniCPM-Llama3-V-2_5 | [openbmb/MiniCPM-Llama3-V-2_5](https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5) |
|
||||||
|
| MiniCPM-V-2_6 | [openbmb/MiniCPM-V-2_6](https://huggingface.co/openbmb/MiniCPM-V-2_6) |
|
||||||
|
|
||||||
## 0. Requirements
|
## 0. Requirements
|
||||||
To run these examples with IPEX-LLM on Intel NPUs, make sure to install the newest driver version of Intel NPU.
|
To run these examples with IPEX-LLM on Intel NPUs, make sure to install the newest driver version of Intel NPU.
|
||||||
|
|
@ -22,14 +24,12 @@ We suggest using conda to manage environment:
|
||||||
conda create -n llm python=3.10 libuv
|
conda create -n llm python=3.10 libuv
|
||||||
conda activate llm
|
conda activate llm
|
||||||
|
|
||||||
# install ipex-llm with 'all' option
|
# install ipex-llm with 'npu' option
|
||||||
pip install --pre --upgrade ipex-llm[all]
|
pip install --pre --upgrade ipex-llm[npu]
|
||||||
pip install torchvision
|
pip install torchvision
|
||||||
|
|
||||||
# below command will install intel_npu_acceleration_library
|
# [optional] for MiniCPM-V-2_6
|
||||||
pip install intel-npu-acceleration-library==1.3
|
pip install timm torch==2.1.2 torchvision==0.16.2
|
||||||
|
|
||||||
pip install transformers==4.40
|
|
||||||
```
|
```
|
||||||
|
|
||||||
### 2. Runtime Configurations
|
### 2. Runtime Configurations
|
||||||
|
|
@ -64,7 +64,7 @@ Arguments info:
|
||||||
- `--load_in_low_bit`: argument defining the `load_in_low_bit` format used. It is default to be `sym_int8`, `sym_int4` can also be used.
|
- `--load_in_low_bit`: argument defining the `load_in_low_bit` format used. It is default to be `sym_int8`, `sym_int4` can also be used.
|
||||||
|
|
||||||
#### Sample Output
|
#### Sample Output
|
||||||
#### [microsoft/Phi-3-vision-128k-instruct](https://huggingface.co/microsoft/Phi-3-vision-128k-instruct)
|
##### [microsoft/Phi-3-vision-128k-instruct](https://huggingface.co/microsoft/Phi-3-vision-128k-instruct)
|
||||||
|
|
||||||
```log
|
```log
|
||||||
Inference time: xxxx s
|
Inference time: xxxx s
|
||||||
|
|
@ -82,3 +82,38 @@ The sample input image is (which is fetched from [COCO dataset](https://cocodata
|
||||||
|
|
||||||
<a href="http://farm6.staticflickr.com/5268/5602445367_3504763978_z.jpg"><img width=400px src="http://farm6.staticflickr.com/5268/5602445367_3504763978_z.jpg" ></a>
|
<a href="http://farm6.staticflickr.com/5268/5602445367_3504763978_z.jpg"><img width=400px src="http://farm6.staticflickr.com/5268/5602445367_3504763978_z.jpg" ></a>
|
||||||
|
|
||||||
|
## 4. Run Optimized Models (Experimental)
|
||||||
|
The examples below show how to run the **_optimized HuggingFace model implementations_** on Intel NPU, including
|
||||||
|
- [MiniCPM-Llama3-V-2_5](./minicpm-llama3-v2.5.py)
|
||||||
|
- [MiniCPM-V-2_6](./minicpm_v_2_6.py)
|
||||||
|
|
||||||
|
### Run
|
||||||
|
```bash
|
||||||
|
# to run MiniCPM-Llama3-V-2_5
|
||||||
|
python minicpm-llama3-v2.5.py
|
||||||
|
|
||||||
|
# to run MiniCPM-V-2_6
|
||||||
|
python minicpm_v_2_6.py
|
||||||
|
```
|
||||||
|
|
||||||
|
Arguments info:
|
||||||
|
- `--repo-id-or-model-path REPO_ID_OR_MODEL_PATH`: argument defining the huggingface repo id for the model (i.e. `openbmb/MiniCPM-Llama3-V-2_5`) to be downloaded, or the path to the huggingface checkpoint folder.
|
||||||
|
- `image-url-or-path IMAGE_URL_OR_PATH`: argument defining the image to be infered. It is default to be 'http://farm6.staticflickr.com/5268/5602445367_3504763978_z.jpg'.
|
||||||
|
- `--prompt PROMPT`: argument defining the prompt to be infered (with integrated prompt format for chat). It is default to be `What is in the image?`.
|
||||||
|
- `--n-predict N_PREDICT`: argument defining the max number of tokens to predict. It is default to be `32`.
|
||||||
|
- `--max-output-len MAX_OUTPUT_LEN`: Defines the maximum sequence length for both input and output tokens. It is default to be `1024`.
|
||||||
|
- `--max-prompt-len MAX_PROMPT_LEN`: Defines the maximum number of tokens that the input prompt can contain. It is default to be `512`.
|
||||||
|
- `--disable-transpose-value-cache`: Disable the optimization of transposing value cache.
|
||||||
|
|
||||||
|
#### Sample Output
|
||||||
|
##### [openbmb/MiniCPM-V-2_6](https://huggingface.co/openbmb/MiniCPM-V-2_6)
|
||||||
|
|
||||||
|
```log
|
||||||
|
Inference time: xx.xx s
|
||||||
|
-------------------- Input --------------------
|
||||||
|
http://farm6.staticflickr.com/5268/5602445367_3504763978_z.jpg
|
||||||
|
-------------------- Prompt --------------------
|
||||||
|
What is in this image?
|
||||||
|
-------------------- Output --------------------
|
||||||
|
The image features a young child holding and showing off a white teddy bear wearing a pink dress. The background includes some red flowers and a stone wall, suggesting an outdoor setting.
|
||||||
|
```
|
||||||
|
|
@ -37,7 +37,7 @@ if __name__ == '__main__':
|
||||||
help='Prompt to infer')
|
help='Prompt to infer')
|
||||||
parser.add_argument("--n-predict", type=int, default=32, help="Max tokens to predict")
|
parser.add_argument("--n-predict", type=int, default=32, help="Max tokens to predict")
|
||||||
parser.add_argument("--max-output-len", type=int, default=1024)
|
parser.add_argument("--max-output-len", type=int, default=1024)
|
||||||
parser.add_argument("--max-prompt-len", type=int, default=960)
|
parser.add_argument("--max-prompt-len", type=int, default=512)
|
||||||
parser.add_argument("--disable-transpose-value-cache", action="store_true", default=False)
|
parser.add_argument("--disable-transpose-value-cache", action="store_true", default=False)
|
||||||
parser.add_argument("--intra-pp", type=int, default=None)
|
parser.add_argument("--intra-pp", type=int, default=None)
|
||||||
parser.add_argument("--inter-pp", type=int, default=None)
|
parser.add_argument("--inter-pp", type=int, default=None)
|
||||||
|
|
|
||||||
Loading…
Reference in a new issue