Remove unused example for now (#8538)

This commit is contained in:
Yuwen Hu 2023-07-14 17:32:50 +08:00 committed by GitHub
parent b397e40015
commit e0f0def279
7 changed files with 5 additions and 106 deletions

View file

@ -37,7 +37,7 @@ python ./generate.py
```
#### 2.2 Server
For optimal performance on server, it is recommended to set several environment variables (refer to [here](../README.md#best-known-configuration) for more information), and run the example with all the physical cores of a single socket.
For optimal performance on server, it is recommended to set several environment variables (refer to [here](../README.md#best-known-configuration-on-linux) for more information), and run the example with all the physical cores of a single socket.
E.g. on Linux,
```bash

View file

@ -37,7 +37,7 @@ python ./generate.py
```
#### 2.2 Server
For optimal performance on server, it is recommended to set several environment variables (refer to [here](../README.md#best-known-configuration) for more information), and run the example with all the physical cores of a single socket.
For optimal performance on server, it is recommended to set several environment variables (refer to [here](../README.md#best-known-configuration-on-linux) for more information), and run the example with all the physical cores of a single socket.
E.g. on Linux,
```bash

View file

@ -36,7 +36,7 @@ python ./generate.py
```
#### 2.2 Server
For optimal performance on server, it is recommended to set several environment variables (refer to [here](../README.md#best-known-configuration) for more information), and run the example with all the physical cores of a single socket.
For optimal performance on server, it is recommended to set several environment variables (refer to [here](../README.md#best-known-configuration-on-linux) for more information), and run the example with all the physical cores of a single socket.
E.g. on Linux,
```bash

View file

@ -37,7 +37,7 @@ python ./generate.py
```
#### 2.2 Server
For optimal performance on server, it is recommended to set several environment variables (refer to [here](../README.md#best-known-configuration) for more information), and run the example with all the physical cores of a single socket.
For optimal performance on server, it is recommended to set several environment variables (refer to [here](../README.md#best-known-configuration-on-linux) for more information), and run the example with all the physical cores of a single socket.
E.g. on Linux,
```bash

View file

@ -37,7 +37,7 @@ python ./generate.py
```
#### 2.2 Server
For optimal performance on server, it is recommended to set several environment variables (refer to [here](../README.md#best-known-configuration) for more information), and run the example with all the physical cores of a single socket.
For optimal performance on server, it is recommended to set several environment variables (refer to [here](../README.md#best-known-configuration-on-linux) for more information), and run the example with all the physical cores of a single socket.
E.g. on Linux,
```bash

View file

@ -1,63 +0,0 @@
#
# Copyright 2016 The BigDL Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
import torch
import os
import time
import argparse
from bigdl.llm.transformers import AutoModelForCausalLM, AutoModel
from transformers import LlamaTokenizer, AutoTokenizer
if __name__ == '__main__':
parser = argparse.ArgumentParser(description='Transformer INT4 example')
parser.add_argument('--repo-id-or-model-path', type=str, default="decapoda-research/llama-7b-hf",
choices=['decapoda-research/llama-7b-hf', 'THUDM/chatglm-6b'],
help='The huggingface repo id for the large language model to be downloaded'
', or the path to the huggingface checkpoint folder')
args = parser.parse_args()
model_path = args.repo_id_or_model_path
if model_path == 'decapoda-research/llama-7b-hf':
# load_in_4bit=True in bigdl.llm.transformers will convert
# the relevant layers in the model into int4 format
model = AutoModelForCausalLM.from_pretrained(model_path, load_in_4bit=True)
tokenizer = LlamaTokenizer.from_pretrained(model_path)
input_str = "Once upon a time, there existed a little girl who liked to have adventures. She wanted to go to places and meet new people, and have fun"
with torch.inference_mode():
st = time.time()
input_ids = tokenizer.encode(input_str, return_tensors="pt")
output = model.generate(input_ids, do_sample=False, max_new_tokens=32)
output_str = tokenizer.decode(output[0], skip_special_tokens=True)
end = time.time()
print('Prompt:', input_str)
print('Output:', output_str)
print(f'Inference time: {end-st} s')
elif model_path == 'THUDM/chatglm-6b':
model = AutoModel.from_pretrained(model_path, trust_remote_code=True, load_in_4bit=True)
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
input_str = "晚上睡不着应该怎么办"
with torch.inference_mode():
st = time.time()
input_ids = tokenizer.encode(input_str, return_tensors="pt")
output = model.generate(input_ids, do_sample=False, max_new_tokens=32)
output_str = tokenizer.decode(output[0], skip_special_tokens=True)
end = time.time()
print('Prompt:', input_str)
print('Output:', output_str)
print(f'Inference time: {end-st} s')

View file

@ -1,38 +0,0 @@
# BigDL-LLM Transformers INT4 Inference Pipeline for Large Language Model
In this example, we show a pipeline to apply BigDL-LLM INT4 optimizations to any Hugging Face Transformers model, and then run inference on the optimized INT4 model.
## Prepare Environment
We suggest using conda to manage environment:
```bash
conda create -n llm python=3.9
conda activate llm
pip install --pre --upgrade bigdl-llm[all]
```
## Run Example
```bash
python ./transformers_int4_pipeline.py --repo-id-or-model-path REPO_ID_OR_MODEL_PATH
```
arguments info:
- `--repo-id-or-model-path MODEL_PATH`: argument defining the huggingface repo id for the large language model to be downloaded, or the path to the huggingface checkpoint folder.
> **Note** In this example, `--repo-id-or-model-path MODEL_PATH` is limited be one of `['decapoda-research/llama-7b-hf', 'THUDM/chatglm-6b']` to better demonstrate English and Chinese support. And it is default to be `'decapoda-research/llama-7b-hf'`.
## Sample Output for Inference
### 'decapoda-research/llama-7b-hf' Model
```log
Prompt: Once upon a time, there existed a little girl who liked to have adventures. She wanted to go to places and meet new people, and have fun
Output: Once upon a time, there existed a little girl who liked to have adventures. She wanted to go to places and meet new people, and have fun. She wanted to be a hero. She wanted to be a hero, but she didn't know how. She didn't know how to be a
Inference time: xxxx s
```
### 'THUDM/chatglm-6b' Model
```log
Prompt: 晚上睡不着应该怎么办
Output: 晚上睡不着应该怎么办 晚上睡不着可能会让人感到焦虑和不安,但以下是一些可能有用的建议:
1. 放松身体和思维:尝试进行深呼吸、渐进性
Inference time: xxxx s
```