Remove unused example for now (#8538)
This commit is contained in:
parent
b397e40015
commit
e0f0def279
7 changed files with 5 additions and 106 deletions
|
|
@ -37,7 +37,7 @@ python ./generate.py
|
|||
```
|
||||
|
||||
#### 2.2 Server
|
||||
For optimal performance on server, it is recommended to set several environment variables (refer to [here](../README.md#best-known-configuration) for more information), and run the example with all the physical cores of a single socket.
|
||||
For optimal performance on server, it is recommended to set several environment variables (refer to [here](../README.md#best-known-configuration-on-linux) for more information), and run the example with all the physical cores of a single socket.
|
||||
|
||||
E.g. on Linux,
|
||||
```bash
|
||||
|
|
|
|||
|
|
@ -37,7 +37,7 @@ python ./generate.py
|
|||
```
|
||||
|
||||
#### 2.2 Server
|
||||
For optimal performance on server, it is recommended to set several environment variables (refer to [here](../README.md#best-known-configuration) for more information), and run the example with all the physical cores of a single socket.
|
||||
For optimal performance on server, it is recommended to set several environment variables (refer to [here](../README.md#best-known-configuration-on-linux) for more information), and run the example with all the physical cores of a single socket.
|
||||
|
||||
E.g. on Linux,
|
||||
```bash
|
||||
|
|
|
|||
|
|
@ -36,7 +36,7 @@ python ./generate.py
|
|||
```
|
||||
|
||||
#### 2.2 Server
|
||||
For optimal performance on server, it is recommended to set several environment variables (refer to [here](../README.md#best-known-configuration) for more information), and run the example with all the physical cores of a single socket.
|
||||
For optimal performance on server, it is recommended to set several environment variables (refer to [here](../README.md#best-known-configuration-on-linux) for more information), and run the example with all the physical cores of a single socket.
|
||||
|
||||
E.g. on Linux,
|
||||
```bash
|
||||
|
|
|
|||
|
|
@ -37,7 +37,7 @@ python ./generate.py
|
|||
```
|
||||
|
||||
#### 2.2 Server
|
||||
For optimal performance on server, it is recommended to set several environment variables (refer to [here](../README.md#best-known-configuration) for more information), and run the example with all the physical cores of a single socket.
|
||||
For optimal performance on server, it is recommended to set several environment variables (refer to [here](../README.md#best-known-configuration-on-linux) for more information), and run the example with all the physical cores of a single socket.
|
||||
|
||||
E.g. on Linux,
|
||||
```bash
|
||||
|
|
|
|||
|
|
@ -37,7 +37,7 @@ python ./generate.py
|
|||
```
|
||||
|
||||
#### 2.2 Server
|
||||
For optimal performance on server, it is recommended to set several environment variables (refer to [here](../README.md#best-known-configuration) for more information), and run the example with all the physical cores of a single socket.
|
||||
For optimal performance on server, it is recommended to set several environment variables (refer to [here](../README.md#best-known-configuration-on-linux) for more information), and run the example with all the physical cores of a single socket.
|
||||
|
||||
E.g. on Linux,
|
||||
```bash
|
||||
|
|
|
|||
|
|
@ -1,63 +0,0 @@
|
|||
#
|
||||
# Copyright 2016 The BigDL Authors.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
#
|
||||
|
||||
import torch
|
||||
import os
|
||||
import time
|
||||
import argparse
|
||||
from bigdl.llm.transformers import AutoModelForCausalLM, AutoModel
|
||||
from transformers import LlamaTokenizer, AutoTokenizer
|
||||
|
||||
if __name__ == '__main__':
|
||||
parser = argparse.ArgumentParser(description='Transformer INT4 example')
|
||||
parser.add_argument('--repo-id-or-model-path', type=str, default="decapoda-research/llama-7b-hf",
|
||||
choices=['decapoda-research/llama-7b-hf', 'THUDM/chatglm-6b'],
|
||||
help='The huggingface repo id for the large language model to be downloaded'
|
||||
', or the path to the huggingface checkpoint folder')
|
||||
args = parser.parse_args()
|
||||
model_path = args.repo_id_or_model_path
|
||||
if model_path == 'decapoda-research/llama-7b-hf':
|
||||
# load_in_4bit=True in bigdl.llm.transformers will convert
|
||||
# the relevant layers in the model into int4 format
|
||||
model = AutoModelForCausalLM.from_pretrained(model_path, load_in_4bit=True)
|
||||
tokenizer = LlamaTokenizer.from_pretrained(model_path)
|
||||
|
||||
input_str = "Once upon a time, there existed a little girl who liked to have adventures. She wanted to go to places and meet new people, and have fun"
|
||||
|
||||
with torch.inference_mode():
|
||||
st = time.time()
|
||||
input_ids = tokenizer.encode(input_str, return_tensors="pt")
|
||||
output = model.generate(input_ids, do_sample=False, max_new_tokens=32)
|
||||
output_str = tokenizer.decode(output[0], skip_special_tokens=True)
|
||||
end = time.time()
|
||||
print('Prompt:', input_str)
|
||||
print('Output:', output_str)
|
||||
print(f'Inference time: {end-st} s')
|
||||
elif model_path == 'THUDM/chatglm-6b':
|
||||
model = AutoModel.from_pretrained(model_path, trust_remote_code=True, load_in_4bit=True)
|
||||
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
|
||||
|
||||
input_str = "晚上睡不着应该怎么办"
|
||||
|
||||
with torch.inference_mode():
|
||||
st = time.time()
|
||||
input_ids = tokenizer.encode(input_str, return_tensors="pt")
|
||||
output = model.generate(input_ids, do_sample=False, max_new_tokens=32)
|
||||
output_str = tokenizer.decode(output[0], skip_special_tokens=True)
|
||||
end = time.time()
|
||||
print('Prompt:', input_str)
|
||||
print('Output:', output_str)
|
||||
print(f'Inference time: {end-st} s')
|
||||
|
|
@ -1,38 +0,0 @@
|
|||
# BigDL-LLM Transformers INT4 Inference Pipeline for Large Language Model
|
||||
|
||||
In this example, we show a pipeline to apply BigDL-LLM INT4 optimizations to any Hugging Face Transformers model, and then run inference on the optimized INT4 model.
|
||||
|
||||
## Prepare Environment
|
||||
We suggest using conda to manage environment:
|
||||
```bash
|
||||
conda create -n llm python=3.9
|
||||
conda activate llm
|
||||
|
||||
pip install --pre --upgrade bigdl-llm[all]
|
||||
```
|
||||
|
||||
## Run Example
|
||||
```bash
|
||||
python ./transformers_int4_pipeline.py --repo-id-or-model-path REPO_ID_OR_MODEL_PATH
|
||||
```
|
||||
arguments info:
|
||||
- `--repo-id-or-model-path MODEL_PATH`: argument defining the huggingface repo id for the large language model to be downloaded, or the path to the huggingface checkpoint folder.
|
||||
|
||||
> **Note** In this example, `--repo-id-or-model-path MODEL_PATH` is limited be one of `['decapoda-research/llama-7b-hf', 'THUDM/chatglm-6b']` to better demonstrate English and Chinese support. And it is default to be `'decapoda-research/llama-7b-hf'`.
|
||||
|
||||
## Sample Output for Inference
|
||||
### 'decapoda-research/llama-7b-hf' Model
|
||||
```log
|
||||
Prompt: Once upon a time, there existed a little girl who liked to have adventures. She wanted to go to places and meet new people, and have fun
|
||||
Output: Once upon a time, there existed a little girl who liked to have adventures. She wanted to go to places and meet new people, and have fun. She wanted to be a hero. She wanted to be a hero, but she didn't know how. She didn't know how to be a
|
||||
Inference time: xxxx s
|
||||
```
|
||||
|
||||
### 'THUDM/chatglm-6b' Model
|
||||
```log
|
||||
Prompt: 晚上睡不着应该怎么办
|
||||
Output: 晚上睡不着应该怎么办 晚上睡不着可能会让人感到焦虑和不安,但以下是一些可能有用的建议:
|
||||
|
||||
1. 放松身体和思维:尝试进行深呼吸、渐进性
|
||||
Inference time: xxxx s
|
||||
```
|
||||
Loading…
Reference in a new issue