Fix Baichuan2 prompt format (#10334)

* Fix Baichuan2 prompt format

* Fix Baichuan2 README

* Change baichuan2 prompt info

* Change baichuan2 prompt info
This commit is contained in:
Yuxuan Xia 2024-03-19 12:48:07 +08:00 committed by GitHub
parent 0451103a43
commit 74e7490fda
6 changed files with 22 additions and 29 deletions

View file

@ -54,15 +54,15 @@ numactl -C 0-47 -m 0 python ./generate.py
```log
Inference time: xxxx s
-------------------- Prompt --------------------
<human>AI是什么 <bot>
<reserved_106> AI是什么 <reserved_107>
-------------------- Output --------------------
<human>AI是什么 <bot>人工智能AI是指由计算机系统或其他数字设备模拟、扩展和增强人类智能的科学和技术。它涉及到多个领域如机器学习、计算机视觉、
<reserved_106> AI是什么 <reserved_107> 人工智能AI是指由计算机系统执行的任务这些任务通常需要人类智能才能完成。AI的目标是使计算机能够模拟人类的思维过程从而
```
```log
Inference time: xxxx s
-------------------- Prompt --------------------
<human>解释一下“温故而知新” <bot>
<reserved_106> 解释一下“温故而知新” <reserved_107>
-------------------- Output --------------------
<human>解释一下“温故而知新” <bot>这句话出自《论语·为政》篇,意思是通过回顾过去的事情来获取新的理解和认识。简单来说就是:温习学过的知识,可以从中
<reserved_106> 解释一下“温故而知新” <reserved_107> 温故而知新是一个成语,出自《论语·为政》篇。这个成语的意思是:通过回顾和了解过去的事情,可以更好地理解新的知识和
```

View file

@ -22,8 +22,10 @@ import numpy as np
from bigdl.llm.transformers import AutoModelForCausalLM
from transformers import AutoTokenizer
# you could tune the prompt based on your own model,
BAICHUAN_PROMPT_FORMAT = "<human>{prompt} <bot>"
# prompt format referred from https://github.com/baichuan-inc/Baichuan2/issues/227
# and https://huggingface.co/baichuan-inc/Baichuan2-7B-Chat/blob/main/generation_utils.py#L7-L49
# For English prompt, you are recommended to change the prompt format.
BAICHUAN_PROMPT_FORMAT = "<reserved_106> {prompt} <reserved_107>"
if __name__ == '__main__':
parser = argparse.ArgumentParser(description='Predict Tokens using `generate()` API for Baichuan model')

View file

@ -109,18 +109,10 @@ Arguments info:
#### Sample Output
#### [baichuan-inc/Baichuan2-7B-Chat](https://huggingface.co/baichuan-inc/Baichuan2-7B-Chat)
```log
-------------------- Prompt --------------------
<human>AI是什么 <bot>
-------------------- Output --------------------
<human>AI是什么 <bot>
AI是人工智能Artificial Intelligence的缩写它是指让计算机或机器模拟、扩展和辅助人类的智能。AI技术已经广泛应用于各个领域
```
```log
Inference time: xxxx s
-------------------- Prompt --------------------
<human>What is AI? <bot>
<reserved_106> AI是什么 <reserved_107>
-------------------- Output --------------------
<human>What is AI? <bot>Artificial Intelligence (AI) refers to the development of computer systems that can perform tasks that would typically require human intelligence. These tasks include learning, reasoning, problem
```
<reserved_106> AI是什么 <reserved_107>AI是人工智能Artificial Intelligence的缩写它是指让计算机或其他设备模拟人类智能的技术。通过使用大量数据和算法AI可以学习、
```

View file

@ -21,8 +21,10 @@ import argparse
from bigdl.llm.transformers import AutoModelForCausalLM
from transformers import AutoTokenizer
# you could tune the prompt based on your own model,
BAICHUAN_PROMPT_FORMAT = "<human>{prompt} <bot>"
# prompt format referred from https://github.com/baichuan-inc/Baichuan2/issues/227
# and https://huggingface.co/baichuan-inc/Baichuan2-7B-Chat/blob/main/generation_utils.py#L7-L49
# For English prompt, you are recommended to change the prompt format.
BAICHUAN_PROMPT_FORMAT = "<reserved_106> {prompt} <reserved_107>"
if __name__ == '__main__':
parser = argparse.ArgumentParser(description='Predict Tokens using `generate()` API for Baichuan model')

View file

@ -114,13 +114,8 @@ In the example, several arguments can be passed to satisfy your requirements:
#### [baichuan-inc/Baichuan2-7B-Chat](https://huggingface.co/baichuan-inc/Baichuan2-7B-Chat)
```log
Inference time: xxxx s
-------------------- Prompt --------------------
<reserved_106> AI是什么 <reserved_107>
-------------------- Output --------------------
<human>AI是什么 <bot>
AI是人工智能Artificial Intelligence的缩写它是指让计算机或机器模拟、扩展和辅助人类的智能。AI技术已经广泛应用于各个领域
```
```log
Inference time: xxxx s
-------------------- Output --------------------
<human>What is AI? <bot>Artificial Intelligence (AI) refers to the development of computer systems that can perform tasks that would typically require human intelligence. These tasks include learning, reasoning, problem
<reserved_106> AI是什么 <reserved_107>AI是人工智能Artificial Intelligence的缩写它是指让计算机或其他设备模拟人类智能的技术。通过使用大量数据和算法AI可以学习、
```

View file

@ -21,8 +21,10 @@ import argparse
from transformers import AutoModelForCausalLM, AutoTokenizer
from bigdl.llm import optimize_model
# you could tune the prompt based on your own model,
BAICHUAN2_PROMPT_FORMAT = "<human>{prompt} <bot>"
# prompt format referred from https://github.com/baichuan-inc/Baichuan2/issues/227
# and https://huggingface.co/baichuan-inc/Baichuan2-7B-Chat/blob/main/generation_utils.py#L7-L49
# For English prompt, you are recommended to change the prompt format.
BAICHUAN_PROMPT_FORMAT = "<reserved_106> {prompt} <reserved_107>"
if __name__ == '__main__':
parser = argparse.ArgumentParser(description='Predict Tokens using `generate()` API for Baichuan2 model')