Add tokenizer_id in Langchain (#10588)

* fix low-bit

* fix

* fix style

---------

Co-authored-by: arda <arda@arda-arc12.sh.intel.com>
This commit is contained in:
Zhicun 2024-04-03 14:25:35 +08:00 committed by GitHub
parent f6fef09933
commit b827f534d5
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
5 changed files with 195 additions and 27 deletions

View file

@ -18,47 +18,47 @@ pip install -U pandas==2.0.3
### Example: Chat
The chat example ([chat.py](./transformers_int4/chat.py)) shows how to use `LLMChain` to build a chat pipeline.
The chat example ([chat.py](./chat.py)) shows how to use `LLMChain` to build a chat pipeline.
To run the example, execute the following command in the current directory:
```bash
python transformers_int4/chat.py -m <path_to_model> [-q <your_question>]
python chat.py -m <path_to_model> [-q <your_question>]
```
> Note: if `-q` is not specified, it will use `What is AI` by default.
### Example: RAG (Retrival Augmented Generation)
The RAG example ([rag.py](./transformers_int4/rag.py)) shows how to load the input text into vector database, and then use `load_qa_chain` to build a retrival pipeline.
The RAG example ([rag.py](./rag.py)) shows how to load the input text into vector database, and then use `load_qa_chain` to build a retrival pipeline.
To run the example, execute the following command in the current directory:
```bash
python transformers_int4/rag.py -m <path_to_model> [-q <your_question>] [-i <path_to_input_txt>]
python rag.py -m <path_to_model> [-q <your_question>] [-i <path_to_input_txt>]
```
> Note: If `-i` is not specified, it will use a short introduction to Big-DL as input by default. if `-q` is not specified, `What is IPEX LLM?` will be used by default.
### Example: Math
The math example ([math.py](./transformers_int4/llm_math.py)) shows how to build a chat pipeline specialized in solving math questions. For example, you can ask `What is 13 raised to the .3432 power?`
The math example ([math.py](./llm_math.py)) shows how to build a chat pipeline specialized in solving math questions. For example, you can ask `What is 13 raised to the .3432 power?`
To run the exmaple, execute the following command in the current directory:
```bash
python transformers_int4/llm_math.py -m <path_to_model> [-q <your_question>]
python llm_math.py -m <path_to_model> [-q <your_question>]
```
> Note: if `-q` is not specified, it will use `What is 13 raised to the .3432 power?` by default.
### Example: Voice Assistant
The voice assistant example ([voiceassistant.py](./transformers_int4/voiceassistant.py)) showcases how to use langchain to build a pipeline that takes in your speech as input in realtime, use an ASR model (e.g. [Whisper-Medium](https://huggingface.co/openai/whisper-medium)) to turn speech into text, and then feed the text into large language model to get response.
The voice assistant example ([voiceassistant.py](./voiceassistant.py)) showcases how to use langchain to build a pipeline that takes in your speech as input in realtime, use an ASR model (e.g. [Whisper-Medium](https://huggingface.co/openai/whisper-medium)) to turn speech into text, and then feed the text into large language model to get response.
To run the exmaple, execute the following command in the current directory:
```bash
python transformers_int4/voiceassistant.py -m <path_to_model> [-q <your_question>]
python voiceassistant.py -m <path_to_model> [-q <your_question>]
```
**Runtime Arguments Explained**:
- `-m MODEL_PATH`: **Required**, the path to the
@ -67,6 +67,23 @@ python transformers_int4/voiceassistant.py -m <path_to_model> [-q <your_question
- `-l LANGUAGE`: you can specify a language such as "english" or "chinese"
- `-d True|False`: whether the model path specified in -m is saved low bit model.
### Example: Low Bit
The low_bit example ([low_bit.py](./low_bit.py)) showcases how to use use langchain with low_bit optimized model.
By `save_low_bit` we save the weights of low_bit model into the target folder.
> Note: `save_low_bit` only saves the weights of the model.
> Users could copy the tokenizer model into the target folder or specify `tokenizer_id` during initialization.
```bash
python low_bit.py -m <path_to_model> -t <path_to_target> [-q <your question>]
```
**Runtime Arguments Explained**:
- `-m MODEL_PATH`: **Required**, the path to the model
- `-t TARGET_PATH`: **Required**, the path to save the low_bit model
- `-q QUESTION`: the question
### Legacy (Native INT4 examples)
IPEX-LLM also provides langchain integrations using native INT4 mode. Those examples can be foud in [native_int4](./native_int4/) folder. For detailed instructions of settting up and running `native_int4` examples, refer to [Native INT4 Examples README](./README_nativeint4.md).

View file

@ -0,0 +1,60 @@
#
# Copyright 2016 The BigDL Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
import argparse
from ipex_llm.langchain.llms import TransformersLLM, TransformersPipelineLLM
from langchain import PromptTemplate, LLMChain
from langchain import HuggingFacePipeline
def main(args):
question = args.question
model_path = args.model_path
low_bit_model_path = args.target_path
template ="""{question}"""
prompt = PromptTemplate(template=template, input_variables=["question"])
llm = TransformersLLM.from_model_id(
model_id=model_path,
model_kwargs={"temperature": 0, "max_length": 64, "trust_remote_code": True},
)
llm.model.save_low_bit(low_bit_model_path)
del llm
low_bit_llm = TransformersLLM.from_model_id_low_bit(
model_id=low_bit_model_path,
tokenizer_id=model_path,
model_kwargs={"temperature": 0, "max_length": 64, "trust_remote_code": True}
)
llm_chain = LLMChain(prompt=prompt, llm=low_bit_llm)
output = llm_chain.run(question)
print("====output=====")
print(output)
if __name__ == '__main__':
parser = argparse.ArgumentParser(description='TransformersLLM Langchain Chat Example')
parser.add_argument('-m','--model-path', type=str, required=True,
help='the path to transformers model')
parser.add_argument('-t','--target-path',type=str,required=True,
help='the path to save the low bit model')
parser.add_argument('-q', '--question', type=str, default='What is AI?',
help='qustion you want to ask.')
args = parser.parse_args()
main(args)

View file

@ -100,4 +100,19 @@ python rag.py -m <path_to_model> [-q QUESTION] [-i INPUT_PATH]
arguments info:
- `-m MODEL_PATH`: **required**, path to the model.
- `-q QUESTION`: question to ask. Default is `What is IPEX?`.
- `-i INPUT_PATH`: path to the input doc.
- `-i INPUT_PATH`: path to the input doc.
#### 5.2. Low Bit
The low_bit example ([low_bit.py](./low_bit.py)) showcases how to use use langchain with low_bit optimized model.
By `save_low_bit` we save the weights of low_bit model into the target folder.
> Note: `save_low_bit` only saves the weights of the model.
> Users could copy the tokenizer model into the target folder or specify `tokenizer_id` during initialization.
```bash
python low_bit.py -m <path_to_model> -t <path_to_target> [-q <your question>]
```
**Runtime Arguments Explained**:
- `-m MODEL_PATH`: **Required**, the path to the model
- `-t TARGET_PATH`: **Required**, the path to save the low_bit model
- `-q QUESTION`: the question

View file

@ -0,0 +1,64 @@
#
# Copyright 2016 The BigDL Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
import argparse
from ipex_llm.langchain.llms import TransformersLLM, TransformersPipelineLLM
from langchain import PromptTemplate, LLMChain
from langchain import HuggingFacePipeline
from torch import device
def main(args):
question = args.question
model_path = args.model_path
low_bit_model_path = args.target_path
template ="""{question}"""
prompt = PromptTemplate(template=template, input_variables=["question"])
llm = TransformersLLM.from_model_id(
model_id=model_path,
model_kwargs={"temperature": 0, "max_length": 64, "trust_remote_code": True},
device_map='xpu'
)
llm.model.save_low_bit(low_bit_model_path)
del llm
low_bit_llm = TransformersLLM.from_model_id_low_bit(
model_id=low_bit_model_path,
tokenizer_id=model_path,
device_map='xpu',
model_kwargs={"temperature": 0, "max_length": 64, "trust_remote_code": True}
)
llm_chain = LLMChain(prompt=prompt, llm=low_bit_llm)
output = llm_chain.run(question)
print("====output=====")
print(output)
if __name__ == '__main__':
parser = argparse.ArgumentParser(description='TransformersLLM Langchain Chat Example')
parser.add_argument('-m','--model-path', type=str, required=True,
help='the path to transformers model')
parser.add_argument('-t','--target-path',type=str,required=True,
help='the path to save the low bit model')
parser.add_argument('-q', '--question', type=str, default='What is AI?',
help='qustion you want to ask.')
args = parser.parse_args()
main(args)

View file

@ -48,7 +48,7 @@
import importlib.util
import logging
from typing import Any, List, Mapping, Optional
from ipex_llm.utils.common.log4Error import invalidInputError
from pydantic import Extra
from langchain.callbacks.manager import CallbackManagerForLLMRun
@ -90,13 +90,14 @@ class TransformersLLM(LLM):
model_id: str,
model_kwargs: Optional[dict] = None,
device_map: str = 'cpu',
tokenizer_id: str = None,
**kwargs: Any,
) -> LLM:
"""
Construct object from model_id
Args:
model_id: Path for the huggingface repo id to be downloaded or
the huggingface checkpoint folder.
model_kwargs: Keyword arguments that will be passed to the model and tokenizer.
@ -114,21 +115,28 @@ class TransformersLLM(LLM):
from transformers import AutoTokenizer, LlamaTokenizer
except ImportError:
raise ValueError(
invalidInputError(
"Could not import transformers python package. "
"Please install it with `pip install transformers`."
)
_model_kwargs = model_kwargs or {}
# TODO: may refactore this code in the future
try:
tokenizer = AutoTokenizer.from_pretrained(model_id, **_model_kwargs)
except:
tokenizer = LlamaTokenizer.from_pretrained(model_id, **_model_kwargs)
if tokenizer_id is not None:
try:
tokenizer = AutoTokenizer.from_pretrained(tokenizer_id, **_model_kwargs)
except:
tokenizer = LlamaTokenizer.from_pretrained(tokenizer_id, **_model_kwargs)
else:
try:
tokenizer = AutoTokenizer.from_pretrained(model_id, **_model_kwargs)
except:
tokenizer = LlamaTokenizer.from_pretrained(model_id, **_model_kwargs)
# TODO: may refactore this code in the future
try:
model = AutoModelForCausalLM.from_pretrained(model_id, load_in_4bit=True, **_model_kwargs)
model = AutoModelForCausalLM.from_pretrained(model_id, load_in_4bit=True,
**_model_kwargs)
except:
model = AutoModel.from_pretrained(model_id, load_in_4bit=True, **_model_kwargs)
@ -155,13 +163,12 @@ class TransformersLLM(LLM):
model_id: str,
model_kwargs: Optional[dict] = None,
device_map: str = 'cpu',
tokenizer_id: str = None,
**kwargs: Any,
) -> LLM:
"""
Construct low_bit object from model_id
Args:
model_id: Path for the bigdl transformers low-bit model checkpoint folder.
model_kwargs: Keyword arguments that will be passed to the model and tokenizer.
kwargs: Extra arguments that will be passed to the model and tokenizer.
@ -177,24 +184,29 @@ class TransformersLLM(LLM):
from transformers import AutoTokenizer, LlamaTokenizer
except ImportError:
raise ValueError(
invalidInputError(
"Could not import transformers python package. "
"Please install it with `pip install transformers`."
)
_model_kwargs = model_kwargs or {}
# TODO: may refactore this code in the future
try:
tokenizer = AutoTokenizer.from_pretrained(model_id, **_model_kwargs)
except:
tokenizer = LlamaTokenizer.from_pretrained(model_id, **_model_kwargs)
if tokenizer_id is not None:
try:
tokenizer = AutoTokenizer.from_pretrained(tokenizer_id, **_model_kwargs)
except:
tokenizer = LlamaTokenizer.from_pretrained(tokenizer_id, **_model_kwargs)
else:
try:
tokenizer = AutoTokenizer.from_pretrained(model_id, **_model_kwargs)
except:
tokenizer = LlamaTokenizer.from_pretrained(model_id, **_model_kwargs)
# TODO: may refactore this code in the future
try:
model = AutoModelForCausalLM.load_low_bit(model_id, **_model_kwargs)
except:
model = AutoModel.load_low_bit(model_id, **_model_kwargs)
# TODO: may refactore this code in the future
if 'xpu' in device_map:
model = model.to(device_map)
@ -260,5 +272,5 @@ class TransformersLLM(LLM):
else:
stopping_criteria = None
output = self.model.generate(input_ids, stopping_criteria=stopping_criteria, **kwargs)
text = self.tokenizer.decode(output[0], skip_special_tokens=True)[len(prompt) :]
text = self.tokenizer.decode(output[0], skip_special_tokens=True)[len(prompt):]
return text