Add tokenizer_id in Langchain (#10588)
* fix low-bit * fix * fix style --------- Co-authored-by: arda <arda@arda-arc12.sh.intel.com>
This commit is contained in:
parent
f6fef09933
commit
b827f534d5
5 changed files with 195 additions and 27 deletions
|
|
@ -18,47 +18,47 @@ pip install -U pandas==2.0.3
|
||||||
|
|
||||||
### Example: Chat
|
### Example: Chat
|
||||||
|
|
||||||
The chat example ([chat.py](./transformers_int4/chat.py)) shows how to use `LLMChain` to build a chat pipeline.
|
The chat example ([chat.py](./chat.py)) shows how to use `LLMChain` to build a chat pipeline.
|
||||||
|
|
||||||
To run the example, execute the following command in the current directory:
|
To run the example, execute the following command in the current directory:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
python transformers_int4/chat.py -m <path_to_model> [-q <your_question>]
|
python chat.py -m <path_to_model> [-q <your_question>]
|
||||||
```
|
```
|
||||||
> Note: if `-q` is not specified, it will use `What is AI` by default.
|
> Note: if `-q` is not specified, it will use `What is AI` by default.
|
||||||
|
|
||||||
### Example: RAG (Retrival Augmented Generation)
|
### Example: RAG (Retrival Augmented Generation)
|
||||||
|
|
||||||
The RAG example ([rag.py](./transformers_int4/rag.py)) shows how to load the input text into vector database, and then use `load_qa_chain` to build a retrival pipeline.
|
The RAG example ([rag.py](./rag.py)) shows how to load the input text into vector database, and then use `load_qa_chain` to build a retrival pipeline.
|
||||||
|
|
||||||
To run the example, execute the following command in the current directory:
|
To run the example, execute the following command in the current directory:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
python transformers_int4/rag.py -m <path_to_model> [-q <your_question>] [-i <path_to_input_txt>]
|
python rag.py -m <path_to_model> [-q <your_question>] [-i <path_to_input_txt>]
|
||||||
```
|
```
|
||||||
> Note: If `-i` is not specified, it will use a short introduction to Big-DL as input by default. if `-q` is not specified, `What is IPEX LLM?` will be used by default.
|
> Note: If `-i` is not specified, it will use a short introduction to Big-DL as input by default. if `-q` is not specified, `What is IPEX LLM?` will be used by default.
|
||||||
|
|
||||||
|
|
||||||
### Example: Math
|
### Example: Math
|
||||||
|
|
||||||
The math example ([math.py](./transformers_int4/llm_math.py)) shows how to build a chat pipeline specialized in solving math questions. For example, you can ask `What is 13 raised to the .3432 power?`
|
The math example ([math.py](./llm_math.py)) shows how to build a chat pipeline specialized in solving math questions. For example, you can ask `What is 13 raised to the .3432 power?`
|
||||||
|
|
||||||
To run the exmaple, execute the following command in the current directory:
|
To run the exmaple, execute the following command in the current directory:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
python transformers_int4/llm_math.py -m <path_to_model> [-q <your_question>]
|
python llm_math.py -m <path_to_model> [-q <your_question>]
|
||||||
```
|
```
|
||||||
> Note: if `-q` is not specified, it will use `What is 13 raised to the .3432 power?` by default.
|
> Note: if `-q` is not specified, it will use `What is 13 raised to the .3432 power?` by default.
|
||||||
|
|
||||||
|
|
||||||
### Example: Voice Assistant
|
### Example: Voice Assistant
|
||||||
|
|
||||||
The voice assistant example ([voiceassistant.py](./transformers_int4/voiceassistant.py)) showcases how to use langchain to build a pipeline that takes in your speech as input in realtime, use an ASR model (e.g. [Whisper-Medium](https://huggingface.co/openai/whisper-medium)) to turn speech into text, and then feed the text into large language model to get response.
|
The voice assistant example ([voiceassistant.py](./voiceassistant.py)) showcases how to use langchain to build a pipeline that takes in your speech as input in realtime, use an ASR model (e.g. [Whisper-Medium](https://huggingface.co/openai/whisper-medium)) to turn speech into text, and then feed the text into large language model to get response.
|
||||||
|
|
||||||
To run the exmaple, execute the following command in the current directory:
|
To run the exmaple, execute the following command in the current directory:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
python transformers_int4/voiceassistant.py -m <path_to_model> [-q <your_question>]
|
python voiceassistant.py -m <path_to_model> [-q <your_question>]
|
||||||
```
|
```
|
||||||
**Runtime Arguments Explained**:
|
**Runtime Arguments Explained**:
|
||||||
- `-m MODEL_PATH`: **Required**, the path to the
|
- `-m MODEL_PATH`: **Required**, the path to the
|
||||||
|
|
@ -67,6 +67,23 @@ python transformers_int4/voiceassistant.py -m <path_to_model> [-q <your_question
|
||||||
- `-l LANGUAGE`: you can specify a language such as "english" or "chinese"
|
- `-l LANGUAGE`: you can specify a language such as "english" or "chinese"
|
||||||
- `-d True|False`: whether the model path specified in -m is saved low bit model.
|
- `-d True|False`: whether the model path specified in -m is saved low bit model.
|
||||||
|
|
||||||
|
|
||||||
|
### Example: Low Bit
|
||||||
|
|
||||||
|
The low_bit example ([low_bit.py](./low_bit.py)) showcases how to use use langchain with low_bit optimized model.
|
||||||
|
By `save_low_bit` we save the weights of low_bit model into the target folder.
|
||||||
|
> Note: `save_low_bit` only saves the weights of the model.
|
||||||
|
> Users could copy the tokenizer model into the target folder or specify `tokenizer_id` during initialization.
|
||||||
|
```bash
|
||||||
|
python low_bit.py -m <path_to_model> -t <path_to_target> [-q <your question>]
|
||||||
|
```
|
||||||
|
**Runtime Arguments Explained**:
|
||||||
|
- `-m MODEL_PATH`: **Required**, the path to the model
|
||||||
|
- `-t TARGET_PATH`: **Required**, the path to save the low_bit model
|
||||||
|
- `-q QUESTION`: the question
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
### Legacy (Native INT4 examples)
|
### Legacy (Native INT4 examples)
|
||||||
|
|
||||||
IPEX-LLM also provides langchain integrations using native INT4 mode. Those examples can be foud in [native_int4](./native_int4/) folder. For detailed instructions of settting up and running `native_int4` examples, refer to [Native INT4 Examples README](./README_nativeint4.md).
|
IPEX-LLM also provides langchain integrations using native INT4 mode. Those examples can be foud in [native_int4](./native_int4/) folder. For detailed instructions of settting up and running `native_int4` examples, refer to [Native INT4 Examples README](./README_nativeint4.md).
|
||||||
|
|
|
||||||
60
python/llm/example/CPU/LangChain/low_bit.py
Normal file
60
python/llm/example/CPU/LangChain/low_bit.py
Normal file
|
|
@ -0,0 +1,60 @@
|
||||||
|
#
|
||||||
|
# Copyright 2016 The BigDL Authors.
|
||||||
|
#
|
||||||
|
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
# you may not use this file except in compliance with the License.
|
||||||
|
# You may obtain a copy of the License at
|
||||||
|
#
|
||||||
|
# http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
#
|
||||||
|
# Unless required by applicable law or agreed to in writing, software
|
||||||
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
# See the License for the specific language governing permissions and
|
||||||
|
# limitations under the License.
|
||||||
|
#
|
||||||
|
|
||||||
|
|
||||||
|
import argparse
|
||||||
|
from ipex_llm.langchain.llms import TransformersLLM, TransformersPipelineLLM
|
||||||
|
from langchain import PromptTemplate, LLMChain
|
||||||
|
from langchain import HuggingFacePipeline
|
||||||
|
|
||||||
|
|
||||||
|
def main(args):
|
||||||
|
question = args.question
|
||||||
|
model_path = args.model_path
|
||||||
|
low_bit_model_path = args.target_path
|
||||||
|
template ="""{question}"""
|
||||||
|
|
||||||
|
prompt = PromptTemplate(template=template, input_variables=["question"])
|
||||||
|
|
||||||
|
llm = TransformersLLM.from_model_id(
|
||||||
|
model_id=model_path,
|
||||||
|
model_kwargs={"temperature": 0, "max_length": 64, "trust_remote_code": True},
|
||||||
|
)
|
||||||
|
llm.model.save_low_bit(low_bit_model_path)
|
||||||
|
del llm
|
||||||
|
low_bit_llm = TransformersLLM.from_model_id_low_bit(
|
||||||
|
model_id=low_bit_model_path,
|
||||||
|
tokenizer_id=model_path,
|
||||||
|
model_kwargs={"temperature": 0, "max_length": 64, "trust_remote_code": True}
|
||||||
|
)
|
||||||
|
llm_chain = LLMChain(prompt=prompt, llm=low_bit_llm)
|
||||||
|
|
||||||
|
output = llm_chain.run(question)
|
||||||
|
print("====output=====")
|
||||||
|
print(output)
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == '__main__':
|
||||||
|
parser = argparse.ArgumentParser(description='TransformersLLM Langchain Chat Example')
|
||||||
|
parser.add_argument('-m','--model-path', type=str, required=True,
|
||||||
|
help='the path to transformers model')
|
||||||
|
parser.add_argument('-t','--target-path',type=str,required=True,
|
||||||
|
help='the path to save the low bit model')
|
||||||
|
parser.add_argument('-q', '--question', type=str, default='What is AI?',
|
||||||
|
help='qustion you want to ask.')
|
||||||
|
args = parser.parse_args()
|
||||||
|
|
||||||
|
main(args)
|
||||||
|
|
@ -100,4 +100,19 @@ python rag.py -m <path_to_model> [-q QUESTION] [-i INPUT_PATH]
|
||||||
arguments info:
|
arguments info:
|
||||||
- `-m MODEL_PATH`: **required**, path to the model.
|
- `-m MODEL_PATH`: **required**, path to the model.
|
||||||
- `-q QUESTION`: question to ask. Default is `What is IPEX?`.
|
- `-q QUESTION`: question to ask. Default is `What is IPEX?`.
|
||||||
- `-i INPUT_PATH`: path to the input doc.
|
- `-i INPUT_PATH`: path to the input doc.
|
||||||
|
|
||||||
|
|
||||||
|
#### 5.2. Low Bit
|
||||||
|
|
||||||
|
The low_bit example ([low_bit.py](./low_bit.py)) showcases how to use use langchain with low_bit optimized model.
|
||||||
|
By `save_low_bit` we save the weights of low_bit model into the target folder.
|
||||||
|
> Note: `save_low_bit` only saves the weights of the model.
|
||||||
|
> Users could copy the tokenizer model into the target folder or specify `tokenizer_id` during initialization.
|
||||||
|
```bash
|
||||||
|
python low_bit.py -m <path_to_model> -t <path_to_target> [-q <your question>]
|
||||||
|
```
|
||||||
|
**Runtime Arguments Explained**:
|
||||||
|
- `-m MODEL_PATH`: **Required**, the path to the model
|
||||||
|
- `-t TARGET_PATH`: **Required**, the path to save the low_bit model
|
||||||
|
- `-q QUESTION`: the question
|
||||||
64
python/llm/example/GPU/LangChain/low_bit.py
Normal file
64
python/llm/example/GPU/LangChain/low_bit.py
Normal file
|
|
@ -0,0 +1,64 @@
|
||||||
|
#
|
||||||
|
# Copyright 2016 The BigDL Authors.
|
||||||
|
#
|
||||||
|
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
# you may not use this file except in compliance with the License.
|
||||||
|
# You may obtain a copy of the License at
|
||||||
|
#
|
||||||
|
# http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
#
|
||||||
|
# Unless required by applicable law or agreed to in writing, software
|
||||||
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
# See the License for the specific language governing permissions and
|
||||||
|
# limitations under the License.
|
||||||
|
#
|
||||||
|
|
||||||
|
|
||||||
|
import argparse
|
||||||
|
|
||||||
|
from ipex_llm.langchain.llms import TransformersLLM, TransformersPipelineLLM
|
||||||
|
from langchain import PromptTemplate, LLMChain
|
||||||
|
from langchain import HuggingFacePipeline
|
||||||
|
from torch import device
|
||||||
|
|
||||||
|
|
||||||
|
def main(args):
|
||||||
|
question = args.question
|
||||||
|
model_path = args.model_path
|
||||||
|
low_bit_model_path = args.target_path
|
||||||
|
template ="""{question}"""
|
||||||
|
|
||||||
|
prompt = PromptTemplate(template=template, input_variables=["question"])
|
||||||
|
|
||||||
|
llm = TransformersLLM.from_model_id(
|
||||||
|
model_id=model_path,
|
||||||
|
model_kwargs={"temperature": 0, "max_length": 64, "trust_remote_code": True},
|
||||||
|
device_map='xpu'
|
||||||
|
)
|
||||||
|
llm.model.save_low_bit(low_bit_model_path)
|
||||||
|
del llm
|
||||||
|
low_bit_llm = TransformersLLM.from_model_id_low_bit(
|
||||||
|
model_id=low_bit_model_path,
|
||||||
|
tokenizer_id=model_path,
|
||||||
|
device_map='xpu',
|
||||||
|
model_kwargs={"temperature": 0, "max_length": 64, "trust_remote_code": True}
|
||||||
|
)
|
||||||
|
llm_chain = LLMChain(prompt=prompt, llm=low_bit_llm)
|
||||||
|
|
||||||
|
output = llm_chain.run(question)
|
||||||
|
print("====output=====")
|
||||||
|
print(output)
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == '__main__':
|
||||||
|
parser = argparse.ArgumentParser(description='TransformersLLM Langchain Chat Example')
|
||||||
|
parser.add_argument('-m','--model-path', type=str, required=True,
|
||||||
|
help='the path to transformers model')
|
||||||
|
parser.add_argument('-t','--target-path',type=str,required=True,
|
||||||
|
help='the path to save the low bit model')
|
||||||
|
parser.add_argument('-q', '--question', type=str, default='What is AI?',
|
||||||
|
help='qustion you want to ask.')
|
||||||
|
args = parser.parse_args()
|
||||||
|
|
||||||
|
main(args)
|
||||||
|
|
@ -48,7 +48,7 @@
|
||||||
import importlib.util
|
import importlib.util
|
||||||
import logging
|
import logging
|
||||||
from typing import Any, List, Mapping, Optional
|
from typing import Any, List, Mapping, Optional
|
||||||
|
from ipex_llm.utils.common.log4Error import invalidInputError
|
||||||
from pydantic import Extra
|
from pydantic import Extra
|
||||||
|
|
||||||
from langchain.callbacks.manager import CallbackManagerForLLMRun
|
from langchain.callbacks.manager import CallbackManagerForLLMRun
|
||||||
|
|
@ -90,13 +90,14 @@ class TransformersLLM(LLM):
|
||||||
model_id: str,
|
model_id: str,
|
||||||
model_kwargs: Optional[dict] = None,
|
model_kwargs: Optional[dict] = None,
|
||||||
device_map: str = 'cpu',
|
device_map: str = 'cpu',
|
||||||
|
tokenizer_id: str = None,
|
||||||
**kwargs: Any,
|
**kwargs: Any,
|
||||||
) -> LLM:
|
) -> LLM:
|
||||||
"""
|
"""
|
||||||
Construct object from model_id
|
Construct object from model_id
|
||||||
|
|
||||||
Args:
|
Args:
|
||||||
|
|
||||||
model_id: Path for the huggingface repo id to be downloaded or
|
model_id: Path for the huggingface repo id to be downloaded or
|
||||||
the huggingface checkpoint folder.
|
the huggingface checkpoint folder.
|
||||||
model_kwargs: Keyword arguments that will be passed to the model and tokenizer.
|
model_kwargs: Keyword arguments that will be passed to the model and tokenizer.
|
||||||
|
|
@ -114,21 +115,28 @@ class TransformersLLM(LLM):
|
||||||
from transformers import AutoTokenizer, LlamaTokenizer
|
from transformers import AutoTokenizer, LlamaTokenizer
|
||||||
|
|
||||||
except ImportError:
|
except ImportError:
|
||||||
raise ValueError(
|
invalidInputError(
|
||||||
"Could not import transformers python package. "
|
"Could not import transformers python package. "
|
||||||
"Please install it with `pip install transformers`."
|
"Please install it with `pip install transformers`."
|
||||||
)
|
)
|
||||||
|
|
||||||
_model_kwargs = model_kwargs or {}
|
_model_kwargs = model_kwargs or {}
|
||||||
# TODO: may refactore this code in the future
|
# TODO: may refactore this code in the future
|
||||||
try:
|
if tokenizer_id is not None:
|
||||||
tokenizer = AutoTokenizer.from_pretrained(model_id, **_model_kwargs)
|
try:
|
||||||
except:
|
tokenizer = AutoTokenizer.from_pretrained(tokenizer_id, **_model_kwargs)
|
||||||
tokenizer = LlamaTokenizer.from_pretrained(model_id, **_model_kwargs)
|
except:
|
||||||
|
tokenizer = LlamaTokenizer.from_pretrained(tokenizer_id, **_model_kwargs)
|
||||||
|
else:
|
||||||
|
try:
|
||||||
|
tokenizer = AutoTokenizer.from_pretrained(model_id, **_model_kwargs)
|
||||||
|
except:
|
||||||
|
tokenizer = LlamaTokenizer.from_pretrained(model_id, **_model_kwargs)
|
||||||
|
|
||||||
# TODO: may refactore this code in the future
|
# TODO: may refactore this code in the future
|
||||||
try:
|
try:
|
||||||
model = AutoModelForCausalLM.from_pretrained(model_id, load_in_4bit=True, **_model_kwargs)
|
model = AutoModelForCausalLM.from_pretrained(model_id, load_in_4bit=True,
|
||||||
|
**_model_kwargs)
|
||||||
except:
|
except:
|
||||||
model = AutoModel.from_pretrained(model_id, load_in_4bit=True, **_model_kwargs)
|
model = AutoModel.from_pretrained(model_id, load_in_4bit=True, **_model_kwargs)
|
||||||
|
|
||||||
|
|
@ -155,13 +163,12 @@ class TransformersLLM(LLM):
|
||||||
model_id: str,
|
model_id: str,
|
||||||
model_kwargs: Optional[dict] = None,
|
model_kwargs: Optional[dict] = None,
|
||||||
device_map: str = 'cpu',
|
device_map: str = 'cpu',
|
||||||
|
tokenizer_id: str = None,
|
||||||
**kwargs: Any,
|
**kwargs: Any,
|
||||||
) -> LLM:
|
) -> LLM:
|
||||||
"""
|
"""
|
||||||
Construct low_bit object from model_id
|
Construct low_bit object from model_id
|
||||||
|
|
||||||
Args:
|
Args:
|
||||||
|
|
||||||
model_id: Path for the bigdl transformers low-bit model checkpoint folder.
|
model_id: Path for the bigdl transformers low-bit model checkpoint folder.
|
||||||
model_kwargs: Keyword arguments that will be passed to the model and tokenizer.
|
model_kwargs: Keyword arguments that will be passed to the model and tokenizer.
|
||||||
kwargs: Extra arguments that will be passed to the model and tokenizer.
|
kwargs: Extra arguments that will be passed to the model and tokenizer.
|
||||||
|
|
@ -177,24 +184,29 @@ class TransformersLLM(LLM):
|
||||||
from transformers import AutoTokenizer, LlamaTokenizer
|
from transformers import AutoTokenizer, LlamaTokenizer
|
||||||
|
|
||||||
except ImportError:
|
except ImportError:
|
||||||
raise ValueError(
|
invalidInputError(
|
||||||
"Could not import transformers python package. "
|
"Could not import transformers python package. "
|
||||||
"Please install it with `pip install transformers`."
|
"Please install it with `pip install transformers`."
|
||||||
)
|
)
|
||||||
|
|
||||||
_model_kwargs = model_kwargs or {}
|
_model_kwargs = model_kwargs or {}
|
||||||
# TODO: may refactore this code in the future
|
# TODO: may refactore this code in the future
|
||||||
try:
|
if tokenizer_id is not None:
|
||||||
tokenizer = AutoTokenizer.from_pretrained(model_id, **_model_kwargs)
|
try:
|
||||||
except:
|
tokenizer = AutoTokenizer.from_pretrained(tokenizer_id, **_model_kwargs)
|
||||||
tokenizer = LlamaTokenizer.from_pretrained(model_id, **_model_kwargs)
|
except:
|
||||||
|
tokenizer = LlamaTokenizer.from_pretrained(tokenizer_id, **_model_kwargs)
|
||||||
|
else:
|
||||||
|
try:
|
||||||
|
tokenizer = AutoTokenizer.from_pretrained(model_id, **_model_kwargs)
|
||||||
|
except:
|
||||||
|
tokenizer = LlamaTokenizer.from_pretrained(model_id, **_model_kwargs)
|
||||||
|
|
||||||
# TODO: may refactore this code in the future
|
# TODO: may refactore this code in the future
|
||||||
try:
|
try:
|
||||||
model = AutoModelForCausalLM.load_low_bit(model_id, **_model_kwargs)
|
model = AutoModelForCausalLM.load_low_bit(model_id, **_model_kwargs)
|
||||||
except:
|
except:
|
||||||
model = AutoModel.load_low_bit(model_id, **_model_kwargs)
|
model = AutoModel.load_low_bit(model_id, **_model_kwargs)
|
||||||
|
|
||||||
# TODO: may refactore this code in the future
|
# TODO: may refactore this code in the future
|
||||||
if 'xpu' in device_map:
|
if 'xpu' in device_map:
|
||||||
model = model.to(device_map)
|
model = model.to(device_map)
|
||||||
|
|
@ -260,5 +272,5 @@ class TransformersLLM(LLM):
|
||||||
else:
|
else:
|
||||||
stopping_criteria = None
|
stopping_criteria = None
|
||||||
output = self.model.generate(input_ids, stopping_criteria=stopping_criteria, **kwargs)
|
output = self.model.generate(input_ids, stopping_criteria=stopping_criteria, **kwargs)
|
||||||
text = self.tokenizer.decode(output[0], skip_special_tokens=True)[len(prompt) :]
|
text = self.tokenizer.decode(output[0], skip_special_tokens=True)[len(prompt):]
|
||||||
return text
|
return text
|
||||||
|
|
|
||||||
Loading…
Reference in a new issue