SichengStevenLi 1a1a97c9e4

Update mddocs for part of Overview (2/2) and Inference (#11377 )

* updated link

* converted to md format, need to be reviewed

* converted to md format, need to be reviewed

* converted to md format, need to be reviewed

* converted to md format, need to be reviewed

* converted to md format, need to be reviewed

* converted to md format, need to be reviewed

* converted to md format, need to be reviewed

* converted to md format, need to be reviewed

* converted to md format, need to be reviewed

* converted to md format, need to be reviewed, deleted some leftover texts

* converted to md file type, need to be reviewed

* converted to md file type, need to be reviewed

* testing Github Tags

* testing Github Tags

* added Github Tags

* added Github Tags

* added Github Tags

* Small fix

* Small fix

* Small fix

* Small fix

* Small fix

* Further fix

* Fix index

* Small fix

* Fix

---------

Co-authored-by: Yuwen Hu <yuwen.hu@intel.com>

2024-06-21 12:07:50 +08:00

2 KiB

Raw Blame History

LangChain API

You may run the models using the LangChain API in ipex-llm.

Using Hugging Face `transformers` INT4 Format

You may run any Hugging Face Transformers model (with INT4 optimiztions applied) using the LangChain API as follows:

from ipex_llm.langchain.llms import TransformersLLM
from ipex_llm.langchain.embeddings import TransformersEmbeddings
from langchain.chains.question_answering import load_qa_chain

embeddings = TransformersEmbeddings.from_model_id(model_id=model_path)
ipex_llm = TransformersLLM.from_model_id(model_id=model_path, ...)

doc_chain = load_qa_chain(ipex_llm, ...)
output = doc_chain.run(...)

Tip

See the examples here

Using Native INT4 Format

You may also convert Hugging Face Transformers models into native INT4 format, and then run the converted models using the LangChain API as follows.

Note

Currently only llama/bloom/gptneox/starcoder model families are supported; for other models, you may use the Hugging Face transformers INT4 format as described above.

You may choose the corresponding API developed for specific native models to load the converted model.

from ipex_llm.langchain.llms import LlamaLLM
from ipex_llm.langchain.embeddings import LlamaEmbeddings
from langchain.chains.question_answering import load_qa_chain

# switch to GptneoxEmbeddings/BloomEmbeddings/StarcoderEmbeddings to load other models
embeddings = LlamaEmbeddings(model_path='/path/to/converted/model.bin')
# switch to GptneoxLLM/BloomLLM/StarcoderLLM to load other models
ipex_llm = LlamaLLM(model_path='/path/to/converted/model.bin')

doc_chain = load_qa_chain(ipex_llm, ...)
doc_chain.run(...)

Tip

See the examples here for more information.

2 KiB Raw Blame History

LangChain API

Using Hugging Face transformers INT4 Format

Using Native INT4 Format

2 KiB

Raw Blame History

Using Hugging Face `transformers` INT4 Format