Yuwen Hu cf6a620bae [LLM] BigDL-LLM Documentation Initial Version (#8833 )

* Change order of LLM in header

* Some updates to footer

* Add BigDL-LLM index page and basic file structure

* Update index page for key features

* Add initial content for BigDL-LLM in 5 mins

* Improvement to footnote

* Add initial contents based on current contents we have

* Add initial quick links

* Small fix

* Rename file

* Hide cli section for now and change model supports to examples

* Hugging Face format -> Hugging Face transformers format

* Add placeholder for GPU supports

* Add GPU related content structure

* Add cpu/gpu installation initial contents

* Add initial contents for GPU supports

* Add image link to LLM index page

* Hide tips and known issues for now

* Small fix

* Update based on comments

* Small fix

* Add notes for Python 3.9

* Add placehoder optimize model & reveal CLI; small revision

* examples add gpu part

* Hide CLI part again for first version of merging

* add keyfeatures-optimize_model part (#1)

* change gif link to the ones hosted on github

* Small fix

---------

Co-authored-by: plusbang <binbin1.deng@intel.com>
Co-authored-by: binbin Deng <108676127+plusbang@users.noreply.github.com>

2023-09-06 15:38:45 +08:00

2.1 KiB

Raw Blame History

LangChain API

You may run the models using the LangChain API in bigdl-llm.

Using Hugging Face `transformers` INT4 Format

You may run any Hugging Face Transformers model (with INT4 optimiztions applied) using the LangChain API as follows:

from bigdl.llm.langchain.llms import TransformersLLM
from bigdl.llm.langchain.embeddings import TransformersEmbeddings
from langchain.chains.question_answering import load_qa_chain

embeddings = TransformersEmbeddings.from_model_id(model_id=model_path)
bigdl_llm = TransformersLLM.from_model_id(model_id=model_path, ...)

doc_chain = load_qa_chain(bigdl_llm, ...)
output = doc_chain.run(...)

.. seealso::

   See the examples `here <https://github.com/intel-analytics/BigDL/blob/main/python/llm/example/langchain/transformers_int4>`_.

Using Native INT4 Format

You may also convert Hugging Face Transformers models into native INT4 format, and then run the converted models using the LangChain API as follows.

.. note::

   * Currently only llama/bloom/gptneox/starcoder/chatglm model families are supported; for other models, you may use the Hugging Face ``transformers`` INT4 format as described `above <./langchain_api.html#using-hugging-face-transformers-int4-format>`_.

   * You may choose the corresponding API developed for specific native models to load the converted model.

from bigdl.llm.langchain.llms import LlamaLLM
from bigdl.llm.langchain.embeddings import LlamaEmbeddings
from langchain.chains.question_answering import load_qa_chain

# switch to ChatGLMEmbeddings/GptneoxEmbeddings/BloomEmbeddings/StarcoderEmbeddings to load other models
embeddings = LlamaEmbeddings(model_path='/path/to/converted/model.bin')
# switch to ChatGLMLLM/GptneoxLLM/BloomLLM/StarcoderLLM to load other models
bigdl_llm = LlamaLLM(model_path='/path/to/converted/model.bin')

doc_chain = load_qa_chain(bigdl_llm, ...)
doc_chain.run(...)

.. seealso::

   See the examples `here <https://github.com/intel-analytics/BigDL/blob/main/python/llm/example/langchain/native_int4>`_.

2.1 KiB Raw Blame History

LangChain API

Using Hugging Face transformers INT4 Format

Using Native INT4 Format

2.1 KiB

Raw Blame History

Using Hugging Face `transformers` INT4 Format