History

binbin Deng 70bc8ea8ae LLM: update langchain and cpp-python style API examples (#8456 )		2023-07-06 14:36:42 +08:00
..
int4_inference.py	LLM: update langchain and cpp-python style API examples (#8456 )	2023-07-06 14:36:42 +08:00
README.md	LLM: update langchain and cpp-python style API examples (#8456 )	2023-07-06 14:36:42 +08:00

BigDL-LLM INT4 Inference Using Llama-Cpp-Python Format API

In this example, we show how to run inference on converted INT4 model using llama-cpp-python format API.

Note

: Currently model family LLaMA, GPT-NeoX, BLOOM and StarCoder are supported.

Prepare Environment

We suggest using conda to manage environment:

conda create -n llm python=3.9
conda activate llm

pip install --pre --upgrade bigdl-llm[all]

Follow the instructions in Convert model.

python ./int4_inference.py -m CONVERTED_MODEL_PATH -x MODEL_FAMILY -p PROMPT -t THREAD_NUM

arguments info:

-m CONVERTED_MODEL_PATH: required, path to the converted model
-x MODEL_FAMILY: required, the model family of the model specified in -m, available options are llama, gptneox, bloom and starcoder
-p PROMPT: question to ask. Default is What is AI?.
-t THREAD_NUM: specify the number of threads to use for inference. Default is 2.