ipex-llm/python/llm/example/cpp-python
2023-07-06 14:36:42 +08:00
..
int4_inference.py LLM: update langchain and cpp-python style API examples (#8456) 2023-07-06 14:36:42 +08:00
README.md LLM: update langchain and cpp-python style API examples (#8456) 2023-07-06 14:36:42 +08:00

BigDL-LLM INT4 Inference Using Llama-Cpp-Python Format API

In this example, we show how to run inference on converted INT4 model using llama-cpp-python format API.

Note

: Currently model family LLaMA, GPT-NeoX, BLOOM and StarCoder are supported.

Prepare Environment

We suggest using conda to manage environment:

conda create -n llm python=3.9
conda activate llm

pip install --pre --upgrade bigdl-llm[all]

Convert Models using bigdl-llm

Follow the instructions in Convert model.

Run the example

python ./int4_inference.py -m CONVERTED_MODEL_PATH -x MODEL_FAMILY -p PROMPT -t THREAD_NUM

arguments info:

  • -m CONVERTED_MODEL_PATH: required, path to the converted model
  • -x MODEL_FAMILY: required, the model family of the model specified in -m, available options are llama, gptneox, bloom and starcoder
  • -p PROMPT: question to ask. Default is What is AI?.
  • -t THREAD_NUM: specify the number of threads to use for inference. Default is 2.