ipex-llm/docs/mddocs/Overview/KeyFeatures/cli.md
SichengStevenLi 1a1a97c9e4
Update mddocs for part of Overview (2/2) and Inference (#11377)
* updated link

* converted to md format, need to be reviewed

* converted to md format, need to be reviewed

* converted to md format, need to be reviewed

* converted to md format, need to be reviewed

* converted to md format, need to be reviewed

* converted to md format, need to be reviewed

* converted to md format, need to be reviewed

* converted to md format, need to be reviewed

* converted to md format, need to be reviewed

* converted to md format, need to be reviewed, deleted some leftover texts

* converted to md file type, need to be reviewed

* converted to md file type, need to be reviewed

* testing Github Tags

* testing Github Tags

* added Github Tags

* added Github Tags

* added Github Tags

* Small fix

* Small fix

* Small fix

* Small fix

* Small fix

* Further fix

* Fix index

* Small fix

* Fix

---------

Co-authored-by: Yuwen Hu <yuwen.hu@intel.com>
2024-06-21 12:07:50 +08:00

1.3 KiB

CLI (Command Line Interface) Tool

Note

Currently ipex-llm CLI supports LLaMA (e.g., vicuna), GPT-NeoX (e.g., redpajama), BLOOM (e.g., pheonix) and GPT2 (e.g., starcoder) model architecture; for other models, you may use the transformers-style or LangChain APIs.

Convert Model

You may convert the downloaded model into native INT4 format using llm-convert.

# convert PyTorch (fp16 or fp32) model; 
# llama/bloom/gptneox/starcoder model family is currently supported
llm-convert "/path/to/model/" --model-format pth --model-family "bloom" --outfile "/path/to/output/"

# convert GPTQ-4bit model
# only llama model family is currently supported
llm-convert "/path/to/model/" --model-format gptq --model-family "llama" --outfile "/path/to/output/"

Run Model

You may run the converted model using llm-cli or llm-chat (built on top of main.cpp in llama.cpp)

# help
# llama/bloom/gptneox/starcoder model family is currently supported
llm-cli -x gptneox -h

# text completion
# llama/bloom/gptneox/starcoder model family is currently supported
llm-cli -t 16 -x gptneox -m "/path/to/output/model.bin" -p 'Once upon a time,'

# chat mode
# llama/gptneox model family is currently supported
llm-chat -m "/path/to/output/model.bin" -x llama