From 0f8201c730bf021efed7a08c44a3a7dd8973b2ab Mon Sep 17 00:00:00 2001 From: Jason Dai Date: Mon, 24 Jul 2023 09:47:49 +0800 Subject: [PATCH] llm readme update (#8595) --- python/llm/README.md | 126 ++++++++++-------- .../transformers/native_int4/README.md | 2 +- 2 files changed, 75 insertions(+), 53 deletions(-) diff --git a/python/llm/README.md b/python/llm/README.md index f9f8005d..4216fbe8 100644 --- a/python/llm/README.md +++ b/python/llm/README.md @@ -16,7 +16,7 @@ See the ***optimized performance*** of `phoenix-inst-chat-7b`, `vicuna-13b-v1.1` We may use any Hugging Face Transfomer models on `bigdl-llm`, and the following models have been verified on Intel laptops. | Model | Example | |-----------|----------------------------------------------------------| -| LLaMA | [link1](example/transformers/native_int4), [link2](example/transformers/transformers_int4/vicuna) | +| LLaMA *(such as Vicuna, Guanaco, Koala, Baize, WizardLM, etc.)* | [link1](example/transformers/native_int4), [link2](example/transformers/transformers_int4/vicuna) | | MPT | [link](example/transformers/transformers_int4/mpt) | | Falcon | [link](example/transformers/transformers_int4/falcon) | | ChatGLM | [link](example/transformers/transformers_int4/chatglm) | @@ -39,10 +39,9 @@ We may use any Hugging Face Transfomer models on `bigdl-llm`, and the following - [Install](#install) - [Download Model](#download-model) - [Run Model](#run-model) - - [CLI Tool](#cli-tool) - - [Hugging Face `transformers`-style API](#hugging-face-transformers-style-api) + - [Hugging Face `transformers` API](#hugging-face-transformers-api) - [LangChain API](#langchain-api) - - [`llama-cpp-python`-style API](#llama-cpp-python-style-api) + - [CLI Tool](#cli-tool) - [`bigdl-llm` Dependence](#bigdl-llm-dependence) @@ -59,47 +58,11 @@ You may download any PyTorch model in Hugging Face *Transformers* format (includ #### Run Model You may run the models using **`bigdl-llm`** through one of the following APIs: -1. [CLI (command line interface) Tool](#cli-tool) -2. [Hugging Face `transformers`-style API](#hugging-face-transformers-style-api) -3. [LangChain API](#langchain-api) -4. [`llama-cpp-python`-style API](#llama-cpp-python-style-api) +1. [Hugging Face `transformers` API](#hugging-face-transformers-api) +2. [LangChain API](#langchain-api) +3. [CLI (command line interface) Tool](#cli-tool) -#### CLI Tool ->**Note**: Currently `bigdl-llm` CLI supports *LLaMA* (e.g., *vicuna*), *GPT-NeoX* (e.g., *redpajama*), *BLOOM* (e.g., *pheonix*) and *GPT2* (e.g., *starcoder*) model architecture; for other models, you may use the `transformers`-style or LangChain APIs. - - - ##### Convert model - - You may convert the downloaded model into native INT4 format using `llm-convert`. - - ```bash - #convert PyTorch (fp16 or fp32) model; - #llama/bloom/gptneox/starcoder model family is currently supported - llm-convert "/path/to/model/" --model-format pth --model-family "bloom" --outfile "/path/to/output/" - - #convert GPTQ-4bit model - #only llama model family is currently supported - llm-convert "/path/to/model/" --model-format gptq --model-family "llama" --outfile "/path/to/output/" - ``` - - - ##### Run model - - You may run the converted model using `llm-cli` or `llm-chat` (*built on top of `main.cpp` in [llama.cpp](https://github.com/ggerganov/llama.cpp)*) - - ```bash - #help - #llama/bloom/gptneox/starcoder model family is currently supported - llm-cli -x gptneox -h - - #text completion - #llama/bloom/gptneox/starcoder model family is currently supported - llm-cli -t 16 -x gptneox -m "/path/to/output/model.bin" -p 'Once upon a time,' - - #chat mode - #llama/gptneox model family is currently supported - llm-chat -m "/path/to/output/model.bin" -x llama - ``` - -#### Hugging Face `transformers`-style API +#### Hugging Face `transformers` API You may run the models using `transformers`-style API in `bigdl-llm`. - ##### Using Hugging Face `transformers` INT4 format @@ -128,7 +91,7 @@ You may run the models using `transformers`-style API in `bigdl-llm`. >See the complete example [here](example/transformers/transformers_low_bit/). - After the model is optimizaed using INT4 (or INT8/INT5), you may save and load the optimized model as follows: + After the model is optimizaed using INT4 (or INT5/INT8), you may save and load the optimized model as follows: ```python model.save_low_bit(model_path) @@ -202,16 +165,75 @@ You may run the models using the LangChain API in `bigdl-llm`. See the examples [here](example/langchain/native_int4). -#### `llama-cpp-python`-style API +#### CLI Tool +>**Note**: Currently `bigdl-llm` CLI supports *LLaMA* (e.g., *vicuna*), *GPT-NeoX* (e.g., *redpajama*), *BLOOM* (e.g., *pheonix*) and *GPT2* (e.g., *starcoder*) model architecture; for other models, you may use the `transformers`-style or LangChain APIs. -You may also run the converted models using the `llama-cpp-python`-style API in `bigdl-llm` as follows. + - ##### Convert model + + You may convert the downloaded model into native INT4 format using `llm-convert`. + + ```bash + #convert PyTorch (fp16 or fp32) model; + #llama/bloom/gptneox/starcoder model family is currently supported + llm-convert "/path/to/model/" --model-format pth --model-family "bloom" --outfile "/path/to/output/" -```python -from bigdl.llm.models import Llama, Bloom, Gptneox + #convert GPTQ-4bit model + #only llama model family is currently supported + llm-convert "/path/to/model/" --model-format gptq --model-family "llama" --outfile "/path/to/output/" + ``` + + - ##### Run model + + You may run the converted model using `llm-cli` or `llm-chat` (*built on top of `main.cpp` in [llama.cpp](https://github.com/ggerganov/llama.cpp)*) -llm = Bloom("/path/to/converted/model.bin", n_threads=4) -result = llm("what is ai") -``` + ```bash + #help + #llama/bloom/gptneox/starcoder model family is currently supported + llm-cli -x gptneox -h + + #text completion + #llama/bloom/gptneox/starcoder model family is currently supported + llm-cli -t 16 -x gptneox -m "/path/to/output/model.bin" -p 'Once upon a time,' + + #chat mode + #llama/gptneox model family is currently supported + llm-chat -m "/path/to/output/model.bin" -x llama + ``` + +#### CLI Tool +>**Note**: Currently `bigdl-llm` CLI supports *LLaMA* (e.g., *vicuna*), *GPT-NeoX* (e.g., *redpajama*), *BLOOM* (e.g., *pheonix*) and *GPT2* (e.g., *starcoder*) model architecture; for other models, you may use the `transformers`-style or LangChain APIs. + + - ##### Convert model + + You may convert the downloaded model into native INT4 format using `llm-convert`. + + ```bash + #convert PyTorch (fp16 or fp32) model; + #llama/bloom/gptneox/starcoder model family is currently supported + llm-convert "/path/to/model/" --model-format pth --model-family "bloom" --outfile "/path/to/output/" + + #convert GPTQ-4bit model + #only llama model family is currently supported + llm-convert "/path/to/model/" --model-format gptq --model-family "llama" --outfile "/path/to/output/" + ``` + + - ##### Run model + + You may run the converted model using `llm-cli` or `llm-chat` (*built on top of `main.cpp` in [llama.cpp](https://github.com/ggerganov/llama.cpp)*) + + ```bash + #help + #llama/bloom/gptneox/starcoder model family is currently supported + llm-cli -x gptneox -h + + #text completion + #llama/bloom/gptneox/starcoder model family is currently supported + llm-cli -t 16 -x gptneox -m "/path/to/output/model.bin" -p 'Once upon a time,' + + #chat mode + #llama/gptneox model family is currently supported + llm-chat -m "/path/to/output/model.bin" -x llama + ``` ### `bigdl-llm` Dependence The native code/lib in `bigdl-llm` has been built using the following tools; in particular, lower `LIBC` version on your Linux system may be incompatible with `bigdl-llm`. diff --git a/python/llm/example/transformers/native_int4/README.md b/python/llm/example/transformers/native_int4/README.md index 3152ff31..c64a1df4 100644 --- a/python/llm/example/transformers/native_int4/README.md +++ b/python/llm/example/transformers/native_int4/README.md @@ -2,7 +2,7 @@ In this example, we show a pipeline to convert a large language model to BigDL-LLM native INT4 format, and then run inference on the converted INT4 model. -> **Note**: BigDL-LLM native INT4 format currently supports model family **LLaMA**(such as Vicuna, Guanaco, Koala, Baize, WizardLM, etc.), **GPT-NeoX**(such as RedPajama), **BLOOM** (such as Phoenix) and **StarCoder**. +> **Note**: BigDL-LLM native INT4 format currently supports model family **LLaMA** (such as Vicuna, Guanaco, Koala, Baize, WizardLM, etc.), **GPT-NeoX** (such as RedPajama), **BLOOM** (such as Phoenix) and **StarCoder**. ## Prepare Environment We suggest using conda to manage environment: