From a38f927fc0cc77b30f7c21791dcafd7291aeb098 Mon Sep 17 00:00:00 2001 From: Jason Dai Date: Mon, 3 Jul 2023 14:59:55 +0800 Subject: [PATCH] Update README.md (#8439) --- python/llm/README.md | 52 ++++++++++++++++++++++++-------------------- 1 file changed, 29 insertions(+), 23 deletions(-) diff --git a/python/llm/README.md b/python/llm/README.md index 12676286..04e3a702 100644 --- a/python/llm/README.md +++ b/python/llm/README.md @@ -77,29 +77,7 @@ Currently `bigdl-llm` CLI supports *LLaMA* (e.g., *vicuna*), *GPT-NeoX* (e.g., * ``` #### Hugging Face `transformers`-style API -You may run the models using `transformers`-style API in `bigdl-llm` - -- ##### Using native INT4 format - - You may convert Hugging Face *Transformers* models into native INT4 format for maximum performance as follows. - - *(Currently only llama/bloom/gptneox/starcoder model family is supported; for other models, you may use the [Hugging Face `transformers` INT4 format](#using-hugging-face-transformers-int4-format)).* - - ```python - #convert the model - from bigdl.llm import llm_convert - bigdl_llm_path = llm_convert(model='/path/to/model/', - outfile='/path/to/output/', outtype='int4', model_family="llama") - - #load the converted model - from bigdl.llm.transformers import BigdlForCausalLM - llm = BigdlForCausalLM.from_pretrained("/path/to/output/model.bin",...) - - #run the converted model - input_ids = llm.tokenize(prompt) - output_ids = llm.generate(input_ids, ...) - output = llm.batch_decode(output_ids) - ``` +You may run the models using `transformers`-style API in `bigdl-llm`. - ##### Using Hugging Face `transformers` INT4 format @@ -118,6 +96,32 @@ You may run the models using `transformers`-style API in `bigdl-llm` output = tokenizer.batch_decode(output_ids) ``` + See the complete example [here](example/transformers/transformers_int4_pipeline.py). + + - ##### Using native INT4 format + + You may also convert Hugging Face *Transformers* models into native INT4 format for maximum performance as follows. + + *(Currently only llama/bloom/gptneox/starcoder model family is supported; for other models, you may use the Transformers INT4 format as described above).* + + ```python + #convert the model + from bigdl.llm import llm_convert + bigdl_llm_path = llm_convert(model='/path/to/model/', + outfile='/path/to/output/', outtype='int4', model_family="llama") + + #load the converted model + from bigdl.llm.transformers import BigdlForCausalLM + llm = BigdlForCausalLM.from_pretrained("/path/to/output/model.bin",...) + + #run the converted model + input_ids = llm.tokenize(prompt) + output_ids = llm.generate(input_ids, ...) + output = llm.batch_decode(output_ids) + ``` + + See the complete example [here](example/transformers/native_int4_pipeline.py). + #### LangChain API You may convert Hugging Face *Transformers* models into *native INT4* format (currently only *llama*/*bloom*/*gptneox*/*starcoder* model family is supported), and then run the converted models using the LangChain API in `bigdl-llm` as follows. @@ -135,6 +139,8 @@ doc_chain = load_qa_chain(bigdl_llm, ...) doc_chain.run(...) ``` +See the examples [here](example/langchain). + #### `llama-cpp-python`-style API You may also run the converted models using the `llama-cpp-python`-style API in `bigdl-llm` as follows.