ipex-llm/docs/readthedocs/source/doc/LLM/Overview/KeyFeatures/optimize_model.md at 7897eb4b51f6858016aa6bfa3a7aaf7c8acf0994

Yuwen Hu cf6a620bae [LLM] BigDL-LLM Documentation Initial Version (#8833 )

* Change order of LLM in header

* Some updates to footer

* Add BigDL-LLM index page and basic file structure

* Update index page for key features

* Add initial content for BigDL-LLM in 5 mins

* Improvement to footnote

* Add initial contents based on current contents we have

* Add initial quick links

* Small fix

* Rename file

* Hide cli section for now and change model supports to examples

* Hugging Face format -> Hugging Face transformers format

* Add placeholder for GPU supports

* Add GPU related content structure

* Add cpu/gpu installation initial contents

* Add initial contents for GPU supports

* Add image link to LLM index page

* Hide tips and known issues for now

* Small fix

* Update based on comments

* Small fix

* Add notes for Python 3.9

* Add placehoder optimize model & reveal CLI; small revision

* examples add gpu part

* Hide CLI part again for first version of merging

* add keyfeatures-optimize_model part (#1)

* change gif link to the ones hosted on github

* Small fix

---------

Co-authored-by: plusbang <binbin1.deng@intel.com>
Co-authored-by: binbin Deng <108676127+plusbang@users.noreply.github.com>

2023-09-06 15:38:45 +08:00

1 KiB

Raw Blame History

General PyTorch Model Supports

You may apply BigDL-LLM optimizations on any Pytorch models, not only Hugging Face Transformers models for acceleration. With BigDL-LLM, PyTorch models (in FP16/BF16/FP32) can be optimized with low-bit quantizations (supported precisions include INT4/INT5/INT8).

You can easily enable BigDL-LLM INT4 optimizations on any Pytorch models just as follows:

# Create or load any Pytorch model
model = ...

# Add only two lines to enable BigDL-LLM INT4 optimizations on model
from bigdl.llm import optimize_model
model = optimize_model(model)

After optimizing the model, you may straightly run the optimized model with no API changed and less inference latency.

.. seealso::

   See the examples for Hugging Face *Transformers* models `here <https://github.com/intel-analytics/BigDL/blob/main/python/llm/example/transformers/general_int4>`_. And examples for other general Pytorch models can be found `here <https://github.com/intel-analytics/BigDL/blob/main/python/llm/example/pytorch-model>`_.

1 KiB Raw Blame History

General PyTorch Model Supports

1 KiB

Raw Blame History