diff --git a/docs/readthedocs/source/_toc.yml b/docs/readthedocs/source/_toc.yml index 9cba0641..094ebb4d 100644 --- a/docs/readthedocs/source/_toc.yml +++ b/docs/readthedocs/source/_toc.yml @@ -38,12 +38,12 @@ subtrees: title: "Key Features" subtrees: - entries: + - file: doc/LLM/Overview/KeyFeatures/optimize_model - file: doc/LLM/Overview/KeyFeatures/transformers_style_api subtrees: - entries: - file: doc/LLM/Overview/KeyFeatures/hugging_face_format - file: doc/LLM/Overview/KeyFeatures/native_format - - file: doc/LLM/Overview/KeyFeatures/optimize_model - file: doc/LLM/Overview/KeyFeatures/langchain_api # - file: doc/LLM/Overview/KeyFeatures/cli - file: doc/LLM/Overview/KeyFeatures/gpu_supports diff --git a/docs/readthedocs/source/doc/LLM/Overview/KeyFeatures/index.rst b/docs/readthedocs/source/doc/LLM/Overview/KeyFeatures/index.rst index 4914196b..823df5a1 100644 --- a/docs/readthedocs/source/doc/LLM/Overview/KeyFeatures/index.rst +++ b/docs/readthedocs/source/doc/LLM/Overview/KeyFeatures/index.rst @@ -3,12 +3,12 @@ BigDL-LLM Key Features You may run the LLMs using ``bigdl-llm`` through one of the following APIs: +* `PyTorch API <./optimize_model.html>`_ * |transformers_style_api|_ * |hugging_face_transformers_format|_ * `Native Format <./native_format.html>`_ -* `General PyTorch Model Supports <./langchain_api.html>`_ * `LangChain API <./langchain_api.html>`_ * `GPU Supports <./gpu_supports.html>`_ diff --git a/docs/readthedocs/source/doc/LLM/Overview/KeyFeatures/optimize_model.md b/docs/readthedocs/source/doc/LLM/Overview/KeyFeatures/optimize_model.md index eeb7a3c1..ac510688 100644 --- a/docs/readthedocs/source/doc/LLM/Overview/KeyFeatures/optimize_model.md +++ b/docs/readthedocs/source/doc/LLM/Overview/KeyFeatures/optimize_model.md @@ -1,22 +1,27 @@ -## General PyTorch Model Supports +## PyTorch API -You may apply BigDL-LLM optimizations on any Pytorch models, not only Hugging Face *Transformers* models for acceleration. With BigDL-LLM, PyTorch models (in FP16/BF16/FP32) can be optimized with low-bit quantizations (supported precisions include INT4/INT5/INT8). +In general, you just need one-line `optimize_model` to easily optimize any loaded PyTorch model, regardless of the library or API you are using. With BigDL-LLM, PyTorch models (in FP16/BF16/FP32) can be optimized with low-bit quantizations (supported precisions include INT4, INT5, INT8, etc). -You can easily enable BigDL-LLM INT4 optimizations on any Pytorch models just as follows: +First, use any PyTorch APIs you like to load your model. To help you better understand the process, here we use [Hugging Face Transformers](https://huggingface.co/docs/transformers/index) library `LlamaForCausalLM` to load a popular model [Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf) as an example: ```python -# Create or load any Pytorch model -model = ... +# Create or load any Pytorch model, take Llama-2-7b-chat-hf as an example +from transformers import LlamaForCausalLM +model = LlamaForCausalLM.from_pretrained('meta-llama/Llama-2-7b-chat-hf', torch_dtype='auto', low_cpu_mem_usage=True) +``` -# Add only two lines to enable BigDL-LLM INT4 optimizations on model +Then, just need to call `optimize_model` to optimize the loaded model and INT4 optimization is applied on model by default: +```python from bigdl.llm import optimize_model + +# With only one line to enable BigDL-LLM INT4 optimization model = optimize_model(model) ``` -After optimizing the model, you may straightly run the optimized model with no API changed and less inference latency. +After optimizing the model, BigDL-LLM does not require any change in the inference code. You can use any libraries to run the optimized model with very low latency. ```eval_rst .. seealso:: - See the examples for Hugging Face *Transformers* models `here `_. And examples for other general Pytorch models can be found `here `_. + * For more detailed usage of ``optimize_model``, please refer to the `API documentation `_. ``` diff --git a/docs/readthedocs/source/doc/LLM/Overview/install_cpu.md b/docs/readthedocs/source/doc/LLM/Overview/install_cpu.md index 763fd09a..5c2642db 100644 --- a/docs/readthedocs/source/doc/LLM/Overview/install_cpu.md +++ b/docs/readthedocs/source/doc/LLM/Overview/install_cpu.md @@ -5,9 +5,11 @@ Install BigDL-LLM for CPU supports using pip through: ```bash -pip install bigdl-llm[all] +pip install --pre --upgrade bigdl-llm[all] # install the latest bigdl-llm nightly build with 'all' option ``` +Please refer to [Environment Setup](#environment-setup) for more information. + ```eval_rst .. note:: @@ -43,7 +45,7 @@ First we recommend using [Conda](https://docs.conda.io/en/latest/miniconda.html) conda create -n llm python=3.9 conda activate llm -pip install bigdl-llm[all] # install bigdl-llm for CPU with 'all' option +pip install --pre --upgrade bigdl-llm[all] # install the latest bigdl-llm nightly build with 'all' option ``` Then for running a LLM model with BigDL-LLM optimizations (taking an `example.py` an example): diff --git a/docs/readthedocs/source/doc/LLM/Overview/install_gpu.md b/docs/readthedocs/source/doc/LLM/Overview/install_gpu.md index 5429c150..0d36c39f 100644 --- a/docs/readthedocs/source/doc/LLM/Overview/install_gpu.md +++ b/docs/readthedocs/source/doc/LLM/Overview/install_gpu.md @@ -5,9 +5,11 @@ Install BigDL-LLM for GPU supports using pip through: ```bash -pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu +pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu # install bigdl-llm for GPU ``` +Please refer to [Environment Setup](#environment-setup) for more information. + ```eval_rst .. note:: @@ -25,6 +27,12 @@ BigDL-LLM for GPU supports has been verified on: * Intel Arcâ„¢ A-Series Graphics * Intel Data Center GPU Flex Series +```eval_rst +.. note:: + + We currently supoort the Ubuntu 20.04 operating system or later. Windows supoort is in progress. +``` + To apply Intel GPU acceleration, there're several steps for tools installation and environment preparation: * Step 1, only Linux system is supported now, Ubuntu 22.04 is prefered. diff --git a/docs/readthedocs/source/doc/LLM/index.rst b/docs/readthedocs/source/doc/LLM/index.rst index f18aa1ab..e13cb0aa 100644 --- a/docs/readthedocs/source/doc/LLM/index.rst +++ b/docs/readthedocs/source/doc/LLM/index.rst @@ -32,8 +32,8 @@ BigDL-LLM +++ + :bdg-link:`PyTorch <./Overview/KeyFeatures/optimize_model.html>` | :bdg-link:`transformers-style <./Overview/KeyFeatures/transformers_style_api.html>` | - :bdg-link:`Optimize Model <./Overview/KeyFeatures/optimize_model.html>` | :bdg-link:`LangChain <./Overview/KeyFeatures/langchain_api.html>` | :bdg-link:`GPU <./Overview/KeyFeatures/gpu_supports.html>` diff --git a/docs/readthedocs/source/doc/PythonAPI/LLM/index.rst b/docs/readthedocs/source/doc/PythonAPI/LLM/index.rst index ea8d4fc0..6d6e38e1 100644 --- a/docs/readthedocs/source/doc/PythonAPI/LLM/index.rst +++ b/docs/readthedocs/source/doc/PythonAPI/LLM/index.rst @@ -4,6 +4,6 @@ BigDL-LLM API .. toctree:: :maxdepth: 3 + optimize.rst transformers.rst langchain.rst - optimize.rst diff --git a/docs/readthedocs/source/doc/PythonAPI/LLM/optimize.rst b/docs/readthedocs/source/doc/PythonAPI/LLM/optimize.rst index a6949247..01903ada 100644 --- a/docs/readthedocs/source/doc/PythonAPI/LLM/optimize.rst +++ b/docs/readthedocs/source/doc/PythonAPI/LLM/optimize.rst @@ -1,4 +1,4 @@ -BigDL-LLM Optimize API +BigDL-LLM PyTorch API ===================== llm.optimize