diff --git a/docs/readthedocs/source/doc/LLM/Overview/examples.rst b/docs/readthedocs/source/doc/LLM/Overview/examples.rst index e61f1c4b..c531d8b7 100644 --- a/docs/readthedocs/source/doc/LLM/Overview/examples.rst +++ b/docs/readthedocs/source/doc/LLM/Overview/examples.rst @@ -1,7 +1,7 @@ BigDL-LLM Examples ================================ -You can use BigDL-LLM to run any Huggingface *Transfomers* models with INT4 optimizations on either servers or laptops. +You can use BigDL-LLM to run any PyTorch model with INT4 optimizations on Intel XPU (from Laptop to GPU to Cloud). Here, we provide examples to help you quickly get started using BigDL-LLM to run some popular open-source models in the community. Please refer to the appropriate guide based on your device: diff --git a/docs/readthedocs/source/doc/LLM/Overview/examples_cpu.md b/docs/readthedocs/source/doc/LLM/Overview/examples_cpu.md index 7fdb934b..462231fb 100644 --- a/docs/readthedocs/source/doc/LLM/Overview/examples_cpu.md +++ b/docs/readthedocs/source/doc/LLM/Overview/examples_cpu.md @@ -6,21 +6,59 @@ To run these examples, please first refer to [here](./install_cpu.html) for more The following models have been verified on either servers or laptops with Intel CPUs. -| Model | Example | -|-----------|----------------------------------------------------------| -| LLaMA *(such as Vicuna, Guanaco, Koala, Baize, WizardLM, etc.)* | [link1](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/transformers/native_int4), [link2](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/transformers/transformers_int4/vicuna) | -| LLaMA 2 | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/transformers/transformers_int4/llama2) | -| MPT | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/transformers/transformers_int4/mpt) | -| Falcon | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/transformers/transformers_int4/falcon) | -| ChatGLM | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/transformers/transformers_int4/chatglm) | -| ChatGLM2 | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/transformers/transformers_int4/chatglm2) | -| Qwen | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/transformers/transformers_int4/qwen) | -| MOSS | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/transformers/transformers_int4/moss) | -| Baichuan | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/transformers/transformers_int4/baichuan) | -| Dolly-v1 | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/transformers/transformers_int4/dolly_v1) | -| Dolly-v2 | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/transformers/transformers_int4/dolly_v2) | -| RedPajama | [link1](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/transformers/native_int4), [link2](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/transformers/transformers_int4/redpajama) | -| Phoenix | [link1](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/transformers/native_int4), [link2](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/transformers/transformers_int4/phoenix) | -| StarCoder | [link1](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/transformers/native_int4), [link2](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/transformers/transformers_int4/starcoder) | -| InternLM | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/transformers/transformers_int4/internlm) | -| Whisper | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/transformers/transformers_int4/whisper) | +## Example of PyTorch API + +| Model | Example of PyTorch API | +|------------|-------------------------------------------------------| +| LLaMA 2 | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/PyTorch-Models/Model/llama2) | +| ChatGLM | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/PyTorch-Models/Model/chatglm) | +| Mistral | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/PyTorch-Models/Model/mistral) | +| Bark | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/PyTorch-Models/Model/bark) | +| BERT | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/PyTorch-Models/Model/bert) | +| Openai Whisper | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/PyTorch-Models/Model/openai-whisper) | + +```eval_rst +.. important:: + + In addition to INT4 optimization, BigDL-LLM also provides other low bit optimizations (such as INT8, INT5, NF4, etc.). You may apply other low bit optimizations through PyTorch API as `example `_. +``` + + +## Example of `transformers`-style API + +| Model | Example of `transformers`-style API | +|------------|-------------------------------------------------------| +| LLaMA *(such as Vicuna, Guanaco, Koala, Baize, WizardLM, etc.)* | [link1](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/Native-Models), [link2](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/Model/vicuna) | +| LLaMA 2 | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/PyTorch-Models/Model/llama2) | [link1](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/Native-Models), [link2](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/Model/llama2) | +| ChatGLM | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/PyTorch-Models/Model/chatglm) | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/Model/chatglm) | +| ChatGLM2 | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/Model/chatglm2) | +| Mistral | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/Model/mistral) | +| Falcon | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/Model/falcon) | +| MPT | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/Model/mpt) | +| Dolly-v1 | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/Model/dolly_v1) | +| Dolly-v2 | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/Model/dolly_v2) | +| Replit Code| [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/Model/replit) | +| RedPajama | [link1](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/Native-Models), [link2](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/Model/redpajama) | +| Phoenix | [link1](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/Native-Models), [link2](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/Model/phoenix) | +| StarCoder | [link1](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/Native-Models), [link2](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/Model/starcoder) | +| Baichuan | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/Model/baichuan) | +| Baichuan2 | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/Model/baichuan2) | +| InternLM | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/Model/internlm) | +| Qwen | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/Model/qwen) | +| Aquila | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/Model/aquila) | +| MOSS | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/Model/moss) | +| Whisper | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/Model/whisper) | + +```eval_rst +.. important:: + + In addition to INT4 optimization, BigDL-LLM also provides other low bit optimizations (such as INT8, INT5, NF4, etc.). You may apply other low bit optimizations through ``transformers``-style API as `example `_. +``` + + +```eval_rst +.. seealso:: + + See the complete examples `here `_. +``` + diff --git a/docs/readthedocs/source/doc/LLM/Overview/examples_gpu.md b/docs/readthedocs/source/doc/LLM/Overview/examples_gpu.md index 48b83b59..b5504cbb 100644 --- a/docs/readthedocs/source/doc/LLM/Overview/examples_gpu.md +++ b/docs/readthedocs/source/doc/LLM/Overview/examples_gpu.md @@ -12,15 +12,59 @@ To run these examples, please first refer to [here](./install_gpu.html) for more The following models have been verified on either servers or laptops with Intel GPUs. -| Model | Example | -|-----------|----------------------------------------------------------| -| LLaMA 2 | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/gpu/llama2) | -| MPT | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/gpu/mpt) | -| Falcon | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/gpu/falcon) | -| ChatGLM2 | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/gpu/chatglm2) | -| Qwen | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/gpu/qwen) | -| Baichuan | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/gpu/baichuan) | -| StarCoder | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/gpu/starcoder) | -| InternLM | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/gpu/internlm) | -| Whisper | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/gpu/whisper) | -| GPT-J | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/gpu/gpt-j) | +## Example of PyTorch API + +| Model | Example of PyTorch API | +|------------|-------------------------------------------------------| +| LLaMA 2 | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/PyTorch-Models/Model/llama2) | +| ChatGLM 2 | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/PyTorch-Models/Model/chatglm2) | +| Mistral | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/PyTorch-Models/Model/mistral) | +| Baichuan | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/PyTorch-Models/Model/baichuan) | +| Baichuan2 | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/PyTorch-Models/Model/baichuan2) | +| Replit | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/PyTorch-Models/Model/replit) | +| StarCoder | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/PyTorch-Models/Model/starcoder) | +| Dolly-v1 | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/PyTorch-Models/Model/dolly-v1) | +| Dolly-v2 | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/PyTorch-Models/Model/dolly-v2) | + +```eval_rst +.. important:: + + In addition to INT4 optimization, BigDL-LLM also provides other low bit optimizations (such as INT8, INT5, NF4, etc.). You may apply other low bit optimizations through PyTorch API as `example `_. +``` + + +## Example of `transformers`-style API + +| Model | Example of `transformers`-style API | +|------------|-------------------------------------------------------| +| LLaMA *(such as Vicuna, Guanaco, Koala, Baize, WizardLM, etc.)* |[link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels/Model/vicuna)| +| LLaMA 2 | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels/Model/llama2) | +| ChatGLM2 | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels/Model/chatglm2) | +| Mistral | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels/Model/mistral) | +| Falcon | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels/Model/falcon) | +| MPT | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/Model/mpt) | +| Dolly-v1 | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/Model/dolly_v1) | +| Dolly-v2 | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/Model/dolly_v2) | +| Replit | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/Model/replit) | +| StarCoder | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels/Model/starcoder) | +| Baichuan | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/Model/baichuan) | +| Baichuan2 | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels/Model/baichuan2) | +| InternLM | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels/Model/internlm) | +| Qwen | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels/Model/qwen) | +| Aquila | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels/Model/aquila) | +| Whisper | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels/Model/whisper) | +| Chinese Llama2 | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels/Model/chinese-llama2) | +| GPT-J | [link](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels/Model/gpt-j) | + +```eval_rst +.. important:: + + In addition to INT4 optimization, BigDL-LLM also provides other low bit optimizations (such as INT8, INT5, NF4, etc.). You may apply other low bit optimizations through ``transformers``-style API as `example `_. +``` + + +```eval_rst +.. seealso:: + + See the complete examples `here `_. +```