diff --git a/python/llm/example/transformers/transformers_int4/GPU/README.md b/python/llm/example/transformers/transformers_int4/GPU/README.md index adf52685..d5c4dff3 100644 --- a/python/llm/example/transformers/transformers_int4/GPU/README.md +++ b/python/llm/example/transformers/transformers_int4/GPU/README.md @@ -1,12 +1,17 @@ -# BigDL-LLM Transformers INT4 Optimization for Large Language Model on Intel® Arc™ A-Series Graphics -You can use BigDL-LLM to run almost every Huggingface Transformer models with INT4 optimizations on your laptops with Intel® Arc™ A-Series Graphics. This directory contains example scripts to help you quickly get started using BigDL-LLM to run some popular open-source models in the community. Each model has its own dedicated folder, where you can find detailed instructions on how to install and run it. +# BigDL-LLM Transformers INT4 Optimization for Large Language Model on Intel GPUs +You can use BigDL-LLM to run almost every Huggingface Transformer models with INT4 optimizations on your laptops with Intel GPUs. This directory contains example scripts to help you quickly get started using BigDL-LLM to run some popular open-source models in the community. Each model has its own dedicated folder, where you can find detailed instructions on how to install and run it. + +## Verified Hardware Platforms + +- Intel Arc™ A-Series Graphics +- Intel Data Center GPU Flex Series ## Recommended Requirements -To apply Intel® Arc™ A-Series Graphics acceleration, there’re several steps for tools installation and environment preparation. +To apply Intel GPU acceleration, there’re several steps for tools installation and environment preparation. Step 1, only Linux system is supported now, Ubuntu 22.04 is prefered. -Step 2, please refer to our [drive installation](https://dgpu-docs.intel.com/installation-guides/index.html#intel-arc-gpus) for general purpose GPU capabilities. +Step 2, please refer to our [drive installation](https://dgpu-docs.intel.com/driver/installation.html) for general purpose GPU capabilities. Step 3, you also need to download and install [Intel® oneAPI Base Toolkit](https://www.intel.com/content/www/us/en/developer/tools/oneapi/base-toolkit-download.html). OneMKL and DPC++ compiler are needed, others are optional. > **Note**: IPEX 2.0.110+xpu requires Intel® oneAPI Base Toolkit's version >= 2023.2.0. diff --git a/python/llm/example/transformers/transformers_int4/GPU/baichuan/README.md b/python/llm/example/transformers/transformers_int4/GPU/baichuan/README.md index e8ff34cd..e264fdb7 100644 --- a/python/llm/example/transformers/transformers_int4/GPU/baichuan/README.md +++ b/python/llm/example/transformers/transformers_int4/GPU/baichuan/README.md @@ -1,11 +1,11 @@ # Baichuan -In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on Baichuan models on any Intel® Arc™ A-Series Graphics. For illustration purposes, we utilize the [baichuan-inc/Baichuan-13B-Chat](https://huggingface.co/baichuan-inc/Baichuan-13B-Chat) as a reference Baichuan model. +In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on Baichuan models on [Intel GPUs](../README.md). For illustration purposes, we utilize the [baichuan-inc/Baichuan-13B-Chat](https://huggingface.co/baichuan-inc/Baichuan-13B-Chat) as a reference Baichuan model. ## 0. Requirements -To run these examples with BigDL-LLM on Intel® Arc™ A-Series Graphics, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information. +To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information. ## Example: Predict Tokens using `generate()` API -In the example [generate.py](./generate.py), we show a basic use case for a Baichuan model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations on Intel® Arc™ A-Series Graphics. +In the example [generate.py](./generate.py), we show a basic use case for a Baichuan model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations on Intel GPUs. ### 1. Install We suggest using conda to manage environment: ```bash diff --git a/python/llm/example/transformers/transformers_int4/GPU/chatglm2/README.md b/python/llm/example/transformers/transformers_int4/GPU/chatglm2/README.md index 55bf6b17..5fbc72db 100644 --- a/python/llm/example/transformers/transformers_int4/GPU/chatglm2/README.md +++ b/python/llm/example/transformers/transformers_int4/GPU/chatglm2/README.md @@ -1,12 +1,12 @@ # ChatGLM2 -In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on ChatGLM2 models on any Intel® Arc™ A-Series Graphics. For illustration purposes, we utilize the [THUDM/chatglm2-6b](https://huggingface.co/THUDM/chatglm2-6b) as a reference ChatGLM2 model. +In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on ChatGLM2 models on [Intel GPUs](../README.md). For illustration purposes, we utilize the [THUDM/chatglm2-6b](https://huggingface.co/THUDM/chatglm2-6b) as a reference ChatGLM2 model. ## 0. Requirements -To run these examples with BigDL-LLM on Intel® Arc™ A-Series Graphics, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information. +To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information. ## Example 1: Predict Tokens using `generate()` API -In the example [generate.py](./generate.py), we show a basic use case for a ChatGLM2 model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations on Intel® Arc™ A-Series Graphics. +In the example [generate.py](./generate.py), we show a basic use case for a ChatGLM2 model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations on Intel GPUs. ### 1. Install We suggest using conda to manage environment: ```bash diff --git a/python/llm/example/transformers/transformers_int4/GPU/falcon/README.md b/python/llm/example/transformers/transformers_int4/GPU/falcon/README.md index e9b29b47..695cd170 100644 --- a/python/llm/example/transformers/transformers_int4/GPU/falcon/README.md +++ b/python/llm/example/transformers/transformers_int4/GPU/falcon/README.md @@ -1,12 +1,12 @@ # Falcon -In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on Falcon models on any Intel® Arc™ A-Series Graphics. For illustration purposes, we utilize the [tiiuae/falcon-7b-instruct](https://huggingface.co/tiiuae/falcon-7b-instruct) as a reference Falcon model. +In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on Falcon models on [Intel GPUs](../README.md). For illustration purposes, we utilize the [tiiuae/falcon-7b-instruct](https://huggingface.co/tiiuae/falcon-7b-instruct) as a reference Falcon model. ## 0. Requirements -To run these examples with BigDL-LLM on Intel® Arc™ A-Series Graphics, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information. +To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information. ## Example: Predict Tokens using `generate()` API -In the example [generate.py](./generate.py), we show a basic use case for a Falcon model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations on Intel® Arc™ A-Series Graphics. +In the example [generate.py](./generate.py), we show a basic use case for a Falcon model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations on Intel GPUs. ### 1. Install We suggest using conda to manage environment: ```bash diff --git a/python/llm/example/transformers/transformers_int4/GPU/internlm/README.md b/python/llm/example/transformers/transformers_int4/GPU/internlm/README.md index f00a82bb..9e4cd903 100644 --- a/python/llm/example/transformers/transformers_int4/GPU/internlm/README.md +++ b/python/llm/example/transformers/transformers_int4/GPU/internlm/README.md @@ -1,11 +1,11 @@ # InternLM -In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on InternLM models on any Intel® Arc™ A-Series Graphics. For illustration purposes, we utilize the [internlm/internlm-chat-7b-8k](https://huggingface.co/internlm/internlm-chat-7b-8k) as a reference InternLM model. +In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on InternLM models on [Intel GPUs](../README.md). For illustration purposes, we utilize the [internlm/internlm-chat-7b-8k](https://huggingface.co/internlm/internlm-chat-7b-8k) as a reference InternLM model. ## 0. Requirements -To run these examples with BigDL-LLM on Intel® Arc™ A-Series Graphics, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information. +To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information. ## Example: Predict Tokens using `generate()` API -In the example [generate.py](./generate.py), we show a basic use case for a InternLM model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations on Intel® Arc™ A-Series Graphics. +In the example [generate.py](./generate.py), we show a basic use case for a InternLM model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations on Intel GPUs. ### 1. Install We suggest using conda to manage environment: ```bash diff --git a/python/llm/example/transformers/transformers_int4/GPU/llama2/README.md b/python/llm/example/transformers/transformers_int4/GPU/llama2/README.md index 547d5918..6d7bef2e 100644 --- a/python/llm/example/transformers/transformers_int4/GPU/llama2/README.md +++ b/python/llm/example/transformers/transformers_int4/GPU/llama2/README.md @@ -1,11 +1,11 @@ # Llama2 -In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on Llama2 models on any Intel® Arc™ A-Series Graphics. For illustration purposes, we utilize the [meta-llama/Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf) and [meta-llama/Llama-2-13b-chat-hf](https://huggingface.co/meta-llama/Llama-2-13b-chat-hf) as reference Llama2 models. +In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on Llama2 models on [Intel GPUs](../README.md). For illustration purposes, we utilize the [meta-llama/Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf) and [meta-llama/Llama-2-13b-chat-hf](https://huggingface.co/meta-llama/Llama-2-13b-chat-hf) as reference Llama2 models. ## 0. Requirements -To run these examples with BigDL-LLM on Intel® Arc™ A-Series Graphics, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information. +To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information. ## Example: Predict Tokens using `generate()` API -In the example [generate.py](./generate.py), we show a basic use case for a Llama2 model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations on Intel® Arc™ A-Series Graphics. +In the example [generate.py](./generate.py), we show a basic use case for a Llama2 model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations on Intel GPUs. ### 1. Install We suggest using conda to manage environment: ```bash diff --git a/python/llm/example/transformers/transformers_int4/GPU/mpt/README.md b/python/llm/example/transformers/transformers_int4/GPU/mpt/README.md index 560c310f..c7594390 100644 --- a/python/llm/example/transformers/transformers_int4/GPU/mpt/README.md +++ b/python/llm/example/transformers/transformers_int4/GPU/mpt/README.md @@ -1,11 +1,11 @@ # MPT -In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on Llama2 models on any Intel® Arc™ A-Series Graphics. For illustration purposes, we utilize the [mosaicml/mpt-7b-chat](https://huggingface.co/mosaicml/mpt-7b-chat) as a reference MPT model. +In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on Llama2 models on [Intel GPUs](../README.md). For illustration purposes, we utilize the [mosaicml/mpt-7b-chat](https://huggingface.co/mosaicml/mpt-7b-chat) as a reference MPT model. ## 0. Requirements -To run these examples with BigDL-LLM on Intel® Arc™ A-Series Graphics, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information. +To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information. ## Example: Predict Tokens using `generate()` API -In the example [generate.py](./generate.py), we show a basic use case for an MPT model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations on Intel® Arc™ A-Series Graphics. +In the example [generate.py](./generate.py), we show a basic use case for an MPT model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations on Intel GPUs. ### 1. Install We suggest using conda to manage environment: ```bash diff --git a/python/llm/example/transformers/transformers_int4/GPU/qwen/README.md b/python/llm/example/transformers/transformers_int4/GPU/qwen/README.md index 21712bc9..b43617cd 100644 --- a/python/llm/example/transformers/transformers_int4/GPU/qwen/README.md +++ b/python/llm/example/transformers/transformers_int4/GPU/qwen/README.md @@ -1,11 +1,11 @@ # Qwen -In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on Qwen models on any Intel® Arc™ A-Series Graphics. For illustration purposes, we utilize the [Qwen-7B-Chat](https://huggingface.co/Qwen/Qwen-7B-Chat) as a reference Qwen model. +In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on Qwen models on [Intel GPUs](../README.md). For illustration purposes, we utilize the [Qwen-7B-Chat](https://huggingface.co/Qwen/Qwen-7B-Chat) as a reference Qwen model. ## 0. Requirements -To run these examples with BigDL-LLM on Intel® Arc™ A-Series Graphics, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information. +To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information. ## Example: Predict Tokens using `generate()` API -In the example [generate.py](./generate.py), we show a basic use case for a Qwen model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations on Intel® Arc™ A-Series Graphics. +In the example [generate.py](./generate.py), we show a basic use case for a Qwen model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations on Intel GPUs. ### 1. Install We suggest using conda to manage environment: ```bash diff --git a/python/llm/example/transformers/transformers_int4/GPU/starcoder/readme.md b/python/llm/example/transformers/transformers_int4/GPU/starcoder/readme.md index b733f19d..ad8000cc 100644 --- a/python/llm/example/transformers/transformers_int4/GPU/starcoder/readme.md +++ b/python/llm/example/transformers/transformers_int4/GPU/starcoder/readme.md @@ -1,11 +1,11 @@ # StarCoder -In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on StarCoder models on any Intel® Arc™ A-Series Graphics. For illustration purposes, we utilize the [bigcode/starcoder](https://huggingface.co/bigcode/starcoder) as a reference StarCoder model. +In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on StarCoder models on [Intel GPUs](../README.md). For illustration purposes, we utilize the [bigcode/starcoder](https://huggingface.co/bigcode/starcoder) as a reference StarCoder model. ## 0. Requirements -To run these examples with BigDL-LLM on Intel® Arc™ A-Series Graphics, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information. +To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information. ## Example: Predict Tokens using `generate()` API -In the example [generate.py](./generate.py), we show a basic use case for an StarCoder model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations on Intel® Arc™ A-Series Graphics. +In the example [generate.py](./generate.py), we show a basic use case for an StarCoder model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations on Intel GPUs. ### 1. Install We suggest using conda to manage environment: ```bash diff --git a/python/llm/example/transformers/transformers_int4/GPU/voiceassistant/README.md b/python/llm/example/transformers/transformers_int4/GPU/voiceassistant/README.md index ab5943fb..e5e6c015 100644 --- a/python/llm/example/transformers/transformers_int4/GPU/voiceassistant/README.md +++ b/python/llm/example/transformers/transformers_int4/GPU/voiceassistant/README.md @@ -1,13 +1,13 @@ # Voice Assistant -In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on Whisper and Llama2 models on any Intel® Arc™ A-Series Graphics. For illustration purposes, we utilize the following models: +In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on Whisper and Llama2 models on [Intel GPUs](../README.md). For illustration purposes, we utilize the following models: - [openai/whisper-small](https://huggingface.co/openai/whisper-small) and [openai/whisper-medium](https://huggingface.co/openai/whisper-medium) as reference whisper models. - [meta-llama/Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf) and [meta-llama/Llama-2-13b-chat-hf](https://huggingface.co/meta-llama/Llama-2-13b-chat-hf) as reference Llama2 models. ## 0. Requirements -To run these examples with BigDL-LLM on Intel® Arc™ A-Series Graphics, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information. +To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information. ## Example: Predict Tokens using `generate()` API -In the example [generate.py](./generate.py), we show a basic use case for a Whisper model to conduct transcription using `generate()` API, then use the recoginzed text as the input for Llama2 model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations on Intel® Arc™ A-Series Graphics. +In the example [generate.py](./generate.py), we show a basic use case for a Whisper model to conduct transcription using `generate()` API, then use the recoginzed text as the input for Llama2 model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations on Intel GPUs. ### 1. Install We suggest using conda to manage environment: ```bash diff --git a/python/llm/example/transformers/transformers_int4/GPU/whisper/readme.md b/python/llm/example/transformers/transformers_int4/GPU/whisper/readme.md index de0733c7..901a5b9e 100644 --- a/python/llm/example/transformers/transformers_int4/GPU/whisper/readme.md +++ b/python/llm/example/transformers/transformers_int4/GPU/whisper/readme.md @@ -1,12 +1,12 @@ # Whisper -In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on Whisper models on any Intel® Arc™ A-Series Graphics. For illustration purposes, we utilize the [openai/whisper-tiny](https://huggingface.co/openai/whisper-tiny) as a reference Whisper model. +In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on Whisper models on [Intel GPUs](../README.md). For illustration purposes, we utilize the [openai/whisper-tiny](https://huggingface.co/openai/whisper-tiny) as a reference Whisper model. ## 0. Requirements -To run these examples with BigDL-LLM on Intel® Arc™ A-Series Graphics, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information. +To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information. ## Example: Recognize Tokens using `generate()` API -In the example [recognize.py](./recognize.py), we show a basic use case for a Whisper model to conduct transcription using `generate()` API, with BigDL-LLM INT4 optimizations on Intel® Arc™ A-Series Graphics. +In the example [recognize.py](./recognize.py), we show a basic use case for a Whisper model to conduct transcription using `generate()` API, with BigDL-LLM INT4 optimizations on Intel GPUs. ### 1. Install We suggest using conda to manage environment: ```bash