LLM GPU Example Update for Windows Support (#9902)
* Update README in LLM GPU Examples * Update reference of Intel GPU * add cpu_embedding=True in comment * small fixes * update GPU/README.md and add explanation for cpu_embedding=True * address comments * fix small typos * add backtick for cpu_embedding=True * remove extra backtick in the doc * add period mark * update readme
This commit is contained in:
parent
e0db44dcb6
commit
bc9cff51a8
57 changed files with 231 additions and 129 deletions
|
|
@ -74,7 +74,7 @@ You could choose to use [PyTorch API](./optimize_model.html) or [`transformers`-
|
|||
|
||||
.. tip::
|
||||
|
||||
When running LLMs on Intel iGPUs for Windows users, we recommend setting ``cpu_embedding=True``` in the ``from_pretrained`` function. This will allow the memory-intensive embedding layer to utilize the CPU instead of iGPU.
|
||||
When running LLMs on Intel iGPUs for Windows users, we recommend setting ``cpu_embedding=True`` in the ``from_pretrained`` function. This will allow the memory-intensive embedding layer to utilize the CPU instead of iGPU.
|
||||
|
||||
See the `API doc <../../../PythonAPI/LLM/transformers.html#hugging-face-transformers-automodel>`_ to find more information.
|
||||
|
||||
|
|
|
|||
|
|
@ -1,26 +1,5 @@
|
|||
# BigDL-LLM Transformers INT4 Optimization for Large Language Model on Intel GPUs
|
||||
You can use BigDL-LLM to run almost every Huggingface Transformer models with INT4 optimizations on your laptops with Intel GPUs. This directory contains example scripts to help you quickly get started using BigDL-LLM to run some popular open-source models in the community. Each model has its own dedicated folder, where you can find detailed instructions on how to install and run it.
|
||||
|
||||
## Verified Hardware Platforms
|
||||
|
||||
- Intel Arc™ A-Series Graphics
|
||||
- Intel Data Center GPU Flex Series
|
||||
- Intel Data Center GPU Max Series
|
||||
|
||||
## Recommended Requirements
|
||||
To apply Intel GPU acceleration, there’re several steps for tools installation and environment preparation. See the [GPU installation guide](https://bigdl.readthedocs.io/en/latest/doc/LLM/Overview/install_gpu.html) for mode details.
|
||||
|
||||
Step 1, only Linux system is supported now, Ubuntu 22.04 is prefered.
|
||||
|
||||
Step 2, please refer to our [driver installation](https://dgpu-docs.intel.com/driver/installation.html) for general purpose GPU capabilities.
|
||||
> **Note**: IPEX 2.0.110+xpu requires Intel GPU Driver version is [Stable 647.21](https://dgpu-docs.intel.com/releases/stable_647_21_20230714.html).
|
||||
|
||||
Step 3, you also need to download and install [Intel® oneAPI Base Toolkit](https://www.intel.com/content/www/us/en/developer/tools/oneapi/base-toolkit-download.html). OneMKL and DPC++ compiler are needed, others are optional.
|
||||
> **Note**: IPEX 2.0.110+xpu requires Intel® oneAPI Base Toolkit's version == 2023.2.0.
|
||||
|
||||
## Best Known Configuration on Linux
|
||||
For better performance, it is recommended to set environment variables on Linux:
|
||||
```bash
|
||||
export USE_XETLA=OFF
|
||||
export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
|
||||
```
|
||||
|
|
|
|||
|
|
@ -7,7 +7,7 @@ In this directory, you will find examples on how you could apply BigDL-LLM INT4
|
|||
> BigDL-LLM optimizes the *Transformers* model in INT4 precision at runtime, and thus no explicit conversion is needed.
|
||||
|
||||
## Requirements
|
||||
To run these examples with BigDL-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
|
||||
To run these examples with BigDL-LLM, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
|
||||
|
||||
## Example: Predict Tokens using `generate()` API
|
||||
In the example [generate.py](./generate.py), we show a basic use case for a Aquila model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations.
|
||||
|
|
|
|||
|
|
@ -7,7 +7,7 @@ In this directory, you will find examples on how you could apply BigDL-LLM INT4
|
|||
> BigDL-LLM optimizes the *Transformers* model in INT4 precision at runtime, and thus no explicit conversion is needed.
|
||||
|
||||
## Requirements
|
||||
To run these examples with BigDL-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
|
||||
To run these examples with BigDL-LLM, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
|
||||
|
||||
## Example: Predict Tokens using `generate()` API
|
||||
In the example [generate.py](./generate.py), we show a basic use case for a Aquila2 model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations.
|
||||
|
|
|
|||
|
|
@ -1,8 +1,8 @@
|
|||
# Baichuan
|
||||
In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on Baichuan models on [Intel GPUs](../README.md). For illustration purposes, we utilize the [baichuan-inc/Baichuan-13B-Chat](https://huggingface.co/baichuan-inc/Baichuan-13B-Chat) as a reference Baichuan model.
|
||||
In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on Baichuan models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [baichuan-inc/Baichuan-13B-Chat](https://huggingface.co/baichuan-inc/Baichuan-13B-Chat) as a reference Baichuan model.
|
||||
|
||||
## 0. Requirements
|
||||
To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
|
||||
To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
|
||||
|
||||
## Example: Predict Tokens using `generate()` API
|
||||
In the example [generate.py](./generate.py), we show a basic use case for a Baichuan model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations on Intel GPUs.
|
||||
|
|
|
|||
|
|
@ -1,8 +1,8 @@
|
|||
# Baichuan
|
||||
In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on Baichuan2 models on [Intel GPUs](../README.md). For illustration purposes, we utilize the [baichuan-inc/Baichuan2-7B-Chat](https://huggingface.co/baichuan-inc/Baichuan-7B-Chat) as a reference Baichuan model.
|
||||
In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on Baichuan2 models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [baichuan-inc/Baichuan2-7B-Chat](https://huggingface.co/baichuan-inc/Baichuan-7B-Chat) as a reference Baichuan model.
|
||||
|
||||
## 0. Requirements
|
||||
To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
|
||||
To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
|
||||
|
||||
## Example: Predict Tokens using `generate()` API
|
||||
In the example [generate.py](./generate.py), we show a basic use case for a Baichuan model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations on Intel GPUs.
|
||||
|
|
|
|||
|
|
@ -1,8 +1,8 @@
|
|||
# BlueLM
|
||||
In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on BlueLM models on [Intel GPUs](../README.md). For illustration purposes, we utilize the [vivo-ai/BlueLM-7B-Chat](https://huggingface.co/vivo-ai/BlueLM-7B-Chat) as a reference BlueLM model.
|
||||
In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on BlueLM models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [vivo-ai/BlueLM-7B-Chat](https://huggingface.co/vivo-ai/BlueLM-7B-Chat) as a reference BlueLM model.
|
||||
|
||||
## 0. Requirements
|
||||
To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
|
||||
To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
|
||||
|
||||
## Example: Predict Tokens using `generate()` API
|
||||
In the example [generate.py](./generate.py), we show a basic use case for a BlueLM model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations on Intel GPUs.
|
||||
|
|
|
|||
|
|
@ -1,13 +1,15 @@
|
|||
# ChatGLM2
|
||||
|
||||
In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on ChatGLM2 models on [Intel GPUs](../README.md). For illustration purposes, we utilize the [THUDM/chatglm2-6b](https://huggingface.co/THUDM/chatglm2-6b) as a reference ChatGLM2 model.
|
||||
In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on ChatGLM2 models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [THUDM/chatglm2-6b](https://huggingface.co/THUDM/chatglm2-6b) as a reference ChatGLM2 model.
|
||||
|
||||
## 0. Requirements
|
||||
To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
|
||||
To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
|
||||
|
||||
## Example 1: Predict Tokens using `generate()` API
|
||||
In the example [generate.py](./generate.py), we show a basic use case for a ChatGLM2 model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations on Intel GPUs.
|
||||
|
||||
### 1. Install
|
||||
#### 1.1 Installation on Linux
|
||||
We suggest using conda to manage environment:
|
||||
```bash
|
||||
conda create -n llm python=3.9
|
||||
|
|
@ -15,21 +17,85 @@ conda activate llm
|
|||
# below command will install intel_extension_for_pytorch==2.1.10+xpu as default
|
||||
pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
|
||||
```
|
||||
#### 1.2 Installation on Windows
|
||||
We suggest using conda to manage environment:
|
||||
```bash
|
||||
conda create -n llm python=3.9 libuv
|
||||
conda activate llm
|
||||
# below command will install intel_extension_for_pytorch==2.1.10+xpu as default
|
||||
pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
|
||||
```
|
||||
|
||||
### 2. Configures OneAPI environment variables
|
||||
#### 2.1 Configurations for Linux
|
||||
```bash
|
||||
source /opt/intel/oneapi/setvars.sh
|
||||
```
|
||||
#### 2.2 Configurations for Windows
|
||||
```cmd
|
||||
call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
|
||||
```
|
||||
> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
|
||||
|
||||
### 3. Run
|
||||
### 3. Runtime Configurations
|
||||
For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
|
||||
#### 3.1 Configurations for Linux
|
||||
<details>
|
||||
|
||||
For optimal performance on Arc, it is recommended to set several environment variables.
|
||||
<summary>For Intel Arc™ A-Series Graphics and Intel Data Center GPU Flex Series</summary>
|
||||
|
||||
```bash
|
||||
export USE_XETLA=OFF
|
||||
export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
|
||||
```
|
||||
|
||||
</details>
|
||||
|
||||
<details>
|
||||
|
||||
<summary>For Intel Data Center GPU Max Series</summary>
|
||||
|
||||
```bash
|
||||
export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
|
||||
export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
|
||||
export ENABLE_SDP_FUSION=1
|
||||
```
|
||||
> Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
|
||||
</details>
|
||||
|
||||
#### 3.2 Configurations for Windows
|
||||
<details>
|
||||
|
||||
<summary>For Intel iGPU</summary>
|
||||
|
||||
```cmd
|
||||
set SYCL_CACHE_PERSISTENT=1
|
||||
set BIGDL_LLM_XMX_DISABLED=1
|
||||
```
|
||||
|
||||
</details>
|
||||
|
||||
<details>
|
||||
|
||||
<summary>For Intel Arc™ A300-Series or Pro A60</summary>
|
||||
|
||||
```cmd
|
||||
set SYCL_CACHE_PERSISTENT=1
|
||||
```
|
||||
|
||||
</details>
|
||||
|
||||
<details>
|
||||
|
||||
<summary>For other Intel dGPU Series</summary>
|
||||
|
||||
There is no need to set further environment variables.
|
||||
|
||||
</details>
|
||||
|
||||
> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
|
||||
|
||||
### 4. Running examples
|
||||
```
|
||||
python ./generate.py --repo-id-or-model-path REPO_ID_OR_MODEL_PATH --prompt PROMPT --n-predict N_PREDICT
|
||||
```
|
||||
|
|
@ -68,6 +134,7 @@ Inference time: xxxx s
|
|||
## Example 2: Stream Chat using `stream_chat()` API
|
||||
In the example [streamchat.py](./streamchat.py), we show a basic use case for a ChatGLM2 model to stream chat, with BigDL-LLM INT4 optimizations.
|
||||
### 1. Install
|
||||
#### 1.1 Installation on Linux
|
||||
We suggest using conda to manage environment:
|
||||
```bash
|
||||
conda create -n llm python=3.9
|
||||
|
|
@ -75,21 +142,85 @@ conda activate llm
|
|||
# below command will install intel_extension_for_pytorch==2.1.10+xpu as default
|
||||
pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
|
||||
```
|
||||
#### 1.2 Installation on Windows
|
||||
We suggest using conda to manage environment:
|
||||
```bash
|
||||
conda create -n llm python=3.9 libuv
|
||||
conda activate llm
|
||||
# below command will install intel_extension_for_pytorch==2.1.10+xpu as default
|
||||
pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
|
||||
```
|
||||
|
||||
### 2. Configures OneAPI environment variables
|
||||
#### 2.1 Configurations for Linux
|
||||
```bash
|
||||
source /opt/intel/oneapi/setvars.sh
|
||||
```
|
||||
#### 2.2 Configurations for Windows
|
||||
```cmd
|
||||
call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
|
||||
```
|
||||
> Note: Please make sure you are using **CMD** (**Anaconda Prompt** if using conda) to run the command as PowerShell is not supported.
|
||||
|
||||
### 3. Run
|
||||
### 3. Runtime Configurations
|
||||
For optimal performance, it is recommended to set several environment variables. Please check out the suggestions based on your device.
|
||||
#### 3.1 Configurations for Linux
|
||||
<details>
|
||||
|
||||
For optimal performance on Arc, it is recommended to set several environment variables.
|
||||
<summary>For Intel Arc™ A-Series Graphics and Intel Data Center GPU Flex Series</summary>
|
||||
|
||||
```bash
|
||||
export USE_XETLA=OFF
|
||||
export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
|
||||
```
|
||||
|
||||
</details>
|
||||
|
||||
<details>
|
||||
|
||||
<summary>For Intel Data Center GPU Max Series</summary>
|
||||
|
||||
```bash
|
||||
export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
|
||||
export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
|
||||
export ENABLE_SDP_FUSION=1
|
||||
```
|
||||
> Note: Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`.
|
||||
</details>
|
||||
|
||||
#### 3.2 Configurations for Windows
|
||||
<details>
|
||||
|
||||
<summary>For Intel iGPU</summary>
|
||||
|
||||
```cmd
|
||||
set SYCL_CACHE_PERSISTENT=1
|
||||
set BIGDL_LLM_XMX_DISABLED=1
|
||||
```
|
||||
|
||||
</details>
|
||||
|
||||
<details>
|
||||
|
||||
<summary>For Intel Arc™ A300-Series or Pro A60</summary>
|
||||
|
||||
```cmd
|
||||
set SYCL_CACHE_PERSISTENT=1
|
||||
```
|
||||
|
||||
</details>
|
||||
|
||||
<details>
|
||||
|
||||
<summary>For other Intel dGPU Series</summary>
|
||||
|
||||
There is no need to set further environment variables.
|
||||
|
||||
</details>
|
||||
|
||||
> Note: For the first time that each model runs on Intel iGPU/Intel Arc™ A300-Series or Pro A60, it may take several minutes to compile.
|
||||
|
||||
### 4. Running examples
|
||||
**Stream Chat using `stream_chat()` API**:
|
||||
```
|
||||
python ./streamchat.py --repo-id-or-model-path REPO_ID_OR_MODEL_PATH --question QUESTION
|
||||
|
|
|
|||
|
|
@ -41,6 +41,8 @@ if __name__ == '__main__':
|
|||
|
||||
# Load model in 4 bit,
|
||||
# which convert the relevant layers in the model into INT4 format
|
||||
# When running LLMs on Intel iGPUs for Windows users, we recommend setting `cpu_embedding=True` in the from_pretrained function.
|
||||
# This will allow the memory-intensive embedding layer to utilize the CPU instead of iGPU.
|
||||
model = AutoModel.from_pretrained(model_path,
|
||||
load_in_4bit=True,
|
||||
optimize_model=True,
|
||||
|
|
|
|||
|
|
@ -39,6 +39,8 @@ if __name__ == '__main__':
|
|||
|
||||
# Load model in 4 bit,
|
||||
# which convert the relevant layers in the model into INT4 format
|
||||
# When running LLMs on Intel iGPUs for Windows users, we recommend setting `cpu_embedding=True` in the from_pretrained function.
|
||||
# This will allow the memory-intensive embedding layer to utilize the CPU instead of iGPU.
|
||||
model = AutoModel.from_pretrained(model_path,
|
||||
load_in_4bit=True,
|
||||
trust_remote_code=True,
|
||||
|
|
|
|||
|
|
@ -1,9 +1,9 @@
|
|||
# ChatGLM3
|
||||
|
||||
In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on ChatGLM3 models on [Intel GPUs](../README.md). For illustration purposes, we utilize the [THUDM/chatglm3-6b](https://huggingface.co/THUDM/chatglm3-6b) as a reference ChatGLM3 model.
|
||||
In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on ChatGLM3 models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [THUDM/chatglm3-6b](https://huggingface.co/THUDM/chatglm3-6b) as a reference ChatGLM3 model.
|
||||
|
||||
## 0. Requirements
|
||||
To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
|
||||
To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
|
||||
|
||||
## Example 1: Predict Tokens using `generate()` API
|
||||
In the example [generate.py](./generate.py), we show a basic use case for a ChatGLM3 model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations on Intel GPUs.
|
||||
|
|
|
|||
|
|
@ -1,8 +1,8 @@
|
|||
# Chinese Llama2
|
||||
In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on Chinese LLaMA models on [Intel GPUs](../README.md). For illustration purposes, we utilize the [LinkSoul/Chinese-Llama-2-7b](https://huggingface.co/LinkSoul/Chinese-Llama-2-7b) as reference Chinese LLaMA models.
|
||||
In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on Chinese LLaMA models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [LinkSoul/Chinese-Llama-2-7b](https://huggingface.co/LinkSoul/Chinese-Llama-2-7b) as reference Chinese LLaMA models.
|
||||
|
||||
## 0. Requirements
|
||||
To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
|
||||
To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
|
||||
|
||||
## Example: Predict Tokens using `generate()` API
|
||||
In the example [generate.py](./generate.py), we show a basic use case for a Llama2 model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations on Intel GPUs.
|
||||
|
|
|
|||
|
|
@ -1,8 +1,8 @@
|
|||
# CodeLlama
|
||||
In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on CodeLlama models on [Intel GPUs](../README.md). For illustration purposes, we utilize the [codellama/CodeLlama-7b-hf](https://huggingface.co/codellama/CodeLlama-7b-hf) as a reference CodeLlama model.
|
||||
In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on CodeLlama models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [codellama/CodeLlama-7b-hf](https://huggingface.co/codellama/CodeLlama-7b-hf) as a reference CodeLlama model.
|
||||
|
||||
## 0. Requirements
|
||||
To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
|
||||
To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
|
||||
|
||||
## Example: Predict Tokens using `generate()` API
|
||||
In the example [generate.py](./generate.py), we show a basic use case for an CodeLlama model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations on Intel GPUs.
|
||||
|
|
|
|||
|
|
@ -1,9 +1,9 @@
|
|||
# Distil-Whisper
|
||||
|
||||
In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on Distil-Whisper models on [Intel GPUs](../README.md). For illustration purposes, we utilize the [distil-whisper/distil-large-v2](https://huggingface.co/distil-whisper/distil-large-v2) as a reference Distil-Whisper model.
|
||||
In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on Distil-Whisper models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [distil-whisper/distil-large-v2](https://huggingface.co/distil-whisper/distil-large-v2) as a reference Distil-Whisper model.
|
||||
|
||||
## 0. Requirements
|
||||
To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
|
||||
To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
|
||||
|
||||
## Example: Recognize Tokens using `generate()` API
|
||||
In the example [recognize.py](./recognize.py), we show a basic use case for a Distil-Whisper model to conduct transcription using `pipeline()` API for long audio input, with BigDL-LLM INT4 optimizations.
|
||||
|
|
|
|||
|
|
@ -2,7 +2,7 @@
|
|||
In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on Dolly v1 models. For illustration purposes, we utilize the [databricks/dolly-v1-6b](https://huggingface.co/databricks/dolly-v1-6b) as a reference Dolly v1 model.
|
||||
|
||||
## 0. Requirements
|
||||
To run these examples with BigDL-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
|
||||
To run these examples with BigDL-LLM, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
|
||||
|
||||
## Example: Predict Tokens using `generate()` API
|
||||
In the example [generate.py](./generate.py), we show a basic use case for a Dolly v1 model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations.
|
||||
|
|
|
|||
|
|
@ -2,7 +2,7 @@
|
|||
In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on Dolly v2 models. For illustration purposes, we utilize the [databricks/dolly-v2-12b](https://huggingface.co/databricks/dolly-v2-12b) as a reference Dolly v2 model.
|
||||
|
||||
## 0. Requirements
|
||||
To run these examples with BigDL-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
|
||||
To run these examples with BigDL-LLM, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
|
||||
|
||||
## Example: Predict Tokens using `generate()` API
|
||||
In the example [generate.py](./generate.py), we show a basic use case for a Dolly v2 model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations.
|
||||
|
|
|
|||
|
|
@ -1,9 +1,9 @@
|
|||
# Falcon
|
||||
|
||||
In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on Falcon models on [Intel GPUs](../README.md). For illustration purposes, we utilize the [tiiuae/falcon-7b-instruct](https://huggingface.co/tiiuae/falcon-7b-instruct) as a reference Falcon model.
|
||||
In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on Falcon models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [tiiuae/falcon-7b-instruct](https://huggingface.co/tiiuae/falcon-7b-instruct) as a reference Falcon model.
|
||||
|
||||
## 0. Requirements
|
||||
To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
|
||||
To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
|
||||
|
||||
## Example: Predict Tokens using `generate()` API
|
||||
In the example [generate.py](./generate.py), we show a basic use case for a Falcon model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations on Intel GPUs.
|
||||
|
|
|
|||
|
|
@ -1,8 +1,8 @@
|
|||
# Flan-t5
|
||||
In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on Flan-t5 models on [Intel GPUs](../README.md). For illustration purposes, we utilize the [google/flan-t5-xxl](https://huggingface.co/google/flan-t5-xxl) as a reference Flan-t5 model.
|
||||
In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on Flan-t5 models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [google/flan-t5-xxl](https://huggingface.co/google/flan-t5-xxl) as a reference Flan-t5 model.
|
||||
|
||||
## 0. Requirements
|
||||
To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
|
||||
To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
|
||||
|
||||
## Example: Predict Tokens using `generate()` API
|
||||
In the example [generate.py](./generate.py), we show a basic use case for a Flan-t5 model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations on Intel GPUs.
|
||||
|
|
|
|||
|
|
@ -1,8 +1,8 @@
|
|||
# GPT-J
|
||||
In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on GPT-J models on [Intel GPUs](../README.md). For illustration purposes, we utilize the [EleutherAI/gpt-j-6b](https://huggingface.co/EleutherAI/gpt-j-6b) as reference GPT-J models.
|
||||
In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on GPT-J models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [EleutherAI/gpt-j-6b](https://huggingface.co/EleutherAI/gpt-j-6b) as reference GPT-J models.
|
||||
|
||||
## 0. Requirements
|
||||
To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
|
||||
To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
|
||||
|
||||
## Example: Predict Tokens using `generate()` API
|
||||
In the example [generate.py](./generate.py), we show a basic use case for a GPT-J model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations on Intel GPUs.
|
||||
|
|
|
|||
|
|
@ -1,8 +1,8 @@
|
|||
# InternLM
|
||||
In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on InternLM models on [Intel GPUs](../README.md). For illustration purposes, we utilize the [internlm/internlm-chat-7b-8k](https://huggingface.co/internlm/internlm-chat-7b-8k) as a reference InternLM model.
|
||||
In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on InternLM models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [internlm/internlm-chat-7b-8k](https://huggingface.co/internlm/internlm-chat-7b-8k) as a reference InternLM model.
|
||||
|
||||
## 0. Requirements
|
||||
To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
|
||||
To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
|
||||
|
||||
## Example: Predict Tokens using `generate()` API
|
||||
In the example [generate.py](./generate.py), we show a basic use case for a InternLM model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations on Intel GPUs.
|
||||
|
|
|
|||
|
|
@ -1,8 +1,8 @@
|
|||
# Llama2
|
||||
In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on Llama2 models on [Intel GPUs](../README.md). For illustration purposes, we utilize the [meta-llama/Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf) and [meta-llama/Llama-2-13b-chat-hf](https://huggingface.co/meta-llama/Llama-2-13b-chat-hf) as reference Llama2 models.
|
||||
In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on Llama2 models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [meta-llama/Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf) and [meta-llama/Llama-2-13b-chat-hf](https://huggingface.co/meta-llama/Llama-2-13b-chat-hf) as reference Llama2 models.
|
||||
|
||||
## 0. Requirements
|
||||
To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
|
||||
To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
|
||||
|
||||
## Example: Predict Tokens using `generate()` API
|
||||
In the example [generate.py](./generate.py), we show a basic use case for a Llama2 model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations on Intel GPUs.
|
||||
|
|
|
|||
|
|
@ -1,8 +1,8 @@
|
|||
# Mistral
|
||||
In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on Mistral models on [Intel GPUs](../README.md). For illustration purposes, we utilize the [mistralai/Mistral-7B-Instruct-v0.1](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1) and [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) as reference Mistral models.
|
||||
In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on Mistral models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [mistralai/Mistral-7B-Instruct-v0.1](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1) and [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) as reference Mistral models.
|
||||
|
||||
## Requirements
|
||||
To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
|
||||
To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
|
||||
|
||||
**Important: According to [Mistral Troubleshooting](https://huggingface.co/mistralai/Mistral-7B-v0.1#troubleshooting), please make sure you have installed `transformers==4.34.0` to run the example.**
|
||||
|
||||
|
|
|
|||
|
|
@ -1,8 +1,8 @@
|
|||
# Mixtral
|
||||
In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on Mixtral models on [Intel GPUs](../README.md). For illustration purposes, we utilize the [mistralai/Mixtral-8x7B-Instruct-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1) as a reference Mixtral model.
|
||||
In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on Mixtral models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [mistralai/Mixtral-8x7B-Instruct-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1) as a reference Mixtral model.
|
||||
|
||||
## Requirements
|
||||
To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
|
||||
To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
|
||||
|
||||
**Important: Please make sure you have installed `transformers==4.36.0` to run the example.**
|
||||
|
||||
|
|
|
|||
|
|
@ -1,8 +1,8 @@
|
|||
# MPT
|
||||
In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on Llama2 models on [Intel GPUs](../README.md). For illustration purposes, we utilize the [mosaicml/mpt-7b-chat](https://huggingface.co/mosaicml/mpt-7b-chat) as a reference MPT model.
|
||||
In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on Llama2 models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [mosaicml/mpt-7b-chat](https://huggingface.co/mosaicml/mpt-7b-chat) as a reference MPT model.
|
||||
|
||||
## 0. Requirements
|
||||
To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
|
||||
To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
|
||||
|
||||
## Example: Predict Tokens using `generate()` API
|
||||
In the example [generate.py](./generate.py), we show a basic use case for an MPT model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations on Intel GPUs.
|
||||
|
|
|
|||
|
|
@ -1,8 +1,8 @@
|
|||
# phi-1_5
|
||||
In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on phi-1_5 models on [Intel GPUs](../README.md). For illustration purposes, we utilize the [microsoft/phi-1_5](https://huggingface.co/microsoft/phi-1_5) as a reference phi-1_5 model.
|
||||
In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on phi-1_5 models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [microsoft/phi-1_5](https://huggingface.co/microsoft/phi-1_5) as a reference phi-1_5 model.
|
||||
|
||||
## 0. Requirements
|
||||
To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
|
||||
To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
|
||||
|
||||
## Example: Predict Tokens using `generate()` API
|
||||
In the example [generate.py](./generate.py), we show a basic use case for a phi-1_5 model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations on Intel GPUs.
|
||||
|
|
|
|||
|
|
@ -1,8 +1,8 @@
|
|||
# Qwen-VL
|
||||
In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on Qwen-VL models on [Intel GPUs](../README.md). For illustration purposes, we utilize the [Qwen/Qwen-VL-Chat](https://huggingface.co/Qwen/Qwen-VL-Chat) as a reference Qwen-VL model.
|
||||
In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on Qwen-VL models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [Qwen/Qwen-VL-Chat](https://huggingface.co/Qwen/Qwen-VL-Chat) as a reference Qwen-VL model.
|
||||
|
||||
## Requirements
|
||||
To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
|
||||
To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
|
||||
|
||||
## Example: Multimodal chat using `chat()` API
|
||||
In the example [chat.py](./chat.py), we show a basic use case for a Qwen-VL model to start a multimodal chat using `chat()` API, with BigDL-LLM INT4 optimizations on Intel GPUs.
|
||||
|
|
|
|||
|
|
@ -1,8 +1,8 @@
|
|||
# Qwen
|
||||
In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on Qwen models on [Intel GPUs](../README.md). For illustration purposes, we utilize the [Qwen-7B-Chat](https://huggingface.co/Qwen/Qwen-7B-Chat) as a reference Qwen model.
|
||||
In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on Qwen models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [Qwen-7B-Chat](https://huggingface.co/Qwen/Qwen-7B-Chat) as a reference Qwen model.
|
||||
|
||||
## 0. Requirements
|
||||
To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
|
||||
To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
|
||||
|
||||
## Example: Predict Tokens using `generate()` API
|
||||
In the example [generate.py](./generate.py), we show a basic use case for a Qwen model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations on Intel GPUs.
|
||||
|
|
|
|||
|
|
@ -1,8 +1,8 @@
|
|||
# Replit
|
||||
In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on Replit models on [Intel GPUs](../README.md). For illustration purposes, we utilize the [replit/replit-code-v1-3b](https://huggingface.co/replit/replit-code-v1-3b) as a reference Replit model.
|
||||
In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on Replit models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [replit/replit-code-v1-3b](https://huggingface.co/replit/replit-code-v1-3b) as a reference Replit model.
|
||||
|
||||
## 0. Requirements
|
||||
To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
|
||||
To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
|
||||
|
||||
## Example: Predict Tokens using `generate()` API
|
||||
In the example [generate.py](./generate.py), we show a basic use case for an Replit model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations on Intel GPUs.
|
||||
|
|
|
|||
|
|
@ -1,8 +1,8 @@
|
|||
# SOLAR-10.7B
|
||||
In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on SOLAR-10.7B models on [Intel GPUs](../README.md). For illustration purposes, we utilize the [upstage/SOLAR-10.7B-Instruct-v1.0](https://huggingface.co/upstage/SOLAR-10.7B-Instruct-v1.0) as a reference SOLAR-10.7B model.
|
||||
In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on SOLAR-10.7B models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [upstage/SOLAR-10.7B-Instruct-v1.0](https://huggingface.co/upstage/SOLAR-10.7B-Instruct-v1.0) as a reference SOLAR-10.7B model.
|
||||
|
||||
## 0. Requirements
|
||||
To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
|
||||
To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
|
||||
|
||||
## Example: Predict Tokens using `generate()` API
|
||||
In the example [generate.py](./generate.py), we show a basic use case for a SOLAR-10.7B model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations on Intel GPUs.
|
||||
|
|
|
|||
|
|
@ -1,8 +1,8 @@
|
|||
# StarCoder
|
||||
In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on StarCoder models on [Intel GPUs](../README.md). For illustration purposes, we utilize the [bigcode/starcoder](https://huggingface.co/bigcode/starcoder) as a reference StarCoder model.
|
||||
In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on StarCoder models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [bigcode/starcoder](https://huggingface.co/bigcode/starcoder) as a reference StarCoder model.
|
||||
|
||||
## 0. Requirements
|
||||
To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
|
||||
To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
|
||||
|
||||
## Example: Predict Tokens using `generate()` API
|
||||
In the example [generate.py](./generate.py), we show a basic use case for an StarCoder model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations on Intel GPUs.
|
||||
|
|
|
|||
|
|
@ -2,7 +2,7 @@
|
|||
In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on Vicuna models. For illustration purposes, we utilize the [lmsys/vicuna-13b-v1.3](https://huggingface.co/lmsys/vicuna-13b-v1.3) and [eachadea/vicuna-7b-1.1](https://huggingface.co/eachadea/vicuna-7b-1.1) as reference Vicuna models.
|
||||
|
||||
## 0. Requirements
|
||||
To run these examples with BigDL-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
|
||||
To run these examples with BigDL-LLM, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
|
||||
|
||||
## Example: Predict Tokens using `generate()` API
|
||||
In the example [generate.py](./generate.py), we show a basic use case for a Vicuna model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations.
|
||||
|
|
|
|||
|
|
@ -1,10 +1,10 @@
|
|||
# Voice Assistant
|
||||
In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on Whisper and Llama2 models on [Intel GPUs](../README.md). For illustration purposes, we utilize the following models:
|
||||
In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on Whisper and Llama2 models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the following models:
|
||||
- [openai/whisper-small](https://huggingface.co/openai/whisper-small) and [openai/whisper-medium](https://huggingface.co/openai/whisper-medium) as reference whisper models.
|
||||
- [meta-llama/Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf) and [meta-llama/Llama-2-13b-chat-hf](https://huggingface.co/meta-llama/Llama-2-13b-chat-hf) as reference Llama2 models.
|
||||
|
||||
## 0. Requirements
|
||||
To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
|
||||
To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
|
||||
|
||||
## Example: Predict Tokens using `generate()` API
|
||||
In the example [generate.py](./generate.py), we show a basic use case for a Whisper model to conduct transcription using `generate()` API, then use the recoginzed text as the input for Llama2 model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations on Intel GPUs.
|
||||
|
|
|
|||
|
|
@ -1,9 +1,9 @@
|
|||
# Whisper
|
||||
|
||||
In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on Whisper models on [Intel GPUs](../README.md). For illustration purposes, we utilize the [openai/whisper-tiny](https://huggingface.co/openai/whisper-tiny) as a reference Whisper model.
|
||||
In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on Whisper models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [openai/whisper-tiny](https://huggingface.co/openai/whisper-tiny) as a reference Whisper model.
|
||||
|
||||
## 0. Requirements
|
||||
To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
|
||||
To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
|
||||
|
||||
## Example: Recognize Tokens using `generate()` API
|
||||
In the example [recognize.py](./recognize.py), we show a basic use case for a Whisper model to conduct transcription using `generate()` API, with BigDL-LLM INT4 optimizations on Intel GPUs.
|
||||
|
|
|
|||
|
|
@ -1,8 +1,8 @@
|
|||
# Yi
|
||||
In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on Yi models on [Intel GPUs](../README.md). For illustration purposes, we utilize the [01-ai/Yi-6B](https://huggingface.co/01-ai/Yi-6B) as a reference Yi model.
|
||||
In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on Yi models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [01-ai/Yi-6B](https://huggingface.co/01-ai/Yi-6B) as a reference Yi model.
|
||||
|
||||
## 0. Requirements
|
||||
To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
|
||||
To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
|
||||
|
||||
## Example: Predict Tokens using `generate()` API
|
||||
In the example [generate.py](./generate.py), we show a basic use case for a Yi model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations on Intel GPUs.
|
||||
|
|
|
|||
|
|
@ -1,26 +1,6 @@
|
|||
# BigDL-LLM INT4 Optimization for Large Language Model on Intel GPUs
|
||||
You can use `optimize_model` API to accelerate general PyTorch models on Intel GPUs. This directory contains example scripts to help you quickly get started using BigDL-LLM to run some popular open-source models in the community. Each model has its own dedicated folder, where you can find detailed instructions on how to install and run it.
|
||||
|
||||
## Verified Hardware Platforms
|
||||
|
||||
- Intel Arc™ A-Series Graphics
|
||||
- Intel Data Center GPU Flex Series
|
||||
- Intel Data Center GPU Max Series
|
||||
|
||||
## Recommended Requirements
|
||||
To apply Intel GPU acceleration, there’re several steps for tools installation and environment preparation. See the [GPU installation guide](https://bigdl.readthedocs.io/en/latest/doc/LLM/Overview/install_gpu.html) for mode details.
|
||||
|
||||
Step 1, only Linux system is supported now, Ubuntu 22.04 is prefered.
|
||||
|
||||
Step 2, please refer to our [driver installation](https://dgpu-docs.intel.com/driver/installation.html) for general purpose GPU capabilities.
|
||||
> **Note**: IPEX 2.0.110+xpu requires Intel GPU Driver version is [Stable 647.21](https://dgpu-docs.intel.com/releases/stable_647_21_20230714.html).
|
||||
|
||||
Step 3, you also need to download and install [Intel® oneAPI Base Toolkit](https://www.intel.com/content/www/us/en/developer/tools/oneapi/base-toolkit-download.html). OneMKL and DPC++ compiler are needed, others are optional.
|
||||
> **Note**: IPEX 2.0.110+xpu requires Intel® oneAPI Base Toolkit's version == 2023.2.0.
|
||||
|
||||
## Best Known Configuration on Linux
|
||||
For better performance, it is recommended to set environment variables on Linux:
|
||||
```bash
|
||||
export USE_XETLA=OFF
|
||||
export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
|
||||
```
|
||||
|
|
|
|||
|
|
@ -2,7 +2,7 @@
|
|||
In this directory, you will find examples on how you could use BigDL-LLM `optimize_model` API to accelerate Aquila2 models. For illustration purposes, we utilize the [BAAI/AquilaChat2-7B](https://huggingface.co/BAAI/AquilaChat2-7B) as reference Aquila2 models.
|
||||
|
||||
## Requirements
|
||||
To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
|
||||
To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
|
||||
|
||||
## Example: Predict Tokens using `generate()` API
|
||||
In the example [generate.py](./generate.py), we show a basic use case for a Aquila2 model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations on Intel GPUs.
|
||||
|
|
|
|||
|
|
@ -2,7 +2,7 @@
|
|||
In this directory, you will find examples on how you could use BigDL-LLM `optimize_model` API to accelerate Baichuan models. For illustration purposes, we utilize the [baichuan-inc/Baichuan-13B-Chat](https://huggingface.co/baichuan-inc/Baichuan-13B-Chat) as reference Baichuan models.
|
||||
|
||||
## Requirements
|
||||
To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
|
||||
To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
|
||||
|
||||
## Example: Predict Tokens using `generate()` API
|
||||
In the example [generate.py](./generate.py), we show a basic use case for a Baichuan model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations on Intel GPUs.
|
||||
|
|
|
|||
|
|
@ -2,7 +2,7 @@
|
|||
In this directory, you will find examples on how you could use BigDL-LLM `optimize_model` API to accelerate Baichuan2 models. For illustration purposes, we utilize the [baichuan-inc/Baichuan2-7B-Chat](https://huggingface.co/baichuan-inc/Baichuan-7B-Chat) as reference Baichuan2 models.
|
||||
|
||||
## Requirements
|
||||
To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
|
||||
To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
|
||||
|
||||
## Example: Predict Tokens using `generate()` API
|
||||
In the example [generate.py](./generate.py), we show a basic use case for a Baichuan2 model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations on Intel GPUs.
|
||||
|
|
|
|||
|
|
@ -2,7 +2,7 @@
|
|||
In this directory, you will find examples on how you could use BigDL-LLM `optimize_model` API to accelerate BlueLM models. For illustration purposes, we utilize the [vivo-ai/BlueLM-7B-Chat](https://huggingface.co/vivo-ai/BlueLM-7B-Chat) as reference BlueLM models.
|
||||
|
||||
## Requirements
|
||||
To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
|
||||
To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
|
||||
|
||||
## Example: Predict Tokens using `generate()` API
|
||||
In the example [generate.py](./generate.py), we show a basic use case for a BlueLM model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations on Intel GPUs.
|
||||
|
|
|
|||
|
|
@ -2,7 +2,7 @@
|
|||
In this directory, you will find examples on how you could use BigDL-LLM `optimize_model` API to accelerate ChatGLM2 models. For illustration purposes, we utilize the [THUDM/chatglm2-6b](https://huggingface.co/THUDM/chatglm2-6b) as reference ChatGLM2 models.
|
||||
|
||||
## Requirements
|
||||
To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
|
||||
To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
|
||||
|
||||
## Example 1: Predict Tokens using `generate()` API
|
||||
In the example [generate.py](./generate.py), we show a basic use case for a ChatGLM2 model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations on Intel GPUs.
|
||||
|
|
|
|||
|
|
@ -2,7 +2,7 @@
|
|||
In this directory, you will find examples on how you could use BigDL-LLM `optimize_model` API to accelerate ChatGLM3 models. For illustration purposes, we utilize the [THUDM/chatglm3-6b](https://huggingface.co/THUDM/chatglm3-6b) as reference ChatGLM3 models.
|
||||
|
||||
## Requirements
|
||||
To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
|
||||
To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
|
||||
|
||||
## Example 1: Predict Tokens using `generate()` API
|
||||
In the example [generate.py](./generate.py), we show a basic use case for a ChatGLM3 model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations on Intel GPUs.
|
||||
|
|
|
|||
|
|
@ -2,7 +2,7 @@
|
|||
In this directory, you will find examples on how you could use BigDL-LLM `optimize_model` API to accelerate CodeLlama models. For illustration purposes, we utilize the [codellama/CodeLlama-7b-hf](https://huggingface.co/codellama/CodeLlama-7b-hf) as reference CodeLlama models.
|
||||
|
||||
## Requirements
|
||||
To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
|
||||
To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
|
||||
|
||||
## Example: Predict Tokens using `generate()` API
|
||||
In the example [generate.py](./generate.py), we show a basic use case for a CodeLlama model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations on Intel GPUs.
|
||||
|
|
|
|||
|
|
@ -1,9 +1,9 @@
|
|||
# Distil-Whisper
|
||||
|
||||
In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on Distil-Whisper models on [Intel GPUs](../README.md). For illustration purposes, we utilize the [distil-whisper/distil-large-v2](https://huggingface.co/distil-whisper/distil-large-v2) as a reference Distil-Whisper model.
|
||||
In this directory, you will find examples on how you could use BigDL-LLM `optimize_model` API on Distil-Whisper models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [distil-whisper/distil-large-v2](https://huggingface.co/distil-whisper/distil-large-v2) as a reference Distil-Whisper model.
|
||||
|
||||
## 0. Requirements
|
||||
To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
|
||||
To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
|
||||
|
||||
## Example: Recognize Tokens using `generate()` API
|
||||
In the example [recognize.py](./recognize.py), we show a basic use case for a Distil-Whisper model to conduct transcription using `pipeline()` API for long audio input, with BigDL-LLM INT4 optimizations.
|
||||
|
|
|
|||
|
|
@ -2,7 +2,7 @@
|
|||
In this directory, you will find examples on how you could use BigDL-LLM `optimize_model` API to accelerate Dolly v1 models. For illustration purposes, we utilize the [databricks/dolly-v1-6b](https://huggingface.co/databricks/dolly-v1-6b) as reference Dolly v1 models.
|
||||
|
||||
## Requirements
|
||||
To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
|
||||
To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
|
||||
|
||||
## Example: Predict Tokens using `generate()` API
|
||||
In the example [generate.py](./generate.py), we show a basic use case for a Dolly v1 model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations on Intel GPUs.
|
||||
|
|
|
|||
|
|
@ -2,7 +2,7 @@
|
|||
In this directory, you will find examples on how you could use BigDL-LLM `optimize_model` API to accelerate Dolly v2 models. For illustration purposes, we utilize the [databricks/dolly-v2-12b](https://huggingface.co/databricks/dolly-v2-12b) and [databricks/dolly-v2-7b](https://huggingface.co/databricks/dolly-v2-7b) as reference Dolly v2 models.
|
||||
|
||||
## Requirements
|
||||
To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
|
||||
To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
|
||||
|
||||
## Example: Predict Tokens using `generate()` API
|
||||
In the example [generate.py](./generate.py), we show a basic use case for a Dolly v2 model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations on Intel GPUs.
|
||||
|
|
|
|||
|
|
@ -1,8 +1,8 @@
|
|||
# Flan-t5
|
||||
In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on Flan-t5 models on [Intel GPUs](../README.md). For illustration purposes, we utilize the [google/flan-t5-xxl](https://huggingface.co/google/flan-t5-xxl) as a reference Flan-t5 model.
|
||||
In this directory, you will find examples on how you could use BigDL-LLM `optimize_model` API on Flan-t5 models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [google/flan-t5-xxl](https://huggingface.co/google/flan-t5-xxl) as a reference Flan-t5 model.
|
||||
|
||||
## 0. Requirements
|
||||
To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
|
||||
To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
|
||||
|
||||
## Example: Predict Tokens using `generate()` API
|
||||
In the example [generate.py](./generate.py), we show a basic use case for a Flan-t5 model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations on Intel GPUs.
|
||||
|
|
|
|||
|
|
@ -2,7 +2,7 @@
|
|||
In this directory, you will find examples on how you could use BigDL-LLM `optimize_model` API to accelerate Llama2 models. For illustration purposes, we utilize the [meta-llama/Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf), [meta-llama/Llama-2-13b-chat-hf](https://huggingface.co/meta-llama/Llama-2-13b-chat-hf) and [meta-llama/Llama-2-70b-chat-hf](https://huggingface.co/meta-llama/Llama-2-70b-chat-hf) as reference Llama2 models.
|
||||
|
||||
## Requirements
|
||||
To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
|
||||
To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
|
||||
|
||||
## Example 1 - Basic Version: Predict Tokens using `generate()` API
|
||||
In the example [generate.py](./generate.py), we show a basic use case for a Llama2 model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations on Intel GPUs.
|
||||
|
|
|
|||
|
|
@ -1,8 +1,8 @@
|
|||
# LLaVA
|
||||
In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on LLaVA models on [Intel GPUs](../README.md). For illustration purposes, we utilize the [liuhaotian/llava-v1.5-7b](https://huggingface.co/liuhaotian/llava-v1.5-7b) as a reference LLaVA model.
|
||||
In this directory, you will find examples on how you could use BigDL-LLM `optimize_model` API on LLaVA models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [liuhaotian/llava-v1.5-7b](https://huggingface.co/liuhaotian/llava-v1.5-7b) as a reference LLaVA model.
|
||||
|
||||
## 0. Requirements
|
||||
To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
|
||||
To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
|
||||
|
||||
## Example: Multi-turn chat centered around an image using `generate()` API
|
||||
In the example [generate.py](./generate.py), we show a basic use case for a LLaVA model to start a multi-turn chat centered around an image using `generate()` API, with BigDL-LLM INT4 optimizations on Intel GPUs.
|
||||
|
|
|
|||
|
|
@ -2,7 +2,7 @@
|
|||
In this directory, you will find examples on how you could use BigDL-LLM `optimize_model` API to accelerate Mistral models. For illustration purposes, we utilize the [mistralai/Mistral-7B-Instruct-v0.1](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1) and [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) as reference Mistral models.
|
||||
|
||||
## Requirements
|
||||
To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
|
||||
To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
|
||||
|
||||
**Important: According to [Mistral Troubleshooting](https://huggingface.co/mistralai/Mistral-7B-v0.1#troubleshooting), please make sure you have installed `transformers==4.34.0` to run the example.**
|
||||
|
||||
|
|
|
|||
|
|
@ -2,7 +2,7 @@
|
|||
In this directory, you will find examples on how you could use BigDL-LLM `optimize_model` API to accelerate Mixtral models. For illustration purposes, we utilize the [mistralai/Mixtral-8x7B-Instruct-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1) as a reference Mixtral model.
|
||||
|
||||
## Requirements
|
||||
To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
|
||||
To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
|
||||
|
||||
**Important: Please make sure you have installed `transformers==4.36.0` to run the example.**
|
||||
|
||||
|
|
|
|||
|
|
@ -1,8 +1,8 @@
|
|||
# phi-1_5
|
||||
In this directory, you will find examples on how you could use BigDL-LLM `optimize_model` API to accelerate phi-1_5 models on [Intel GPUs](../README.md). For illustration purposes, we utilize the [microsoft/phi-1_5](https://huggingface.co/microsoft/phi-1_5) as a reference phi-1_5 model.
|
||||
In this directory, you will find examples on how you could use BigDL-LLM `optimize_model` API to accelerate phi-1_5 models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [microsoft/phi-1_5](https://huggingface.co/microsoft/phi-1_5) as a reference phi-1_5 model.
|
||||
|
||||
## Requirements
|
||||
To run these examples with BigDL-LLM, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
|
||||
To run these examples with BigDL-LLM, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
|
||||
|
||||
## Example: Predict Tokens using `generate()` API
|
||||
In the example [generate.py](./generate.py), we show a basic use case for a phi-1_5 model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations on Intel GPUs.
|
||||
|
|
|
|||
|
|
@ -1,8 +1,8 @@
|
|||
# Qwen-VL
|
||||
In this directory, you will find examples on how you could use BigDL-LLM `optimize_model` API to accelerate Qwen-VL models on [Intel GPUs](../README.md). For illustration purposes, we utilize the [Qwen/Qwen-VL-Chat](https://huggingface.co/Qwen/Qwen-VL-Chat) as a reference Qwen-VL model.
|
||||
In this directory, you will find examples on how you could use BigDL-LLM `optimize_model` API to accelerate Qwen-VL models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [Qwen/Qwen-VL-Chat](https://huggingface.co/Qwen/Qwen-VL-Chat) as a reference Qwen-VL model.
|
||||
|
||||
## Requirements
|
||||
To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
|
||||
To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
|
||||
|
||||
## Example: Multimodal chat using `chat()` API
|
||||
In the example [chat.py](./chat.py), we show a basic use case for a Qwen-VL model to start a multimodal chat using `chat()` API, with BigDL-LLM 'optimize_model' API on Intel GPUs.
|
||||
|
|
|
|||
|
|
@ -2,7 +2,7 @@
|
|||
In this directory, you will find examples on how you could use BigDL-LLM `optimize_model` API to accelerate Replit models. For illustration purposes, we utilize the [replit/replit-code-v1-3b](https://huggingface.co/replit/replit-code-v1-3b) as reference Replit models.
|
||||
|
||||
## Requirements
|
||||
To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
|
||||
To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
|
||||
|
||||
## Example: Predict Tokens using `generate()` API
|
||||
In the example [generate.py](./generate.py), we show a basic use case for a Replit model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations on Intel GPUs.
|
||||
|
|
|
|||
|
|
@ -2,7 +2,7 @@
|
|||
In this directory, you will find examples on how you could use BigDL-LLM `optimize_model` API to accelerate SOLAR-10.7B models. For illustration purposes, we utilize the [upstage/SOLAR-10.7B-Instruct-v1.0](https://huggingface.co/upstage/SOLAR-10.7B-Instruct-v1.0) as a reference SOLAR-10.7B model.
|
||||
|
||||
## Requirements
|
||||
To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
|
||||
To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
|
||||
|
||||
## Example: Predict Tokens using `generate()` API
|
||||
In the example [generate.py](./generate.py), we show a basic use case for a SOLAR-10.7B model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations on Intel GPUs.
|
||||
|
|
|
|||
|
|
@ -2,7 +2,7 @@
|
|||
In this directory, you will find examples on how you could use BigDL-LLM `optimize_model` API to accelerate StarCoder models. For illustration purposes, we utilize the [bigcode/starcoder](https://huggingface.co/bigcode/starcoder) as reference StarCoder models.
|
||||
|
||||
## Requirements
|
||||
To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
|
||||
To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
|
||||
|
||||
## Example: Predict Tokens using `generate()` API
|
||||
In the example [generate.py](./generate.py), we show a basic use case for a StarCoder model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations on Intel GPUs.
|
||||
|
|
|
|||
|
|
@ -1,8 +1,8 @@
|
|||
# Yi
|
||||
In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on Yi models on [Intel GPUs](../README.md). For illustration purposes, we utilize the [01-ai/Yi-6B](https://huggingface.co/01-ai/Yi-6B) as a reference Yi model.
|
||||
In this directory, you will find examples on how you could use BigDL-LLM `optimize_model` API on Yi models on [Intel GPUs](../../../README.md). For illustration purposes, we utilize the [01-ai/Yi-6B](https://huggingface.co/01-ai/Yi-6B) as a reference Yi model.
|
||||
|
||||
## 0. Requirements
|
||||
To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
|
||||
To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../README.md#requirements) for more information.
|
||||
|
||||
## Example: Predict Tokens using `generate()` API
|
||||
In the example [generate.py](./generate.py), we show a basic use case for a Yi model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations on Intel GPUs.
|
||||
|
|
|
|||
|
|
@ -10,6 +10,7 @@ This folder contains examples of running BigDL-LLM on Intel GPU:
|
|||
|
||||
|
||||
## System Support
|
||||
### 1. Linux:
|
||||
**Hardware**:
|
||||
- Intel Arc™ A-Series Graphics
|
||||
- Intel Data Center GPU Flex Series
|
||||
|
|
@ -18,5 +19,12 @@ This folder contains examples of running BigDL-LLM on Intel GPU:
|
|||
**Operating System**:
|
||||
- Ubuntu 20.04 or later (Ubuntu 22.04 is preferred)
|
||||
|
||||
### 2. Windows
|
||||
**Hardware**:
|
||||
- Intel iGPU and dGPU
|
||||
|
||||
**Operating System**:
|
||||
- Windows 10/11, with or without WSL
|
||||
|
||||
## Requirements
|
||||
To apply Intel GPU acceleration, there’re several steps for tools installation and environment preparation. See the [GPU installation guide](https://bigdl.readthedocs.io/en/latest/doc/LLM/Overview/install_gpu.html) for mode details.
|
||||
Loading…
Reference in a new issue