Add Data Center GPU Flex Series to Readme (#8835)

* Add Data Center GPU Flex Series to Readme * remove * update starcoder
2023-08-30 02:19:09 +08:00 · 2023-08-30 02:19:09 +08:00 · a386ad984e
commit a386ad984e
parent 7429ea0606
11 changed files with 39 additions and 34 deletions
--- a/python/llm/example/transformers/transformers_int4/GPU/README.md
+++ b/python/llm/example/transformers/transformers_int4/GPU/README.md
@ -1,12 +1,17 @@
-# BigDL-LLM Transformers INT4 Optimization for Large Language Model on Intel® Arc™ A-Series Graphics
-You can use BigDL-LLM to run almost every Huggingface Transformer models with INT4 optimizations on your laptops with Intel® Arc™ A-Series Graphics. This directory contains example scripts to help you quickly get started using BigDL-LLM to run some popular open-source models in the community. Each model has its own dedicated folder, where you can find detailed instructions on how to install and run it.
+# BigDL-LLM Transformers INT4 Optimization for Large Language Model on Intel GPUs
+You can use BigDL-LLM to run almost every Huggingface Transformer models with INT4 optimizations on your laptops with Intel GPUs. This directory contains example scripts to help you quickly get started using BigDL-LLM to run some popular open-source models in the community. Each model has its own dedicated folder, where you can find detailed instructions on how to install and run it.
+
+## Verified Hardware Platforms
+
+- Intel Arc™ A-Series Graphics
+- Intel Data Center GPU Flex Series

 ## Recommended Requirements
-To apply Intel® Arc™ A-Series Graphics acceleration, there’re several steps for tools installation and environment preparation.
+To apply Intel GPU acceleration, there’re several steps for tools installation and environment preparation.

 Step 1, only Linux system is supported now, Ubuntu 22.04 is prefered.

-Step 2, please refer to our [drive installation](https://dgpu-docs.intel.com/installation-guides/index.html#intel-arc-gpus) for general purpose GPU capabilities.
+Step 2, please refer to our [drive installation](https://dgpu-docs.intel.com/driver/installation.html) for general purpose GPU capabilities.

 Step 3, you also need to download and install [Intel® oneAPI Base Toolkit](https://www.intel.com/content/www/us/en/developer/tools/oneapi/base-toolkit-download.html). OneMKL and DPC++ compiler are needed, others are optional.
 > **Note**: IPEX 2.0.110+xpu requires Intel® oneAPI Base Toolkit's version >= 2023.2.0.
--- a/python/llm/example/transformers/transformers_int4/GPU/baichuan/README.md
+++ b/python/llm/example/transformers/transformers_int4/GPU/baichuan/README.md
@ -1,11 +1,11 @@
 # Baichuan
-In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on Baichuan models on any Intel® Arc™ A-Series Graphics. For illustration purposes, we utilize the [baichuan-inc/Baichuan-13B-Chat](https://huggingface.co/baichuan-inc/Baichuan-13B-Chat) as a reference Baichuan model.
+In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on Baichuan models on [Intel GPUs](../README.md). For illustration purposes, we utilize the [baichuan-inc/Baichuan-13B-Chat](https://huggingface.co/baichuan-inc/Baichuan-13B-Chat) as a reference Baichuan model.

 ## 0. Requirements
-To run these examples with BigDL-LLM on Intel® Arc™ A-Series Graphics, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
+To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.

 ## Example: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for a Baichuan model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations on Intel® Arc™ A-Series Graphics.
+In the example [generate.py](./generate.py), we show a basic use case for a Baichuan model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 We suggest using conda to manage environment:
 ```bash
--- a/python/llm/example/transformers/transformers_int4/GPU/chatglm2/README.md
+++ b/python/llm/example/transformers/transformers_int4/GPU/chatglm2/README.md
@ -1,12 +1,12 @@
 # ChatGLM2

-In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on ChatGLM2 models on any Intel® Arc™ A-Series Graphics. For illustration purposes, we utilize the [THUDM/chatglm2-6b](https://huggingface.co/THUDM/chatglm2-6b) as a reference ChatGLM2 model.
+In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on ChatGLM2 models on [Intel GPUs](../README.md). For illustration purposes, we utilize the [THUDM/chatglm2-6b](https://huggingface.co/THUDM/chatglm2-6b) as a reference ChatGLM2 model.

 ## 0. Requirements
-To run these examples with BigDL-LLM on Intel® Arc™ A-Series Graphics, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
+To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.

 ## Example 1: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for a ChatGLM2 model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations on Intel® Arc™ A-Series Graphics.
+In the example [generate.py](./generate.py), we show a basic use case for a ChatGLM2 model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 We suggest using conda to manage environment:
 ```bash
--- a/python/llm/example/transformers/transformers_int4/GPU/falcon/README.md
+++ b/python/llm/example/transformers/transformers_int4/GPU/falcon/README.md
@ -1,12 +1,12 @@
 # Falcon

-In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on Falcon models on any Intel® Arc™ A-Series Graphics. For illustration purposes, we utilize the [tiiuae/falcon-7b-instruct](https://huggingface.co/tiiuae/falcon-7b-instruct) as a reference Falcon model.
+In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on Falcon models on [Intel GPUs](../README.md). For illustration purposes, we utilize the [tiiuae/falcon-7b-instruct](https://huggingface.co/tiiuae/falcon-7b-instruct) as a reference Falcon model.

 ## 0. Requirements
-To run these examples with BigDL-LLM on Intel® Arc™ A-Series Graphics, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
+To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.

 ## Example: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for a Falcon model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations on Intel® Arc™ A-Series Graphics.
+In the example [generate.py](./generate.py), we show a basic use case for a Falcon model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 We suggest using conda to manage environment:
 ```bash
--- a/python/llm/example/transformers/transformers_int4/GPU/internlm/README.md
+++ b/python/llm/example/transformers/transformers_int4/GPU/internlm/README.md
@ -1,11 +1,11 @@
 # InternLM
-In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on InternLM models on any Intel® Arc™ A-Series Graphics. For illustration purposes, we utilize the [internlm/internlm-chat-7b-8k](https://huggingface.co/internlm/internlm-chat-7b-8k) as a reference InternLM model.
+In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on InternLM models on [Intel GPUs](../README.md). For illustration purposes, we utilize the [internlm/internlm-chat-7b-8k](https://huggingface.co/internlm/internlm-chat-7b-8k) as a reference InternLM model.

 ## 0. Requirements
-To run these examples with BigDL-LLM on Intel® Arc™ A-Series Graphics, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
+To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.

 ## Example: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for a InternLM model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations on Intel® Arc™ A-Series Graphics.
+In the example [generate.py](./generate.py), we show a basic use case for a InternLM model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 We suggest using conda to manage environment:
 ```bash
--- a/python/llm/example/transformers/transformers_int4/GPU/llama2/README.md
+++ b/python/llm/example/transformers/transformers_int4/GPU/llama2/README.md
@ -1,11 +1,11 @@
 # Llama2
-In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on Llama2 models on any Intel® Arc™ A-Series Graphics. For illustration purposes, we utilize the [meta-llama/Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf) and [meta-llama/Llama-2-13b-chat-hf](https://huggingface.co/meta-llama/Llama-2-13b-chat-hf) as reference Llama2 models.
+In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on Llama2 models on [Intel GPUs](../README.md). For illustration purposes, we utilize the [meta-llama/Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf) and [meta-llama/Llama-2-13b-chat-hf](https://huggingface.co/meta-llama/Llama-2-13b-chat-hf) as reference Llama2 models.

 ## 0. Requirements
-To run these examples with BigDL-LLM on Intel® Arc™ A-Series Graphics, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
+To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.

 ## Example: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for a Llama2 model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations on Intel® Arc™ A-Series Graphics.
+In the example [generate.py](./generate.py), we show a basic use case for a Llama2 model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 We suggest using conda to manage environment:
 ```bash
--- a/python/llm/example/transformers/transformers_int4/GPU/mpt/README.md
+++ b/python/llm/example/transformers/transformers_int4/GPU/mpt/README.md
@ -1,11 +1,11 @@
 # MPT
-In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on Llama2 models on any Intel® Arc™ A-Series Graphics. For illustration purposes, we utilize the [mosaicml/mpt-7b-chat](https://huggingface.co/mosaicml/mpt-7b-chat) as a reference MPT model.
+In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on Llama2 models on [Intel GPUs](../README.md). For illustration purposes, we utilize the [mosaicml/mpt-7b-chat](https://huggingface.co/mosaicml/mpt-7b-chat) as a reference MPT model.

 ## 0. Requirements
-To run these examples with BigDL-LLM on Intel® Arc™ A-Series Graphics, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
+To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.

 ## Example: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for an MPT model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations on Intel® Arc™ A-Series Graphics.
+In the example [generate.py](./generate.py), we show a basic use case for an MPT model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 We suggest using conda to manage environment:
 ```bash
--- a/python/llm/example/transformers/transformers_int4/GPU/qwen/README.md
+++ b/python/llm/example/transformers/transformers_int4/GPU/qwen/README.md
@ -1,11 +1,11 @@
 # Qwen
-In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on Qwen models on any Intel® Arc™ A-Series Graphics. For illustration purposes, we utilize the [Qwen-7B-Chat](https://huggingface.co/Qwen/Qwen-7B-Chat) as a reference Qwen model.
+In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on Qwen models on [Intel GPUs](../README.md). For illustration purposes, we utilize the [Qwen-7B-Chat](https://huggingface.co/Qwen/Qwen-7B-Chat) as a reference Qwen model.

 ## 0. Requirements
-To run these examples with BigDL-LLM on Intel® Arc™ A-Series Graphics, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
+To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.

 ## Example: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for a Qwen model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations on Intel® Arc™ A-Series Graphics.
+In the example [generate.py](./generate.py), we show a basic use case for a Qwen model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 We suggest using conda to manage environment:
 ```bash
--- a/python/llm/example/transformers/transformers_int4/GPU/starcoder/readme.md
+++ b/python/llm/example/transformers/transformers_int4/GPU/starcoder/readme.md
@ -1,11 +1,11 @@
 # StarCoder
-In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on StarCoder models on any Intel® Arc™ A-Series Graphics. For illustration purposes, we utilize the [bigcode/starcoder](https://huggingface.co/bigcode/starcoder) as a reference StarCoder model.
+In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on StarCoder models on [Intel GPUs](../README.md). For illustration purposes, we utilize the [bigcode/starcoder](https://huggingface.co/bigcode/starcoder) as a reference StarCoder model.

 ## 0. Requirements
-To run these examples with BigDL-LLM on Intel® Arc™ A-Series Graphics, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
+To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.

 ## Example: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for an StarCoder model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations on Intel® Arc™ A-Series Graphics.
+In the example [generate.py](./generate.py), we show a basic use case for an StarCoder model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 We suggest using conda to manage environment:
 ```bash
--- a/python/llm/example/transformers/transformers_int4/GPU/voiceassistant/README.md
+++ b/python/llm/example/transformers/transformers_int4/GPU/voiceassistant/README.md
@ -1,13 +1,13 @@
 # Voice Assistant
-In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on Whisper and Llama2 models on any Intel® Arc™ A-Series Graphics. For illustration purposes, we utilize the following models: 
+In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on Whisper and Llama2 models on [Intel GPUs](../README.md). For illustration purposes, we utilize the following models: 
 - [openai/whisper-small](https://huggingface.co/openai/whisper-small) and [openai/whisper-medium](https://huggingface.co/openai/whisper-medium) as reference whisper models.
 - [meta-llama/Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf) and [meta-llama/Llama-2-13b-chat-hf](https://huggingface.co/meta-llama/Llama-2-13b-chat-hf) as reference Llama2 models.

 ## 0. Requirements
-To run these examples with BigDL-LLM on Intel® Arc™ A-Series Graphics, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
+To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.

 ## Example: Predict Tokens using `generate()` API
-In the example [generate.py](./generate.py), we show a basic use case for a Whisper model to conduct transcription using `generate()` API, then use the recoginzed text as the input for Llama2 model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations on Intel® Arc™ A-Series Graphics.
+In the example [generate.py](./generate.py), we show a basic use case for a Whisper model to conduct transcription using `generate()` API, then use the recoginzed text as the input for Llama2 model to predict the next N tokens using `generate()` API, with BigDL-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 We suggest using conda to manage environment:
 ```bash
--- a/python/llm/example/transformers/transformers_int4/GPU/whisper/readme.md
+++ b/python/llm/example/transformers/transformers_int4/GPU/whisper/readme.md
@ -1,12 +1,12 @@
 # Whisper

-In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on Whisper models on any Intel® Arc™ A-Series Graphics. For illustration purposes, we utilize the [openai/whisper-tiny](https://huggingface.co/openai/whisper-tiny) as a reference Whisper model.
+In this directory, you will find examples on how you could apply BigDL-LLM INT4 optimizations on Whisper models on [Intel GPUs](../README.md). For illustration purposes, we utilize the [openai/whisper-tiny](https://huggingface.co/openai/whisper-tiny) as a reference Whisper model.

 ## 0. Requirements
-To run these examples with BigDL-LLM on Intel® Arc™ A-Series Graphics, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.
+To run these examples with BigDL-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../README.md#recommended-requirements) for more information.

 ## Example: Recognize Tokens using `generate()` API
-In the example [recognize.py](./recognize.py), we show a basic use case for a Whisper model to conduct transcription using `generate()` API, with BigDL-LLM INT4 optimizations on Intel® Arc™ A-Series Graphics.
+In the example [recognize.py](./recognize.py), we show a basic use case for a Whisper model to conduct transcription using `generate()` API, with BigDL-LLM INT4 optimizations on Intel GPUs.
 ### 1. Install
 We suggest using conda to manage environment:
 ```bash