From 37f509bb95c729e4ccabbeb5409b2745dc1fe16c Mon Sep 17 00:00:00 2001
From: Jason Dai <jason.dai@gmail.com>
Date: Thu, 14 Dec 2023 19:50:21 +0800
Subject: [PATCH] Update readme (#9692)

---
 README.md                                                  | 7 +++++--
 docs/readthedocs/source/index.rst                          | 3 ++-
 .../example/GPU/HF-Transformers-AutoModels/Model/README.md | 4 ++--
 python/llm/example/GPU/PyTorch-Models/Model/README.md      | 4 ++--
 python/llm/example/GPU/README.md                           | 2 +-
 5 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/README.md b/README.md
index 7a43ba0d..a26b6080 100644
--- a/README.md
+++ b/README.md
@@ -11,8 +11,9 @@
 
 > *It is built on the excellent work of [llama.cpp](https://github.com/ggerganov/llama.cpp), [bitsandbytes](https://github.com/TimDettmers/bitsandbytes), [qlora](https://github.com/artidoro/qlora), [gptq](https://github.com/IST-DASLab/gptq), [AutoGPTQ](https://github.com/PanQiWei/AutoGPTQ), [awq](https://github.com/mit-han-lab/llm-awq), [AutoAWQ](https://github.com/casper-hansen/AutoAWQ), [vLLM](https://github.com/vllm-project/vllm), [llama-cpp-python](https://github.com/abetlen/llama-cpp-python), [gptq_for_llama](https://github.com/qwopqwop200/GPTQ-for-LLaMa), [chatglm.cpp](https://github.com/li-plus/chatglm.cpp), [redpajama.cpp](https://github.com/togethercomputer/redpajama.cpp), [gptneox.cpp](https://github.com/byroneverson/gptneox.cpp), [bloomz.cpp](https://github.com/NouamaneTazi/bloomz.cpp/), etc.*
 
-### Latest update
-- [2023/12]`bigdl-llm` now supports [QA-LoRA](python/llm/example/GPU/QLoRA-FineTuning/alpaca-qlora#qa-lora) (see *["QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models"](https://arxiv.org/abs/2309.14717)*)
+### Latest update :fire: 
+- [2023/12] `bigdl-llm` now supports [Mixtra-7x8B](python/llm/example/GPU/HF-Transformers-AutoModels/Model/mixtral) on both Intel [GPU](python/llm/example/GPU/HF-Transformers-AutoModels/Model/mixtral) and [CPU](python/llm/example/CPU/HF-Transformers-AutoModels/Model/mixtral).
+- [2023/12] `bigdl-llm` now supports [QA-LoRA](python/llm/example/GPU/QLoRA-FineTuning/alpaca-qlora#qa-lora) (see *["QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models"](https://arxiv.org/abs/2309.14717)*)
 - [2023/12] `bigdl-llm` now supports [FP8 and FP4 inference](python/llm/example/GPU/HF-Transformers-AutoModels/More-Data-Types) on Intel ***GPU***.
 - [2023/11] Initial support for directly loading [GGUF](python/llm/example/GPU/HF-Transformers-AutoModels/Advanced-Quantizations/GGUF), [AWQ](python/llm/example/GPU/HF-Transformers-AutoModels/Advanced-Quantizations/AWQ) and [GPTQ](python/llm/example/GPU/HF-Transformers-AutoModels/Advanced-Quantizations/GPTQ) models in to `bigdl-llm` is available.
 - [2023/11] Initial support for [vLLM continuous batching](python/llm/example/CPU/vLLM-Serving) is availabe on Intel ***CPU***.
@@ -95,6 +96,8 @@ pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-w
 ```
 > Note: `bigdl-llm` has been tested on Python 3.9
 
+See the [GPU installation guide](https://bigdl.readthedocs.io/en/latest/doc/LLM/Overview/install_gpu.html) for mode details.
+
 ##### Run Model
 You may apply INT4 optimizations to any Hugging Face *Transformers* models as follows.
 
diff --git a/docs/readthedocs/source/index.rst b/docs/readthedocs/source/index.rst
index 8ebdfbae..4a696bfc 100644
--- a/docs/readthedocs/source/index.rst
+++ b/docs/readthedocs/source/index.rst
@@ -22,8 +22,9 @@ BigDL-LLM: low-Bit LLM library
    It is built on top of the excellent work of `llama.cpp <https://github.com/ggerganov/llama.cpp>`_, `gptq <https://github.com/IST-DASLab/gptq>`_, `bitsandbytes <https://github.com/TimDettmers/bitsandbytes>`_, `qlora <https://github.com/artidoro/qlora>`_, etc.
 
 ============================================
-Latest update
+Latest update 
 ============================================
+- [2023/12] ``bigdl-llm`` now supports `Mixtra-7x8B <https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels/Model/mixtral>`_ on both Intel `GPU <https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels/Model/mixtral>`_ and `CPU <https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/Model/mixtral>`_.
 - [2023/12] ``bigdl-llm`` now supports `QA-LoRA <https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/QLoRA-FineTuning/alpaca-qlora#qa-lora>`_ (see `"QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models" <https://arxiv.org/abs/2309.14717>`_).
 - [2023/12] ``bigdl-llm`` now supports `FP8 and FP4 inference <https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels/More-Data-Types>`_ on Intel **GPU**.
 - [2023/11] Initial support for directly loading `GGUF <https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels/Advanced-Quantizations/GGUF>`_, `AWQ <https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels/Advanced-Quantizations/AWQ>`_ and `GPTQ <https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels/Advanced-Quantizations/GPTQ>`_ models in to ``bigdl-llm`` is available.
diff --git a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/README.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/README.md
index b98d03e3..4d35b4aa 100644
--- a/python/llm/example/GPU/HF-Transformers-AutoModels/Model/README.md
+++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/README.md
@@ -8,7 +8,7 @@ You can use BigDL-LLM to run almost every Huggingface Transformer models with IN
 - Intel Data Center GPU Max Series
 
 ## Recommended Requirements
-To apply Intel GPU acceleration, there’re several steps for tools installation and environment preparation.
+To apply Intel GPU acceleration, there’re several steps for tools installation and environment preparation. See the [GPU installation guide](https://bigdl.readthedocs.io/en/latest/doc/LLM/Overview/install_gpu.html) for mode details.
 
 Step 1, only Linux system is supported now, Ubuntu 22.04 is prefered.
 
@@ -16,7 +16,7 @@ Step 2, please refer to our [driver installation](https://dgpu-docs.intel.com/dr
 > **Note**: IPEX 2.0.110+xpu requires Intel GPU Driver version is [Stable 647.21](https://dgpu-docs.intel.com/releases/stable_647_21_20230714.html).
 
 Step 3, you also need to download and install [Intel® oneAPI Base Toolkit](https://www.intel.com/content/www/us/en/developer/tools/oneapi/base-toolkit-download.html). OneMKL and DPC++ compiler are needed, others are optional.
-> **Note**: IPEX 2.0.110+xpu requires Intel® oneAPI Base Toolkit's version >= 2023.2.0.
+> **Note**: IPEX 2.0.110+xpu requires Intel® oneAPI Base Toolkit's version == 2023.2.0.
 
 ## Best Known Configuration on Linux
 For better performance, it is recommended to set environment variables on Linux:
diff --git a/python/llm/example/GPU/PyTorch-Models/Model/README.md b/python/llm/example/GPU/PyTorch-Models/Model/README.md
index 75b7f668..31825764 100644
--- a/python/llm/example/GPU/PyTorch-Models/Model/README.md
+++ b/python/llm/example/GPU/PyTorch-Models/Model/README.md
@@ -8,7 +8,7 @@ You can use `optimize_model` API to accelerate general PyTorch models on Intel G
 - Intel Data Center GPU Max Series
 
 ## Recommended Requirements
-To apply Intel GPU acceleration, there’re several steps for tools installation and environment preparation.
+To apply Intel GPU acceleration, there’re several steps for tools installation and environment preparation. See the [GPU installation guide](https://bigdl.readthedocs.io/en/latest/doc/LLM/Overview/install_gpu.html) for mode details.
 
 Step 1, only Linux system is supported now, Ubuntu 22.04 is prefered.
 
@@ -16,7 +16,7 @@ Step 2, please refer to our [driver installation](https://dgpu-docs.intel.com/dr
 > **Note**: IPEX 2.0.110+xpu requires Intel GPU Driver version is [Stable 647.21](https://dgpu-docs.intel.com/releases/stable_647_21_20230714.html).
 
 Step 3, you also need to download and install [Intel® oneAPI Base Toolkit](https://www.intel.com/content/www/us/en/developer/tools/oneapi/base-toolkit-download.html). OneMKL and DPC++ compiler are needed, others are optional.
-> **Note**: IPEX 2.0.110+xpu requires Intel® oneAPI Base Toolkit's version >= 2023.2.0.
+> **Note**: IPEX 2.0.110+xpu requires Intel® oneAPI Base Toolkit's version == 2023.2.0.
 
 ## Best Known Configuration on Linux
 For better performance, it is recommended to set environment variables on Linux:
diff --git a/python/llm/example/GPU/README.md b/python/llm/example/GPU/README.md
index 38b4f207..7ff7359b 100644
--- a/python/llm/example/GPU/README.md
+++ b/python/llm/example/GPU/README.md
@@ -19,7 +19,7 @@ This folder contains examples of running BigDL-LLM on Intel GPU:
 - Ubuntu 20.04 or later (Ubuntu 22.04 is preferred)
 
 ## Requirements
-To apply Intel GPU acceleration, there’re several steps for tools installation and environment preparation.
+To apply Intel GPU acceleration, there’re several steps for tools installation and environment preparation. See the [GPU installation guide](https://bigdl.readthedocs.io/en/latest/doc/LLM/Overview/install_gpu.html) for mode details.
 
 Step 1, please refer to our [driver installation](https://dgpu-docs.intel.com/driver/installation.html) for general purpose GPU capabilities.
 > **Note**: IPEX 2.0.110+xpu requires Intel GPU Driver version is [Stable 647.21](https://dgpu-docs.intel.com/releases/stable_647_21_20230714.html).