From 86b81c09d90fe90f005afe931330d4c0fff33491 Mon Sep 17 00:00:00 2001
From: SichengStevenLi <144295301+SichengStevenLi@users.noreply.github.com>
Date: Fri, 28 Jun 2024 10:41:00 +0800
Subject: [PATCH] Table of Contents in Quickstart Files (#11437)
* fixed a minor grammar mistake
* added table of contents
* added table of contents
* changed table of contents indexing
* added table of contents
* added table of contents, changed grammar
* added table of contents
* added table of contents
* added table of contents
* added table of contents
* added table of contents
* added table of contents, modified chapter numbering
* fixed troubleshooting section redirection path
* added table of contents
* added table of contents, modified section numbering
* added table of contents, modified section numbering
* added table of contents
* added table of contents, changed title size, modified numbering
* added table of contents, changed section title size and capitalization
* added table of contents, modified section numbering
* changed table of contents syntax
* changed table of contents syntax
* changed table of contents syntax
* changed table of contents syntax
* changed table of contents syntax
* changed table of contents syntax
* changed table of contents syntax
* changed table of contents syntax
* changed table of contents syntax
* changed table of contents syntax
* changed table of contents syntax
* changed table of contents syntax
* changed table of contents syntax
* changed table of contents syntax
* changed table of contents syntax
* changed table of contents syntax
* changed table of contents syntax
* changed table of contents syntax
* changed table of contents capitalization issue
* changed table of contents capitalization issue
* changed table of contents location
* changed table of contents
* changed table of contents
* changed section capitalization
* removed comments
* removed comments
* removed comments
---
docs/mddocs/Quickstart/README.md | 2 +-
docs/mddocs/Quickstart/axolotl_quickstart.md | 9 +++++++++
docs/mddocs/Quickstart/benchmark_quickstart.md | 13 ++++++++++---
docs/mddocs/Quickstart/bigdl_llm_migration.md | 5 +++++
docs/mddocs/Quickstart/chatchat_quickstart.md | 10 +++++++++-
docs/mddocs/Quickstart/continue_quickstart.md | 7 +++++++
.../deepspeed_autotp_fastapi_quickstart.md | 5 +++++
docs/mddocs/Quickstart/dify_quickstart.md | 7 +++++++
docs/mddocs/Quickstart/fastchat_quickstart.md | 5 +++++
docs/mddocs/Quickstart/install_linux_gpu.md | 9 +++++++++
docs/mddocs/Quickstart/install_windows_gpu.md | 8 ++++++++
.../llama3_llamacpp_ollama_quickstart.md | 6 ++++++
docs/mddocs/Quickstart/llama_cpp_quickstart.md | 15 ++++++++++++---
docs/mddocs/Quickstart/ollama_quickstart.md | 15 +++++++++++----
.../open_webui_with_ollama_quickstart.md | 11 +++++++++--
docs/mddocs/Quickstart/privateGPT_quickstart.md | 6 ++++++
docs/mddocs/Quickstart/ragflow_quickstart.md | 12 ++++++++++--
docs/mddocs/Quickstart/vLLM_quickstart.md | 17 ++++++++++++-----
docs/mddocs/Quickstart/webui_quickstart.md | 14 +++++++++++---
19 files changed, 152 insertions(+), 24 deletions(-)
diff --git a/docs/mddocs/Quickstart/README.md b/docs/mddocs/Quickstart/README.md
index 3294761c..2f76c59b 100644
--- a/docs/mddocs/Quickstart/README.md
+++ b/docs/mddocs/Quickstart/README.md
@@ -1,7 +1,7 @@
# IPEX-LLM Quickstart
> [!NOTE]
-> We are adding more Quickstart guide.
+> We are adding more Quickstart guides.
This section includes efficient guide to show you how to:
diff --git a/docs/mddocs/Quickstart/axolotl_quickstart.md b/docs/mddocs/Quickstart/axolotl_quickstart.md
index c0654cd2..e50b9f8e 100644
--- a/docs/mddocs/Quickstart/axolotl_quickstart.md
+++ b/docs/mddocs/Quickstart/axolotl_quickstart.md
@@ -13,6 +13,15 @@ See the demo of finetuning LLaMA2-7B on Intel Arc GPU below.
+## Table of Contents
+- [Prerequisites](./axolotl_quickstart.md#0-prerequisites)
+- [Install IPEX-LLM for Axolotl](./axolotl_quickstart.md#1-install-ipex-llm-for-axolotl)
+- [Example: Finetune Llama-2-7B with Axolotl](./axolotl_quickstart.md#2-example-finetune-llama-2-7b-with-axolotl)
+- [Finetune Llama-3-8B (Experimental)](./axolotl_quickstart.md#3-finetune-llama-3-8b-experimental)
+- [Troubleshooting](./axolotl_quickstart.md#troubleshooting)
+
+
+
## Quickstart
### 0. Prerequisites
diff --git a/docs/mddocs/Quickstart/benchmark_quickstart.md b/docs/mddocs/Quickstart/benchmark_quickstart.md
index a677398e..fc5ce949 100644
--- a/docs/mddocs/Quickstart/benchmark_quickstart.md
+++ b/docs/mddocs/Quickstart/benchmark_quickstart.md
@@ -2,7 +2,14 @@
We can perform benchmarking for IPEX-LLM on Intel CPUs and GPUs using the benchmark scripts we provide.
-## Prepare The Environment
+## Table of Contents
+- [Prepare the Environment](./benchmark_quickstart.md#prepare-the-environment)
+- [Prepare the Scripts](./benchmark_quickstart.md#prepare-the-scripts)
+- [Run on Windows](./benchmark_quickstart.md#run-on-windows)
+- [Run on Linux](./benchmark_quickstart.md#run-on-linux)
+- [Result](./benchmark_quickstart.md#result)
+
+## Prepare the Environment
You can refer to [here](../Overview/install.md) to install IPEX-LLM in your environment. The following dependencies are also needed to run the benchmark scripts.
@@ -11,7 +18,7 @@ pip install pandas
pip install omegaconf
```
-## Prepare The Scripts
+## Prepare the Scripts
Navigate to your local workspace and then download IPEX-LLM from GitHub. Modify the `config.yaml` under `all-in-one` folder for your benchmark configurations.
@@ -21,7 +28,7 @@ git clone https://github.com/intel-analytics/ipex-llm.git
cd ipex-llm/python/llm/dev/benchmark/all-in-one/
```
-## config.yaml
+### config.yaml
```yaml
diff --git a/docs/mddocs/Quickstart/bigdl_llm_migration.md b/docs/mddocs/Quickstart/bigdl_llm_migration.md
index 0b7643e1..f6a76f34 100644
--- a/docs/mddocs/Quickstart/bigdl_llm_migration.md
+++ b/docs/mddocs/Quickstart/bigdl_llm_migration.md
@@ -2,6 +2,11 @@
This guide helps you migrate your `bigdl-llm` application to use `ipex-llm`.
+## Table of Contents
+- [Upgrade `bigdl-llm` package to `ipex-llm`](./bigdl_llm_migration.md#1-upgrade-bigdl-llm-code-to-ipex-llm)
+- [Migrate `bigdl-llm` code to `ipex-llm`](./bigdl_llm_migration.md#migrate-bigdl-llm-code-to-ipex-llm)
+
+
## Upgrade `bigdl-llm` package to `ipex-llm`
> [!NOTE]
diff --git a/docs/mddocs/Quickstart/chatchat_quickstart.md b/docs/mddocs/Quickstart/chatchat_quickstart.md
index 217d199c..8aaa4307 100644
--- a/docs/mddocs/Quickstart/chatchat_quickstart.md
+++ b/docs/mddocs/Quickstart/chatchat_quickstart.md
@@ -21,12 +21,20 @@
> [!NOTE]
> You can change the UI language in the left-side menu. We currently support **English** and **简体中文** (see video demos below).
+## Table of Contents
+- [Langchain-Chatchat Architecture](./chatchat_quickstart.md#langchain-chatchat-architecture)
+- [Install and Run](./chatchat_quickstart.md#install-and-run)
+- [How to Use RAG](./chatchat_quickstart.md#how-to-use-rag)
+- [Troubleshooting & Tips](./chatchat_quickstart.md#troubleshooting--tips)
+
+
## Langchain-Chatchat Architecture
See the Langchain-Chatchat architecture below ([source](https://github.com/chatchat-space/Langchain-Chatchat/blob/master/docs/img/langchain%2Bchatglm.png)).
+
## Quickstart
### Install and Run
@@ -72,7 +80,7 @@ You can now click `Dialogue` on the left-side menu to return to the chat UI. The
For more information about how to use Langchain-Chatchat, refer to Official Quickstart guide in [English](https://github.com/chatchat-space/Langchain-Chatchat/blob/master/README_en.md#), [Chinese](https://github.com/chatchat-space/Langchain-Chatchat/blob/master/README.md#), or the [Wiki](https://github.com/chatchat-space/Langchain-Chatchat/wiki/).
-### Trouble Shooting & Tips
+### Troubleshooting & Tips
#### 1. Version Compatibility
diff --git a/docs/mddocs/Quickstart/continue_quickstart.md b/docs/mddocs/Quickstart/continue_quickstart.md
index 9bfbd1b1..d3feb289 100644
--- a/docs/mddocs/Quickstart/continue_quickstart.md
+++ b/docs/mddocs/Quickstart/continue_quickstart.md
@@ -14,6 +14,13 @@ Below is a demo of using `Continue` with [CodeQWen1.5-7B](https://huggingface.co
+## Table of Contents
+- [Install and Run Ollama Serve](./continue_quickstart.md#1-install-and-run-ollama-serve)
+- [Pull and Prepare the Model](./continue_quickstart.md#2-pull-and-prepare-the-model)
+- [Install `Continue` Extension](./continue_quickstart.md#3-install-continue-extension)
+- [`Continue` Configuration](./continue_quickstart.md#4-continue-configuration)
+- [How to Use `Continue`](./continue_quickstart.md#5-how-to-use-continue)
+
## Quickstart
This guide walks you through setting up and running **Continue** within _Visual Studio Code_, empowered by local large language models served via [Ollama](./ollama_quickstart.md) with `ipex-llm` optimizations.
diff --git a/docs/mddocs/Quickstart/deepspeed_autotp_fastapi_quickstart.md b/docs/mddocs/Quickstart/deepspeed_autotp_fastapi_quickstart.md
index 17e51dca..0fa9888b 100644
--- a/docs/mddocs/Quickstart/deepspeed_autotp_fastapi_quickstart.md
+++ b/docs/mddocs/Quickstart/deepspeed_autotp_fastapi_quickstart.md
@@ -2,6 +2,11 @@
This example demonstrates how to run IPEX-LLM serving on multiple [Intel GPUs](../../../python/llm/example/GPU/README.md) by leveraging DeepSpeed AutoTP.
+## Table of Contents
+- [Requirements](./deepspeed_autotp_fastapi_quickstart.md#requirements)
+- [Example](./deepspeed_autotp_fastapi_quickstart.md#example)
+
+
## Requirements
To run this example with IPEX-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to [here](../../../python/llm/example/GPU/README.md#requirements) for more information. For this particular example, you will need at least two GPUs on your machine.
diff --git a/docs/mddocs/Quickstart/dify_quickstart.md b/docs/mddocs/Quickstart/dify_quickstart.md
index d507f9bd..68c6544d 100644
--- a/docs/mddocs/Quickstart/dify_quickstart.md
+++ b/docs/mddocs/Quickstart/dify_quickstart.md
@@ -15,6 +15,13 @@
+## Table of Contents
+- [Install and Start Ollama Service on Intel GPU](./dify_quickstart.md#1-install-and-start-ollama-service-on-intel-gpu)
+- [Install and Start Dify](./dify_quickstart.md#2-install-and-start-dify)
+- [How to Use Dify](./dify_quickstart.md#3-how-to-use-dify)
+
+
+
## Quickstart
### 1. Install and Start `Ollama` Service on Intel GPU
diff --git a/docs/mddocs/Quickstart/fastchat_quickstart.md b/docs/mddocs/Quickstart/fastchat_quickstart.md
index e89c64f6..43145739 100644
--- a/docs/mddocs/Quickstart/fastchat_quickstart.md
+++ b/docs/mddocs/Quickstart/fastchat_quickstart.md
@@ -4,6 +4,11 @@ FastChat is an open platform for training, serving, and evaluating large languag
IPEX-LLM can be easily integrated into FastChat so that user can use `IPEX-LLM` as a serving backend in the deployment.
+## Table of Contents
+- [Install IPEX-LLM with FastChat](./fastchat_quickstart.md#1-install-ipex-llm-with-fastchat)
+- [Start the Service](./fastchat_quickstart.md#2-start-the-service)
+
+
## Quick Start
This quickstart guide walks you through installing and running `FastChat` with `ipex-llm`.
diff --git a/docs/mddocs/Quickstart/install_linux_gpu.md b/docs/mddocs/Quickstart/install_linux_gpu.md
index afb64e6f..8fc0c8ff 100644
--- a/docs/mddocs/Quickstart/install_linux_gpu.md
+++ b/docs/mddocs/Quickstart/install_linux_gpu.md
@@ -4,6 +4,15 @@ This guide demonstrates how to install IPEX-LLM on Linux with Intel GPUs. It app
IPEX-LLM currently supports the Ubuntu 20.04 operating system and later, and supports PyTorch 2.0 and PyTorch 2.1 on Linux. This page demonstrates IPEX-LLM with PyTorch 2.1. Check the [Installation](../Overview/install_gpu.md#linux) page for more details.
+
+## Table of Contents
+- [Install Prerequisites](./install_linux_gpu.md#install-prerequisites)
+- [Install ipex-llm](./install_linux_gpu.md#install-ipex-llm)
+- [Verify Installation](./install_linux_gpu.md#verify-installation)
+- [Runtime Configurations](./install_linux_gpu.md#runtime-configurations)
+- [A Quick Example](./install_linux_gpu.md#a-quick-example)
+- [Tips & Troubleshooting](./install_linux_gpu.md#tips--troubleshooting)
+
## Install Prerequisites
### Install GPU Driver
diff --git a/docs/mddocs/Quickstart/install_windows_gpu.md b/docs/mddocs/Quickstart/install_windows_gpu.md
index eb7fa9f9..a77d855b 100644
--- a/docs/mddocs/Quickstart/install_windows_gpu.md
+++ b/docs/mddocs/Quickstart/install_windows_gpu.md
@@ -4,6 +4,14 @@ This guide demonstrates how to install IPEX-LLM on Windows with Intel GPUs.
It applies to Intel Core Ultra and Core 11 - 14 gen integrated GPUs (iGPUs), as well as Intel Arc Series GPU.
+## Table of Contents
+- [Install Prerequisites](./install_windows_gpu.md#install-prerequisites)
+- [Install ipex-llm](./install_windows_gpu.md#install-ipex-llm)
+- [Verify Installation](./install_windows_gpu.md#verify-installation)
+- [Monitor GPU Status](./install_windows_gpu.md#monitor-gpu-status)
+- [A Quick Example](./install_windows_gpu.md#a-quick-example)
+- [Tips & Troubleshooting](./install_windows_gpu.md#tips--troubleshooting)
+
## Install Prerequisites
### (Optional) Update GPU Driver
diff --git a/docs/mddocs/Quickstart/llama3_llamacpp_ollama_quickstart.md b/docs/mddocs/Quickstart/llama3_llamacpp_ollama_quickstart.md
index 8ab22500..5f2dabe7 100644
--- a/docs/mddocs/Quickstart/llama3_llamacpp_ollama_quickstart.md
+++ b/docs/mddocs/Quickstart/llama3_llamacpp_ollama_quickstart.md
@@ -15,6 +15,12 @@ See the demo of running Llama-3-8B-Instruct on Intel Arc GPU using `Ollama` belo
+## Table of Contents
+- [Run Llama 3 using llama.cpp](./llama3_llamacpp_ollama_quickstart.md#1-run-llama-3-using-llamacpp)
+- [Run Llama3 using Ollama](./llama3_llamacpp_ollama_quickstart.md#2-run-llama3-using-ollama)
+
+
+
## Quick Start
This quickstart guide walks you through how to run Llama 3 on Intel GPU using `llama.cpp` / `Ollama` with IPEX-LLM.
diff --git a/docs/mddocs/Quickstart/llama_cpp_quickstart.md b/docs/mddocs/Quickstart/llama_cpp_quickstart.md
index 1297f474..adcccd39 100644
--- a/docs/mddocs/Quickstart/llama_cpp_quickstart.md
+++ b/docs/mddocs/Quickstart/llama_cpp_quickstart.md
@@ -18,6 +18,15 @@ See the demo of running LLaMA2-7B on Intel Arc GPU below.
>
> Our latest version is consistent with [62bfef5](https://github.com/ggerganov/llama.cpp/commit/62bfef5194d5582486d62da3db59bf44981b7912) of llama.cpp.
+## Table of Contents
+- [Prerequisites](./llama_cpp_quickstart.md#0-prerequisites)
+- [Install IPEX-LLM for llama.cpp](./llama_cpp_quickstart.md#1-install-ipex-llm-for-llamacpp)
+- [Setup for running llama.cpp](./llama_cpp_quickstart.md#2-setup-for-running-llamacpp)
+- [Example: Running community GGUF models with IPEX-LLM](./llama_cpp_quickstart.md#3-example-running-community-gguf-models-with-ipex-llm)
+- [Troubleshooting](./llama_cpp_quickstart.md#troubleshooting)
+
+
+
## Quick Start
This quickstart guide walks you through installing and running `llama.cpp` with `ipex-llm`.
@@ -35,7 +44,7 @@ IPEX-LLM backend for llama.cpp only supports the more recent GPU drivers. Please
If you have lower GPU driver version, visit the [Install IPEX-LLM on Windows with Intel GPU Guide](./install_windows_gpu.md), and follow [Update GPU driver](./install_windows_gpu.md#optional-update-gpu-driver).
-### 1 Install IPEX-LLM for llama.cpp
+### 1. Install IPEX-LLM for llama.cpp
To use `llama.cpp` with IPEX-LLM, first ensure that `ipex-llm[cpp]` is installed.
@@ -59,7 +68,7 @@ To use `llama.cpp` with IPEX-LLM, first ensure that `ipex-llm[cpp]` is installed
**After the installation, you should have created a conda environment, named `llm-cpp` for instance, for running `llama.cpp` commands with IPEX-LLM.**
-### 2 Setup for running llama.cpp
+### 2. Setup for running llama.cpp
First you should create a directory to use `llama.cpp`, for instance, use following command to create a `llama-cpp` directory and enter it.
```cmd
@@ -127,7 +136,7 @@ To use GPU acceleration, several environment variables are required or recommend
> export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
> ```
-### 3 Example: Running community GGUF models with IPEX-LLM
+### 3. Example: Running community GGUF models with IPEX-LLM
Here we provide a simple example to show how to run a community GGUF model with IPEX-LLM.
diff --git a/docs/mddocs/Quickstart/ollama_quickstart.md b/docs/mddocs/Quickstart/ollama_quickstart.md
index 98a8be98..4846f82c 100644
--- a/docs/mddocs/Quickstart/ollama_quickstart.md
+++ b/docs/mddocs/Quickstart/ollama_quickstart.md
@@ -18,9 +18,16 @@ See the demo of running LLaMA2-7B on Intel Arc GPU below.
>
> Our current version is consistent with [v0.1.39](https://github.com/ollama/ollama/releases/tag/v0.1.39) of ollama.
+## Table of Contents
+- [Install IPEX-LLM for Ollama](./ollama_quickstart.md#1-install-ipex-llm-for-ollama)
+- [Initialize Ollama](./ollama_quickstart.md#2-initialize-ollama)
+- [Run Ollama Serve](./ollama_quickstart.md#3-run-ollama-serve)
+- [Pull Model](./ollama_quickstart.md#4-pull-model)
+- [Using Ollama](./ollama_quickstart.md#5-using-ollama)
+
## Quickstart
-### 1 Install IPEX-LLM for Ollama
+### 1. Install IPEX-LLM for Ollama
IPEX-LLM's support for `ollama` now is available for Linux system and Windows system.
@@ -53,7 +60,7 @@ Activate the `llm-cpp` conda environment and initialize Ollama by executing the
**Now you can use this executable file by standard ollama's usage.**
-### 3 Run Ollama Serve
+### 3. Run Ollama Serve
You may launch the Ollama service as below:
@@ -102,7 +109,7 @@ The console will display messages similar to the following:
-### 4 Pull Model
+### 4. Pull Model
Keep the Ollama service on and open another terminal and run `./ollama pull ` in Linux (`ollama.exe pull ` in Windows) to automatically pull a model. e.g. `dolphin-phi:latest`:
@@ -110,7 +117,7 @@ Keep the Ollama service on and open another terminal and run `./ollama pull
-### 5 Using Ollama
+### 5. Using Ollama
#### Using Curl
diff --git a/docs/mddocs/Quickstart/open_webui_with_ollama_quickstart.md b/docs/mddocs/Quickstart/open_webui_with_ollama_quickstart.md
index 6981b464..be975a4b 100644
--- a/docs/mddocs/Quickstart/open_webui_with_ollama_quickstart.md
+++ b/docs/mddocs/Quickstart/open_webui_with_ollama_quickstart.md
@@ -13,16 +13,23 @@
+## Table of Contents
+- [Run Ollama with Intel GPU](./open_webui_with_ollama_quickstart.md#1-run-ollama-with-intel-gpu)
+- [Install the Open-Webui](./open_webui_with_ollama_quickstart.md#2-install-the-open-webui)
+- [Start the Open-WebUI](./open_webui_with_ollama_quickstart.md#3-start-the-open-webui)
+- [Using the Open-Webui](./open_webui_with_ollama_quickstart.md#4-using-the-open-webui)
+- [Troubleshooting](./open_webui_with_ollama_quickstart.md#5-troubleshooting)
+
## Quickstart
This quickstart guide walks you through setting up and using [Open WebUI](https://github.com/open-webui/open-webui) with Ollama (using the C++ interface of [`ipex-llm`](https://github.com/intel-analytics/ipex-llm) as an accelerated backend).
-### 1 Run Ollama with Intel GPU
+### 1. Run Ollama with Intel GPU
Follow the instructions on the [Run Ollama with Intel GPU](./ollama_quickstart.md) to install and run "Ollama Serve". Please ensure that the Ollama server continues to run while you're using the Open WebUI.
-### 2 Install the Open-Webui
+### 2. Install the Open-Webui
#### Install Node.js & npm
diff --git a/docs/mddocs/Quickstart/privateGPT_quickstart.md b/docs/mddocs/Quickstart/privateGPT_quickstart.md
index b95599d5..b7a53fc3 100644
--- a/docs/mddocs/Quickstart/privateGPT_quickstart.md
+++ b/docs/mddocs/Quickstart/privateGPT_quickstart.md
@@ -13,6 +13,12 @@
+## Table of Contents
+- [Install and Start `Ollama` Service on Intel GPU](./privateGPT_quickstart.md#1-install-and-start-ollama-service-on-intel-gpu)
+- [Install PrivateGPT](./privateGPT_quickstart.md#2-install-privategpt)
+- [Start PrivateGPT](./privateGPT_quickstart.md#3-start-privategpt)
+- [Using PrivateGPT](./privateGPT_quickstart.md#4-using-privategpt)
+
## Quickstart
### 1. Install and Start `Ollama` Service on Intel GPU
diff --git a/docs/mddocs/Quickstart/ragflow_quickstart.md b/docs/mddocs/Quickstart/ragflow_quickstart.md
index 254aa372..22251831 100644
--- a/docs/mddocs/Quickstart/ragflow_quickstart.md
+++ b/docs/mddocs/Quickstart/ragflow_quickstart.md
@@ -14,9 +14,17 @@
+
+## Table of Contents
+- [Prerequisites](./ragflow_quickstart.md#0-prerequisites)
+- [Install and Start Ollama Service on Intel GPU](./ragflow_quickstart.md#1-install-and-start-ollama-service-on-intel-gpu)
+- [Pull Model](./ragflow_quickstart.md#2-pull-model)
+- [Start `RAGFlow` Service](./ragflow_quickstart.md#3-start-ragflow-service)
+- [Using `RAGFlow`](./ragflow_quickstart.md#4-using-ragflow)
+
## Quickstart
-### 0 Prerequisites
+### 0. Prerequisites
- CPU >= 4 cores
- RAM >= 16 GB
@@ -95,7 +103,7 @@ To make the change permanent and ensure it persists after a reboot, add or updat
vm.max_map_count=262144
```
-### 3.3 Start the `RAGFlow` server using Docker
+#### 3.3 Start the `RAGFlow` server using Docker
Build the pre-built Docker images and start up the server:
diff --git a/docs/mddocs/Quickstart/vLLM_quickstart.md b/docs/mddocs/Quickstart/vLLM_quickstart.md
index 155fd321..764b35c1 100644
--- a/docs/mddocs/Quickstart/vLLM_quickstart.md
+++ b/docs/mddocs/Quickstart/vLLM_quickstart.md
@@ -11,6 +11,13 @@ Currently, IPEX-LLM integrated vLLM only supports the following models:
- ChatGLM series models
- Baichuan series models
+## Table of Contents
+- [Install IPEX-LLM for vLLM](./vLLM_quickstart.md#1-install-ipex-llm-for-vllm)
+- [Install vLLM](./vLLM_quickstart.md#2-install-vllm)
+- [Offline Inference/Service](./vLLM_quickstart.md#3-offline-inferenceservice)
+- [About Tensor Parallel](./vLLM_quickstart.md#4-about-tensor-parallel)
+- [Performing Benchmark](./vLLM_quickstart.md#5-performing-benchmark)
+
## Quick Start
@@ -48,9 +55,9 @@ pip install transformers_stream_generator einops tiktoken
**Now you are all set to use vLLM with IPEX-LLM**
-## 3. Offline inference/Service
+### 3. Offline Inference/Service
-### Offline inference
+#### Offline inference
To run offline inference using vLLM for a quick impression, use the following example.
@@ -87,7 +94,7 @@ Prompt: 'The capital of France is', Generated text: ' Paris.\nThe capital of Fra
Prompt: 'The future of AI is', Generated text: " bright, but it's not without challenges. As AI continues to evolve,"
```
-### Service
+#### Service
> [!NOTE]
> Because of using JIT compilation for kernels. We recommend to send a few requests for warmup before using the service for the best performance.
@@ -170,7 +177,7 @@ Below shows an example output using `Qwen1.5-7B-Chat` with low-bit format `sym_i
> export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
> ```
-## 4. About Tensor parallel
+### 4. About Tensor Parallel
> [!NOTE]
> We recommend to use docker for tensor parallel deployment. Check our serving docker image `intelanalytics/ipex-llm-serving-xpu`.
@@ -223,7 +230,7 @@ If the service have booted successfully, you should see the output similar to th
-## 5.Performing benchmark
+### 5. Performing Benchmark
To perform benchmark, you can use the **benchmark_throughput** script that is originally provided by vLLM repo.
diff --git a/docs/mddocs/Quickstart/webui_quickstart.md b/docs/mddocs/Quickstart/webui_quickstart.md
index 2775605f..6600c6c0 100644
--- a/docs/mddocs/Quickstart/webui_quickstart.md
+++ b/docs/mddocs/Quickstart/webui_quickstart.md
@@ -13,6 +13,14 @@ See the demo of running LLaMA2-7B on an Intel Core Ultra laptop below.
+## Table of Contents
+- [Install IPEX-LLM](./webui_quickstart.md#1-install-ipex-llm)
+- [Install the WebUI](./webui_quickstart.md#2-install-the-webui)
+- [Start the WebUI Server](./webui_quickstart.md#3-start-the-webui-server)
+- [Using the WebUI](./webui_quickstart.md#4-using-the-webui)
+- [Advanced Usage](./webui_quickstart.md#5-advanced-usage)
+- [Troubleshooting](./webui_quickstart.md#troubleshooting)
+
## Quickstart
This quickstart guide walks you through setting up and using the [Text Generation WebUI](https://github.com/intel-analytics/text-generation-webui) with `ipex-llm`.
@@ -23,13 +31,13 @@ A preview of the WebUI in action is shown below:
-### 1 Install IPEX-LLM
+### 1. Install IPEX-LLM
To use the WebUI, first ensure that IPEX-LLM is installed. Follow the instructions on the [IPEX-LLM Installation Quickstart for Windows with Intel GPU](./install_windows_gpu.md).
**After the installation, you should have created a conda environment, named `llm` for instance, for running `ipex-llm` applications.**
-### 2 Install the WebUI
+### 2. Install the WebUI
#### Download the WebUI
Download the `text-generation-webui` with IPEX-LLM integrations from [this link](https://github.com/intel-analytics/text-generation-webui/archive/refs/heads/ipex-llm.zip). Unzip the content into a directory, e.g.,`C:\text-generation-webui`.
@@ -50,7 +58,7 @@ pip install -r extensions/openai/requirements.txt
> [!NOTE]
> `extensions/openai/requirements.txt` is for API service. If you don't need the API service, you can omit this command.
-### 3 Start the WebUI Server
+### 3. Start the WebUI Server
#### Set Environment Variables
Configure oneAPI variables by running the following command in **Miniforge Prompt**: