diff --git a/docs/readthedocs/source/_templates/sidebar_quicklinks.html b/docs/readthedocs/source/_templates/sidebar_quicklinks.html index 4dea56b9..3b5a36b0 100644 --- a/docs/readthedocs/source/_templates/sidebar_quicklinks.html +++ b/docs/readthedocs/source/_templates/sidebar_quicklinks.html @@ -9,9 +9,15 @@
+ >
* Install drivers
@@ -45,9 +43,9 @@ We assume that you have the 6.2 kernel on your linux machine.
sudo reboot
```
- >
+ >
- >
+ >
* Configure permissions
@@ -63,7 +61,7 @@ We assume that you have the 6.2 kernel on your linux machine.
## Setup Python Environment
-* Install the Miniconda as follows
+Install the Miniconda as follows if you don't have conda installed on your machine:
```bash
wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh
@@ -75,7 +73,7 @@ We assume that you have the 6.2 kernel on your linux machine.
conda --version
# rm Miniconda3-latest-Linux-x86_64.sh # if you don't need this file any longer
```
- >
+ >
## Install oneAPI
@@ -88,9 +86,9 @@ We assume that you have the 6.2 kernel on your linux machine.
sudo apt install intel-basekit
```
- >
+ >
- >
+ >
## Install `bigdl-llm`
@@ -103,24 +101,24 @@ We assume that you have the 6.2 kernel on your linux machine.
pip install --pre --upgrade bigdl-llm[xpu] --extra-index-url https://developer.intel.com/ipex-whl-stable-xpu
```
- >
+ >
- >
+ >
-* You can verfy if bigdl-llm is successfully by simply importing a few classes from the library. For example, execute the following import command in terminal:
+* You can verify if bigdl-llm is successfully installed by simply importing a few classes from the library. For example, execute the following import command in the terminal:
```bash
source /opt/intel/oneapi/setvars.sh
python
- > from bigdl.llm.transformers import AutoModel,AutoModelForCausalLM
+ > from bigdl.llm.transformers import AutoModel, AutoModelForCausalLM
```
- >
+ >
-## Runtime Configuration
+## Runtime Configurations
To use GPU acceleration on Linux, several environment variables are required or recommended before running a GPU example.
@@ -147,7 +145,7 @@ To use GPU acceleration on Linux, several environment variables are required or
export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
export ENABLE_SDP_FUSION=1
```
- Please note that libtcmalloc.so can be installed by conda install -c conda-forge -y gperftools=2.10
+ Please note that `libtcmalloc.so` can be installed by ```conda install -c conda-forge -y gperftools=2.10```.
## A Quick Example
@@ -213,5 +211,5 @@ Now let's play with a real LLM. We'll be using the [phi-1.5](https://huggingface
## Tips & Troubleshooting
### Warmup for optimial performance on first run
-When running LLMs on GPU for the first time, you might notice the performance is lower than expected, with delays up to several minutes before the first token is generated. This delay occurs because the GPU kernels require compilation and initialization, which varies across different GPU models. To achieve optimal and consistent performance, we recommend a one-time warm-up by running `model.generate(...)` an additional time before starting your actual generation tasks. If you're developing an application, you can incorporate this warmup step into start-up or loading routine to enhance the user experience.
+When running LLMs on GPU for the first time, you might notice the performance is lower than expected, with delays up to several minutes before the first token is generated. This delay occurs because the GPU kernels require compilation and initialization, which varies across different GPU types. To achieve optimal and consistent performance, we recommend a one-time warm-up by running `model.generate(...)` an additional time before starting your actual generation tasks. If you're developing an application, you can incorporate this warmup step into start-up or loading routine to enhance the user experience.