Update mddocs for DockerGuides (#11380)
* transfer files in DockerGuides from rst to md * add some dividing lines * adjust the title hierarchy in docker_cpp_xpu_quickstart.md * restore * switch to the correct branch * small change --------- Co-authored-by: ATMxsp01 <shou.xu@intel.com>
This commit is contained in:
parent
1a1a97c9e4
commit
fed79f106b
9 changed files with 134 additions and 148 deletions
|
|
@ -1,4 +1,4 @@
|
||||||
## Run llama.cpp/Ollama/Open-WebUI on an Intel GPU via Docker
|
# Run llama.cpp/Ollama/Open-WebUI on an Intel GPU via Docker
|
||||||
|
|
||||||
## Quick Start
|
## Quick Start
|
||||||
|
|
||||||
|
|
@ -6,11 +6,11 @@
|
||||||
|
|
||||||
1. Linux Installation
|
1. Linux Installation
|
||||||
|
|
||||||
Follow the instructions in this [guide](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/DockerGuides/docker_windows_gpu.html#linux) to install Docker on Linux.
|
Follow the instructions in this [guide](./docker_windows_gpu.md#linux) to install Docker on Linux.
|
||||||
|
|
||||||
2. Windows Installation
|
2. Windows Installation
|
||||||
|
|
||||||
For Windows installation, refer to this [guide](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/DockerGuides/docker_windows_gpu.html#install-docker-desktop-for-windows).
|
For Windows installation, refer to this [guide](./docker_windows_gpu.md#install-docker-desktop-for-windows).
|
||||||
|
|
||||||
#### Setting Docker on windows
|
#### Setting Docker on windows
|
||||||
|
|
||||||
|
|
@ -24,18 +24,18 @@ docker pull intelanalytics/ipex-llm-inference-cpp-xpu:latest
|
||||||
|
|
||||||
### Start Docker Container
|
### Start Docker Container
|
||||||
|
|
||||||
```eval_rst
|
Choose one of the following methods to start the container:
|
||||||
.. tabs::
|
|
||||||
.. tab:: Linux
|
|
||||||
|
|
||||||
To map the `xpu` into the container, you need to specify `--device=/dev/dri` when booting the container. Select the device you are running(device type:(Max, Flex, Arc, iGPU)). And change the `/path/to/models` to mount the models. `bench_model` is used to benchmark quickly. If want to benchmark, make sure it on the `/path/to/models`
|
<details>
|
||||||
|
<Summary>For <strong>Linux</strong>:</summary>
|
||||||
|
|
||||||
.. code-block:: bash
|
To map the `xpu` into the container, you need to specify `--device=/dev/dri` when booting the container. Select the device you are running(device type:(Max, Flex, Arc, iGPU)). And change the `/path/to/models` to mount the models. `bench_model` is used to benchmark quickly. If want to benchmark, make sure it on the `/path/to/models`
|
||||||
|
|
||||||
#/bin/bash
|
```bash
|
||||||
export DOCKER_IMAGE=intelanalytics/ipex-llm-inference-cpp-xpu:latest
|
#/bin/bash
|
||||||
export CONTAINER_NAME=ipex-llm-inference-cpp-xpu-container
|
export DOCKER_IMAGE=intelanalytics/ipex-llm-inference-cpp-xpu:latest
|
||||||
sudo docker run -itd \
|
export CONTAINER_NAME=ipex-llm-inference-cpp-xpu-container
|
||||||
|
sudo docker run -itd \
|
||||||
--net=host \
|
--net=host \
|
||||||
--device=/dev/dri \
|
--device=/dev/dri \
|
||||||
-v /path/to/models:/models \
|
-v /path/to/models:/models \
|
||||||
|
|
@ -46,17 +46,19 @@ docker pull intelanalytics/ipex-llm-inference-cpp-xpu:latest
|
||||||
-e DEVICE=Arc \
|
-e DEVICE=Arc \
|
||||||
--shm-size="16g" \
|
--shm-size="16g" \
|
||||||
$DOCKER_IMAGE
|
$DOCKER_IMAGE
|
||||||
|
```
|
||||||
|
</details>
|
||||||
|
|
||||||
.. tab:: Windows
|
<details>
|
||||||
|
<summary>For <strong>Windows</strong>:</summary>
|
||||||
|
|
||||||
To map the `xpu` into the container, you need to specify `--device=/dev/dri` when booting the container. And change the `/path/to/models` to mount the models. Then add `--privileged` and map the `/usr/lib/wsl` to the docker.
|
To map the `xpu` into the container, you need to specify `--device=/dev/dri` when booting the container. And change the `/path/to/models` to mount the models. Then add `--privileged` and map the `/usr/lib/wsl` to the docker.
|
||||||
|
|
||||||
.. code-block:: bash
|
```bash
|
||||||
|
#/bin/bash
|
||||||
#/bin/bash
|
export DOCKER_IMAGE=intelanalytics/ipex-llm-inference-cpp-xpu:latest
|
||||||
export DOCKER_IMAGE=intelanalytics/ipex-llm-inference-cpp-xpu:latest
|
export CONTAINER_NAME=ipex-llm-inference-cpp-xpu-container
|
||||||
export CONTAINER_NAME=ipex-llm-inference-cpp-xpu-container
|
sudo docker run -itd \
|
||||||
sudo docker run -itd \
|
|
||||||
--net=host \
|
--net=host \
|
||||||
--device=/dev/dri \
|
--device=/dev/dri \
|
||||||
--privileged \
|
--privileged \
|
||||||
|
|
@ -69,9 +71,10 @@ docker pull intelanalytics/ipex-llm-inference-cpp-xpu:latest
|
||||||
-e DEVICE=Arc \
|
-e DEVICE=Arc \
|
||||||
--shm-size="16g" \
|
--shm-size="16g" \
|
||||||
$DOCKER_IMAGE
|
$DOCKER_IMAGE
|
||||||
|
```
|
||||||
|
</details>
|
||||||
|
|
||||||
```
|
---
|
||||||
|
|
||||||
|
|
||||||
After the container is booted, you could get into the container through `docker exec`.
|
After the container is booted, you could get into the container through `docker exec`.
|
||||||
|
|
||||||
|
|
@ -126,7 +129,7 @@ llama_print_timings: eval time = xxx ms / 31 runs ( xxx ms per
|
||||||
llama_print_timings: total time = xxx ms / xxx tokens
|
llama_print_timings: total time = xxx ms / xxx tokens
|
||||||
```
|
```
|
||||||
|
|
||||||
Please refer to this [documentation](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/llama_cpp_quickstart.html) for more details.
|
Please refer to this [documentation](../Quickstart/llama_cpp_quickstart.md) for more details.
|
||||||
|
|
||||||
|
|
||||||
### Running Ollama serving with IPEX-LLM on Intel GPU
|
### Running Ollama serving with IPEX-LLM on Intel GPU
|
||||||
|
|
@ -194,13 +197,13 @@ Sample output:
|
||||||
```
|
```
|
||||||
|
|
||||||
|
|
||||||
Please refer to this [documentation](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/ollama_quickstart.html#pull-model) for more details.
|
Please refer to this [documentation](../Quickstart/ollama_quickstart.md#4-pull-model) for more details.
|
||||||
|
|
||||||
|
|
||||||
### Running Open WebUI with Intel GPU
|
### Running Open WebUI with Intel GPU
|
||||||
|
|
||||||
Start the ollama and load the model first, then use the open-webui to chat.
|
Start the ollama and load the model first, then use the open-webui to chat.
|
||||||
If you have difficulty accessing the huggingface repositories, you may use a mirror, e.g. add export HF_ENDPOINT=https://hf-mirror.com before running bash start.sh.
|
If you have difficulty accessing the huggingface repositories, you may use a mirror, e.g. add `export HF_ENDPOINT=https://hf-mirror.com`before running bash start.sh.
|
||||||
```bash
|
```bash
|
||||||
cd /llm/scripts/
|
cd /llm/scripts/
|
||||||
bash start-open-webui.sh
|
bash start-open-webui.sh
|
||||||
|
|
@ -218,4 +221,4 @@ INFO: Uvicorn running on http://0.0.0.0:8080 (Press CTRL+C to quit)
|
||||||
<img src="https://llm-assets.readthedocs.io/en/latest/_images/open_webui_signup.png" width="100%" />
|
<img src="https://llm-assets.readthedocs.io/en/latest/_images/open_webui_signup.png" width="100%" />
|
||||||
</a>
|
</a>
|
||||||
|
|
||||||
For how to log-in or other guide, Please refer to this [documentation](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/open_webui_with_ollama_quickstart.html) for more details.
|
For how to log-in or other guide, Please refer to this [documentation](../Quickstart/open_webui_with_ollama_quickstart.md) for more details.
|
||||||
|
|
|
||||||
|
|
@ -2,16 +2,12 @@
|
||||||
|
|
||||||
We can run PyTorch Inference Benchmark, Chat Service and PyTorch Examples on Intel GPUs within Docker (on Linux or WSL).
|
We can run PyTorch Inference Benchmark, Chat Service and PyTorch Examples on Intel GPUs within Docker (on Linux or WSL).
|
||||||
|
|
||||||
```eval_rst
|
> [!NOTE]
|
||||||
.. note::
|
> The current Windows + WSL + Docker solution only supports Arc series dGPU. For Windows users with MTL iGPU, it is recommended to install directly via pip install in Miniforge Prompt. Refer to [this guide](../Quickstart/install_windows_gpu.md).
|
||||||
|
|
||||||
The current Windows + WSL + Docker solution only supports Arc series dGPU. For Windows users with MTL iGPU, it is recommended to install directly via pip install in Miniforge Prompt. Refer to `this guide <https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/install_windows_gpu.html>`_.
|
|
||||||
|
|
||||||
```
|
|
||||||
|
|
||||||
## Install Docker
|
## Install Docker
|
||||||
|
|
||||||
Follow the [Docker installation Guide](./docker_windows_gpu.html#install-docker) to install docker on either Linux or Windows.
|
Follow the [Docker installation Guide](./docker_windows_gpu.md#install-docker) to install docker on either Linux or Windows.
|
||||||
|
|
||||||
## Launch Docker
|
## Launch Docker
|
||||||
|
|
||||||
|
|
@ -20,19 +16,17 @@ Prepare ipex-llm-xpu Docker Image:
|
||||||
docker pull intelanalytics/ipex-llm-xpu:latest
|
docker pull intelanalytics/ipex-llm-xpu:latest
|
||||||
```
|
```
|
||||||
|
|
||||||
Start ipex-llm-xpu Docker Container:
|
Start ipex-llm-xpu Docker Container. Choose one of the following commands to start the container:
|
||||||
|
|
||||||
```eval_rst
|
<details>
|
||||||
.. tabs::
|
<summary>For <strong>Linux</strong>:</summary>
|
||||||
.. tab:: Linux
|
|
||||||
|
|
||||||
.. code-block:: bash
|
```bash
|
||||||
|
export DOCKER_IMAGE=intelanalytics/ipex-llm-xpu:latest
|
||||||
|
export CONTAINER_NAME=my_container
|
||||||
|
export MODEL_PATH=/llm/models[change to your model path]
|
||||||
|
|
||||||
export DOCKER_IMAGE=intelanalytics/ipex-llm-xpu:latest
|
docker run -itd \
|
||||||
export CONTAINER_NAME=my_container
|
|
||||||
export MODEL_PATH=/llm/models[change to your model path]
|
|
||||||
|
|
||||||
docker run -itd \
|
|
||||||
--net=host \
|
--net=host \
|
||||||
--device=/dev/dri \
|
--device=/dev/dri \
|
||||||
--memory="32G" \
|
--memory="32G" \
|
||||||
|
|
@ -40,17 +34,19 @@ Start ipex-llm-xpu Docker Container:
|
||||||
--shm-size="16g" \
|
--shm-size="16g" \
|
||||||
-v $MODEL_PATH:/llm/models \
|
-v $MODEL_PATH:/llm/models \
|
||||||
$DOCKER_IMAGE
|
$DOCKER_IMAGE
|
||||||
|
```
|
||||||
|
</details>
|
||||||
|
|
||||||
.. tab:: Windows WSL
|
<details>
|
||||||
|
<summary>For <strong>Windows WSL</strong>:</summary>
|
||||||
|
|
||||||
.. code-block:: bash
|
```bash
|
||||||
|
#/bin/bash
|
||||||
|
export DOCKER_IMAGE=intelanalytics/ipex-llm-xpu:latest
|
||||||
|
export CONTAINER_NAME=my_container
|
||||||
|
export MODEL_PATH=/llm/models[change to your model path]
|
||||||
|
|
||||||
#/bin/bash
|
sudo docker run -itd \
|
||||||
export DOCKER_IMAGE=intelanalytics/ipex-llm-xpu:latest
|
|
||||||
export CONTAINER_NAME=my_container
|
|
||||||
export MODEL_PATH=/llm/models[change to your model path]
|
|
||||||
|
|
||||||
sudo docker run -itd \
|
|
||||||
--net=host \
|
--net=host \
|
||||||
--privileged \
|
--privileged \
|
||||||
--device /dev/dri \
|
--device /dev/dri \
|
||||||
|
|
@ -60,8 +56,10 @@ Start ipex-llm-xpu Docker Container:
|
||||||
-v $MODEL_PATH:/llm/llm-models \
|
-v $MODEL_PATH:/llm/llm-models \
|
||||||
-v /usr/lib/wsl:/usr/lib/wsl \
|
-v /usr/lib/wsl:/usr/lib/wsl \
|
||||||
$DOCKER_IMAGE
|
$DOCKER_IMAGE
|
||||||
```
|
```
|
||||||
|
</details>
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
Access the container:
|
Access the container:
|
||||||
```
|
```
|
||||||
|
|
@ -77,18 +75,13 @@ root@arda-arc12:/# sycl-ls
|
||||||
[ext_oneapi_level_zero:gpu:0] Intel(R) Level-Zero, Intel(R) Arc(TM) A770 Graphics 1.3 [1.3.26241]
|
[ext_oneapi_level_zero:gpu:0] Intel(R) Level-Zero, Intel(R) Arc(TM) A770 Graphics 1.3 [1.3.26241]
|
||||||
```
|
```
|
||||||
|
|
||||||
```eval_rst
|
> [!TIP]
|
||||||
.. tip::
|
> You can run the Env-Check script to verify your ipex-llm installation and runtime environment.
|
||||||
|
>
|
||||||
You can run the Env-Check script to verify your ipex-llm installation and runtime environment.
|
> ```bash
|
||||||
|
> cd /ipex-llm/python/llm/scripts
|
||||||
.. code-block:: bash
|
> bash env-check.sh
|
||||||
|
> ```
|
||||||
cd /ipex-llm/python/llm/scripts
|
|
||||||
bash env-check.sh
|
|
||||||
|
|
||||||
|
|
||||||
```
|
|
||||||
|
|
||||||
## Run Inference Benchmark
|
## Run Inference Benchmark
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -4,21 +4,18 @@ An IPEX-LLM container is a pre-configured environment that includes all necessar
|
||||||
|
|
||||||
This guide provides steps to run/develop PyTorch examples in VSCode with Docker on Intel GPUs.
|
This guide provides steps to run/develop PyTorch examples in VSCode with Docker on Intel GPUs.
|
||||||
|
|
||||||
```eval_rst
|
|
||||||
.. note::
|
|
||||||
|
|
||||||
This guide assumes you have already installed VSCode in your environment.
|
> [!note]
|
||||||
|
> This guide assumes you have already installed VSCode in your environment.
|
||||||
To run/develop on Windows, install VSCode and then follow the steps below.
|
>
|
||||||
|
> To run/develop on Windows, install VSCode and then follow the steps below.
|
||||||
To run/develop on Linux, you might open VSCode first and SSH to a remote Linux machine, then proceed with the following steps.
|
>
|
||||||
|
> To run/develop on Linux, you might open VSCode first and SSH to a remote Linux machine, then proceed with the following steps.
|
||||||
```
|
|
||||||
|
|
||||||
|
|
||||||
## Install Docker
|
## Install Docker
|
||||||
|
|
||||||
Follow the [Docker installation Guide](./docker_windows_gpu.html#install-docker) to install docker on either Linux or Windows.
|
Follow the [Docker installation Guide](./docker_windows_gpu.md#install-docker) to install docker on either Linux or Windows.
|
||||||
|
|
||||||
## Install Extensions for VSCcode
|
## Install Extensions for VSCcode
|
||||||
|
|
||||||
|
|
@ -52,19 +49,18 @@ Open the Terminal in VSCode (you can use the shortcut `` Ctrl+Shift+` ``), then
|
||||||
docker pull intelanalytics/ipex-llm-xpu:latest
|
docker pull intelanalytics/ipex-llm-xpu:latest
|
||||||
```
|
```
|
||||||
|
|
||||||
Start ipex-llm-xpu Docker Container:
|
Start ipex-llm-xpu Docker Container. Choose one of the following commands to start the container:
|
||||||
|
|
||||||
```eval_rst
|
<details>
|
||||||
.. tabs::
|
<summary>For <strong>Linux</strong>:</summary>
|
||||||
.. tab:: Linux
|
|
||||||
|
|
||||||
.. code-block:: bash
|
```bash
|
||||||
|
|
||||||
export DOCKER_IMAGE=intelanalytics/ipex-llm-xpu:latest
|
export DOCKER_IMAGE=intelanalytics/ipex-llm-xpu:latest
|
||||||
export CONTAINER_NAME=my_container
|
export CONTAINER_NAME=my_container
|
||||||
export MODEL_PATH=/llm/models[change to your model path]
|
export MODEL_PATH=/llm/models[change to your model path]
|
||||||
|
|
||||||
docker run -itd \
|
docker run -itd \
|
||||||
--net=host \
|
--net=host \
|
||||||
--device=/dev/dri \
|
--device=/dev/dri \
|
||||||
--memory="32G" \
|
--memory="32G" \
|
||||||
|
|
@ -72,17 +68,19 @@ Start ipex-llm-xpu Docker Container:
|
||||||
--shm-size="16g" \
|
--shm-size="16g" \
|
||||||
-v $MODEL_PATH:/llm/models \
|
-v $MODEL_PATH:/llm/models \
|
||||||
$DOCKER_IMAGE
|
$DOCKER_IMAGE
|
||||||
|
```
|
||||||
|
</details>
|
||||||
|
|
||||||
.. tab:: Windows WSL
|
<details>
|
||||||
|
<summary>For <strong>Windows WSL</strong>:</summary>
|
||||||
|
|
||||||
.. code-block:: bash
|
```bash
|
||||||
|
#/bin/bash
|
||||||
|
export DOCKER_IMAGE=intelanalytics/ipex-llm-xpu:latest
|
||||||
|
export CONTAINER_NAME=my_container
|
||||||
|
export MODEL_PATH=/llm/models[change to your model path]
|
||||||
|
|
||||||
#/bin/bash
|
sudo docker run -itd \
|
||||||
export DOCKER_IMAGE=intelanalytics/ipex-llm-xpu:latest
|
|
||||||
export CONTAINER_NAME=my_container
|
|
||||||
export MODEL_PATH=/llm/models[change to your model path]
|
|
||||||
|
|
||||||
sudo docker run -itd \
|
|
||||||
--net=host \
|
--net=host \
|
||||||
--privileged \
|
--privileged \
|
||||||
--device /dev/dri \
|
--device /dev/dri \
|
||||||
|
|
@ -92,8 +90,10 @@ Start ipex-llm-xpu Docker Container:
|
||||||
-v $MODEL_PATH:/llm/llm-models \
|
-v $MODEL_PATH:/llm/llm-models \
|
||||||
-v /usr/lib/wsl:/usr/lib/wsl \
|
-v /usr/lib/wsl:/usr/lib/wsl \
|
||||||
$DOCKER_IMAGE
|
$DOCKER_IMAGE
|
||||||
```
|
```
|
||||||
|
</details>
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
## Run/Develop Pytorch Examples
|
## Run/Develop Pytorch Examples
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -14,18 +14,12 @@ Follow the instructions in the [Offcial Docker Guide](https://www.docker.com/get
|
||||||
|
|
||||||
### Windows
|
### Windows
|
||||||
|
|
||||||
```eval_rst
|
> [!TIP]
|
||||||
.. tip::
|
> The installation requires at least 35GB of free disk space on C drive.
|
||||||
|
|
||||||
The installation requires at least 35GB of free disk space on C drive.
|
> [!NOTE]
|
||||||
|
> Detailed installation instructions for Windows, including steps for enabling WSL2, can be found on the [Docker Desktop for Windows installation page](https://docs.docker.com/desktop/install/windows-install/).
|
||||||
|
|
||||||
```
|
|
||||||
```eval_rst
|
|
||||||
.. note::
|
|
||||||
|
|
||||||
Detailed installation instructions for Windows, including steps for enabling WSL2, can be found on the [Docker Desktop for Windows installation page](https://docs.docker.com/desktop/install/windows-install/).
|
|
||||||
|
|
||||||
```
|
|
||||||
|
|
||||||
#### Install Docker Desktop for Windows
|
#### Install Docker Desktop for Windows
|
||||||
Follow the instructions in [this guide](https://docs.docker.com/desktop/install/windows-install/) to install **Docker Desktop for Windows**. Restart you machine after the installation is complete.
|
Follow the instructions in [this guide](https://docs.docker.com/desktop/install/windows-install/) to install **Docker Desktop for Windows**. Restart you machine after the installation is complete.
|
||||||
|
|
@ -34,11 +28,9 @@ Follow the instructions in [this guide](https://docs.docker.com/desktop/install/
|
||||||
|
|
||||||
Follow the instructions in [this guide](https://docs.microsoft.com/en-us/windows/wsl/install) to install **Windows Subsystem for Linux 2 (WSL2)**.
|
Follow the instructions in [this guide](https://docs.microsoft.com/en-us/windows/wsl/install) to install **Windows Subsystem for Linux 2 (WSL2)**.
|
||||||
|
|
||||||
```eval_rst
|
> [!TIP]
|
||||||
.. tip::
|
> You may verify WSL2 installation by running the command `wsl --list` in PowerShell or Command Prompt. If WSL2 is installed, you will see a list of installed Linux distributions.
|
||||||
|
|
||||||
You may verify WSL2 installation by running the command `wsl --list` in PowerShell or Command Prompt. If WSL2 is installed, you will see a list of installed Linux distributions.
|
|
||||||
```
|
|
||||||
|
|
||||||
#### Enable Docker integration with WSL2
|
#### Enable Docker integration with WSL2
|
||||||
|
|
||||||
|
|
@ -47,11 +39,10 @@ Open **Docker desktop**, and select `Settings`->`Resources`->`WSL integration`->
|
||||||
<img src="https://llm-assets.readthedocs.io/en/latest/_images/docker_desktop_new.png" width=100%; />
|
<img src="https://llm-assets.readthedocs.io/en/latest/_images/docker_desktop_new.png" width=100%; />
|
||||||
</a>
|
</a>
|
||||||
|
|
||||||
```eval_rst
|
|
||||||
.. tip::
|
|
||||||
|
|
||||||
If you encounter **Docker Engine stopped** when opening Docker Desktop, you can reopen it in administrator mode.
|
> [!TIP]
|
||||||
```
|
> If you encounter **Docker Engine stopped** when opening Docker Desktop, you can reopen it in administrator mode.
|
||||||
|
|
||||||
|
|
||||||
#### Verify Docker is enabled in WSL2
|
#### Verify Docker is enabled in WSL2
|
||||||
|
|
||||||
|
|
@ -67,11 +58,9 @@ You can see the output similar to the following:
|
||||||
<img src="https://llm-assets.readthedocs.io/en/latest/_images/docker_wsl.png" width=100%; />
|
<img src="https://llm-assets.readthedocs.io/en/latest/_images/docker_wsl.png" width=100%; />
|
||||||
</a>
|
</a>
|
||||||
|
|
||||||
```eval_rst
|
|
||||||
.. tip::
|
|
||||||
|
|
||||||
During the use of Docker in WSL, Docker Desktop needs to be kept open all the time.
|
> [!TIP]
|
||||||
```
|
> During the use of Docker in WSL, Docker Desktop needs to be kept open all the time.
|
||||||
|
|
||||||
|
|
||||||
## IPEX-LLM Docker Containers
|
## IPEX-LLM Docker Containers
|
||||||
|
|
@ -89,7 +78,7 @@ We have several docker images available for running LLMs on Intel GPUs. The foll
|
||||||
| intelanalytics/ipex-llm-finetune-qlora-xpu:latest| GPU Finetuning|For fine-tuning LLMs using QLora/Lora, etc.|
|
| intelanalytics/ipex-llm-finetune-qlora-xpu:latest| GPU Finetuning|For fine-tuning LLMs using QLora/Lora, etc.|
|
||||||
|
|
||||||
We have also provided several quickstarts for various usage scenarios:
|
We have also provided several quickstarts for various usage scenarios:
|
||||||
- [Run and develop LLM applications in PyTorch](./docker_pytorch_inference_gpu.html)
|
- [Run and develop LLM applications in PyTorch](./docker_pytorch_inference_gpu.md)
|
||||||
|
|
||||||
... to be added soon.
|
... to be added soon.
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -4,7 +4,7 @@ This guide demonstrates how to run `FastChat` serving with `IPEX-LLM` on Intel G
|
||||||
|
|
||||||
## Install docker
|
## Install docker
|
||||||
|
|
||||||
Follow the instructions in this [guide](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/DockerGuides/docker_windows_gpu.html#linux) to install Docker on Linux.
|
Follow the instructions in this [guide](./docker_windows_gpu.md#linux) to install Docker on Linux.
|
||||||
|
|
||||||
## Pull the latest image
|
## Pull the latest image
|
||||||
|
|
||||||
|
|
@ -17,7 +17,7 @@ docker pull intelanalytics/ipex-llm-serving-xpu:latest
|
||||||
|
|
||||||
To map the `xpu` into the container, you need to specify `--device=/dev/dri` when booting the container. Change the `/path/to/models` to mount the models.
|
To map the `xpu` into the container, you need to specify `--device=/dev/dri` when booting the container. Change the `/path/to/models` to mount the models.
|
||||||
|
|
||||||
```
|
```bash
|
||||||
#/bin/bash
|
#/bin/bash
|
||||||
export DOCKER_IMAGE=intelanalytics/ipex-llm-serving-xpu:latest
|
export DOCKER_IMAGE=intelanalytics/ipex-llm-serving-xpu:latest
|
||||||
export CONTAINER_NAME=ipex-llm-serving-xpu-container
|
export CONTAINER_NAME=ipex-llm-serving-xpu-container
|
||||||
|
|
@ -54,9 +54,9 @@ root@arda-arc12:/# sycl-ls
|
||||||
|
|
||||||
For convenience, we have provided a script named `/llm/start-fastchat-service.sh` for you to start the service.
|
For convenience, we have provided a script named `/llm/start-fastchat-service.sh` for you to start the service.
|
||||||
|
|
||||||
However, the script only provide instructions for the most common scenarios. If this script doesn't meet your needs, you can always find the complete guidance for FastChat at [Serving using IPEX-LLM and FastChat](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/fastchat_quickstart.html#start-the-service).
|
However, the script only provide instructions for the most common scenarios. If this script doesn't meet your needs, you can always find the complete guidance for FastChat at [Serving using IPEX-LLM and FastChat](../Quickstart/fastchat_quickstart.md#2-start-the-service).
|
||||||
|
|
||||||
Before starting the service, you can refer to this [section](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/install_linux_gpu.html#runtime-configurations) to setup our recommended runtime configurations.
|
Before starting the service, you can refer to this [section](../Quickstart/install_linux_gpu.md#runtime-configurations) to setup our recommended runtime configurations.
|
||||||
|
|
||||||
Now we can start the FastChat service, you can use our provided script `/llm/start-fastchat-service.sh` like the following way:
|
Now we can start the FastChat service, you can use our provided script `/llm/start-fastchat-service.sh` like the following way:
|
||||||
|
|
||||||
|
|
@ -105,10 +105,10 @@ The `vllm_worker` may start slowly than normal `ipex_llm_worker`. The booted se
|
||||||
</a>
|
</a>
|
||||||
|
|
||||||
|
|
||||||
```eval_rst
|
|
||||||
.. note::
|
> [!note]
|
||||||
To verify/use the service booted by the script, follow the instructions in `this guide <https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/fastchat_quickstart.html#launch-restful-api-serve>`_.
|
> To verify/use the service booted by the script, follow the instructions in [this guide](../Quickstart/fastchat_quickstart.md#launch-restful-api-server).
|
||||||
```
|
|
||||||
|
|
||||||
After a request has been sent to the `openai_api_server`, the corresponding inference time latency can be found in the worker log as shown below:
|
After a request has been sent to the `openai_api_server`, the corresponding inference time latency can be found in the worker log as shown below:
|
||||||
|
|
||||||
|
|
|
||||||
16
docs/mddocs/DockerGuides/index.md
Normal file
16
docs/mddocs/DockerGuides/index.md
Normal file
|
|
@ -0,0 +1,16 @@
|
||||||
|
# IPEX-LLM Docker Container User Guides
|
||||||
|
|
||||||
|
|
||||||
|
In this section, you will find guides related to using IPEX-LLM with Docker, covering how to:
|
||||||
|
|
||||||
|
- [Overview of IPEX-LLM Containers](./docker_windows_gpu.md)
|
||||||
|
|
||||||
|
- Inference in Python/C++
|
||||||
|
- [GPU Inference in Python with IPEX-LLM](./docker_pytorch_inference_gpu.md)
|
||||||
|
- [VSCode LLM Development with IPEX-LLM on Intel GPU](./docker_run_pytorch_inference_in_vscode.md)
|
||||||
|
- [llama.cpp/Ollama/Open-WebUI with IPEX-LLM on Intel GPU](./docker_cpp_xpu_quickstart.md)
|
||||||
|
|
||||||
|
- Serving
|
||||||
|
- [FastChat with IPEX-LLM on Intel GPU](./fastchat_docker_quickstart.md)
|
||||||
|
- [vLLM with IPEX-LLM on Intel GPU](./vllm_docker_quickstart.md)
|
||||||
|
- [vLLM with IPEX-LLM on Intel CPU](./vllm_cpu_docker_quickstart.md)
|
||||||
|
|
@ -1,15 +0,0 @@
|
||||||
IPEX-LLM Docker Container User Guides
|
|
||||||
=====================================
|
|
||||||
|
|
||||||
In this section, you will find guides related to using IPEX-LLM with Docker, covering how to:
|
|
||||||
|
|
||||||
* `Overview of IPEX-LLM Containers <./docker_windows_gpu.html>`_
|
|
||||||
|
|
||||||
* Inference in Python/C++
|
|
||||||
* `GPU Inference in Python with IPEX-LLM <./docker_pytorch_inference_gpu.html>`_
|
|
||||||
* `VSCode LLM Development with IPEX-LLM on Intel GPU <./docker_pytorch_inference_gpu.html>`_
|
|
||||||
* `llama.cpp/Ollama/Open-WebUI with IPEX-LLM on Intel GPU <./docker_cpp_xpu_quickstart.html>`_
|
|
||||||
* Serving
|
|
||||||
* `FastChat with IPEX-LLM on Intel GPU <./fastchat_docker_quickstart.html>`_
|
|
||||||
* `vLLM with IPEX-LLM on Intel GPU <./vllm_docker_quickstart.html>`_
|
|
||||||
* `vLLM with IPEX-LLM on Intel CPU <./vllm_cpu_docker_quickstart.html>`_
|
|
||||||
|
|
@ -18,7 +18,7 @@ docker pull intelanalytics/ipex-llm-serving-cpu:latest
|
||||||
## Start Docker Container
|
## Start Docker Container
|
||||||
|
|
||||||
To fully use your Intel CPU to run vLLM inference and serving, you should
|
To fully use your Intel CPU to run vLLM inference and serving, you should
|
||||||
```
|
```bash
|
||||||
#/bin/bash
|
#/bin/bash
|
||||||
export DOCKER_IMAGE=intelanalytics/ipex-llm-serving-cpu:latest
|
export DOCKER_IMAGE=intelanalytics/ipex-llm-serving-cpu:latest
|
||||||
export CONTAINER_NAME=ipex-llm-serving-cpu-container
|
export CONTAINER_NAME=ipex-llm-serving-cpu-container
|
||||||
|
|
@ -48,7 +48,7 @@ We have included multiple vLLM-related files in `/llm/`:
|
||||||
3. `payload-1024.lua`: Used for testing request per second using 1k-128 request
|
3. `payload-1024.lua`: Used for testing request per second using 1k-128 request
|
||||||
4. `start-vllm-service.sh`: Used for template for starting vLLM service
|
4. `start-vllm-service.sh`: Used for template for starting vLLM service
|
||||||
|
|
||||||
Before performing benchmark or starting the service, you can refer to this [section](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Overview/install_cpu.html#environment-setup) to setup our recommended runtime configurations.
|
Before performing benchmark or starting the service, you can refer to this [section](../Overview/install_cpu.md#environment-setup) to setup our recommended runtime configurations.
|
||||||
|
|
||||||
### Service
|
### Service
|
||||||
|
|
||||||
|
|
@ -92,7 +92,7 @@ You can tune the service using these four arguments:
|
||||||
- `--max-num-batched-token`
|
- `--max-num-batched-token`
|
||||||
- `--max-num-seq`
|
- `--max-num-seq`
|
||||||
|
|
||||||
You can refer to this [doc](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/vLLM_quickstart.html#service) for a detailed explaination on these parameters.
|
You can refer to this [doc](../Quickstart/vLLM_quickstart.md#service) for a detailed explaination on these parameters.
|
||||||
|
|
||||||
### Benchmark
|
### Benchmark
|
||||||
|
|
||||||
|
|
@ -115,4 +115,4 @@ wrk -t8 -c8 -d15m -s payload-1024.lua http://localhost:8000/v1/completions --tim
|
||||||
|
|
||||||
#### Offline benchmark through benchmark_vllm_throughput.py
|
#### Offline benchmark through benchmark_vllm_throughput.py
|
||||||
|
|
||||||
Please refer to this [section](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/vLLM_quickstart.html#performing-benchmark) on how to use `benchmark_vllm_throughput.py` for benchmarking.
|
Please refer to this [section](../Quickstart/vLLM_quickstart.md#5performing-benchmark) on how to use `benchmark_vllm_throughput.py` for benchmarking.
|
||||||
|
|
|
||||||
|
|
@ -4,7 +4,7 @@ This guide demonstrates how to run `vLLM` serving with `IPEX-LLM` on Intel GPUs
|
||||||
|
|
||||||
## Install docker
|
## Install docker
|
||||||
|
|
||||||
Follow the instructions in this [guide](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/DockerGuides/docker_windows_gpu.html#linux) to install Docker on Linux.
|
Follow the instructions in this [guide](./docker_windows_gpu.md#linux) to install Docker on Linux.
|
||||||
|
|
||||||
## Pull the latest image
|
## Pull the latest image
|
||||||
|
|
||||||
|
|
@ -18,7 +18,7 @@ docker pull intelanalytics/ipex-llm-serving-xpu:latest
|
||||||
|
|
||||||
To map the `xpu` into the container, you need to specify `--device=/dev/dri` when booting the container. Change the `/path/to/models` to mount the models.
|
To map the `xpu` into the container, you need to specify `--device=/dev/dri` when booting the container. Change the `/path/to/models` to mount the models.
|
||||||
|
|
||||||
```
|
```bash
|
||||||
#/bin/bash
|
#/bin/bash
|
||||||
export DOCKER_IMAGE=intelanalytics/ipex-llm-serving-xpu:latest
|
export DOCKER_IMAGE=intelanalytics/ipex-llm-serving-xpu:latest
|
||||||
export CONTAINER_NAME=ipex-llm-serving-xpu-container
|
export CONTAINER_NAME=ipex-llm-serving-xpu-container
|
||||||
|
|
@ -58,7 +58,7 @@ We have included multiple vLLM-related files in `/llm/`:
|
||||||
3. `payload-1024.lua`: Used for testing request per second using 1k-128 request
|
3. `payload-1024.lua`: Used for testing request per second using 1k-128 request
|
||||||
4. `start-vllm-service.sh`: Used for template for starting vLLM service
|
4. `start-vllm-service.sh`: Used for template for starting vLLM service
|
||||||
|
|
||||||
Before performing benchmark or starting the service, you can refer to this [section](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/install_linux_gpu.html#runtime-configurations) to setup our recommended runtime configurations.
|
Before performing benchmark or starting the service, you can refer to this [section](../Quickstart/install_linux_gpu.md#runtime-configurations) to setup our recommended runtime configurations.
|
||||||
|
|
||||||
|
|
||||||
### Service
|
### Service
|
||||||
|
|
@ -82,7 +82,7 @@ If the service have booted successfully, you should see the output similar to th
|
||||||
|
|
||||||
vLLM supports to utilize multiple cards through tensor parallel.
|
vLLM supports to utilize multiple cards through tensor parallel.
|
||||||
|
|
||||||
You can refer to this [documentation](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/vLLM_quickstart.html#about-tensor-parallel) on how to utilize the `tensor-parallel` feature and start the service.
|
You can refer to this [documentation](../Quickstart/vLLM_quickstart.md#4-about-tensor-parallel) on how to utilize the `tensor-parallel` feature and start the service.
|
||||||
|
|
||||||
#### Verify
|
#### Verify
|
||||||
After the service has been booted successfully, you can send a test request using `curl`. Here, `YOUR_MODEL` should be set equal to `served_model_name` in your booting script, e.g. `Qwen1.5`.
|
After the service has been booted successfully, you can send a test request using `curl`. Here, `YOUR_MODEL` should be set equal to `served_model_name` in your booting script, e.g. `Qwen1.5`.
|
||||||
|
|
@ -113,7 +113,7 @@ You can tune the service using these four arguments:
|
||||||
- `--max-num-batched-token`
|
- `--max-num-batched-token`
|
||||||
- `--max-num-seq`
|
- `--max-num-seq`
|
||||||
|
|
||||||
You can refer to this [doc](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/vLLM_quickstart.html#service) for a detailed explaination on these parameters.
|
You can refer to this [doc](../Quickstart/vLLM_quickstart.md#service) for a detailed explaination on these parameters.
|
||||||
|
|
||||||
### Benchmark
|
### Benchmark
|
||||||
|
|
||||||
|
|
@ -143,4 +143,4 @@ The following figure shows performing benchmark on `Llama-2-7b-chat-hf` using th
|
||||||
|
|
||||||
#### Offline benchmark through benchmark_vllm_throughput.py
|
#### Offline benchmark through benchmark_vllm_throughput.py
|
||||||
|
|
||||||
Please refer to this [section](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/vLLM_quickstart.html#performing-benchmark) on how to use `benchmark_vllm_throughput.py` for benchmarking.
|
Please refer to this [section](../Quickstart/vLLM_quickstart.md#5performing-benchmark) on how to use `benchmark_vllm_throughput.py` for benchmarking.
|
||||||
|
|
|
||||||
Loading…
Reference in a new issue