diff --git a/docs/readthedocs/source/doc/LLM/DockerGuides/docker_pytorch_inference_gpu.md b/docs/readthedocs/source/doc/LLM/DockerGuides/docker_pytorch_inference_gpu.md index e3245d41..7a3d3550 100644 --- a/docs/readthedocs/source/doc/LLM/DockerGuides/docker_pytorch_inference_gpu.md +++ b/docs/readthedocs/source/doc/LLM/DockerGuides/docker_pytorch_inference_gpu.md @@ -2,6 +2,13 @@ We can run PyTorch Inference Benchmark, Chat Service and PyTorch Examples on Intel GPUs within Docker (on Linux or WSL). +```eval_rst +.. note:: + + The current Windows + WSL + Docker solution only supports Arc series dGPU. For Windows users with MTL iGPU, it is recommended to install directly via pip install in Anaconda Prompt. Refer to `this guide `_. + +``` + ## Install Docker Follow the [Docker installation Guide](./docker_windows_gpu.html#install-docker) to install docker on either Linux or Windows. @@ -10,7 +17,7 @@ Follow the [Docker installation Guide](./docker_windows_gpu.html#install-docker) Prepare ipex-llm-xpu Docker Image: ```bash -docker pull intelanalytics/ipex-llm-xpu:2.1.0-SNAPSHOT +docker pull intelanalytics/ipex-llm-xpu:latest ``` Start ipex-llm-xpu Docker Container: @@ -21,7 +28,7 @@ Start ipex-llm-xpu Docker Container: .. code-block:: bash - export DOCKER_IMAGE=intelanalytics/ipex-llm-xpu:2.1.0-SNAPSHOT + export DOCKER_IMAGE=intelanalytics/ipex-llm-xpu:latest export CONTAINER_NAME=my_container export MODEL_PATH=/llm/models[change to your model path] @@ -39,7 +46,7 @@ Start ipex-llm-xpu Docker Container: .. code-block:: bash #/bin/bash - export DOCKER_IMAGE=intelanalytics/ipex-llm-xpu:2.1.0-SNAPSHOT + export DOCKER_IMAGE=intelanalytics/ipex-llm-xpu:latest export CONTAINER_NAME=my_container export MODEL_PATH=/llm/models[change to your model path] diff --git a/docs/readthedocs/source/doc/LLM/DockerGuides/docker_windows_gpu.md b/docs/readthedocs/source/doc/LLM/DockerGuides/docker_windows_gpu.md index 6014229b..f0b2d6e4 100644 --- a/docs/readthedocs/source/doc/LLM/DockerGuides/docker_windows_gpu.md +++ b/docs/readthedocs/source/doc/LLM/DockerGuides/docker_windows_gpu.md @@ -80,13 +80,13 @@ We have several docker images available for running LLMs on Intel GPUs. The foll | Image Name | Description | Use Case | |------------|-------------|----------| -| intelanalytics/ipex-llm-cpu:2.1.0-SNAPSHOT | CPU Inference |For development and running LLMs using llama.cpp, Ollama and Python| -| intelanalytics/ipex-llm-xpu:2.1.0-SNAPSHOT | GPU Inference |For development and running LLMs using llama.cpp, Ollama and Python| -| intelanalytics/ipex-llm-serving-cpu:2.1.0-SNAPSHOT | CPU Serving|For serving multiple users/requests through REST APIs using vLLM/FastChat| -| intelanalytics/ipex-llm-serving-xpu:2.1.0-SNAPSHOT | GPU Serving|For serving multiple users/requests through REST APIs using vLLM/FastChat| -| intelanalytics/ipex-llm-finetune-qlora-cpu-standalone:2.1.0-SNAPSHOT | CPU Finetuning via Docker|For fine-tuning LLMs using QLora/Lora, etc. | -|intelanalytics/ipex-llm-finetune-qlora-cpu-k8s:2.1.0-SNAPSHOT|CPU Finetuning via Kubernetes|For fine-tuning LLMs using QLora/Lora, etc. | -| intelanalytics/ipex-llm-finetune-qlora-xpu:2.1.0-SNAPSHOT| GPU Finetuning|For fine-tuning LLMs using QLora/Lora, etc.| +| intelanalytics/ipex-llm-cpu:latest | CPU Inference |For development and running LLMs using llama.cpp, Ollama and Python| +| intelanalytics/ipex-llm-xpu:latest | GPU Inference |For development and running LLMs using llama.cpp, Ollama and Python| +| intelanalytics/ipex-llm-serving-cpu:latest | CPU Serving|For serving multiple users/requests through REST APIs using vLLM/FastChat| +| intelanalytics/ipex-llm-serving-xpu:latest | GPU Serving|For serving multiple users/requests through REST APIs using vLLM/FastChat| +| intelanalytics/ipex-llm-finetune-qlora-cpu-standalone:latest | CPU Finetuning via Docker|For fine-tuning LLMs using QLora/Lora, etc. | +|intelanalytics/ipex-llm-finetune-qlora-cpu-k8s:latest|CPU Finetuning via Kubernetes|For fine-tuning LLMs using QLora/Lora, etc. | +| intelanalytics/ipex-llm-finetune-qlora-xpu:latest| GPU Finetuning|For fine-tuning LLMs using QLora/Lora, etc.| We have also provided several quickstarts for various usage scenarios: - [Run and develop LLM applications in PyTorch](./docker_pytorch_inference_gpu.html)