From 8fdc8fb197027f787fd0ffd92ce04e5d0757d619 Mon Sep 17 00:00:00 2001 From: Shaojun Liu <61072813+liu-shaojun@users.noreply.github.com> Date: Wed, 22 May 2024 09:29:42 +0800 Subject: [PATCH] Quickstart: Run/Develop PyTorch in VSCode with Docker on Intel GPU (#11070) * add quickstart: Run/Develop PyTorch in VSCode with Docker on Intel GPU * add gif * update index.rst * update link * update GIFs --- .../source/_templates/sidebar_quicklinks.html | 3 + docs/readthedocs/source/_toc.yml | 1 + .../docker_run_pytorch_inference_in_vscode.md | 139 ++++++++++++++++++ .../source/doc/LLM/DockerGuides/index.rst | 3 +- 4 files changed, 145 insertions(+), 1 deletion(-) create mode 100644 docs/readthedocs/source/doc/LLM/DockerGuides/docker_run_pytorch_inference_in_vscode.md diff --git a/docs/readthedocs/source/_templates/sidebar_quicklinks.html b/docs/readthedocs/source/_templates/sidebar_quicklinks.html index b71cf4e4..87d2fe47 100644 --- a/docs/readthedocs/source/_templates/sidebar_quicklinks.html +++ b/docs/readthedocs/source/_templates/sidebar_quicklinks.html @@ -83,6 +83,9 @@
  • Run PyTorch Inference on an Intel GPU via Docker
  • +
  • + Run/Develop PyTorch in VSCode with Docker on Intel GPU +
  • Run llama.cpp/Ollama/Open-WebUI on an Intel GPU via Docker
  • diff --git a/docs/readthedocs/source/_toc.yml b/docs/readthedocs/source/_toc.yml index 75540171..3d3a5245 100644 --- a/docs/readthedocs/source/_toc.yml +++ b/docs/readthedocs/source/_toc.yml @@ -21,6 +21,7 @@ subtrees: - entries: - file: doc/LLM/DockerGuides/docker_windows_gpu - file: doc/LLM/DockerGuides/docker_pytorch_inference_gpu + - file: doc/LLM/DockerGuides/docker_run_pytorch_inference_in_vscode - file: doc/LLM/DockerGuides/docker_cpp_xpu_quickstart - file: doc/LLM/Quickstart/index title: "Quickstart" diff --git a/docs/readthedocs/source/doc/LLM/DockerGuides/docker_run_pytorch_inference_in_vscode.md b/docs/readthedocs/source/doc/LLM/DockerGuides/docker_run_pytorch_inference_in_vscode.md new file mode 100644 index 00000000..b625ac6b --- /dev/null +++ b/docs/readthedocs/source/doc/LLM/DockerGuides/docker_run_pytorch_inference_in_vscode.md @@ -0,0 +1,139 @@ +# Run/Develop PyTorch in VSCode with Docker on Intel GPU + +An IPEX-LLM container is a pre-configured environment that includes all necessary dependencies for running LLMs on Intel GPUs. + +This guide provides steps to run/develop PyTorch examples in VSCode with Docker on Intel GPUs. + +```eval_rst +.. note:: + + This guide assumes you have already installed VSCode in your environment. + + To run/develop on Windows, install VSCode and then follow the steps below. + + To run/develop on Linux, you might open VSCode first and SSH to a remote Linux machine, then proceed with the following steps. + +``` + + +## Install Docker + +Follow the [Docker installation Guide](./docker_windows_gpu.html#install-docker) to install docker on either Linux or Windows. + +## Install Extensions for VSCcode + +#### Install Dev Containers Extension +For both Linux/Windows, you will need to Install Dev Containers extension. + +Open the Extensions view in VSCode (you can use the shortcut `Ctrl+Shift+X`), then search for and install the `Dev Containers` extension. + + + + + + +#### Install WSL Extension for Windows + +For Windows, you will need to install wsl extension to to the WSL environment. Open the Extensions view in VSCode (you can use the shortcut `Ctrl+Shift+X`), then search for and install the `WSL` extension. + +Press F1 to bring up the Command Palette and type in `WSL: Connect to WSL Using Distro...` and select it and then select a specific WSL distro `Ubuntu` + + + + + + + +## Launch Container + +Open the Terminal in VSCode (you can use the shortcut `` Ctrl+Shift+` ``), then pull ipex-llm-xpu Docker Image: + +```bash +docker pull intelanalytics/ipex-llm-xpu:latest +``` + +Start ipex-llm-xpu Docker Container: + +```eval_rst +.. tabs:: + .. tab:: Linux + + .. code-block:: bash + + export DOCKER_IMAGE=intelanalytics/ipex-llm-xpu:latest + export CONTAINER_NAME=my_container + export MODEL_PATH=/llm/models[change to your model path] + + docker run -itd \ + --net=host \ + --device=/dev/dri \ + --memory="32G" \ + --name=$CONTAINER_NAME \ + --shm-size="16g" \ + -v $MODEL_PATH:/llm/models \ + $DOCKER_IMAGE + + .. tab:: Windows WSL + + .. code-block:: bash + + #/bin/bash + export DOCKER_IMAGE=intelanalytics/ipex-llm-xpu:latest + export CONTAINER_NAME=my_container + export MODEL_PATH=/llm/models[change to your model path] + + sudo docker run -itd \ + --net=host \ + --privileged \ + --device /dev/dri \ + --memory="32G" \ + --name=$CONTAINER_NAME \ + --shm-size="16g" \ + -v $MODEL_PATH:/llm/llm-models \ + -v /usr/lib/wsl:/usr/lib/wsl \ + $DOCKER_IMAGE +``` + + +## Run/Develop Pytorch Examples + +Press F1 to bring up the Command Palette and type in `Dev Containers: Attach to Running Container...` and select it and then select `my_container` + +Now you are in a running Docker Container, Open folder `/ipex-llm/python/llm/example/GPU/HF-Transformers-AutoModels/Model/`. + + + + + +In this folder, we provide several PyTorch examples that you could apply IPEX-LLM INT4 optimizations on models on Intel GPUs. + +For example, if your model is Llama-2-7b-chat-hf and mounted on /llm/models, you can navigate to llama2 directory, excute the following command to run example: + ```bash + cd + python ./generate.py --repo-id-or-model-path /llm/models/Llama-2-7b-chat-hf --prompt PROMPT --n-predict N_PREDICT + ``` + + +Arguments info: +- `--repo-id-or-model-path REPO_ID_OR_MODEL_PATH`: argument defining the huggingface repo id for the Llama2 model (e.g. `meta-llama/Llama-2-7b-chat-hf` and `meta-llama/Llama-2-13b-chat-hf`) to be downloaded, or the path to the huggingface checkpoint folder. It is default to be `'meta-llama/Llama-2-7b-chat-hf'`. +- `--prompt PROMPT`: argument defining the prompt to be infered (with integrated prompt format for chat). It is default to be `'What is AI?'`. +- `--n-predict N_PREDICT`: argument defining the max number of tokens to predict. It is default to be `32`. + +**Sample Output** +```log +Inference time: xxxx s +-------------------- Prompt -------------------- +[INST] <> + +<> + +What is AI? [/INST] +-------------------- Output -------------------- +[INST] <> + +<> + +What is AI? [/INST] Artificial intelligence (AI) is the broader field of research and development aimed at creating machines that can perform tasks that typically require human intelligence, +``` + +You can develop your own PyTorch example based on these examples. diff --git a/docs/readthedocs/source/doc/LLM/DockerGuides/index.rst b/docs/readthedocs/source/doc/LLM/DockerGuides/index.rst index c4b24d7e..9e9f02fd 100644 --- a/docs/readthedocs/source/doc/LLM/DockerGuides/index.rst +++ b/docs/readthedocs/source/doc/LLM/DockerGuides/index.rst @@ -6,6 +6,7 @@ In this section, you will find guides related to using IPEX-LLM with Docker, cov * `Overview of IPEX-LLM Containers for Intel GPU <./docker_windows_gpu.html>`_ * `Run PyTorch Inference on an Intel GPU via Docker <./docker_pytorch_inference_gpu.html>`_ +* `Run/Develop PyTorch in VSCode with Docker on Intel GPU <./docker_pytorch_inference_gpu.html>`_ * `Run llama.cpp/Ollama/open-webui with Docker on Intel GPU <./docker_cpp_xpu_quickstart.html>`_ * `Run IPEX-LLM integrated FastChat with Docker on Intel GPU <./fastchat_docker_quickstart>`_ -* `Run IPEX-LLM integrated vLLM with Docker on Intel GPU <./vllm_docker_quickstart>`_ \ No newline at end of file +* `Run IPEX-LLM integrated vLLM with Docker on Intel GPU <./vllm_docker_quickstart>`_