update readme docker section, fix quickstart title, remove chs figure (#11044)

* update readme and fix quickstart title, remove chs figure

* update readme according to comment

* reorganize the docker guide structure
This commit is contained in:
Shengsheng Huang 2024-05-24 00:18:20 +08:00 committed by GitHub
parent 797dbc48b8
commit 7ed270a4d8
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
5 changed files with 34 additions and 24 deletions

View file

@ -77,21 +77,33 @@ See the demo of running [*Text-Generation-WebUI*](https://ipex-llm.readthedocs.i
[^1]: Performance varies by use, configuration and other factors. `ipex-llm` may not optimize to the same degree for non-Intel products. Learn more at www.Intel.com/PerformanceIndex. [^1]: Performance varies by use, configuration and other factors. `ipex-llm` may not optimize to the same degree for non-Intel products. Learn more at www.Intel.com/PerformanceIndex.
## `ipex-llm` Quickstart ## `ipex-llm` Quickstart
### Install `ipex-llm`
- [Windows GPU](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/install_windows_gpu.html): installing `ipex-llm` on Windows with Intel GPU
- [Linux GPU](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/install_linux_gpu.html): installing `ipex-llm` on Linux with Intel GPU
- [Docker](docker/llm): using `ipex-llm` dockers on Intel CPU and GPU
- *For more details, please refer to the [installation guide](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Overview/install.html)*
### Run `ipex-llm` ### Docker
- [GPU Inference in C++](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/DockerGuides/docker_cpp_xpu_quickstart.html): running `llama.cpp`, `ollama`, `OpenWebUI`, etc., with `ipex-llm` on Intel GPU
- [GPU Inference in Python](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/DockerGuides/docker_pytorch_inference_gpu.html#) : running HuggingFace `transformers`, `LangChain`, `LlamaIndex`, `ModelScope`, etc. with `ipex-llm` on Intel GPU
- [GPU Dev in Visual Studio Code](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/DockerGuides/docker_run_pytorch_inference_in_vscode.html): LLM development in python using `ipex-llm` on Intel GPU in VSCode
- [vLLM on GPU](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/DockerGuides/fastchat_docker_quickstart.html): serving with `ipex-llm` accelerated `vLLM` on Intel GPU
- [FastChat on GPU](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/DockerGuides/fastchat_docker_quickstart.html): serving with `ipex-llm` accelerated `FastChat`on Intel GPU
### Use
- [llama.cpp](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/llama_cpp_quickstart.html): running **llama.cpp** (*using C++ interface of `ipex-llm` as an accelerated backend for `llama.cpp`*) on Intel GPU - [llama.cpp](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/llama_cpp_quickstart.html): running **llama.cpp** (*using C++ interface of `ipex-llm` as an accelerated backend for `llama.cpp`*) on Intel GPU
- [ollama](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/ollama_quickstart.html): running **ollama** (*using C++ interface of `ipex-llm` as an accelerated backend for `ollama`*) on Intel GPU - [ollama](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/ollama_quickstart.html): running **ollama** (*using C++ interface of `ipex-llm` as an accelerated backend for `ollama`*) on Intel GPU
- [vLLM](python/llm/example/GPU/vLLM-Serving): running `ipex-llm` in `vLLM` on both Intel [GPU](python/llm/example/GPU/vLLM-Serving) and [CPU](python/llm/example/CPU/vLLM-Serving) - [vLLM](python/llm/example/GPU/vLLM-Serving): running `ipex-llm` in `vLLM` on both Intel [GPU](python/llm/example/GPU/vLLM-Serving) and [CPU](python/llm/example/CPU/vLLM-Serving)
- [FastChat](python/llm/src/ipex_llm/serving/fastchat): running `ipex-llm` in `FastChat` serving on on both Intel GPU and CPU - [FastChat](python/llm/src/ipex_llm/serving/fastchat): running `ipex-llm` in `FastChat` serving on on both Intel GPU and CPU
- [LangChain-Chatchat RAG](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/chatchat_quickstart.html): running `ipex-llm` in `LangChain-Chatchat` (*Knowledge Base QA using **RAG** pipeline*) - [LangChain-Chatchat RAG](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/chatchat_quickstart.html): running `ipex-llm` in `LangChain-Chatchat` (*Knowledge Base QA using **RAG** pipeline*)
- [Text-Generation-WebUI](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/webui_quickstart.html): running `ipex-llm` in `oobabooga` **WebUI** - [Text-Generation-WebUI](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/webui_quickstart.html): running `ipex-llm` in `oobabooga` **WebUI**
- [Dify](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/dify_quickstart.html): running `ipex-llm` in `Dify`(*production-ready LLM app development platform*)
- [Continue](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/continue_quickstart.html): using `Continue` (a coding copilot in VSCode) backed by `ipex-llm`
- [Benchmarking](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/benchmark_quickstart.html): running (latency and throughput) benchmarks for `ipex-llm` on Intel CPU and GPU - [Benchmarking](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/benchmark_quickstart.html): running (latency and throughput) benchmarks for `ipex-llm` on Intel CPU and GPU
### Install
- [Windows GPU](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/install_windows_gpu.html): installing `ipex-llm` on Windows with Intel GPU
- [Linux GPU](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/install_linux_gpu.html): installing `ipex-llm` on Linux with Intel GPU
- *For more details, please refer to the [installation guide](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Overview/install.html)*
### Code Examples ### Code Examples
- Low bit inference - Low bit inference
- [INT4 inference](python/llm/example/GPU/HF-Transformers-AutoModels/Model): **INT4** LLM inference on Intel [GPU](python/llm/example/GPU/HF-Transformers-AutoModels/Model) and [CPU](python/llm/example/CPU/HF-Transformers-AutoModels/Model) - [INT4 inference](python/llm/example/GPU/HF-Transformers-AutoModels/Model): **INT4** LLM inference on Intel [GPU](python/llm/example/GPU/HF-Transformers-AutoModels/Model) and [CPU](python/llm/example/CPU/HF-Transformers-AutoModels/Model)

View file

@ -78,22 +78,22 @@
</label> </label>
<ul class="bigdl-quicklinks-section-nav"> <ul class="bigdl-quicklinks-section-nav">
<li> <li>
<a href="doc/LLM/DockerGuides/docker_windows_gpu.html">Overview of IPEX-LLM Containers for Intel GPU</a> <a href="doc/LLM/DockerGuides/docker_windows_gpu.html">Overview of IPEX-LLM Containers</a>
</li> </li>
<li> <li>
<a href="doc/LLM/DockerGuides/docker_pytorch_inference_gpu.html">Run PyTorch Inference on an Intel GPU via Docker</a> <a href="doc/LLM/DockerGuides/docker_pytorch_inference_gpu.html">Python Inference with `ipex-llm` on Intel GPU </a>
</li> </li>
<li> <li>
<a href="doc/LLM/DockerGuides/docker_run_pytorch_inference_in_vscode.html">Run/Develop PyTorch in VSCode with Docker on Intel GPU</a> <a href="doc/LLM/DockerGuides/docker_run_pytorch_inference_in_vscode.html">VSCode LLM Development with `ipex-llm` on Intel GPU</a>
</li> </li>
<li> <li>
<a href="doc/LLM/DockerGuides/docker_cpp_xpu_quickstart.html">Run llama.cpp/Ollama/Open-WebUI on an Intel GPU via Docker</a> <a href="doc/LLM/DockerGuides/docker_cpp_xpu_quickstart.html">llama.cpp/Ollama/Open-WebUI with `ipex-llm` on Intel GPU</a>
</li> </li>
<li> <li>
<a href="doc/LLM/DockerGuides/fastchat_docker_quickstart.html">Run IPEX-LLM integrated FastChat on an Intel GPU via Docker</a> <a href="doc/LLM/DockerGuides/fastchat_docker_quickstart.html">FastChat with `ipex-llm` on Intel GPU</a>
</li> </li>
<li> <li>
<a href="doc/LLM/DockerGuides/vllm_docker_quickstart.html">Run IPEX-LLM integrated vLLM on an Intel GPU via Docker</a> <a href="doc/LLM/DockerGuides/vllm_docker_quickstart.html">vLLM with `ipex-llm` on Intel GPU</a>
</li> </li>
</ul> </ul>
</li> </li>

View file

@ -1,4 +1,4 @@
# Run PyTorch Inference on an Intel GPU via Docker # Python Inference using IPEX-LLM on Intel GPU
We can run PyTorch Inference Benchmark, Chat Service and PyTorch Examples on Intel GPUs within Docker (on Linux or WSL). We can run PyTorch Inference Benchmark, Chat Service and PyTorch Examples on Intel GPUs within Docker (on Linux or WSL).

View file

@ -108,8 +108,4 @@ Command: python chat.py --model-path /llm/llm-models/chatglm2-6b/
Uptime: 29.349235 s Uptime: 29.349235 s
Aborted Aborted
``` ```
To resolve this problem, you can disabling the iGPU in Device Manager on Windows as follows: To resolve this problem, you can disable the iGPU in Device Manager on Windows. For details, refer to [this guide](https://www.elevenforum.com/t/enable-or-disable-integrated-graphics-igpu-in-windows-11.18616/)
<a href="https://llm-assets.readthedocs.io/en/latest/_images/disable_iGPU.png">
<img src="https://llm-assets.readthedocs.io/en/latest/_images/disable_iGPU.png" width=100%; />
</a>

View file

@ -3,10 +3,12 @@ IPEX-LLM Docker Container User Guides
In this section, you will find guides related to using IPEX-LLM with Docker, covering how to: In this section, you will find guides related to using IPEX-LLM with Docker, covering how to:
* `Overview of IPEX-LLM Containers <./docker_windows_gpu.html>`_
* `Overview of IPEX-LLM Containers for Intel GPU <./docker_windows_gpu.html>`_ * Inference in Python/C++
* `Run PyTorch Inference on an Intel GPU via Docker <./docker_pytorch_inference_gpu.html>`_ * `GPU Inference in Python with IPEX-LLM <./docker_pytorch_inference_gpu.html>`_
* `Run/Develop PyTorch in VSCode with Docker on Intel GPU <./docker_pytorch_inference_gpu.html>`_ * `VSCode LLM Development with IPEX-LLM on Intel GPU <./docker_pytorch_inference_gpu.html>`_
* `Run llama.cpp/Ollama/open-webui with Docker on Intel GPU <./docker_cpp_xpu_quickstart.html>`_ * `llama.cpp/Ollama/Open-WebUI with IPEX-LLM on Intel GPU <./docker_cpp_xpu_quickstart.html>`_
* `Run IPEX-LLM integrated FastChat with Docker on Intel GPU <./fastchat_docker_quickstart.html>`_ * Serving
* `Run IPEX-LLM integrated vLLM with Docker on Intel GPU <./vllm_docker_quickstart.html>`_ * `FastChat with IPEX-LLM on Intel GPU <./fastchat_docker_quickstart.html>`_
* `vLLM with IPEX-LLM on Intel GPU <./vllm_docker_quickstart.html>`_