update readme docker section, fix quickstart title, remove chs figure (#11044)
* update readme and fix quickstart title, remove chs figure * update readme according to comment * reorganize the docker guide structure
This commit is contained in:
		
							parent
							
								
									797dbc48b8
								
							
						
					
					
						commit
						7ed270a4d8
					
				
					 5 changed files with 34 additions and 24 deletions
				
			
		
							
								
								
									
										24
									
								
								README.md
									
									
									
									
									
								
							
							
						
						
									
										24
									
								
								README.md
									
									
									
									
									
								
							| 
						 | 
				
			
			@ -77,21 +77,33 @@ See the demo of running [*Text-Generation-WebUI*](https://ipex-llm.readthedocs.i
 | 
			
		|||
[^1]: Performance varies by use, configuration and other factors. `ipex-llm` may not optimize to the same degree for non-Intel products. Learn more at www.Intel.com/PerformanceIndex.
 | 
			
		||||
 | 
			
		||||
## `ipex-llm` Quickstart
 | 
			
		||||
### Install `ipex-llm`
 | 
			
		||||
- [Windows GPU](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/install_windows_gpu.html): installing `ipex-llm` on Windows with Intel GPU
 | 
			
		||||
- [Linux GPU](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/install_linux_gpu.html): installing `ipex-llm` on Linux with Intel GPU
 | 
			
		||||
- [Docker](docker/llm): using `ipex-llm` dockers on Intel CPU and GPU
 | 
			
		||||
- *For more details, please refer to the [installation guide](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Overview/install.html)*
 | 
			
		||||
 | 
			
		||||
### Run `ipex-llm`
 | 
			
		||||
### Docker
 | 
			
		||||
- [GPU Inference in C++](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/DockerGuides/docker_cpp_xpu_quickstart.html): running `llama.cpp`, `ollama`, `OpenWebUI`, etc., with `ipex-llm` on Intel GPU
 | 
			
		||||
- [GPU Inference in Python](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/DockerGuides/docker_pytorch_inference_gpu.html#) : running HuggingFace `transformers`, `LangChain`, `LlamaIndex`, `ModelScope`, etc. with `ipex-llm` on Intel GPU
 | 
			
		||||
- [GPU Dev in Visual Studio Code](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/DockerGuides/docker_run_pytorch_inference_in_vscode.html): LLM development in python using `ipex-llm` on Intel GPU in VSCode
 | 
			
		||||
- [vLLM on GPU](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/DockerGuides/fastchat_docker_quickstart.html): serving with `ipex-llm` accelerated `vLLM` on Intel GPU 
 | 
			
		||||
- [FastChat on GPU](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/DockerGuides/fastchat_docker_quickstart.html): serving with `ipex-llm` accelerated `FastChat`on Intel GPU
 | 
			
		||||
 | 
			
		||||
### Use
 | 
			
		||||
- [llama.cpp](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/llama_cpp_quickstart.html): running **llama.cpp** (*using C++ interface of `ipex-llm` as an accelerated backend for `llama.cpp`*) on Intel GPU
 | 
			
		||||
- [ollama](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/ollama_quickstart.html): running **ollama** (*using C++ interface of `ipex-llm` as an accelerated backend for `ollama`*) on Intel GPU
 | 
			
		||||
- [vLLM](python/llm/example/GPU/vLLM-Serving): running `ipex-llm` in `vLLM` on both Intel [GPU](python/llm/example/GPU/vLLM-Serving) and [CPU](python/llm/example/CPU/vLLM-Serving)
 | 
			
		||||
- [FastChat](python/llm/src/ipex_llm/serving/fastchat): running `ipex-llm` in `FastChat` serving on on both Intel GPU and CPU
 | 
			
		||||
- [LangChain-Chatchat RAG](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/chatchat_quickstart.html): running `ipex-llm` in `LangChain-Chatchat` (*Knowledge Base QA using **RAG** pipeline*)
 | 
			
		||||
- [Text-Generation-WebUI](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/webui_quickstart.html): running `ipex-llm` in `oobabooga` **WebUI**
 | 
			
		||||
- [Dify](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/dify_quickstart.html): running `ipex-llm` in `Dify`(*production-ready LLM app development platform*)
 | 
			
		||||
- [Continue](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/continue_quickstart.html): using `Continue` (a coding copilot in VSCode) backed by `ipex-llm`
 | 
			
		||||
- [Benchmarking](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/benchmark_quickstart.html): running  (latency and throughput) benchmarks for `ipex-llm` on Intel CPU and GPU
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
### Install 
 | 
			
		||||
- [Windows GPU](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/install_windows_gpu.html): installing `ipex-llm` on Windows with Intel GPU
 | 
			
		||||
- [Linux GPU](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/install_linux_gpu.html): installing `ipex-llm` on Linux with Intel GPU
 | 
			
		||||
- *For more details, please refer to the [installation guide](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Overview/install.html)*
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
### Code Examples
 | 
			
		||||
- Low bit inference
 | 
			
		||||
  - [INT4 inference](python/llm/example/GPU/HF-Transformers-AutoModels/Model): **INT4** LLM inference on Intel [GPU](python/llm/example/GPU/HF-Transformers-AutoModels/Model) and [CPU](python/llm/example/CPU/HF-Transformers-AutoModels/Model)
 | 
			
		||||
| 
						 | 
				
			
			
 | 
			
		|||
| 
						 | 
				
			
			@ -78,22 +78,22 @@
 | 
			
		|||
                </label>
 | 
			
		||||
                <ul class="bigdl-quicklinks-section-nav">
 | 
			
		||||
                    <li>
 | 
			
		||||
                        <a href="doc/LLM/DockerGuides/docker_windows_gpu.html">Overview of IPEX-LLM Containers for Intel GPU</a>
 | 
			
		||||
                        <a href="doc/LLM/DockerGuides/docker_windows_gpu.html">Overview of IPEX-LLM Containers</a>
 | 
			
		||||
                    </li>
 | 
			
		||||
                    <li>
 | 
			
		||||
                        <a href="doc/LLM/DockerGuides/docker_pytorch_inference_gpu.html">Run PyTorch Inference on an Intel GPU via Docker</a>
 | 
			
		||||
                        <a href="doc/LLM/DockerGuides/docker_pytorch_inference_gpu.html">Python Inference with `ipex-llm` on Intel GPU </a>
 | 
			
		||||
                    </li>
 | 
			
		||||
                    <li>
 | 
			
		||||
                        <a href="doc/LLM/DockerGuides/docker_run_pytorch_inference_in_vscode.html">Run/Develop PyTorch in VSCode with Docker on Intel GPU</a>
 | 
			
		||||
                        <a href="doc/LLM/DockerGuides/docker_run_pytorch_inference_in_vscode.html">VSCode LLM Development with `ipex-llm` on Intel GPU</a>
 | 
			
		||||
                    </li>
 | 
			
		||||
                    <li>
 | 
			
		||||
                        <a href="doc/LLM/DockerGuides/docker_cpp_xpu_quickstart.html">Run llama.cpp/Ollama/Open-WebUI on an Intel GPU via Docker</a>
 | 
			
		||||
                        <a href="doc/LLM/DockerGuides/docker_cpp_xpu_quickstart.html">llama.cpp/Ollama/Open-WebUI with `ipex-llm` on Intel GPU</a>
 | 
			
		||||
                    </li>
 | 
			
		||||
                    <li>
 | 
			
		||||
                        <a href="doc/LLM/DockerGuides/fastchat_docker_quickstart.html">Run IPEX-LLM integrated FastChat on an Intel GPU via Docker</a>
 | 
			
		||||
                        <a href="doc/LLM/DockerGuides/fastchat_docker_quickstart.html">FastChat with `ipex-llm` on Intel GPU</a>
 | 
			
		||||
                    </li>
 | 
			
		||||
                    <li>
 | 
			
		||||
                        <a href="doc/LLM/DockerGuides/vllm_docker_quickstart.html">Run IPEX-LLM integrated vLLM on an Intel GPU via Docker</a>
 | 
			
		||||
                        <a href="doc/LLM/DockerGuides/vllm_docker_quickstart.html">vLLM with `ipex-llm` on Intel GPU</a>
 | 
			
		||||
                    </li>
 | 
			
		||||
                </ul>
 | 
			
		||||
            </li>
 | 
			
		||||
| 
						 | 
				
			
			
 | 
			
		|||
| 
						 | 
				
			
			@ -1,4 +1,4 @@
 | 
			
		|||
# Run PyTorch Inference on an Intel GPU via Docker
 | 
			
		||||
# Python Inference using IPEX-LLM on Intel GPU
 | 
			
		||||
 | 
			
		||||
We can run PyTorch Inference Benchmark, Chat Service and PyTorch Examples on Intel GPUs within Docker (on Linux or WSL).
 | 
			
		||||
 | 
			
		||||
| 
						 | 
				
			
			
 | 
			
		|||
| 
						 | 
				
			
			@ -108,8 +108,4 @@ Command: python chat.py --model-path /llm/llm-models/chatglm2-6b/
 | 
			
		|||
Uptime: 29.349235 s
 | 
			
		||||
Aborted
 | 
			
		||||
```
 | 
			
		||||
To resolve this problem, you can disabling the iGPU in Device Manager on Windows as follows:
 | 
			
		||||
 | 
			
		||||
<a href="https://llm-assets.readthedocs.io/en/latest/_images/disable_iGPU.png">
 | 
			
		||||
 <img src="https://llm-assets.readthedocs.io/en/latest/_images/disable_iGPU.png" width=100%; />
 | 
			
		||||
</a>
 | 
			
		||||
To resolve this problem, you can disable the iGPU in Device Manager on Windows. For details, refer to [this guide](https://www.elevenforum.com/t/enable-or-disable-integrated-graphics-igpu-in-windows-11.18616/)
 | 
			
		||||
| 
						 | 
				
			
			
 | 
			
		|||
| 
						 | 
				
			
			@ -3,10 +3,12 @@ IPEX-LLM Docker Container User Guides
 | 
			
		|||
 | 
			
		||||
In this section, you will find guides related to using IPEX-LLM with Docker, covering how to:
 | 
			
		||||
 | 
			
		||||
* `Overview of IPEX-LLM Containers <./docker_windows_gpu.html>`_
 | 
			
		||||
 | 
			
		||||
* `Overview of IPEX-LLM Containers for Intel GPU <./docker_windows_gpu.html>`_
 | 
			
		||||
* `Run PyTorch Inference on an Intel GPU via Docker <./docker_pytorch_inference_gpu.html>`_
 | 
			
		||||
* `Run/Develop PyTorch in VSCode with Docker on Intel GPU <./docker_pytorch_inference_gpu.html>`_
 | 
			
		||||
* `Run llama.cpp/Ollama/open-webui with Docker on Intel GPU <./docker_cpp_xpu_quickstart.html>`_
 | 
			
		||||
* `Run IPEX-LLM integrated FastChat with Docker on Intel GPU <./fastchat_docker_quickstart.html>`_
 | 
			
		||||
* `Run IPEX-LLM integrated vLLM with Docker on Intel GPU <./vllm_docker_quickstart.html>`_
 | 
			
		||||
* Inference in Python/C++  
 | 
			
		||||
   * `GPU Inference in Python with IPEX-LLM <./docker_pytorch_inference_gpu.html>`_
 | 
			
		||||
   * `VSCode LLM Development with IPEX-LLM on Intel GPU <./docker_pytorch_inference_gpu.html>`_
 | 
			
		||||
   * `llama.cpp/Ollama/Open-WebUI with IPEX-LLM on Intel GPU <./docker_cpp_xpu_quickstart.html>`_
 | 
			
		||||
* Serving
 | 
			
		||||
   * `FastChat with IPEX-LLM on Intel GPU <./fastchat_docker_quickstart.html>`_
 | 
			
		||||
   * `vLLM with IPEX-LLM on Intel GPU <./vllm_docker_quickstart.html>`_
 | 
			
		||||
| 
						 | 
				
			
			
 | 
			
		|||
		Loading…
	
		Reference in a new issue