ipex-llm/docker/llm
Guancheng Fu 69c8d36f16
Switching from vLLM v0.3.3 to vLLM 0.5.4 (#12042)
* Enable single card sync engine

* enable ipex-llm optimizations for vllm

* enable optimizations for lm_head

* Fix chatglm multi-reference problem

* Remove duplicate layer

* LLM: Update vLLM to v0.5.4 (#11746)

* Enable single card sync engine

* enable ipex-llm optimizations for vllm

* enable optimizations for lm_head

* Fix chatglm multi-reference problem

* update 0.5.4 api_server

* add dockerfile

* fix

* fix

* refine

* fix

---------

Co-authored-by: gc-fu <guancheng.fu@intel.com>

* Add vllm-0.5.4 Dockerfile (#11838)

* Update BIGDL_LLM_SDP_IGNORE_MASK in start-vllm-service.sh (#11957)

* Fix vLLM not convert issues (#11817) (#11918)

* Fix not convert issues

* refine

Co-authored-by: Guancheng Fu <110874468+gc-fu@users.noreply.github.com>

* Fix glm4-9b-chat nan error on vllm 0.5.4 (#11969)

* init

* update mlp forward

* fix minicpm error in vllm 0.5.4

* fix dependabot alerts (#12008)

* Update 0.5.4 dockerfile (#12021)

* Add vllm awq loading logic (#11987)

* [ADD] Add vllm awq loading logic

* [FIX] fix the module.linear_method path

* [FIX] fix quant_config path error

* Enable Qwen padding mlp to 256 to support batch_forward (#12030)

* Enable padding mlp

* padding to 256

* update style

* Install 27191 runtime in 0.5.4 docker image (#12040)

* fix rebase error

* fix rebase error

* vLLM: format for 0.5.4 rebase (#12043)

* format

* Update model_convert.py

* Fix serving docker related modifications (#12046)

* Fix undesired modifications (#12048)

* fix

* Refine offline_inference arguments

---------

Co-authored-by: Xiangyu Tian <109123695+xiangyuT@users.noreply.github.com>
Co-authored-by: Jun Wang <thoughts.times@gmail.com>
Co-authored-by: Wang, Jian4 <61138589+hzjane@users.noreply.github.com>
Co-authored-by: liu-shaojun <johnssalyn@outlook.com>
Co-authored-by: Shaojun Liu <61072813+liu-shaojun@users.noreply.github.com>
2024-09-10 15:37:43 +08:00
..
finetune update tag to 2.2.0-SNAPSHOT (#11947) 2024-08-28 09:20:32 +08:00
inference update docker image tag to 2.2.0-SNAPSHOT (#11904) 2024-08-23 13:57:41 +08:00
inference-cpp Enhanced XPU Dockerfiles: Optimized Environment Variables and Documentation (#11506) 2024-07-04 20:18:38 +08:00
serving Switching from vLLM v0.3.3 to vLLM 0.5.4 (#12042) 2024-09-10 15:37:43 +08:00
sources Fix docker images (#11362) 2024-06-20 15:44:55 +08:00
README.md update docker image tag to 2.2.0-SNAPSHOT (#11904) 2024-08-23 13:57:41 +08:00
README_backup.md update docker image tag to 2.2.0-SNAPSHOT (#11904) 2024-08-23 13:57:41 +08:00

IPEX-LLM Docker Containers

You can run IPEX-LLM containers (via docker or k8s) for inference, serving and fine-tuning on Intel CPU and GPU. Details on how to use these containers are available at IPEX-LLM Docker Container Guides.

Prerequisites

  • Docker on Windows or Linux
  • Windows Subsystem for Linux (WSL) is required if using Windows.

Quick Start

Pull a IPEX-LLM Docker Image

To pull IPEX-LLM Docker images from Docker Hub, use the docker pull command. For instance, to pull the CPU inference image:

docker pull intelanalytics/ipex-llm-cpu:2.2.0-SNAPSHOT

Available images in hub are:

Image Name Description
intelanalytics/ipex-llm-cpu:2.2.0-SNAPSHOT CPU Inference
intelanalytics/ipex-llm-xpu:2.2.0-SNAPSHOT GPU Inference
intelanalytics/ipex-llm-serving-cpu:2.2.0-SNAPSHOT CPU Serving
intelanalytics/ipex-llm-serving-xpu:2.2.0-SNAPSHOT GPU Serving
intelanalytics/ipex-llm-finetune-qlora-cpu-standalone:2.2.0-SNAPSHOT CPU Finetuning via Docker
intelanalytics/ipex-llm-finetune-qlora-cpu-k8s:2.2.0-SNAPSHOT CPU Finetuning via Kubernetes
intelanalytics/ipex-llm-finetune-qlora-xpu:2.2.0-SNAPSHOT GPU Finetuning

Run a Container

Use docker run command to run an IPEX-LLM docker container. For detailed instructions, refer to the IPEX-LLM Docker Container Guides.

Build Docker Image

To build a Docker image from source, first clone the IPEX-LLM repository and navigate to the Dockerfile directory. For example, to build the CPU inference image, navigate to docker/llm/inference/cpu/docker.

Then, use the following command to build the image (replace your_image_name with your desired image name):

docker build \
  --build-arg no_proxy=localhost,127.0.0.1 \
  --rm --no-cache -t your_image_name .

Note: If you're working behind a proxy, also add args --build-arg http_proxy=http://your_proxy_uri:port and --build-arg https_proxy=https://your_proxy_url:port