diff --git a/docker/llm/README.md b/docker/llm/README.md index 1f418bb9..eba61568 100644 --- a/docker/llm/README.md +++ b/docker/llm/README.md @@ -1,18 +1,18 @@ -# Getting started with BigDL-LLM in Docker +# Getting started with IPEX-LLM in Docker ### Index -- [Docker installation guide for BigDL-LLM on CPU](#docker-installation-guide-for-bigdl-llm-on-cpu) - - [BigDL-LLM on Windows](#bigdl-llm-on-windows) - - [BigDL-LLM on Linux/MacOS](#bigdl-llm-on-linuxmacos) -- [Docker installation guide for BigDL LLM on XPU](#docker-installation-guide-for-bigdl-llm-on-xpu) -- [Docker installation guide for BigDL LLM Serving on CPU](#docker-installation-guide-for-bigdl-llm-serving-on-cpu) -- [Docker installation guide for BigDL LLM Serving on XPU](#docker-installation-guide-for-bigdl-llm-serving-on-xpu) -- [Docker installation guide for BigDL LLM Fine Tuning on CPU](#docker-installation-guide-for-bigdl-llm-fine-tuning-on-cpu) -- [Docker installation guide for BigDL LLM Fine Tuning on XPU](#docker-installation-guide-for-bigdl-llm-fine-tuning-on-xpu) +- [Docker installation guide for IPEX-LLM on CPU](#docker-installation-guide-for-ipex-llm-on-cpu) + - [IPEX-LLM on Windows](#ipex-llm-on-windows) + - [IPEX-LLM on Linux/MacOS](#ipex-llm-on-linuxmacos) +- [Docker installation guide for IPEX LLM on XPU](#docker-installation-guide-for-ipex-llm-on-xpu) +- [Docker installation guide for IPEX LLM Serving on CPU](#docker-installation-guide-for-ipex-llm-serving-on-cpu) +- [Docker installation guide for IPEX LLM Serving on XPU](#docker-installation-guide-for-ipex-llm-serving-on-xpu) +- [Docker installation guide for IPEX LLM Fine Tuning on CPU](#docker-installation-guide-for-ipex-llm-fine-tuning-on-cpu) +- [Docker installation guide for IPEX LLM Fine Tuning on XPU](#docker-installation-guide-for-ipex-llm-fine-tuning-on-xpu) -## Docker installation guide for BigDL-LLM on CPU +## Docker installation guide for IPEX-LLM on CPU -### BigDL-LLM on Windows +### IPEX-LLM on Windows #### Install docker @@ -23,26 +23,26 @@ The instructions for installing can be accessed from [here](https://docs.docker.com/desktop/install/windows-install/). -#### Pull bigdl-llm-cpu image +#### Pull ipex-llm-cpu image To pull image from hub, you can execute command on console: ```bash -docker pull intelanalytics/bigdl-llm-cpu:2.5.0-SNAPSHOT +docker pull intelanalytics/ipex-llm-cpu:2.5.0-SNAPSHOT ``` to check if the image is successfully downloaded, you can use: ```powershell -docker images | sls intelanalytics/bigdl-llm-cpu:2.5.0-SNAPSHOT +docker images | sls intelanalytics/ipex-llm-cpu:2.5.0-SNAPSHOT ``` -#### Start bigdl-llm-cpu container +#### Start ipex-llm-cpu container To run the image and do inference, you could create and run a bat script on Windows. An example on Windows could be: ```bat @echo off -set DOCKER_IMAGE=intelanalytics/bigdl-llm-cpu:2.5.0-SNAPSHOT +set DOCKER_IMAGE=intelanalytics/ipex-llm-cpu:2.5.0-SNAPSHOT set CONTAINER_NAME=my_container set MODEL_PATH=D:/llm/models[change to your model path] @@ -62,7 +62,7 @@ After the container is booted, you could get into the container through `docker docker exec -it my_container bash ``` -To run inference using `BigDL-LLM` using cpu, you could refer to this [documentation](https://github.com/intel-analytics/BigDL/tree/main/python/llm#cpu-int4). +To run inference using `IPEX-LLM` using cpu, you could refer to this [documentation](https://github.com/intel-analytics/IPEX/tree/main/python/llm#cpu-int4). #### Getting started with chat @@ -89,7 +89,7 @@ Here is a demostration: #### Getting started with tutorials -You could start a jupyter-lab serving to explore bigdl-llm-tutorial which can help you build a more sophisticated Chatbo. +You could start a jupyter-lab serving to explore ipex-llm-tutorial which can help you build a more sophisticated Chatbo. To start serving, run the script under '/llm': ```bash @@ -107,12 +107,12 @@ Here is a demostration of how to use tutorial in explorer: -### BigDL-LLM on Linux/MacOS +### IPEX-LLM on Linux/MacOS To run container on Linux/MacOS: ```bash #/bin/bash -export DOCKER_IMAGE=intelanalytics/bigdl-llm-cpu:2.5.0-SNAPSHOT +export DOCKER_IMAGE=intelanalytics/ipex-llm-cpu:2.5.0-SNAPSHOT export CONTAINER_NAME=my_container export MODEL_PATH=/llm/models[change to your model path] @@ -126,23 +126,23 @@ docker run -itd \ $DOCKER_IMAGE ``` -Also, you could use chat.py and bigdl-llm-tutorial for development. +Also, you could use chat.py and ipex-llm-tutorial for development. [Getting started with chat](#getting-started-with-chat) [Getting started with tutorials](#getting-started-with-tutorials) -## Docker installation guide for BigDL LLM on XPU +## Docker installation guide for IPEX LLM on XPU First, pull docker image from docker hub: ``` -docker pull intelanalytics/bigdl-llm-xpu:2.5.0-SNAPSHOT +docker pull intelanalytics/ipex-llm-xpu:2.5.0-SNAPSHOT ``` To map the xpu into the container, you need to specify --device=/dev/dri when booting the container. An example could be: ```bash #/bin/bash -export DOCKER_IMAGE=intelanalytics/bigdl-llm-xpu:2.5.0-SNAPSHOT +export DOCKER_IMAGE=intelanalytics/ipex-llm-xpu:2.5.0-SNAPSHOT export CONTAINER_NAME=my_container export MODEL_PATH=/llm/models[change to your model path] @@ -168,20 +168,20 @@ root@arda-arc12:/# sycl-ls [ext_oneapi_level_zero:gpu:0] Intel(R) Level-Zero, Intel(R) Arc(TM) A770 Graphics 1.3 [1.3.26241] ``` -To run inference using `BigDL-LLM` using xpu, you could refer to this [documentation](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU). +To run inference using `IPEX-LLM` using xpu, you could refer to this [documentation](https://github.com/intel-analytics/IPEX/tree/main/python/llm/example/GPU). -## Docker installation guide for BigDL LLM Serving on CPU +## Docker installation guide for IPEX LLM Serving on CPU ### Boot container Pull image: ``` -docker pull intelanalytics/bigdl-llm-serving-cpu:2.5.0-SNAPSHOT +docker pull intelanalytics/ipex-llm-serving-cpu:2.5.0-SNAPSHOT ``` You could use the following bash script to start the container. Please be noted that the CPU config is specified for Xeon CPUs, change it accordingly if you are not using a Xeon CPU. ```bash -export DOCKER_IMAGE=intelanalytics/bigdl-llm-serving-cpu:2.5.0-SNAPSHOT +export DOCKER_IMAGE=intelanalytics/ipex-llm-serving-cpu:2.5.0-SNAPSHOT export CONTAINER_NAME=my_container export MODEL_PATH=/llm/models[change to your model path] @@ -198,14 +198,11 @@ After the container is booted, you could get into the container through `docker ### Models -Using BigDL-LLM in FastChat does not impose any new limitations on model usage. Therefore, all Hugging Face Transformer models can be utilized in FastChat. +Using IPEX-LLM in FastChat does not impose any new limitations on model usage. Therefore, all Hugging Face Transformer models can be utilized in FastChat. -FastChat determines the Model adapter to use through path matching. Therefore, in order to load models using BigDL-LLM, you need to make some modifications to the model's name. +FastChat determines the Model adapter to use through path matching. Therefore, in order to load models using IPEX-LLM, you need to make some modifications to the model's name. -For instance, assuming you have downloaded the `llama-7b-hf` from [HuggingFace](https://huggingface.co/decapoda-research/llama-7b-hf). Then, to use the `BigDL-LLM` as backend, you need to change name from `llama-7b-hf` to `bigdl-7b`. -The key point here is that the model's path should include "bigdl" and should not include paths matched by other model adapters. - -A special case is `ChatGLM` models. For these models, you do not need to do any changes after downloading the model and the `BigDL-LLM` backend will be used automatically. +A special case is `ChatGLM` models. For these models, you do not need to do any changes after downloading the model and the `IPEX-LLM` backend will be used automatically. ### Start the service @@ -237,11 +234,11 @@ python3 -m fastchat.serve.gradio_web_server This is the user interface that users will interact with. -By following these steps, you will be able to serve your models using the web UI with `BigDL-LLM` as the backend. You can open your browser and chat with a model now. +By following these steps, you will be able to serve your models using the web UI with `IPEX-LLM` as the backend. You can open your browser and chat with a model now. #### Serving with OpenAI-Compatible RESTful APIs -To start an OpenAI API server that provides compatible APIs using `BigDL-LLM` backend, you need three main components: an OpenAI API Server that serves the in-coming requests, model workers that host one or more models, and a controller to coordinate the web server and model workers. +To start an OpenAI API server that provides compatible APIs using `IPEX-LLM` backend, you need three main components: an OpenAI API Server that serves the in-coming requests, model workers that host one or more models, and a controller to coordinate the web server and model workers. First, launch the controller @@ -262,13 +259,13 @@ python3 -m fastchat.serve.openai_api_server --host localhost --port 8000 ``` -## Docker installation guide for BigDL LLM Serving on XPU +## Docker installation guide for IPEX LLM Serving on XPU ### Boot container Pull image: ``` -docker pull intelanalytics/bigdl-llm-serving-xpu:2.5.0-SNAPSHOT +docker pull intelanalytics/ipex-llm-serving-xpu:2.5.0-SNAPSHOT ``` To map the `xpu` into the container, you need to specify `--device=/dev/dri` when booting the container. @@ -276,7 +273,7 @@ To map the `xpu` into the container, you need to specify `--device=/dev/dri` whe An example could be: ```bash #/bin/bash -export DOCKER_IMAGE=intelanalytics/bigdl-llm-serving-cpu:2.5.0-SNAPSHOT +export DOCKER_IMAGE=intelanalytics/ipex-llm-serving-cpu:2.5.0-SNAPSHOT export CONTAINER_NAME=my_container export MODEL_PATH=/llm/models[change to your model path] export SERVICE_MODEL_PATH=/llm/models/chatglm2-6b[a specified model path for running service] @@ -331,11 +328,11 @@ python3 -m fastchat.serve.gradio_web_server This is the user interface that users will interact with. -By following these steps, you will be able to serve your models using the web UI with `BigDL-LLM` as the backend. You can open your browser and chat with a model now. +By following these steps, you will be able to serve your models using the web UI with `IPEX-LLM` as the backend. You can open your browser and chat with a model now. #### Serving with OpenAI-Compatible RESTful APIs -To start an OpenAI API server that provides compatible APIs using `BigDL-LLM` backend, you need three main components: an OpenAI API Server that serves the in-coming requests, model workers that host one or more models, and a controller to coordinate the web server and model workers. +To start an OpenAI API server that provides compatible APIs using `IPEX-LLM` backend, you need three main components: an OpenAI API Server that serves the in-coming requests, model workers that host one or more models, and a controller to coordinate the web server and model workers. First, launch the controller @@ -355,7 +352,7 @@ Finally, launch the RESTful API server python3 -m fastchat.serve.openai_api_server --host localhost --port 8000 ``` -## Docker installation guide for BigDL LLM Fine Tuning on CPU +## Docker installation guide for IPEX LLM Fine Tuning on CPU ### 1. Prepare Docker Image @@ -363,10 +360,10 @@ You can download directly from Dockerhub like: ```bash # For standalone -docker pull intelanalytics/bigdl-llm-finetune-qlora-cpu-standalone:2.5.0-SNAPSHOT +docker pull intelanalytics/ipex-llm-finetune-qlora-cpu-standalone:2.5.0-SNAPSHOT # For k8s -docker pull intelanalytics/bigdl-llm-finetune-qlora-cpu-k8s:2.5.0-SNAPSHOT +docker pull intelanalytics/ipex-llm-finetune-qlora-cpu-k8s:2.5.0-SNAPSHOT ``` Or build the image from source: @@ -379,7 +376,7 @@ export HTTPS_PROXY=your_https_proxy docker build \ --build-arg http_proxy=${HTTP_PROXY} \ --build-arg https_proxy=${HTTPS_PROXY} \ - -t intelanalytics/bigdl-llm-finetune-qlora-cpu-standalone:2.5.0-SNAPSHOT \ + -t intelanalytics/ipex-llm-finetune-qlora-cpu-standalone:2.5.0-SNAPSHOT \ -f ./Dockerfile . # For k8s @@ -389,7 +386,7 @@ export HTTPS_PROXY=your_https_proxy docker build \ --build-arg http_proxy=${HTTP_PROXY} \ --build-arg https_proxy=${HTTPS_PROXY} \ - -t intelanalytics/bigdl-llm-finetune-qlora-cpu-k8s:2.5.0-SNAPSHOT \ + -t intelanalytics/ipex-llm-finetune-qlora-cpu-k8s:2.5.0-SNAPSHOT \ -f ./Dockerfile.k8s . ``` @@ -405,12 +402,12 @@ export HTTPS_PROXY=your_https_proxy docker run -itd \ --net=host \ - --name=bigdl-llm-fintune-qlora-cpu \ + --name=ipex-llm-fintune-qlora-cpu \ -e http_proxy=${HTTP_PROXY} \ -e https_proxy=${HTTPS_PROXY} \ - -v $BASE_MODE_PATH:/bigdl/model \ - -v $DATA_PATH:/bigdl/data/alpaca-cleaned \ - intelanalytics/bigdl-llm-finetune-qlora-cpu-standalone:2.5.0-SNAPSHOT + -v $BASE_MODE_PATH:/ipex_llm/model \ + -v $DATA_PATH:/ipex_llm/data/alpaca-cleaned \ + intelanalytics/ipex-llm-finetune-qlora-cpu-standalone:2.5.0-SNAPSHOT ``` The download and mount of base model and data to a docker container demonstrates a standard fine-tuning process. You can skip this step for a quick start, and in this way, the fine-tuning codes will automatically download the needed files: @@ -421,10 +418,10 @@ export HTTPS_PROXY=your_https_proxy docker run -itd \ --net=host \ - --name=bigdl-llm-fintune-qlora-cpu \ + --name=ipex-llm-fintune-qlora-cpu \ -e http_proxy=${HTTP_PROXY} \ -e https_proxy=${HTTPS_PROXY} \ - intelanalytics/bigdl-llm-finetune-qlora-cpu-standalone:2.5.0-SNAPSHOT + intelanalytics/ipex-llm-finetune-qlora-cpu-standalone:2.5.0-SNAPSHOT ``` However, we do recommend you to handle them manually, because the automatical download can be blocked by Internet access and Huggingface authentication etc. according to different environment, and the manual method allows you to fine-tune in a custom way (with different base model and dataset). @@ -434,14 +431,14 @@ However, we do recommend you to handle them manually, because the automatical do Enter the running container: ```bash -docker exec -it bigdl-llm-fintune-qlora-cpu bash +docker exec -it ipex-llm-fintune-qlora-cpu bash ``` Then, start QLoRA fine-tuning: If the machine memory is not enough, you can try to set `use_gradient_checkpointing=True`. ```bash -cd /bigdl +cd /ipex_llm bash start-qlora-finetuning-on-cpu.sh ``` @@ -473,16 +470,16 @@ python ./export_merged_model.py --repo-id-or-model-path REPO_ID_OR_MODEL_PATH -- Then you can use `./outputs/checkpoint-200-merged` as a normal huggingface transformer model to do inference. -## Docker installation guide for BigDL LLM Fine Tuning on XPU +## Docker installation guide for IPEX LLM Fine Tuning on XPU -The following shows how to fine-tune LLM with Quantization (QLoRA built on BigDL-LLM 4bit optimizations) in a docker environment, which is accelerated by Intel XPU. +The following shows how to fine-tune LLM with Quantization (QLoRA built on IPEX-LLM 4bit optimizations) in a docker environment, which is accelerated by Intel XPU. ### 1. Prepare Docker Image You can download directly from Dockerhub like: ```bash -docker pull intelanalytics/bigdl-llm-finetune-qlora-xpu:2.5.0-SNAPSHOT +docker pull intelanalytics/ipex-llm-finetune-qlora-xpu:2.5.0-SNAPSHOT ``` Or build the image from source: @@ -494,7 +491,7 @@ export HTTPS_PROXY=your_https_proxy docker build \ --build-arg http_proxy=${HTTP_PROXY} \ --build-arg https_proxy=${HTTPS_PROXY} \ - -t intelanalytics/bigdl-llm-finetune-qlora-xpu:2.5.0-SNAPSHOT \ + -t intelanalytics/ipex-llm-finetune-qlora-xpu:2.5.0-SNAPSHOT \ -f ./Dockerfile . ``` @@ -512,13 +509,13 @@ docker run -itd \ --net=host \ --device=/dev/dri \ --memory="32G" \ - --name=bigdl-llm-fintune-qlora-xpu \ + --name=ipex-llm-fintune-qlora-xpu \ -e http_proxy=${HTTP_PROXY} \ -e https_proxy=${HTTPS_PROXY} \ -v $BASE_MODE_PATH:/model \ -v $DATA_PATH:/data/alpaca-cleaned \ --shm-size="16g" \ - intelanalytics/bigdl-llm-fintune-qlora-xpu:2.5.0-SNAPSHOT + intelanalytics/ipex-llm-fintune-qlora-xpu:2.5.0-SNAPSHOT ``` The download and mount of base model and data to a docker container demonstrates a standard fine-tuning process. You can skip this step for a quick start, and in this way, the fine-tuning codes will automatically download the needed files: @@ -531,11 +528,11 @@ docker run -itd \ --net=host \ --device=/dev/dri \ --memory="32G" \ - --name=bigdl-llm-fintune-qlora-xpu \ + --name=ipex-llm-fintune-qlora-xpu \ -e http_proxy=${HTTP_PROXY} \ -e https_proxy=${HTTPS_PROXY} \ --shm-size="16g" \ - intelanalytics/bigdl-llm-fintune-qlora-xpu:2.5.0-SNAPSHOT + intelanalytics/ipex-llm-fintune-qlora-xpu:2.5.0-SNAPSHOT ``` However, we do recommend you to handle them manually, because the automatical download can be blocked by Internet access and Huggingface authentication etc. according to different environment, and the manual method allows you to fine-tune in a custom way (with different base model and dataset). @@ -545,7 +542,7 @@ However, we do recommend you to handle them manually, because the automatical do Enter the running container: ```bash -docker exec -it bigdl-llm-fintune-qlora-xpu bash +docker exec -it ipex-llm-fintune-qlora-xpu bash ``` Then, start QLoRA fine-tuning: diff --git a/docker/llm/finetune/lora/cpu/README.md b/docker/llm/finetune/lora/cpu/README.md index 47ab9d24..bd11e84b 100644 --- a/docker/llm/finetune/lora/cpu/README.md +++ b/docker/llm/finetune/lora/cpu/README.md @@ -2,13 +2,13 @@ [Alpaca Lora](https://github.com/tloen/alpaca-lora/tree/main) uses [low-rank adaption](https://arxiv.org/pdf/2106.09685.pdf) to speed up the finetuning process of base model [Llama2-7b](https://huggingface.co/meta-llama/Llama-2-7b), and tries to reproduce the standard Alpaca, a general finetuned LLM. This is on top of Hugging Face transformers with Pytorch backend, which natively requires a number of expensive GPU resources and takes significant time. -By constract, BigDL here provides a CPU optimization to accelerate the lora finetuning of Llama2-7b, in the power of mixed-precision and distributed training. Detailedly, [Intel OneCCL](https://www.intel.com/content/www/us/en/developer/tools/oneapi/oneccl.html), an available Hugging Face backend, is able to speed up the Pytorch computation with BF16 datatype on CPUs, as well as parallel processing on Kubernetes enabled by [Intel MPI](https://www.intel.com/content/www/us/en/developer/tools/oneapi/mpi-library.html). +By constract, IPEX-LLM here provides a CPU optimization to accelerate the lora finetuning of Llama2-7b, in the power of mixed-precision and distributed training. Detailedly, [Intel OneCCL](https://www.intel.com/content/www/us/en/developer/tools/oneapi/oneccl.html), an available Hugging Face backend, is able to speed up the Pytorch computation with BF16 datatype on CPUs, as well as parallel processing on Kubernetes enabled by [Intel MPI](https://www.intel.com/content/www/us/en/developer/tools/oneapi/mpi-library.html). The architecture is illustrated in the following: ![image](https://llm-assets.readthedocs.io/en/latest/_images/llm-finetune-lora-cpu-k8s.png) -As above, BigDL implements its MPI training with [Kubeflow MPI operator](https://github.com/kubeflow/mpi-operator/tree/master), which encapsulates the deployment as MPIJob CRD, and assists users to handle the construction of a MPI worker cluster on Kubernetes, such as public key distribution, SSH connection, and log collection. +As above, IPEX-LLM implements its MPI training with [Kubeflow MPI operator](https://github.com/kubeflow/mpi-operator/tree/master), which encapsulates the deployment as MPIJob CRD, and assists users to handle the construction of a MPI worker cluster on Kubernetes, such as public key distribution, SSH connection, and log collection. Now, let's go to deploy a Lora finetuning to create a LLM from Llama2-7b. @@ -20,7 +20,7 @@ Follow [here](https://github.com/kubeflow/mpi-operator/tree/master#installation) ### 2. Download Image, Base Model and Finetuning Data -Follow [here](https://github.com/intel-analytics/BigDL/tree/main/docker/llm/finetune/lora/docker#prepare-bigdl-image-for-lora-finetuning) to prepare BigDL Lora Finetuning image in your cluster. +Follow [here](https://github.com/intel-analytics/IPEX-LLM/tree/main/docker/llm/finetune/lora/docker#prepare-ipex-llm-image-for-lora-finetuning) to prepare IPEX-LLM Lora Finetuning image in your cluster. As finetuning is from a base model, first download [Llama2-7b model from the public download site of Hugging Face](https://huggingface.co/meta-llama/Llama-2-7b). Then, download [cleaned alpaca data](https://raw.githubusercontent.com/tloen/alpaca-lora/main/alpaca_data_cleaned_archive.json), which contains all kinds of general knowledge and has already been cleaned. Next, move the downloaded files to a shared directory on your NFS server. @@ -34,12 +34,12 @@ After preparing parameters in `./kubernetes/values.yaml`, submit the job as befl ```bash cd ./kubernetes -helm install bigdl-lora-finetuning . +helm install ipex-llm-lora-finetuning . ``` ### 4. Check Deployment ```bash -kubectl get all -n bigdl-lora-finetuning # you will see launcher and worker pods running +kubectl get all -n ipex-llm-lora-finetuning # you will see launcher and worker pods running ``` ### 5. Check Finetuning Process @@ -47,8 +47,8 @@ kubectl get all -n bigdl-lora-finetuning # you will see launcher and worker pods After deploying successfully, you can find a launcher pod, and then go inside this pod and check the logs collected from all workers. ```bash -kubectl get all -n bigdl-lora-finetuning # you will see a launcher pod -kubectl exec -it bash -n bigdl-ppml-finetuning # enter launcher pod +kubectl get all -n ipex-llm-lora-finetuning # you will see a launcher pod +kubectl exec -it bash -n ipex-llm-lora-finetuning # enter launcher pod cat launcher.log # display logs collected from other workers ``` diff --git a/docker/llm/finetune/lora/cpu/docker/Dockerfile b/docker/llm/finetune/lora/cpu/docker/Dockerfile index 1d6d919d..4b6f51b9 100644 --- a/docker/llm/finetune/lora/cpu/docker/Dockerfile +++ b/docker/llm/finetune/lora/cpu/docker/Dockerfile @@ -12,13 +12,13 @@ FROM mpioperator/intel as builder ARG http_proxy ARG https_proxy ENV PIP_NO_CACHE_DIR=false -COPY ./requirements.txt /bigdl/requirements.txt +COPY ./requirements.txt /ipex_llm/requirements.txt # add public key COPY --from=key-getter /root/intel-oneapi-archive-keyring.gpg /usr/share/keyrings/intel-oneapi-archive-keyring.gpg RUN echo "deb [signed-by=/usr/share/keyrings/intel-oneapi-archive-keyring.gpg] https://apt.repos.intel.com/oneapi all main " > /etc/apt/sources.list.d/oneAPI.list -RUN mkdir /bigdl/data && mkdir /bigdl/model && \ +RUN mkdir /ipex_llm/data && mkdir /ipex_llm/model && \ # install pytorch 2.0.1 apt-get update && \ apt-get install -y python3-pip python3.9-dev python3-wheel git software-properties-common && \ @@ -29,12 +29,12 @@ RUN mkdir /bigdl/data && mkdir /bigdl/model && \ pip install intel_extension_for_pytorch==2.0.100 && \ pip install oneccl_bind_pt -f https://developer.intel.com/ipex-whl-stable && \ # install transformers etc. - cd /bigdl && \ + cd /ipex_llm && \ git clone https://github.com/huggingface/transformers.git && \ cd transformers && \ git reset --hard 057e1d74733f52817dc05b673a340b4e3ebea08c && \ pip install . && \ - pip install -r /bigdl/requirements.txt && \ + pip install -r /ipex_llm/requirements.txt && \ # install python add-apt-repository ppa:deadsnakes/ppa -y && \ apt-get install -y python3.9 && \ @@ -56,9 +56,9 @@ RUN mkdir /bigdl/data && mkdir /bigdl/model && \ echo " UserKnownHostsFile /dev/null" >> /etc/ssh/ssh_config && \ sed -i 's/#\(StrictModes \).*/\1no/g' /etc/ssh/sshd_config -COPY ./bigdl-lora-finetuing-entrypoint.sh /bigdl/bigdl-lora-finetuing-entrypoint.sh -COPY ./lora_finetune.py /bigdl/lora_finetune.py +COPY ./ipex-llm-lora-finetuing-entrypoint.sh /ipex_llm/ipex-llm-lora-finetuing-entrypoint.sh +COPY ./lora_finetune.py /ipex_llm/lora_finetune.py -RUN chown -R mpiuser /bigdl +RUN chown -R mpiuser /ipex_llm USER mpiuser ENTRYPOINT ["/bin/bash"] diff --git a/docker/llm/finetune/lora/cpu/docker/README.md b/docker/llm/finetune/lora/cpu/docker/README.md index 93013bc0..d93f3f2a 100644 --- a/docker/llm/finetune/lora/cpu/docker/README.md +++ b/docker/llm/finetune/lora/cpu/docker/README.md @@ -1,11 +1,11 @@ ## Fine-tune LLM with One CPU -### 1. Prepare BigDL image for Lora Finetuning +### 1. Prepare IPEX LLM image for Lora Finetuning You can download directly from Dockerhub like: ```bash -docker pull intelanalytics/bigdl-llm-finetune-lora-cpu:2.5.0-SNAPSHOT +docker pull intelanalytics/ipex-llm-finetune-lora-cpu:2.5.0-SNAPSHOT ``` Or build the image from source: @@ -17,7 +17,7 @@ export HTTPS_PROXY=your_https_proxy docker build \ --build-arg http_proxy=${HTTP_PROXY} \ --build-arg https_proxy=${HTTPS_PROXY} \ - -t intelanalytics/bigdl-llm-finetune-lora-cpu:2.5.0-SNAPSHOT \ + -t intelanalytics/ipex-llm-finetune-lora-cpu:2.5.0-SNAPSHOT \ -f ./Dockerfile . ``` @@ -27,13 +27,13 @@ Here, we try to finetune [Llama2-7b](https://huggingface.co/meta-llama/Llama-2-7 ``` docker run -itd \ - --name=bigdl-llm-fintune-lora-cpu \ + --name=ipex-llm-fintune-lora-cpu \ --cpuset-cpus="your_expected_range_of_cpu_numbers" \ -e STANDALONE_DOCKER=TRUE \ -e WORKER_COUNT_DOCKER=your_worker_count \ - -v your_downloaded_base_model_path:/bigdl/model \ - -v your_downloaded_data_path:/bigdl/data/alpaca_data_cleaned_archive.json \ - intelanalytics/bigdl-llm-finetune-lora-cpu:2.5.0-SNAPSHOT \ + -v your_downloaded_base_model_path:/ipex_llm/model \ + -v your_downloaded_data_path:/ipex_llm/data/alpaca_data_cleaned_archive.json \ + intelanalytics/ipex-llm-finetune-lora-cpu:2.5.0-SNAPSHOT \ bash ``` @@ -44,21 +44,21 @@ You can adjust the configuration according to your own environment. After our te Enter the running container: ``` -docker exec -it bigdl-llm-fintune-lora-cpu bash +docker exec -it ipex-llm-fintune-lora-cpu bash ``` Then, run the script to start finetuning: ``` -bash /bigdl/bigdl-lora-finetuing-entrypoint.sh +bash /ipex_llm/ipex-llm-lora-finetuing-entrypoint.sh ``` After minutes, it is expected to get results like: ``` Training Alpaca-LoRA model with params: -base_model: /bigdl/model/ -data_path: /bigdl/data/alpaca_data_cleaned_archive.json +base_model: /ipex_llm/model/ +data_path: /ipex_llm/data/alpaca_data_cleaned_archive.json output_dir: /home/mpiuser/finetuned_model batch_size: 128 micro_batch_size: 8 diff --git a/docker/llm/finetune/lora/cpu/docker/bigdl-lora-finetuing-entrypoint.sh b/docker/llm/finetune/lora/cpu/docker/ipex-llm-lora-finetuing-entrypoint.sh similarity index 86% rename from docker/llm/finetune/lora/cpu/docker/bigdl-lora-finetuing-entrypoint.sh rename to docker/llm/finetune/lora/cpu/docker/ipex-llm-lora-finetuing-entrypoint.sh index 3bd2305a..fab52a47 100644 --- a/docker/llm/finetune/lora/cpu/docker/bigdl-lora-finetuing-entrypoint.sh +++ b/docker/llm/finetune/lora/cpu/docker/ipex-llm-lora-finetuing-entrypoint.sh @@ -15,9 +15,9 @@ then -genv KMP_AFFINITY="granularity=fine,none" \ -genv KMP_BLOCKTIME=1 \ -genv TF_ENABLE_ONEDNN_OPTS=1 \ - python /bigdl/lora_finetune.py \ - --base_model '/bigdl/model/' \ - --data_path "/bigdl/data/alpaca_data_cleaned_archive.json" \ + python /ipex_llm/lora_finetune.py \ + --base_model '/ipex_llm/model/' \ + --data_path "/ipex_llm/data/alpaca_data_cleaned_archive.json" \ --output_dir "/home/mpiuser/finetuned_model" \ --micro_batch_size 8 \ --bf16 @@ -29,7 +29,7 @@ else if [ "$WORKER_ROLE" = "launcher" ] then sed "s/:1/ /g" /etc/mpi/hostfile > /home/mpiuser/hostfile - export DATA_PATH="/bigdl/data/$DATA_SUB_PATH" + export DATA_PATH="/ipex_llm/data/$DATA_SUB_PATH" sleep 10 mpirun \ -n $WORLD_SIZE \ @@ -40,8 +40,8 @@ else -genv KMP_AFFINITY="granularity=fine,none" \ -genv KMP_BLOCKTIME=1 \ -genv TF_ENABLE_ONEDNN_OPTS=1 \ - python /bigdl/lora_finetune.py \ - --base_model '/bigdl/model/' \ + python /ipex_llm/lora_finetune.py \ + --base_model '/ipex_llm/model/' \ --data_path "$DATA_PATH" \ --output_dir "/home/mpiuser/finetuned_model" \ --micro_batch_size $MICRO_BATCH_SIZE \ diff --git a/docker/llm/finetune/lora/cpu/kubernetes/Chart.yaml b/docker/llm/finetune/lora/cpu/kubernetes/Chart.yaml index dead414b..bbb69e9b 100644 --- a/docker/llm/finetune/lora/cpu/kubernetes/Chart.yaml +++ b/docker/llm/finetune/lora/cpu/kubernetes/Chart.yaml @@ -1,6 +1,6 @@ apiVersion: v2 name: trusted-fintune-service -description: A Helm chart for BigDL PPML Trusted BigData Service on Kubernetes +description: A Helm chart for IPEX-LLM Finetuning Service on Kubernetes type: application version: 1.1.27 appVersion: "1.16.0" diff --git a/docker/llm/finetune/lora/cpu/kubernetes/templates/bigdl-lora-finetuning-job.yaml b/docker/llm/finetune/lora/cpu/kubernetes/templates/ipex-llm-lora-finetuning-job.yaml similarity index 75% rename from docker/llm/finetune/lora/cpu/kubernetes/templates/bigdl-lora-finetuning-job.yaml rename to docker/llm/finetune/lora/cpu/kubernetes/templates/ipex-llm-lora-finetuning-job.yaml index 34d7170a..972eda57 100644 --- a/docker/llm/finetune/lora/cpu/kubernetes/templates/bigdl-lora-finetuning-job.yaml +++ b/docker/llm/finetune/lora/cpu/kubernetes/templates/ipex-llm-lora-finetuning-job.yaml @@ -1,8 +1,8 @@ apiVersion: kubeflow.org/v2beta1 kind: MPIJob metadata: - name: bigdl-lora-finetuning-job - namespace: bigdl-lora-finetuning + name: ipex-llm-lora-finetuning-job + namespace: ipex-llm-lora-finetuning spec: slotsPerWorker: 1 runPolicy: @@ -20,10 +20,10 @@ spec: claimName: nfs-pvc containers: - image: {{ .Values.imageName }} - name: bigdl-ppml-finetuning-launcher + name: ipex-llm-lora-finetuning-launcher securityContext: runAsUser: 1000 - command: ['sh' , '-c', 'bash /bigdl/bigdl-lora-finetuing-entrypoint.sh'] + command: ['sh' , '-c', 'bash /ipex_llm/ipex-llm-lora-finetuing-entrypoint.sh'] env: - name: WORKER_ROLE value: "launcher" @@ -34,7 +34,7 @@ spec: - name: MASTER_PORT value: "42679" - name: MASTER_ADDR - value: "bigdl-lora-finetuning-job-worker-0.bigdl-lora-finetuning-job-worker" + value: "ipex-llm-lora-finetuning-job-worker-0.ipex-llm-lora-finetuning-job-worker" - name: DATA_SUB_PATH value: "{{ .Values.dataSubPath }}" - name: OMP_NUM_THREADS @@ -46,20 +46,20 @@ spec: volumeMounts: - name: nfs-storage subPath: {{ .Values.modelSubPath }} - mountPath: /bigdl/model + mountPath: /ipex_llm/model - name: nfs-storage subPath: {{ .Values.dataSubPath }} - mountPath: "/bigdl/data/{{ .Values.dataSubPath }}" + mountPath: "/ipex_llm/data/{{ .Values.dataSubPath }}" Worker: replicas: {{ .Values.trainerNum }} template: spec: containers: - image: {{ .Values.imageName }} - name: bigdl-ppml-finetuning-worker + name: ipex-llm-lora-finetuning-worker securityContext: runAsUser: 1000 - command: ['sh' , '-c', 'bash /bigdl/bigdl-lora-finetuing-entrypoint.sh'] + command: ['sh' , '-c', 'bash /ipex_llm/ipex-llm-lora-finetuing-entrypoint.sh'] env: - name: WORKER_ROLE value: "trainer" @@ -70,7 +70,7 @@ spec: - name: MASTER_PORT value: "42679" - name: MASTER_ADDR - value: "bigdl-lora-finetuning-job-worker-0.bigdl-lora-finetuning-job-worker" + value: "ipex-llm-lora-finetuning-job-worker-0.ipex-llm-lora-finetuning-job-worker" - name: LOCAL_POD_NAME valueFrom: fieldRef: @@ -78,10 +78,10 @@ spec: volumeMounts: - name: nfs-storage subPath: {{ .Values.modelSubPath }} - mountPath: /bigdl/model + mountPath: /ipex_llm/model - name: nfs-storage subPath: {{ .Values.dataSubPath }} - mountPath: "/bigdl/data/{{ .Values.dataSubPath }}" + mountPath: "/ipex_llm/data/{{ .Values.dataSubPath }}" resources: requests: cpu: {{ .Values.cpuPerPod }} diff --git a/docker/llm/finetune/qlora/cpu/kubernetes/templates/bigdl-qlora-finetuning-namespace.yaml b/docker/llm/finetune/lora/cpu/kubernetes/templates/ipex-llm-lora-finetuning-namespace.yaml similarity index 55% rename from docker/llm/finetune/qlora/cpu/kubernetes/templates/bigdl-qlora-finetuning-namespace.yaml rename to docker/llm/finetune/lora/cpu/kubernetes/templates/ipex-llm-lora-finetuning-namespace.yaml index c873aa9f..8ca99ec5 100644 --- a/docker/llm/finetune/qlora/cpu/kubernetes/templates/bigdl-qlora-finetuning-namespace.yaml +++ b/docker/llm/finetune/lora/cpu/kubernetes/templates/ipex-llm-lora-finetuning-namespace.yaml @@ -1,4 +1,4 @@ apiVersion: v1 kind: Namespace metadata: - name: bigdl-qlora-finetuning + name: ipex-llm-lora-finetuning diff --git a/docker/llm/finetune/lora/cpu/kubernetes/templates/nfs-pv.yaml b/docker/llm/finetune/lora/cpu/kubernetes/templates/nfs-pv.yaml index 63e90ba7..38e8e722 100644 --- a/docker/llm/finetune/lora/cpu/kubernetes/templates/nfs-pv.yaml +++ b/docker/llm/finetune/lora/cpu/kubernetes/templates/nfs-pv.yaml @@ -1,8 +1,8 @@ apiVersion: v1 kind: PersistentVolume metadata: - name: nfs-pv-bigdl-lora-finetuning - namespace: bigdl-lora-finetuning + name: nfs-pv-ipex-llm-lora-finetuning + namespace: ipex-llm-lora-finetuning spec: capacity: storage: 15Gi diff --git a/docker/llm/finetune/lora/cpu/kubernetes/templates/nfs-pvc.yaml b/docker/llm/finetune/lora/cpu/kubernetes/templates/nfs-pvc.yaml index 5c2284eb..a65bcdb9 100644 --- a/docker/llm/finetune/lora/cpu/kubernetes/templates/nfs-pvc.yaml +++ b/docker/llm/finetune/lora/cpu/kubernetes/templates/nfs-pvc.yaml @@ -2,7 +2,7 @@ kind: PersistentVolumeClaim apiVersion: v1 metadata: name: nfs-pvc - namespace: bigdl-lora-finetuning + namespace: ipex-llm-lora-finetuning spec: accessModes: - ReadWriteOnce diff --git a/docker/llm/finetune/lora/cpu/kubernetes/values.yaml b/docker/llm/finetune/lora/cpu/kubernetes/values.yaml index 36e137b8..b082d0aa 100644 --- a/docker/llm/finetune/lora/cpu/kubernetes/values.yaml +++ b/docker/llm/finetune/lora/cpu/kubernetes/values.yaml @@ -1,4 +1,4 @@ -imageName: intelanalytics/bigdl-llm-finetune-lora-cpu:2.5.0-SNAPSHOT +imageName: intelanalytics/ipex-llm-finetune-lora-cpu:2.5.0-SNAPSHOT trainerNum: 8 microBatchSize: 8 nfsServerIp: your_nfs_server_ip diff --git a/docker/llm/finetune/qlora/cpu/docker/Dockerfile b/docker/llm/finetune/qlora/cpu/docker/Dockerfile index 2cfb1645..3cf2adc2 100644 --- a/docker/llm/finetune/qlora/cpu/docker/Dockerfile +++ b/docker/llm/finetune/qlora/cpu/docker/Dockerfile @@ -18,7 +18,7 @@ ENV TRANSFORMERS_COMMIT_ID=95fe0f5 COPY --from=key-getter /root/intel-oneapi-archive-keyring.gpg /usr/share/keyrings/intel-oneapi-archive-keyring.gpg RUN echo "deb [signed-by=/usr/share/keyrings/intel-oneapi-archive-keyring.gpg] https://apt.repos.intel.com/oneapi all main " > /etc/apt/sources.list.d/oneAPI.list -RUN mkdir -p /bigdl/data && mkdir -p /bigdl/model && \ +RUN mkdir -p /ipex_llm/data && mkdir -p /ipex_llm/model && \ # install pytorch 2.1.0 apt-get update && \ apt-get install -y --no-install-recommends python3-pip python3.9-dev python3-wheel python3.9-distutils git software-properties-common && \ @@ -27,8 +27,8 @@ RUN mkdir -p /bigdl/data && mkdir -p /bigdl/model && \ pip3 install --upgrade pip && \ export PIP_DEFAULT_TIMEOUT=100 && \ pip install --upgrade torch==2.1.0 && \ - # install CPU bigdl-llm - pip3 install --pre --upgrade bigdl-llm[all] && \ + # install CPU ipex-llm + pip3 install --pre --upgrade ipex-llm[all] && \ # install ipex and oneccl pip install https://intel-extension-for-pytorch.s3.amazonaws.com/ipex_stable/cpu/intel_extension_for_pytorch-2.1.0%2Bcpu-cp39-cp39-linux_x86_64.whl && \ pip install oneccl_bind_pt -f https://developer.intel.com/ipex-whl-stable && \ @@ -41,16 +41,16 @@ RUN mkdir -p /bigdl/data && mkdir -p /bigdl/model && \ apt-get update && apt-get install -y curl wget gpg gpg-agent software-properties-common libunwind8-dev && \ # get qlora example code ln -s /usr/bin/python3 /usr/bin/python && \ - cd /bigdl && \ - git clone https://github.com/intel-analytics/BigDL.git && \ - mv BigDL/python/llm/example/CPU/QLoRA-FineTuning/* . && \ + cd /ipex_llm && \ + git clone https://github.com/intel-analytics/IPEX-LLM.git && \ + mv IPEX-LLM/python/llm/example/CPU/QLoRA-FineTuning/* . && \ mkdir -p /GPU/LLM-Finetuning && \ - mv BigDL/python/llm/example/GPU/LLM-Finetuning/common /GPU/LLM-Finetuning/common && \ - rm -r BigDL && \ - chown -R mpiuser /bigdl + mv IPEX-LLM/python/llm/example/GPU/LLM-Finetuning/common /GPU/LLM-Finetuning/common && \ + rm -r IPEX-LLM && \ + chown -R mpiuser /ipex_llm # for standalone -COPY ./start-qlora-finetuning-on-cpu.sh /bigdl/start-qlora-finetuning-on-cpu.sh +COPY ./start-qlora-finetuning-on-cpu.sh /ipex_llm/start-qlora-finetuning-on-cpu.sh USER mpiuser diff --git a/docker/llm/finetune/qlora/cpu/docker/Dockerfile.k8s b/docker/llm/finetune/qlora/cpu/docker/Dockerfile.k8s index d2991985..71a8a5e1 100644 --- a/docker/llm/finetune/qlora/cpu/docker/Dockerfile.k8s +++ b/docker/llm/finetune/qlora/cpu/docker/Dockerfile.k8s @@ -19,7 +19,7 @@ ENV TRANSFORMERS_COMMIT_ID=95fe0f5 COPY --from=key-getter /root/intel-oneapi-archive-keyring.gpg /usr/share/keyrings/intel-oneapi-archive-keyring.gpg RUN echo "deb [signed-by=/usr/share/keyrings/intel-oneapi-archive-keyring.gpg] https://apt.repos.intel.com/oneapi all main " > /etc/apt/sources.list.d/oneAPI.list -RUN mkdir -p /bigdl/data && mkdir -p /bigdl/model && \ +RUN mkdir -p /ipex_llm/data && mkdir -p /ipex_llm/model && \ apt-get update && \ apt install -y --no-install-recommends openssh-server openssh-client libcap2-bin gnupg2 ca-certificates \ python3-pip python3.9-dev python3-wheel python3.9-distutils git software-properties-common && \ @@ -40,8 +40,8 @@ RUN mkdir -p /bigdl/data && mkdir -p /bigdl/model && \ pip3 install --upgrade pip && \ export PIP_DEFAULT_TIMEOUT=100 && \ pip install --upgrade torch==2.1.0 --index-url https://download.pytorch.org/whl/cpu && \ - # install CPU bigdl-llm - pip3 install --pre --upgrade bigdl-llm[all] && \ + # install CPU ipex-llm + pip3 install --pre --upgrade ipex-llm[all] && \ # install ipex and oneccl pip install intel_extension_for_pytorch==2.0.100 && \ pip install oneccl_bind_pt -f https://developer.intel.com/ipex-whl-stable && \ @@ -59,14 +59,14 @@ RUN mkdir -p /bigdl/data && mkdir -p /bigdl/model && \ rm -rf /var/lib/apt/lists/* && \ # get qlora example code ln -s /usr/bin/python3 /usr/bin/python && \ - cd /bigdl && \ - git clone https://github.com/intel-analytics/BigDL.git && \ - mv BigDL/python/llm/example/CPU/QLoRA-FineTuning/* . && \ - rm -r BigDL && \ - chown -R mpiuser /bigdl + cd /ipex_llm && \ + git clone https://github.com/intel-analytics/IPEX-LLM.git && \ + mv IPEX-LLM/python/llm/example/CPU/QLoRA-FineTuning/* . && \ + rm -r IPEX-LLM && \ + chown -R mpiuser /ipex_llm # for k8s -COPY ./bigdl-qlora-finetuing-entrypoint.sh /bigdl/bigdl-qlora-finetuing-entrypoint.sh +COPY ./ipex-llm-qlora-finetuing-entrypoint.sh /ipex_llm/ipex-llm-qlora-finetuing-entrypoint.sh USER mpiuser diff --git a/docker/llm/finetune/qlora/cpu/docker/README.md b/docker/llm/finetune/qlora/cpu/docker/README.md index 98cb1dfe..16e6e11d 100644 --- a/docker/llm/finetune/qlora/cpu/docker/README.md +++ b/docker/llm/finetune/qlora/cpu/docker/README.md @@ -1,6 +1,6 @@ -## Fine-tune LLM with BigDL LLM Container +## Fine-tune LLM with IPEX LLM Container -The following shows how to fine-tune LLM with Quantization (QLoRA built on BigDL-LLM 4bit optimizations) in a docker environment, which is accelerated by Intel CPU. +The following shows how to fine-tune LLM with Quantization (QLoRA built on IPEX-LLM 4bit optimizations) in a docker environment, which is accelerated by Intel CPU. ### 1. Prepare Docker Image @@ -8,10 +8,10 @@ You can download directly from Dockerhub like: ```bash # For standalone -docker pull intelanalytics/bigdl-llm-finetune-qlora-cpu-standalone:2.5.0-SNAPSHOT +docker pull intelanalytics/ipex-llm-finetune-qlora-cpu-standalone:2.5.0-SNAPSHOT # For k8s -docker pull intelanalytics/bigdl-llm-finetune-qlora-cpu-k8s:2.5.0-SNAPSHOT +docker pull intelanalytics/ipex-llm-finetune-qlora-cpu-k8s:2.5.0-SNAPSHOT ``` Or build the image from source: @@ -24,7 +24,7 @@ export HTTPS_PROXY=your_https_proxy docker build \ --build-arg http_proxy=${HTTP_PROXY} \ --build-arg https_proxy=${HTTPS_PROXY} \ - -t intelanalytics/bigdl-llm-finetune-qlora-cpu-standalone:2.5.0-SNAPSHOT \ + -t intelanalytics/ipex-llm-finetune-qlora-cpu-standalone:2.5.0-SNAPSHOT \ -f ./Dockerfile . # For k8s @@ -34,7 +34,7 @@ export HTTPS_PROXY=your_https_proxy docker build \ --build-arg http_proxy=${HTTP_PROXY} \ --build-arg https_proxy=${HTTPS_PROXY} \ - -t intelanalytics/bigdl-llm-finetune-qlora-cpu-k8s:2.5.0-SNAPSHOT \ + -t intelanalytics/ipex-llm-finetune-qlora-cpu-k8s:2.5.0-SNAPSHOT \ -f ./Dockerfile.k8s . ``` @@ -50,12 +50,12 @@ export HTTPS_PROXY=your_https_proxy docker run -itd \ --net=host \ - --name=bigdl-llm-fintune-qlora-cpu \ + --name=ipex-llm-fintune-qlora-cpu \ -e http_proxy=${HTTP_PROXY} \ -e https_proxy=${HTTPS_PROXY} \ - -v $BASE_MODE_PATH:/bigdl/model \ - -v $DATA_PATH:/bigdl/data/alpaca-cleaned \ - intelanalytics/bigdl-llm-finetune-qlora-cpu-standalone:2.5.0-SNAPSHOT + -v $BASE_MODE_PATH:/ipex_llm/model \ + -v $DATA_PATH:/ipex_llm/data/alpaca-cleaned \ + intelanalytics/ipex-llm-finetune-qlora-cpu-standalone:2.5.0-SNAPSHOT ``` The download and mount of base model and data to a docker container demonstrates a standard fine-tuning process. You can skip this step for a quick start, and in this way, the fine-tuning codes will automatically download the needed files: @@ -66,10 +66,10 @@ export HTTPS_PROXY=your_https_proxy docker run -itd \ --net=host \ - --name=bigdl-llm-fintune-qlora-cpu \ + --name=ipex-llm-fintune-qlora-cpu \ -e http_proxy=${HTTP_PROXY} \ -e https_proxy=${HTTPS_PROXY} \ - intelanalytics/bigdl-llm-finetune-qlora-cpu-standalone:2.5.0-SNAPSHOT + intelanalytics/ipex-llm-finetune-qlora-cpu-standalone:2.5.0-SNAPSHOT ``` However, we do recommend you to handle them manually, because the automatical download can be blocked by Internet access and Huggingface authentication etc. according to different environment, and the manual method allows you to fine-tune in a custom way (with different base model and dataset). @@ -79,14 +79,14 @@ However, we do recommend you to handle them manually, because the automatical do Enter the running container: ```bash -docker exec -it bigdl-llm-fintune-qlora-cpu bash +docker exec -it ipex-llm-fintune-qlora-cpu bash ``` Then, start QLoRA fine-tuning: If the machine memory is not enough, you can try to set `use_gradient_checkpointing=True`. ```bash -cd /bigdl +cd /ipex_llm bash start-qlora-finetuning-on-cpu.sh ``` @@ -120,19 +120,17 @@ Then you can use `./outputs/checkpoint-200-merged` as a normal huggingface trans ### 4. Start Multi-Porcess Fine-Tuning in One Docker -
- -Multi-process parallelism enables higher performance for QLoRA fine-tuning, e.g. Xeon server series with multi-processor-socket architecture is suitable to run one instance on each QLoRA. This can be done by simply invoke >=2 OneCCL instances in BigDL QLoRA docker: +Multi-process parallelism enables higher performance for QLoRA fine-tuning, e.g. Xeon server series with multi-processor-socket architecture is suitable to run one instance on each QLoRA. This can be done by simply invoke >=2 OneCCL instances in IPEX-LLM QLoRA docker: ```bash docker run -itd \ - --name=bigdl-llm-fintune-qlora-cpu \ + --name=ipex-llm-fintune-qlora-cpu \ --cpuset-cpus="your_expected_range_of_cpu_numbers" \ -e STANDALONE_DOCKER=TRUE \ -e WORKER_COUNT_DOCKER=your_worker_count \ - -v your_downloaded_base_model_path:/bigdl/model \ - -v your_downloaded_data_path:/bigdl/data/alpaca_data_cleaned_archive.json \ - intelanalytics/bigdl-llm-finetune-qlora-cpu-standalone:2.5.0-SNAPSHOT + -v your_downloaded_base_model_path:/ipex_llm/model \ + -v your_downloaded_data_path:/ipex_llm/data/alpaca_data_cleaned_archive.json \ + intelanalytics/ipex-llm-finetune-qlora-cpu-standalone:2.5.0-SNAPSHOT ``` Note that `STANDALONE_DOCKER` is set to **TRUE** here. @@ -145,4 +143,4 @@ bash start-qlora-finetuning-on-cpu.sh ### 5. Start Distributed Fine-Tuning on Kubernetes -Besides multi-process mode, you can also run QLoRA on a kubernetes cluster. please refer [here](https://github.com/intel-analytics/BigDL/blob/main/docker/llm/finetune/qlora/cpu/kubernetes/README.md). +Besides multi-process mode, you can also run QLoRA on a kubernetes cluster. please refer [here](https://github.com/intel-analytics/IPEX-LLM/blob/main/docker/llm/finetune/qlora/cpu/kubernetes/README.md). diff --git a/docker/llm/finetune/qlora/cpu/docker/bigdl-qlora-finetuing-entrypoint.sh b/docker/llm/finetune/qlora/cpu/docker/ipex-llm-qlora-finetuing-entrypoint.sh similarity index 87% rename from docker/llm/finetune/qlora/cpu/docker/bigdl-qlora-finetuing-entrypoint.sh rename to docker/llm/finetune/qlora/cpu/docker/ipex-llm-qlora-finetuing-entrypoint.sh index 3ed37dbb..8d468bbd 100644 --- a/docker/llm/finetune/qlora/cpu/docker/bigdl-qlora-finetuing-entrypoint.sh +++ b/docker/llm/finetune/qlora/cpu/docker/ipex-llm-qlora-finetuing-entrypoint.sh @@ -3,8 +3,8 @@ set -x source /opt/intel/oneapi/setvars.sh export CCL_WORKER_COUNT=$WORLD_SIZE -source bigdl-llm-init -t -cd /bigdl/alpaca-qlora +source ipex-llm-init -t +cd /ipex_llm/alpaca-qlora if [ "$WORKER_ROLE" = "launcher" ] then sed "s/:1/ /g" /etc/mpi/hostfile > /home/mpiuser/hostfile @@ -24,9 +24,9 @@ then -genv KMP_AFFINITY="granularity=fine,none" \ -genv KMP_BLOCKTIME=1 \ -genv TF_ENABLE_ONEDNN_OPTS=1 \ - python /bigdl/alpaca-qlora/alpaca_qlora_finetuning_cpu.py \ - --base_model '/bigdl/model' \ - --data_path "/bigdl/data" \ + python /ipex_llm/alpaca-qlora/alpaca_qlora_finetuning_cpu.py \ + --base_model '/ipex_llm/model' \ + --data_path "/ipex_llm/data" \ --output_dir "/home/mpiuser/finetuned_model" \ --batch_size 128 \ --micro_batch_size $MICRO_BATCH_SIZE \ diff --git a/docker/llm/finetune/qlora/cpu/docker/start-qlora-finetuning-on-cpu.sh b/docker/llm/finetune/qlora/cpu/docker/start-qlora-finetuning-on-cpu.sh index 0a428334..90e0d885 100644 --- a/docker/llm/finetune/qlora/cpu/docker/start-qlora-finetuning-on-cpu.sh +++ b/docker/llm/finetune/qlora/cpu/docker/start-qlora-finetuning-on-cpu.sh @@ -1,10 +1,10 @@ #!/bin/bash set -x -cd /bigdl +cd /ipex_llm export USE_XETLA=OFF export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 source /opt/intel/oneapi/setvars.sh -source bigdl-llm-init -t +source ipex-llm-init -t if [ -d "./model" ]; then diff --git a/docker/llm/finetune/qlora/cpu/kubernetes/Chart.yaml b/docker/llm/finetune/qlora/cpu/kubernetes/Chart.yaml index 3606401c..2c750b4d 100644 --- a/docker/llm/finetune/qlora/cpu/kubernetes/Chart.yaml +++ b/docker/llm/finetune/qlora/cpu/kubernetes/Chart.yaml @@ -1,6 +1,6 @@ apiVersion: v2 -name: bigdl-fintune-service -description: A Helm chart for BigDL Finetune Service on Kubernetes +name: ipex-fintune-service +description: A Helm chart for IPEX-LLM Finetune Service on Kubernetes type: application version: 1.1.27 appVersion: "1.16.0" diff --git a/docker/llm/finetune/qlora/cpu/kubernetes/README.md b/docker/llm/finetune/qlora/cpu/kubernetes/README.md index 73fb6491..4279ea66 100644 --- a/docker/llm/finetune/qlora/cpu/kubernetes/README.md +++ b/docker/llm/finetune/qlora/cpu/kubernetes/README.md @@ -1,12 +1,10 @@ ## Run Distributed QLoRA Fine-Tuning on Kubernetes with OneCCL -![image](https://github.com/intel-analytics/BigDL/assets/60865256/825f47d9-c864-4f39-a331-adb1e3cb528e) - -BigDL here provides a CPU optimization to accelerate the QLoRA finetuning of Llama2-7b, in the power of mixed-precision and distributed training. Detailedly, [Intel OneCCL](https://www.intel.com/content/www/us/en/developer/tools/oneapi/oneccl.html), an available Hugging Face backend, is able to speed up the Pytorch computation with BF16 datatype on CPUs, as well as parallel processing on Kubernetes enabled by [Intel MPI](https://www.intel.com/content/www/us/en/developer/tools/oneapi/mpi-library.html). Moreover, advanaced quantization of BigDL-LLM has been applied to improve memory utilization, which makes CPU large-scale fine-tuning possible with runtime NF4 model storage and BF16 computing types. +IPEX-LLM here provides a CPU optimization to accelerate the QLoRA finetuning of Llama2-7b, in the power of mixed-precision and distributed training. Detailedly, [Intel OneCCL](https://www.intel.com/content/www/us/en/developer/tools/oneapi/oneccl.html), an available Hugging Face backend, is able to speed up the Pytorch computation with BF16 datatype on CPUs, as well as parallel processing on Kubernetes enabled by [Intel MPI](https://www.intel.com/content/www/us/en/developer/tools/oneapi/mpi-library.html). Moreover, advanaced quantization of IPEX-LLM has been applied to improve memory utilization, which makes CPU large-scale fine-tuning possible with runtime NF4 model storage and BF16 computing types. The architecture is illustrated in the following: -As above, BigDL implements its MPI training with [Kubeflow MPI operator](https://github.com/kubeflow/mpi-operator/tree/master), which encapsulates the deployment as MPIJob CRD, and assists users to handle the construction of a MPI worker cluster on Kubernetes, such as public key distribution, SSH connection, and log collection. +As above, IPEX-LLM implements its MPI training with [Kubeflow MPI operator](https://github.com/kubeflow/mpi-operator/tree/master), which encapsulates the deployment as MPIJob CRD, and assists users to handle the construction of a MPI worker cluster on Kubernetes, such as public key distribution, SSH connection, and log collection. Now, let's go to deploy a QLoRA finetuning to create a new LLM from Llama2-7b. @@ -18,7 +16,7 @@ Follow [here](https://github.com/kubeflow/mpi-operator/tree/master#installation) ### 2. Download Image, Base Model and Finetuning Data -Follow [here](https://github.com/intel-analytics/BigDL/tree/main/docker/llm/finetune/qlora/cpu/docker#1-prepare-docker-image) to prepare BigDL QLoRA Finetuning image in your cluster. +Follow [here](https://github.com/intel-analytics/IPEX-LLM/tree/main/docker/llm/finetune/qlora/cpu/docker#1-prepare-docker-image) to prepare IPEX-LLM QLoRA Finetuning image in your cluster. As finetuning is from a base model, first download [Llama2-7b model from the public download site of Hugging Face](https://huggingface.co/meta-llama/Llama-2-7b). Then, download [cleaned alpaca data](https://raw.githubusercontent.com/tloen/alpaca-lora/main/alpaca_data_cleaned_archive.json), which contains all kinds of general knowledge and has already been cleaned. Next, move the downloaded files to a shared directory on your NFS server. @@ -32,12 +30,12 @@ After preparing parameters in `./kubernetes/values.yaml`, submit the job as befl ```bash cd ./kubernetes -helm install bigdl-qlora-finetuning . +helm install ipex-llm-qlora-finetuning . ``` ### 4. Check Deployment ```bash -kubectl get all -n bigdl-qlora-finetuning # you will see launcher and worker pods running +kubectl get all -n ipex-llm-qlora-finetuning # you will see launcher and worker pods running ``` ### 5. Check Finetuning Process @@ -45,8 +43,8 @@ kubectl get all -n bigdl-qlora-finetuning # you will see launcher and worker pod After deploying successfully, you can find a launcher pod, and then go inside this pod and check the logs collected from all workers. ```bash -kubectl get all -n bigdl-qlora-finetuning # you will see a launcher pod -kubectl exec -it bash -n bigdl-qlora-finetuning # enter launcher pod +kubectl get all -n ipex-llm-qlora-finetuning # you will see a launcher pod +kubectl exec -it bash -n ipex-llm-qlora-finetuning # enter launcher pod cat launcher.log # display logs collected from other workers ``` diff --git a/docker/llm/finetune/lora/cpu/kubernetes/templates/bigdl-lora-finetuning-namespace.yaml b/docker/llm/finetune/qlora/cpu/kubernetes/templates/ipex-llm-finetuning-namespace.yaml similarity index 54% rename from docker/llm/finetune/lora/cpu/kubernetes/templates/bigdl-lora-finetuning-namespace.yaml rename to docker/llm/finetune/qlora/cpu/kubernetes/templates/ipex-llm-finetuning-namespace.yaml index b521299e..2bb03dc4 100644 --- a/docker/llm/finetune/lora/cpu/kubernetes/templates/bigdl-lora-finetuning-namespace.yaml +++ b/docker/llm/finetune/qlora/cpu/kubernetes/templates/ipex-llm-finetuning-namespace.yaml @@ -1,4 +1,4 @@ apiVersion: v1 kind: Namespace metadata: - name: bigdl-lora-finetuning + name: ipex-llm-qlora-finetuning diff --git a/docker/llm/finetune/qlora/cpu/kubernetes/templates/bigdl-qlora-finetuning-job.yaml b/docker/llm/finetune/qlora/cpu/kubernetes/templates/ipex-llm-qlora-finetuning-job.yaml similarity index 81% rename from docker/llm/finetune/qlora/cpu/kubernetes/templates/bigdl-qlora-finetuning-job.yaml rename to docker/llm/finetune/qlora/cpu/kubernetes/templates/ipex-llm-qlora-finetuning-job.yaml index 71b1cd03..a6fd8477 100644 --- a/docker/llm/finetune/qlora/cpu/kubernetes/templates/bigdl-qlora-finetuning-job.yaml +++ b/docker/llm/finetune/qlora/cpu/kubernetes/templates/ipex-llm-qlora-finetuning-job.yaml @@ -1,8 +1,8 @@ apiVersion: kubeflow.org/v2beta1 kind: MPIJob metadata: - name: bigdl-qlora-finetuning-job - namespace: bigdl-qlora-finetuning + name: ipex-llm-qlora-finetuning-job + namespace: ipex-llm-qlora-finetuning spec: slotsPerWorker: 1 runPolicy: @@ -20,10 +20,10 @@ spec: claimName: nfs-pvc containers: - image: {{ .Values.imageName }} - name: bigdl-qlora-finetuning-launcher + name: ipex-llm-qlora-finetuning-launcher securityContext: runAsUser: 1000 - command: ['sh' , '-c', 'bash /bigdl/bigdl-qlora-finetuing-entrypoint.sh'] + command: ['sh' , '-c', 'bash /ipex_llm/ipex-llm-qlora-finetuing-entrypoint.sh'] env: - name: WORKER_ROLE value: "launcher" @@ -34,7 +34,7 @@ spec: - name: MASTER_PORT value: "42679" - name: MASTER_ADDR - value: "bigdl-qlora-finetuning-job-worker-0.bigdl-qlora-finetuning-job-worker" + value: "ipex-llm-qlora-finetuning-job-worker-0.ipex-llm-qlora-finetuning-job-worker" - name: DATA_SUB_PATH value: "{{ .Values.dataSubPath }}" - name: ENABLE_GRADIENT_CHECKPOINT @@ -52,10 +52,10 @@ spec: volumeMounts: - name: nfs-storage subPath: {{ .Values.modelSubPath }} - mountPath: /bigdl/model + mountPath: /ipex_llm/model - name: nfs-storage subPath: {{ .Values.dataSubPath }} - mountPath: "/bigdl/data/{{ .Values.dataSubPath }}" + mountPath: "/ipex_llm/data/{{ .Values.dataSubPath }}" Worker: replicas: {{ .Values.trainerNum }} template: @@ -74,10 +74,10 @@ spec: topologyKey: kubernetes.io/hostname containers: - image: {{ .Values.imageName }} - name: bigdl-qlora-finetuning-worker + name: ipex-llm-qlora-finetuning-worker securityContext: runAsUser: 1000 - command: ['sh' , '-c', 'bash /bigdl/bigdl-qlora-finetuing-entrypoint.sh'] + command: ['sh' , '-c', 'bash /ipex_llm/ipex-llm-qlora-finetuing-entrypoint.sh'] env: - name: WORKER_ROLE value: "trainer" @@ -88,7 +88,7 @@ spec: - name: MASTER_PORT value: "42679" - name: MASTER_ADDR - value: "bigdl-qlora-finetuning-job-worker-0.bigdl-qlora-finetuning-job-worker" + value: "ipex-llm-qlora-finetuning-job-worker-0.ipex-llm-qlora-finetuning-job-worker" - name: ENABLE_GRADIENT_CHECKPOINT value: "{{ .Values.enableGradientCheckpoint }}" - name: http_proxy @@ -102,10 +102,10 @@ spec: volumeMounts: - name: nfs-storage subPath: {{ .Values.modelSubPath }} - mountPath: /bigdl/model + mountPath: /ipex_llm/model - name: nfs-storage subPath: {{ .Values.dataSubPath }} - mountPath: "/bigdl/data/{{ .Values.dataSubPath }}" + mountPath: "/ipex_llm/data/{{ .Values.dataSubPath }}" resources: requests: cpu: 48 diff --git a/docker/llm/finetune/qlora/cpu/kubernetes/templates/nfs-pv.yaml b/docker/llm/finetune/qlora/cpu/kubernetes/templates/nfs-pv.yaml index 14a2126f..fa104391 100644 --- a/docker/llm/finetune/qlora/cpu/kubernetes/templates/nfs-pv.yaml +++ b/docker/llm/finetune/qlora/cpu/kubernetes/templates/nfs-pv.yaml @@ -1,8 +1,8 @@ apiVersion: v1 kind: PersistentVolume metadata: - name: nfs-pv-bigdl-qlora-finetuning - namespace: bigdl-qlora-finetuning + name: nfs-pv-ipex-llm-qlora-finetuning + namespace: ipex-llm-qlora-finetuning spec: capacity: storage: 15Gi diff --git a/docker/llm/finetune/qlora/cpu/kubernetes/templates/nfs-pvc.yaml b/docker/llm/finetune/qlora/cpu/kubernetes/templates/nfs-pvc.yaml index 48ef589d..0a2b0d99 100644 --- a/docker/llm/finetune/qlora/cpu/kubernetes/templates/nfs-pvc.yaml +++ b/docker/llm/finetune/qlora/cpu/kubernetes/templates/nfs-pvc.yaml @@ -2,7 +2,7 @@ kind: PersistentVolumeClaim apiVersion: v1 metadata: name: nfs-pvc - namespace: bigdl-qlora-finetuning + namespace: ipex-llm-qlora-finetuning spec: accessModes: - ReadWriteOnce diff --git a/docker/llm/finetune/qlora/cpu/kubernetes/values.yaml b/docker/llm/finetune/qlora/cpu/kubernetes/values.yaml index b195f203..083f6584 100644 --- a/docker/llm/finetune/qlora/cpu/kubernetes/values.yaml +++ b/docker/llm/finetune/qlora/cpu/kubernetes/values.yaml @@ -1,4 +1,4 @@ -imageName: intelanalytics/bigdl-llm-finetune-qlora-cpu-k8s:2.5.0-SNAPSHOT +imageName: intelanalytics/ipex-llm-finetune-qlora-cpu-k8s:2.5.0-SNAPSHOT trainerNum: 2 microBatchSize: 8 enableGradientCheckpoint: false # true will save more memory but increase latency diff --git a/docker/llm/finetune/qlora/xpu/docker/Dockerfile b/docker/llm/finetune/qlora/xpu/docker/Dockerfile index 415581ad..25bc65d2 100644 --- a/docker/llm/finetune/qlora/xpu/docker/Dockerfile +++ b/docker/llm/finetune/qlora/xpu/docker/Dockerfile @@ -28,15 +28,15 @@ RUN curl -fsSL https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-P ln -s /usr/bin/python3 /usr/bin/python && \ apt-get install -y python3-pip python3.9-dev python3-wheel python3.9-distutils && \ curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py && \ - # install XPU bigdl-llm - pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu && \ + # install XPU ipex-llm + pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu && \ # install huggingface dependencies pip install git+https://github.com/huggingface/transformers.git@${TRANSFORMERS_COMMIT_ID} && \ pip install peft==0.5.0 datasets accelerate==0.23.0 && \ pip install bitsandbytes scipy && \ - git clone https://github.com/intel-analytics/BigDL.git && \ - mv BigDL/python/llm/example/GPU/LLM-Finetuning/common /common && \ - rm -r BigDL && \ - wget https://raw.githubusercontent.com/intel-analytics/BigDL/main/python/llm/example/GPU/LLM-Finetuning/QLoRA/simple-example/qlora_finetuning.py + git clone https://github.com/intel-analytics/IPEX-LLM.git && \ + mv IPEX-LLM/python/llm/example/GPU/LLM-Finetuning/common /common && \ + rm -r IPEX-LLM && \ + wget https://raw.githubusercontent.com/intel-analytics/IPEX-LLM/main/python/llm/example/GPU/LLM-Finetuning/QLoRA/simple-example/qlora_finetuning.py COPY ./start-qlora-finetuning-on-xpu.sh /start-qlora-finetuning-on-xpu.sh diff --git a/docker/llm/finetune/qlora/xpu/docker/README.md b/docker/llm/finetune/qlora/xpu/docker/README.md index 13e5fbab..56926293 100644 --- a/docker/llm/finetune/qlora/xpu/docker/README.md +++ b/docker/llm/finetune/qlora/xpu/docker/README.md @@ -1,13 +1,13 @@ -## Fine-tune LLM with BigDL LLM Container +## Fine-tune LLM with IPEX LLM Container -The following shows how to fine-tune LLM with Quantization (QLoRA built on BigDL-LLM 4bit optimizations) in a docker environment, which is accelerated by Intel XPU. +The following shows how to fine-tune LLM with Quantization (QLoRA built on IPEX-LLM 4bit optimizations) in a docker environment, which is accelerated by Intel XPU. ### 1. Prepare Docker Image You can download directly from Dockerhub like: ```bash -docker pull intelanalytics/bigdl-llm-finetune-qlora-xpu:2.5.0-SNAPSHOT +docker pull intelanalytics/ipex-llm-finetune-qlora-xpu:2.5.0-SNAPSHOT ``` Or build the image from source: @@ -19,7 +19,7 @@ export HTTPS_PROXY=your_https_proxy docker build \ --build-arg http_proxy=${HTTP_PROXY} \ --build-arg https_proxy=${HTTPS_PROXY} \ - -t intelanalytics/bigdl-llm-finetune-qlora-xpu:2.5.0-SNAPSHOT \ + -t intelanalytics/ipex-llm-finetune-qlora-xpu:2.5.0-SNAPSHOT \ -f ./Dockerfile . ``` @@ -37,13 +37,13 @@ docker run -itd \ --net=host \ --device=/dev/dri \ --memory="32G" \ - --name=bigdl-llm-fintune-qlora-xpu \ + --name=ipex-llm-fintune-qlora-xpu \ -e http_proxy=${HTTP_PROXY} \ -e https_proxy=${HTTPS_PROXY} \ -v $BASE_MODE_PATH:/model \ -v $DATA_PATH:/data/alpaca-cleaned \ --shm-size="16g" \ - intelanalytics/bigdl-llm-fintune-qlora-xpu:2.5.0-SNAPSHOT + intelanalytics/ipex-llm-fintune-qlora-xpu:2.5.0-SNAPSHOT ``` The download and mount of base model and data to a docker container demonstrates a standard fine-tuning process. You can skip this step for a quick start, and in this way, the fine-tuning codes will automatically download the needed files: @@ -56,11 +56,11 @@ docker run -itd \ --net=host \ --device=/dev/dri \ --memory="32G" \ - --name=bigdl-llm-fintune-qlora-xpu \ + --name=ipex-llm-fintune-qlora-xpu \ -e http_proxy=${HTTP_PROXY} \ -e https_proxy=${HTTPS_PROXY} \ --shm-size="16g" \ - intelanalytics/bigdl-llm-fintune-qlora-xpu:2.5.0-SNAPSHOT + intelanalytics/ipex-llm-fintune-qlora-xpu:2.5.0-SNAPSHOT ``` However, we do recommend you to handle them manually, because the automatical download can be blocked by Internet access and Huggingface authentication etc. according to different environment, and the manual method allows you to fine-tune in a custom way (with different base model and dataset). @@ -70,7 +70,7 @@ However, we do recommend you to handle them manually, because the automatical do Enter the running container: ```bash -docker exec -it bigdl-llm-fintune-qlora-xpu bash +docker exec -it ipex-llm-fintune-qlora-xpu bash ``` Then, start QLoRA fine-tuning: diff --git a/docker/llm/inference/cpu/docker/Dockerfile b/docker/llm/inference/cpu/docker/Dockerfile index c12189fd..f8a302f7 100644 --- a/docker/llm/inference/cpu/docker/Dockerfile +++ b/docker/llm/inference/cpu/docker/Dockerfile @@ -24,18 +24,18 @@ RUN env DEBIAN_FRONTEND=noninteractive apt-get update && \ rm get-pip.py && \ pip install --upgrade requests argparse urllib3 && \ pip3 install --no-cache-dir --upgrade torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu && \ - pip install --pre --upgrade bigdl-llm[all] && \ -# Download bigdl-llm-tutorial + pip install --pre --upgrade ipex-llm[all] && \ +# Download ipex-llm-tutorial cd /llm && \ pip install --upgrade jupyterlab && \ - git clone https://github.com/intel-analytics/bigdl-llm-tutorial && \ + git clone https://github.com/intel-analytics/ipex-llm-tutorial && \ chmod +x /llm/start-notebook.sh && \ # Download all-in-one benchmark - git clone https://github.com/intel-analytics/BigDL && \ - cp -r ./BigDL/python/llm/dev/benchmark/ ./benchmark && \ + git clone https://github.com/intel-analytics/IPEX-LLM && \ + cp -r ./IPEX-LLM/python/llm/dev/benchmark/ ./benchmark && \ # Copy chat.py script pip install --upgrade colorama && \ - cp -r ./BigDL/python/llm/portable-zip/ ./portable-zip && \ + cp -r ./IPEX-LLM/python/llm/portable-zip/ ./portable-zip && \ # Install all-in-one dependencies apt-get install -y numactl && \ pip install --upgrade omegaconf && \ @@ -46,13 +46,13 @@ RUN env DEBIAN_FRONTEND=noninteractive apt-get update && \ # Add Qwen support pip install --upgrade transformers_stream_generator einops && \ # Copy vLLM-Serving - cp -r ./BigDL/python/llm/example/CPU/vLLM-Serving/ ./vLLM-Serving && \ - rm -rf ./BigDL && \ + cp -r ./IPEX-LLM/python/llm/example/CPU/vLLM-Serving/ ./vLLM-Serving && \ + rm -rf ./IPEX-LLM && \ # Fix vllm service pip install pydantic==1.10.11 && \ -# Install bigdl-llm +# Install ipex-llm cd /llm && \ - pip install --pre --upgrade bigdl-llm[all] && \ + pip install --pre --upgrade ipex-llm[all] && \ # Fix CVE-2024-22195 pip install Jinja2==3.1.3 && \ pip install torch==2.2.0 torchvision==0.17.0 torchaudio==2.2.0 --index-url https://download.pytorch.org/whl/cpu && \ diff --git a/docker/llm/inference/cpu/docker/README.md b/docker/llm/inference/cpu/docker/README.md index cbe6c8ae..ba817747 100644 --- a/docker/llm/inference/cpu/docker/README.md +++ b/docker/llm/inference/cpu/docker/README.md @@ -1,4 +1,4 @@ -## Build/Use BigDL-LLM cpu image +## Build/Use IPEX-LLM cpu image ### Build Image ```bash @@ -6,7 +6,7 @@ docker build \ --build-arg http_proxy=.. \ --build-arg https_proxy=.. \ --build-arg no_proxy=.. \ - --rm --no-cache -t intelanalytics/bigdl-llm-cpu:2.5.0-SNAPSHOT . + --rm --no-cache -t intelanalytics/ipex-llm-cpu:2.5.0-SNAPSHOT . ``` @@ -16,7 +16,7 @@ docker build \ An example could be: ```bash #/bin/bash -export DOCKER_IMAGE=intelanalytics/bigdl-llm-cpu:2.5.0-SNAPSHOT +export DOCKER_IMAGE=intelanalytics/ipex-llm-cpu:2.5.0-SNAPSHOT sudo docker run -itd \ --net=host \ @@ -31,7 +31,7 @@ sudo docker run -itd \ After the container is booted, you could get into the container through `docker exec`. -To run inference using `BigDL-LLM` using cpu, you could refer to this [documentation](https://github.com/intel-analytics/BigDL/tree/main/python/llm#cpu-int4). +To run inference using `IPEX-LLM` using cpu, you could refer to this [documentation](https://github.com/intel-analytics/IPEX-LLM/tree/main/python/llm#cpu-int4). ### Use chat.py @@ -41,7 +41,7 @@ You can download models and bind the model directory from host machine to contai Here is an example: ```bash -export DOCKER_IMAGE=intelanalytics/bigdl-llm-cpu:2.5.0-SNAPSHOT +export DOCKER_IMAGE=intelanalytics/ipex-llm-cpu:2.5.0-SNAPSHOT export MODEL_PATH=/home/llm/models sudo docker run -itd \ @@ -65,4 +65,4 @@ In the example above, it can be: ```bash cd /llm python chat.py --model-path /llm/models/MODEL_NAME -``` \ No newline at end of file +``` diff --git a/docker/llm/inference/cpu/docker/start-notebook.sh b/docker/llm/inference/cpu/docker/start-notebook.sh index f3aedda8..85ebcd55 100644 --- a/docker/llm/inference/cpu/docker/start-notebook.sh +++ b/docker/llm/inference/cpu/docker/start-notebook.sh @@ -1,7 +1,7 @@ #!/bin/bash # -# Copyright 2016 The BigDL Authors. +# Copyright 2016 The IPEX-LLM Authors. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -29,4 +29,4 @@ while [ $# -gt 0 ]; do shift done -jupyter-lab --notebook-dir=/llm/bigdl-llm-tutorial --ip=0.0.0.0 --port=$port --no-browser --NotebookApp.token=$token --allow-root \ No newline at end of file +jupyter-lab --notebook-dir=/llm/ipex-llm-tutorial --ip=0.0.0.0 --port=$port --no-browser --NotebookApp.token=$token --allow-root diff --git a/docker/llm/inference/xpu/docker/Dockerfile b/docker/llm/inference/xpu/docker/Dockerfile index a74e00f3..266515db 100644 --- a/docker/llm/inference/xpu/docker/Dockerfile +++ b/docker/llm/inference/xpu/docker/Dockerfile @@ -20,7 +20,7 @@ RUN curl -fsSL https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-P wget -qO - https://repositories.intel.com/graphics/intel-graphics.key | gpg --dearmor --output /usr/share/keyrings/intel-graphics.gpg && \ echo 'deb [arch=amd64,i386 signed-by=/usr/share/keyrings/intel-graphics.gpg] https://repositories.intel.com/graphics/ubuntu jammy arc' | tee /etc/apt/sources.list.d/intel.gpu.jammy.list && \ rm /etc/apt/sources.list.d/intel-graphics.list && \ - # Install PYTHON 3.9 and BigDL-LLM[xpu] + # Install PYTHON 3.9 and IPEX-LLM[xpu] ln -snf /usr/share/zoneinfo/$TZ /etc/localtime && echo $TZ > /etc/timezone && \ env DEBIAN_FRONTEND=noninteractive apt-get update && \ apt install software-properties-common libunwind8-dev vim less -y && \ @@ -35,7 +35,7 @@ RUN curl -fsSL https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-P python3 get-pip.py && \ rm get-pip.py && \ pip install --upgrade requests argparse urllib3 && \ - pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu && \ + pip install --pre --upgrade ipex-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu && \ # Fix Trivy CVE Issues pip install transformers==4.36.2 && \ pip install transformers_stream_generator einops tiktoken && \ @@ -48,6 +48,6 @@ RUN curl -fsSL https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-P pip install --upgrade fastapi && \ pip install --upgrade "uvicorn[standard]" && \ # Download vLLM-Serving - git clone https://github.com/intel-analytics/BigDL && \ - cp -r ./BigDL/python/llm/example/GPU/vLLM-Serving/ ./vLLM-Serving && \ - rm -rf ./BigDL + git clone https://github.com/intel-analytics/IPEX-LLM && \ + cp -r ./IPEX-LLM/python/llm/example/GPU/vLLM-Serving/ ./vLLM-Serving && \ + rm -rf ./IPEX-LLM diff --git a/docker/llm/inference/xpu/docker/README.md b/docker/llm/inference/xpu/docker/README.md index e63d751a..c17787e6 100644 --- a/docker/llm/inference/xpu/docker/README.md +++ b/docker/llm/inference/xpu/docker/README.md @@ -1,4 +1,4 @@ -## Build/Use BigDL-LLM xpu image +## Build/Use IPEX-LLM xpu image ### Build Image ```bash @@ -6,7 +6,7 @@ docker build \ --build-arg http_proxy=.. \ --build-arg https_proxy=.. \ --build-arg no_proxy=.. \ - --rm --no-cache -t intelanalytics/bigdl-llm-xpu:2.5.0-SNAPSHOT . + --rm --no-cache -t intelanalytics/ipex-llm-xpu:2.5.0-SNAPSHOT . ``` @@ -17,7 +17,7 @@ To map the `xpu` into the container, you need to specify `--device=/dev/dri` whe An example could be: ```bash #/bin/bash -export DOCKER_IMAGE=intelanalytics/bigdl-llm-xpu:2.5.0-SNAPSHOT +export DOCKER_IMAGE=intelanalytics/ipex-llm-xpu:2.5.0-SNAPSHOT sudo docker run -itd \ --net=host \ @@ -42,4 +42,4 @@ root@arda-arc12:/# sycl-ls ``` -To run inference using `BigDL-LLM` using xpu, you could refer to this [documentation](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU). +To run inference using `IPEX-LLM` using xpu, you could refer to this [documentation](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU). diff --git a/docker/llm/serving/cpu/docker/Dockerfile b/docker/llm/serving/cpu/docker/Dockerfile index 8346502d..6c5c4684 100644 --- a/docker/llm/serving/cpu/docker/Dockerfile +++ b/docker/llm/serving/cpu/docker/Dockerfile @@ -1,4 +1,4 @@ -FROM intelanalytics/bigdl-llm-cpu:2.5.0-SNAPSHOT +FROM intelanalytics/ipex-llm-cpu:2.5.0-SNAPSHOT ARG http_proxy ARG https_proxy @@ -12,7 +12,7 @@ COPY ./model_adapter.py.patch /llm/model_adapter.py.patch ADD https://github.com/krallin/tini/releases/download/${TINI_VERSION}/tini /sbin/tini # Install Serving Dependencies RUN cd /llm && \ - pip install --pre --upgrade bigdl-llm[serving] && \ + pip install --pre --upgrade ipex-llm[serving] && \ # Fix Trivy CVE Issues pip install Jinja2==3.1.3 transformers==4.36.2 gradio==4.19.2 cryptography==42.0.4 && \ # Fix Qwen model adpater in fastchat diff --git a/docker/llm/serving/cpu/docker/README.md b/docker/llm/serving/cpu/docker/README.md index bea11636..ec1b011d 100644 --- a/docker/llm/serving/cpu/docker/README.md +++ b/docker/llm/serving/cpu/docker/README.md @@ -1,4 +1,4 @@ -## Build/Use BigDL-LLM-serving cpu image +## Build/Use IPEX-LLM-serving cpu image ### Build Image ```bash @@ -6,7 +6,7 @@ docker build \ --build-arg http_proxy=.. \ --build-arg https_proxy=.. \ --build-arg no_proxy=.. \ - --rm --no-cache -t intelanalytics/bigdl-llm-serving-cpu:2.5.0-SNAPSHOT . + --rm --no-cache -t intelanalytics/ipex-llm-serving-cpu:2.5.0-SNAPSHOT . ``` ### Use the image for doing cpu serving @@ -16,7 +16,7 @@ You could use the following bash script to start the container. Please be noted ```bash #/bin/bash -export DOCKER_IMAGE=intelanalytics/bigdl-llm-serving-cpu:2.5.0-SNAPSHOT +export DOCKER_IMAGE=intelanalytics/ipex-llm-serving-cpu:2.5.0-SNAPSHOT sudo docker run -itd \ --net=host \ @@ -30,13 +30,13 @@ sudo docker run -itd \ After the container is booted, you could get into the container through `docker exec`. -To run model-serving using `BigDL-LLM` as backend, you can refer to this [document](https://github.com/intel-analytics/BigDL/tree/main/python/llm/src/bigdl/llm/serving). +To run model-serving using `IPEX-LLM` as backend, you can refer to this [document](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/src/ipex/llm/serving). Also you can set environment variables and start arguments while running a container to get serving started initially. You may need to boot several containers to support. One controller container and at least one worker container are needed. The api server address(host and port) and controller address are set in controller container, and you need to set the same controller address as above, model path on your machine and worker address in worker container. To start a controller container: ```bash #/bin/bash -export DOCKER_IMAGE=intelanalytics/bigdl-llm-serving-cpu:2.5.0-SNAPSHOT +export DOCKER_IMAGE=intelanalytics/ipex-llm-serving-cpu:2.5.0-SNAPSHOT controller_host=localhost controller_port=23000 api_host=localhost @@ -59,7 +59,7 @@ sudo docker run -itd \ To start a worker container: ```bash #/bin/bash -export DOCKER_IMAGE=intelanalytics/bigdl-llm-serving-cpu:2.5.0-SNAPSHOT +export DOCKER_IMAGE=intelanalytics/ipex-llm-serving-cpu:2.5.0-SNAPSHOT export MODEL_PATH=YOUR_MODEL_PATH controller_host=localhost controller_port=23000 @@ -94,4 +94,4 @@ curl -X POST -H "Content-Type: application/json" -d '{ "use_beam_search": false, "stream": false }' http://localhost:8000/v1/completions -``` \ No newline at end of file +``` diff --git a/docker/llm/serving/cpu/docker/entrypoint.sh b/docker/llm/serving/cpu/docker/entrypoint.sh index 36217dd2..2a137499 100644 --- a/docker/llm/serving/cpu/docker/entrypoint.sh +++ b/docker/llm/serving/cpu/docker/entrypoint.sh @@ -196,8 +196,8 @@ else else # Logic for non-controller(worker) mode worker_address="http://$worker_host:$worker_port" - # Apply optimizations from bigdl-llm - source bigdl-llm-init -t + # Apply optimizations from ipex-llm + source ipex-llm-init -t # First check if user have set OMP_NUM_THREADS by themselves if [[ -n "${omp_num_threads}" ]]; then echo "Setting OMP_NUM_THREADS to its original value: $omp_num_threads" diff --git a/docker/llm/serving/cpu/kubernetes/README.md b/docker/llm/serving/cpu/kubernetes/README.md index 5c08e00d..f8d745d3 100644 --- a/docker/llm/serving/cpu/kubernetes/README.md +++ b/docker/llm/serving/cpu/kubernetes/README.md @@ -1,8 +1,8 @@ -## Deployment bigdl-llm serving service in K8S environment +## Deployment ipex-llm serving service in K8S environment ## Image -To deploy BigDL-LLM-serving cpu in Kubernetes environment, please use this image: `intelanalytics/bigdl-llm-serving-cpu:2.5.0-SNAPSHOT` +To deploy IPEX-LLM-serving cpu in Kubernetes environment, please use this image: `intelanalytics/ipex-llm-serving-cpu:2.5.0-SNAPSHOT` ## Before deployment @@ -10,12 +10,10 @@ To deploy BigDL-LLM-serving cpu in Kubernetes environment, please use this image In this document, we will use `vicuna-7b-v1.5` as the deployment model. -After downloading the model, please change name from `vicuna-7b-v1.5` to `vicuna-7b-v1.5-bigdl` to use `bigdl-llm` as the backend. The `bigdl-llm` backend will be used if model path contains `bigdl`. Otherwise, the original transformer-backend will be used. +After downloading the model, please change name from `vicuna-7b-v1.5` to `vicuna-7b-v1.5-ipex` to use `ipex-llm` as the backend. The `ipex-llm` backend will be used if model path contains `ipex-llm`. Otherwise, the original transformer-backend will be used. You can download the model from [here](https://huggingface.co/lmsys/vicuna-7b-v1.5). -For ChatGLM models, users do not need to add `bigdl` into model path. We have already used the `BigDL-LLM` backend for this model. - ### Kubernetes config We recommend to setup your kubernetes cluster before deployment. Mostly importantly, please set `cpu-management-policy` to `static` by using this [tutorial](https://kubernetes.io/docs/tasks/administer-cluster/cpu-management-policies/). Also, it would be great to also set the `topology management policy` to `single-numa-node`. @@ -67,7 +65,7 @@ We use the following yaml file for controller deployment: apiVersion: v1 kind: Pod metadata: - name: bigdl-fschat-a1234bd-controller + name: ipex-llm-fschat-a1234bd-controller labels: fastchat-appid: a1234bd fastchat-app-type: controller @@ -75,7 +73,7 @@ spec: dnsPolicy: "ClusterFirst" containers: - name: fastchat-controller # fixed - image: intelanalytics/bigdl-llm-serving-cpu:2.5.0-SNAPSHOT + image: intelanalytics/ipex-llm-serving-cpu:2.5.0-SNAPSHOT imagePullPolicy: IfNotPresent env: - name: CONTROLLER_HOST # fixed @@ -107,7 +105,7 @@ spec: apiVersion: v1 kind: Service metadata: - name: bigdl-a1234bd-fschat-controller-service + name: ipex-llm-a1234bd-fschat-controller-service spec: # You may also want to change this to use the cluster's feature type: NodePort @@ -133,7 +131,7 @@ We use the following deployment for worker deployment: apiVersion: apps/v1 kind: Deployment metadata: - name: bigdl-fschat-a1234bd-worker-deployment + name: ipex-llm-fschat-a1234bd-worker-deployment spec: # Change this to the number you want replicas: 1 @@ -148,11 +146,11 @@ spec: dnsPolicy: "ClusterFirst" containers: - name: fastchat-worker # fixed - image: intelanalytics/bigdl-llm-serving-cpu:2.5.0-SNAPSHOT + image: intelanalytics/ipex-llm-serving-cpu:2.5.0-SNAPSHOT imagePullPolicy: IfNotPresent env: - name: CONTROLLER_HOST # fixed - value: bigdl-a1234bd-fschat-controller-service + value: ipex-llm-a1234bd-fschat-controller-service - name: CONTROLLER_PORT # fixed value: "21005" - name: WORKER_HOST # fixed @@ -162,7 +160,7 @@ spec: - name: WORKER_PORT # fixed value: "21841" - name: MODEL_PATH - value: "/llm/models/vicuna-7b-v1.5-bigdl/" # change this to your model + value: "/llm/models/vicuna-7b-v1.5-ipex-llm/" # change this to your model - name: OMP_NUM_THREADS value: "16" resources: @@ -190,7 +188,7 @@ You may want to change the `MODEL_PATH` variable in the yaml. Also, please reme We have set port using `GRADIO_PORT` envrionment variable in `deployment.yaml`, you can use this command ```bash -k port-forward bigdl-fschat-a1234bd-controller --address 0.0.0.0 8002:8002 +k port-forward ipex-llm-fschat-a1234bd-controller --address 0.0.0.0 8002:8002 ``` Then visit http://YOUR_HOST_IP:8002 to access ui. @@ -209,14 +207,14 @@ First, install openai-python: pip install --upgrade openai ``` -Then, interact with model vicuna-7b-v1.5-bigdl: +Then, interact with model vicuna-7b-v1.5-ipex-llm: ```python import openai openai.api_key = "EMPTY" openai.api_base = "http://localhost:8000/v1" -model = "vicuna-7b-v1.5-bigdl" +model = "vicuna-7b-v1.5-ipex-llm" prompt = "Once upon a time" # create a completion diff --git a/docker/llm/serving/cpu/kubernetes/deployment.yaml b/docker/llm/serving/cpu/kubernetes/deployment.yaml index 1c58f811..71c3ad42 100644 --- a/docker/llm/serving/cpu/kubernetes/deployment.yaml +++ b/docker/llm/serving/cpu/kubernetes/deployment.yaml @@ -16,7 +16,7 @@ spec: apiVersion: v1 kind: Pod metadata: - name: bigdl-fschat-a1234bd-controller + name: ipex-llm-fschat-a1234bd-controller labels: fastchat-appid: a1234bd fastchat-app-type: controller @@ -24,7 +24,7 @@ spec: dnsPolicy: "ClusterFirst" containers: - name: fastchat-controller # fixed - image: intelanalytics/bigdl-llm-serving-cpu:2.5.0-SNAPSHOT + image: intelanalytics/ipex-llm-serving-cpu:2.5.0-SNAPSHOT imagePullPolicy: IfNotPresent env: - name: CONTROLLER_HOST # fixed @@ -56,7 +56,7 @@ spec: apiVersion: v1 kind: Service metadata: - name: bigdl-a1234bd-fschat-controller-service + name: ipex-llm-a1234bd-fschat-controller-service spec: # You may also want to change this to use the cluster's feature type: NodePort @@ -76,7 +76,7 @@ spec: apiVersion: apps/v1 kind: Deployment metadata: - name: bigdl-fschat-a1234bd-worker-deployment + name: ipex-llm-fschat-a1234bd-worker-deployment spec: # Change this to the number you want replicas: 1 @@ -91,11 +91,11 @@ spec: dnsPolicy: "ClusterFirst" containers: - name: fastchat-worker # fixed - image: intelanalytics/bigdl-llm-serving-cpu:2.5.0-SNAPSHOT + image: intelanalytics/ipex-llm-serving-cpu:2.5.0-SNAPSHOT imagePullPolicy: IfNotPresent env: - name: CONTROLLER_HOST # fixed - value: bigdl-a1234bd-fschat-controller-service + value: ipex-llm-a1234bd-fschat-controller-service - name: CONTROLLER_PORT # fixed value: "21005" - name: WORKER_HOST # fixed @@ -105,7 +105,7 @@ spec: - name: WORKER_PORT # fixed value: "21841" - name: MODEL_PATH - value: "/llm/models/vicuna-7b-v1.5-bigdl/" # change this to your model + value: "/llm/models/vicuna-7b-v1.5-ipex-llm/" # change this to your model - name: OMP_NUM_THREADS value: "16" resources: @@ -123,4 +123,4 @@ spec: volumes: - name: llm-models persistentVolumeClaim: - claimName: models-pvc \ No newline at end of file + claimName: models-pvc diff --git a/docker/llm/serving/xpu/docker/Dockerfile b/docker/llm/serving/xpu/docker/Dockerfile index 42922562..87cb85b8 100644 --- a/docker/llm/serving/xpu/docker/Dockerfile +++ b/docker/llm/serving/xpu/docker/Dockerfile @@ -1,4 +1,4 @@ -FROM intelanalytics/bigdl-llm-xpu:2.5.0-SNAPSHOT +FROM intelanalytics/ipex-llm-xpu:2.5.0-SNAPSHOT ARG http_proxy ARG https_proxy @@ -10,7 +10,7 @@ COPY ./entrypoint.sh /opt/entrypoint.sh # Install Serving Dependencies RUN cd /llm && \ - pip install --pre --upgrade bigdl-llm[serving] && \ + pip install --pre --upgrade ipex-llm[serving] && \ pip install transformers==4.36.2 gradio==4.19.2 && \ chmod +x /opt/entrypoint.sh diff --git a/docker/llm/serving/xpu/docker/README.md b/docker/llm/serving/xpu/docker/README.md index 85b266bc..16822dff 100644 --- a/docker/llm/serving/xpu/docker/README.md +++ b/docker/llm/serving/xpu/docker/README.md @@ -1,4 +1,4 @@ -## Build/Use BigDL-LLM-serving xpu image +## Build/Use IPEX-LLM-serving xpu image ### Build Image ```bash @@ -6,7 +6,7 @@ docker build \ --build-arg http_proxy=.. \ --build-arg https_proxy=.. \ --build-arg no_proxy=.. \ - --rm --no-cache -t intelanalytics/bigdl-llm-serving-xpu:2.5.0-SNAPSHOT . + --rm --no-cache -t intelanalytics/ipex-llm-serving-xpu:2.5.0-SNAPSHOT . ``` @@ -18,7 +18,7 @@ To map the `xpu` into the container, you need to specify `--device=/dev/dri` whe An example could be: ```bash #/bin/bash -export DOCKER_IMAGE=intelanalytics/bigdl-llm-serving-xpu:2.5.0-SNAPSHOT +export DOCKER_IMAGE=intelanalytics/ipex-llm-serving-xpu:2.5.0-SNAPSHOT sudo docker run -itd \ --net=host \ @@ -43,4 +43,4 @@ root@arda-arc12:/# sycl-ls ``` After the container is booted, you could get into the container through `docker exec`. -To run model-serving using `BigDL-LLM` as backend, you can refer to this [document](https://github.com/intel-analytics/BigDL/tree/main/python/llm/src/bigdl/llm/serving). +To run model-serving using `IPEX-LLM` as backend, you can refer to this [document](https://github.com/intel-analytics/IPEX-LLM/tree/main/python/llm/src/ipex_llm/serving).