refine and verify ipex-inference-cpu docker document (#10565)
* restructure the index * refine and verify cpu-inference document * update
This commit is contained in:
parent
0136fad1d4
commit
52f1b541cf
2 changed files with 150 additions and 132 deletions
|
|
@ -1,148 +1,166 @@
|
||||||
# Getting started with IPEX-LLM in Docker
|
# Getting started with IPEX-LLM in Docker
|
||||||
|
|
||||||
|
This guide provides step-by-step instructions for installing and using IPEX-LLM in a Docker environment. It covers setups for both CPU and XPU (accelerated processing units) on different operating systems.
|
||||||
|
|
||||||
### Index
|
### Index
|
||||||
- [Docker installation guide for IPEX-LLM on CPU](#docker-installation-guide-for-ipex-llm-on-cpu)
|
- [Docker Installation](#docker-installation-instructions)
|
||||||
- [IPEX-LLM on Windows](#ipex-llm-on-windows)
|
- [IPEX-LLM Inference](#ipex-llm-inference-on-cpu)
|
||||||
- [IPEX-LLM on Linux/MacOS](#ipex-llm-on-linuxmacos)
|
- [On CPU](#ipex-llm-inference-on-cpu)
|
||||||
- [Docker installation guide for IPEX LLM on XPU](#docker-installation-guide-for-ipex-llm-on-xpu)
|
- [On XPU](#ipex-llm-inference-on-xpu)
|
||||||
- [Docker installation guide for IPEX LLM Serving on CPU](#docker-installation-guide-for-ipex-llm-serving-on-cpu)
|
- [IPEX-LLM Serving](#ipex-llm-serving-on-cpu)
|
||||||
- [Docker installation guide for IPEX LLM Serving on XPU](#docker-installation-guide-for-ipex-llm-serving-on-xpu)
|
- [On CPU](#ipex-llm-serving-on-cpu)
|
||||||
- [Docker installation guide for IPEX LLM Fine Tuning on CPU](#docker-installation-guide-for-ipex-llm-fine-tuning-on-cpu)
|
- [On XPU](#ipex-llm-serving-on-xpu)
|
||||||
- [Docker installation guide for IPEX LLM Fine Tuning on XPU](#docker-installation-guide-for-ipex-llm-fine-tuning-on-xpu)
|
- [IPEX-LLM Fine Tuning](#ipex-llm-fine-tuning-on-cpu)
|
||||||
|
- [On CPU](#ipex-llm-fine-tuning-on-cpu)
|
||||||
## Docker installation guide for IPEX-LLM on CPU
|
- [On XPU](#ipex-llm-fine-tuning-on-xpu)
|
||||||
|
|
||||||
### IPEX-LLM on Windows
|
|
||||||
|
|
||||||
#### Install docker
|
|
||||||
|
|
||||||
New users can quickly get started with Docker using this [official link](https://www.docker.com/get-started/).
|
|
||||||
|
|
||||||
For Windows users, make sure WSL2 or Hyper-V is enabled on your computer.
|
|
||||||
The instructions for installing can be accessed from
|
|
||||||
[here](https://docs.docker.com/desktop/install/windows-install/).
|
|
||||||
|
|
||||||
|
|
||||||
#### Pull ipex-llm-cpu image
|
## Docker Installation Instructions
|
||||||
|
|
||||||
To pull image from hub, you can execute command on console:
|
1. **For New Users:**
|
||||||
|
- Begin by visiting the [official Docker Get Started page](https://www.docker.com/get-started/) for a comprehensive introduction and installation guide.
|
||||||
|
|
||||||
|
2. **Additional Steps for Windows Users:**
|
||||||
|
- For Windows Users, follow the step-by-step guide: [Docker Installation Instructions for Windows](https://github.com/intel-analytics/ipex-llm/blob/main/docs/readthedocs/source/doc/LLM/Quickstart/docker_windows_gpu.md).
|
||||||
|
|
||||||
|
|
||||||
|
## IPEX-LLM Inference on CPU
|
||||||
|
|
||||||
|
### 1. Prepare ipex-llm-cpu Docker Image
|
||||||
|
|
||||||
|
Run the following command to pull image from dockerhub:
|
||||||
```bash
|
```bash
|
||||||
docker pull intelanalytics/ipex-llm-cpu:2.5.0-SNAPSHOT
|
docker pull intelanalytics/ipex-llm-cpu:2.1.0-SNAPSHOT
|
||||||
```
|
|
||||||
to check if the image is successfully downloaded, you can use:
|
|
||||||
```powershell
|
|
||||||
docker images | sls intelanalytics/ipex-llm-cpu:2.5.0-SNAPSHOT
|
|
||||||
```
|
```
|
||||||
|
|
||||||
|
### 2. Start bigdl-llm-cpu Docker Container
|
||||||
|
|
||||||
#### Start ipex-llm-cpu container
|
|
||||||
|
|
||||||
To run the image and do inference, you could create and run a bat script on Windows.
|
|
||||||
|
|
||||||
An example on Windows could be:
|
|
||||||
```bat
|
|
||||||
@echo off
|
|
||||||
set DOCKER_IMAGE=intelanalytics/ipex-llm-cpu:2.5.0-SNAPSHOT
|
|
||||||
set CONTAINER_NAME=my_container
|
|
||||||
set MODEL_PATH=D:/llm/models[change to your model path]
|
|
||||||
|
|
||||||
:: Run the Docker container
|
|
||||||
docker run -itd ^
|
|
||||||
-p 12345:12345 ^
|
|
||||||
--cpuset-cpus="0-7" ^
|
|
||||||
--cpuset-mems="0" ^
|
|
||||||
--memory="8G" ^
|
|
||||||
--name=%CONTAINER_NAME% ^
|
|
||||||
-v %MODEL_PATH%:/llm/models ^
|
|
||||||
%DOCKER_IMAGE%
|
|
||||||
```
|
|
||||||
|
|
||||||
After the container is booted, you could get into the container through `docker exec`.
|
|
||||||
```
|
|
||||||
docker exec -it my_container bash
|
|
||||||
```
|
|
||||||
|
|
||||||
To run inference using `IPEX-LLM` using cpu, you could refer to this [documentation](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm#cpu-int4).
|
|
||||||
|
|
||||||
|
|
||||||
#### Getting started with chat
|
|
||||||
|
|
||||||
chat.py can be used to initiate a conversation with a specified model. The file is under directory '/llm'.
|
|
||||||
|
|
||||||
You can download models and bind the model directory from host machine to container when start a container.
|
|
||||||
|
|
||||||
After entering the container through `docker exec`, you can run chat.py by:
|
|
||||||
```bash
|
|
||||||
cd /llm/portable-zip
|
|
||||||
python chat.py --model-path YOUR_MODEL_PATH
|
|
||||||
```
|
|
||||||
If your model is chatglm-6b and mounted on /llm/models, you can excute:
|
|
||||||
```bash
|
|
||||||
python chat.py --model-path /llm/models/chatglm-6b
|
|
||||||
```
|
|
||||||
Here is a demostration:
|
|
||||||
|
|
||||||
<a align="left" href="https://llm-assets.readthedocs.io/en/latest/_images/llm-inference-cpu-docker-chatpy-demo.gif">
|
|
||||||
<img src="https://llm-assets.readthedocs.io/en/latest/_images/llm-inference-cpu-docker-chatpy-demo.gif" width='60%' />
|
|
||||||
|
|
||||||
</a>
|
|
||||||
|
|
||||||
#### Getting started with tutorials
|
|
||||||
|
|
||||||
You could start a jupyter-lab serving to explore ipex-llm-tutorial which can help you build a more sophisticated Chatbo.
|
|
||||||
|
|
||||||
To start serving, run the script under '/llm':
|
|
||||||
```bash
|
|
||||||
cd /llm
|
|
||||||
./start-notebook.sh [--port EXPECTED_PORT]
|
|
||||||
```
|
|
||||||
You could assign a port to serving, or the default port 12345 will be assigned.
|
|
||||||
|
|
||||||
If you use host network mode when booted the container, after successfully running service, you can access http://127.0.0.1:12345/lab to get into tutorial, or you should bind the correct ports between container and host.
|
|
||||||
|
|
||||||
Here is a demostration of how to use tutorial in explorer:
|
|
||||||
|
|
||||||
<a align="left" href="https://llm-assets.readthedocs.io/en/latest/_images/llm-inference-cpu-docker-tutorial-demo.gif">
|
|
||||||
<img src="https://llm-assets.readthedocs.io/en/latest/_images/llm-inference-cpu-docker-tutorial-demo.gif" width='60%' />
|
|
||||||
|
|
||||||
</a>
|
|
||||||
|
|
||||||
### IPEX-LLM on Linux/MacOS
|
|
||||||
|
|
||||||
To run container on Linux/MacOS:
|
|
||||||
```bash
|
```bash
|
||||||
#/bin/bash
|
#/bin/bash
|
||||||
export DOCKER_IMAGE=intelanalytics/ipex-llm-cpu:2.5.0-SNAPSHOT
|
export DOCKER_IMAGE=intelanalytics/ipex-llm-cpu:2.1.0-SNAPSHOT
|
||||||
export CONTAINER_NAME=my_container
|
export CONTAINER_NAME=my_container
|
||||||
export MODEL_PATH=/llm/models[change to your model path]
|
export MODEL_PATH=/llm/models[change to your model path]
|
||||||
|
|
||||||
docker run -itd \
|
docker run -itd \
|
||||||
|
--privileged \
|
||||||
-p 12345:12345 \
|
-p 12345:12345 \
|
||||||
--cpuset-cpus="0-7" \
|
--cpuset-cpus="0-47" \
|
||||||
--cpuset-mems="0" \
|
--cpuset-mems="0" \
|
||||||
--memory="8G" \
|
|
||||||
--name=$CONTAINER_NAME \
|
--name=$CONTAINER_NAME \
|
||||||
-v $MODEL_PATH:/llm/models \
|
-v $MODEL_PATH:/llm/models \
|
||||||
$DOCKER_IMAGE
|
$DOCKER_IMAGE
|
||||||
```
|
```
|
||||||
|
|
||||||
Also, you could use chat.py and ipex-llm-tutorial for development.
|
Access the container:
|
||||||
|
```
|
||||||
|
docker exec -it $CONTAINER_NAME bash
|
||||||
|
```
|
||||||
|
|
||||||
[Getting started with chat](#getting-started-with-chat)
|
### 3. Start Inference and Tutorials
|
||||||
|
**3.1 Chat Interface**: Use `chat.py` for conversational AI.
|
||||||
|
|
||||||
[Getting started with tutorials](#getting-started-with-tutorials)
|
For example, if your model is chatglm-6b and mounted on /llm/models, you can excute the following command to initiate a conversation:
|
||||||
|
```bash
|
||||||
|
cd /llm/portable-zip
|
||||||
|
python chat.py --model-path /llm/models/chatglm2-6b
|
||||||
|
```
|
||||||
|
Here is a demostration:
|
||||||
|
|
||||||
## Docker installation guide for IPEX LLM on XPU
|
<a align="left" href="https://llm-assets.readthedocs.io/en/latest/_images/llm-inference-cpu-docker-chatpy-demo.gif">
|
||||||
|
<img src="https://llm-assets.readthedocs.io/en/latest/_images/llm-inference-cpu-docker-chatpy-demo.gif" width='60%' />
|
||||||
|
|
||||||
|
</a><br>
|
||||||
|
|
||||||
|
**3.2 Jupyter Lab Tutorials**: Start a Jupyter Lab session for IPEX-LLM tutorials.
|
||||||
|
|
||||||
|
Run the following command to start notebook:
|
||||||
|
```bash
|
||||||
|
cd /llm
|
||||||
|
./start-notebook.sh [--port EXPECTED_PORT]
|
||||||
|
```
|
||||||
|
The default port is 12345, you could assign a different port by specifying the --port parameter.
|
||||||
|
|
||||||
|
If you're using the host network mode when booting the container, once the service is running successfully, you can access the tutorial at http://127.0.0.1:12345/lab. Alternatively, you need to ensure the correct ports are bound between the container and the host.
|
||||||
|
|
||||||
|
Here's a demonstration of how to navigate the tutorial in the explorer:
|
||||||
|
|
||||||
|
<a align="left" href="https://llm-assets.readthedocs.io/en/latest/_images/llm-inference-cpu-docker-tutorial-demo.gif">
|
||||||
|
<img src="https://llm-assets.readthedocs.io/en/latest/_images/llm-inference-cpu-docker-tutorial-demo.gif" width='60%' />
|
||||||
|
|
||||||
|
</a> <br>
|
||||||
|
|
||||||
|
**3.3 Performance Benchmark**: We provide a benchmark tool help users to test all the benchmarks and record them in a result CSV.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cd /llm//benchmark/all-in-one
|
||||||
|
```
|
||||||
|
|
||||||
|
Users can provide models and related information in config.yaml.
|
||||||
|
```bash
|
||||||
|
repo_id:
|
||||||
|
- 'THUDM/chatglm-6b'
|
||||||
|
- 'THUDM/chatglm2-6b'
|
||||||
|
- 'meta-llama/Llama-2-7b-chat-hf'
|
||||||
|
# - 'liuhaotian/llava-v1.5-7b' # requires a LLAVA_REPO_DIR env variables pointing to the llava dir; added only for gpu win related test_api now
|
||||||
|
local_model_hub: 'path to your local model hub'
|
||||||
|
warm_up: 1
|
||||||
|
num_trials: 3
|
||||||
|
num_beams: 1 # default to greedy search
|
||||||
|
low_bit: 'sym_int4' # default to use 'sym_int4' (i.e. symmetric int4)
|
||||||
|
batch_size: 1 # default to 1
|
||||||
|
in_out_pairs:
|
||||||
|
- '32-32'
|
||||||
|
- '1024-128'
|
||||||
|
test_api:
|
||||||
|
- "transformer_int4"
|
||||||
|
- "native_int4"
|
||||||
|
- "optimize_model"
|
||||||
|
- "pytorch_autocast_bf16"
|
||||||
|
# - "transformer_autocast_bf16"
|
||||||
|
# - "bigdl_ipex_bf16"
|
||||||
|
# - "bigdl_ipex_int4"
|
||||||
|
# - "bigdl_ipex_int8"
|
||||||
|
# - "ipex_fp16_gpu" # on Intel GPU
|
||||||
|
# - "bigdl_fp16_gpu" # on Intel GPU
|
||||||
|
# - "transformer_int4_gpu" # on Intel GPU
|
||||||
|
# - "optimize_model_gpu" # on Intel GPU
|
||||||
|
# - "deepspeed_transformer_int4_cpu" # on Intel SPR Server
|
||||||
|
# - "transformer_int4_gpu_win" # on Intel GPU for Windows
|
||||||
|
# - "transformer_int4_fp16_gpu_win" # on Intel GPU for Windows, use fp16 for non-linear layer
|
||||||
|
# - "transformer_int4_loadlowbit_gpu_win" # on Intel GPU for Windows using load_low_bit API. Please make sure you have used the save.py to save the converted low bit model
|
||||||
|
# - "deepspeed_optimize_model_gpu" # deepspeed autotp on Intel GPU
|
||||||
|
# - "speculative_cpu"
|
||||||
|
# - "speculative_gpu"
|
||||||
|
cpu_embedding: False # whether put embedding to CPU (only avaiable now for gpu win related test_api)
|
||||||
|
streaming: False # whether output in streaming way (only avaiable now for gpu win related test_api)
|
||||||
|
```
|
||||||
|
|
||||||
|
This benchmark tool offers various test APIs, including `transformer_int4`, `speculative_cpu`, and more.
|
||||||
|
|
||||||
|
For instance, if you wish to benchmark **inference with speculative decoding**, utilize the `speculative_cpu` test API in the `config.yml` file.
|
||||||
|
|
||||||
|
Then, execute `bash run-spr.sh`, which will generate output results in `results.csv`.
|
||||||
|
```bash
|
||||||
|
bash run-spr.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
For further details and comprehensive functionality of the benchmark tool, please refer to the [all-in-one benchmark tool](https://github.com/intel-analytics/BigDL/tree/main/python/llm/dev/benchmark/all-in-one).
|
||||||
|
|
||||||
|
Additionally, for examples related to Inference with Speculative Decoding, you can explore [Speculative-Decoding Examples](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/Speculative-Decoding).
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
## IPEX-LLM Inference on XPU
|
||||||
|
|
||||||
First, pull docker image from docker hub:
|
First, pull docker image from docker hub:
|
||||||
```
|
```
|
||||||
docker pull intelanalytics/ipex-llm-xpu:2.5.0-SNAPSHOT
|
docker pull intelanalytics/ipex-llm-xpu:2.1.0-SNAPSHOT
|
||||||
```
|
```
|
||||||
To map the xpu into the container, you need to specify --device=/dev/dri when booting the container.
|
To map the xpu into the container, you need to specify --device=/dev/dri when booting the container.
|
||||||
An example could be:
|
An example could be:
|
||||||
```bash
|
```bash
|
||||||
#/bin/bash
|
#/bin/bash
|
||||||
export DOCKER_IMAGE=intelanalytics/ipex-llm-xpu:2.5.0-SNAPSHOT
|
export DOCKER_IMAGE=intelanalytics/ipex-llm-xpu:2.1.0-SNAPSHOT
|
||||||
export CONTAINER_NAME=my_container
|
export CONTAINER_NAME=my_container
|
||||||
export MODEL_PATH=/llm/models[change to your model path]
|
export MODEL_PATH=/llm/models[change to your model path]
|
||||||
|
|
||||||
|
|
@ -170,18 +188,18 @@ root@arda-arc12:/# sycl-ls
|
||||||
|
|
||||||
To run inference using `IPEX-LLM` using xpu, you could refer to this [documentation](https://github.com/intel-analytics/IPEX/tree/main/python/llm/example/GPU).
|
To run inference using `IPEX-LLM` using xpu, you could refer to this [documentation](https://github.com/intel-analytics/IPEX/tree/main/python/llm/example/GPU).
|
||||||
|
|
||||||
## Docker installation guide for IPEX LLM Serving on CPU
|
## IPEX-LLM Serving on CPU
|
||||||
|
|
||||||
### Boot container
|
### Boot container
|
||||||
|
|
||||||
Pull image:
|
Pull image:
|
||||||
```
|
```
|
||||||
docker pull intelanalytics/ipex-llm-serving-cpu:2.5.0-SNAPSHOT
|
docker pull intelanalytics/ipex-llm-serving-cpu:2.1.0-SNAPSHOT
|
||||||
```
|
```
|
||||||
|
|
||||||
You could use the following bash script to start the container. Please be noted that the CPU config is specified for Xeon CPUs, change it accordingly if you are not using a Xeon CPU.
|
You could use the following bash script to start the container. Please be noted that the CPU config is specified for Xeon CPUs, change it accordingly if you are not using a Xeon CPU.
|
||||||
```bash
|
```bash
|
||||||
export DOCKER_IMAGE=intelanalytics/ipex-llm-serving-cpu:2.5.0-SNAPSHOT
|
export DOCKER_IMAGE=intelanalytics/ipex-llm-serving-cpu:2.1.0-SNAPSHOT
|
||||||
export CONTAINER_NAME=my_container
|
export CONTAINER_NAME=my_container
|
||||||
export MODEL_PATH=/llm/models[change to your model path]
|
export MODEL_PATH=/llm/models[change to your model path]
|
||||||
|
|
||||||
|
|
@ -259,13 +277,13 @@ python3 -m fastchat.serve.openai_api_server --host localhost --port 8000
|
||||||
```
|
```
|
||||||
|
|
||||||
|
|
||||||
## Docker installation guide for IPEX LLM Serving on XPU
|
## IPEX-LLM Serving on XPU
|
||||||
|
|
||||||
### Boot container
|
### Boot container
|
||||||
|
|
||||||
Pull image:
|
Pull image:
|
||||||
```
|
```
|
||||||
docker pull intelanalytics/ipex-llm-serving-xpu:2.5.0-SNAPSHOT
|
docker pull intelanalytics/ipex-llm-serving-xpu:2.1.0-SNAPSHOT
|
||||||
```
|
```
|
||||||
|
|
||||||
To map the `xpu` into the container, you need to specify `--device=/dev/dri` when booting the container.
|
To map the `xpu` into the container, you need to specify `--device=/dev/dri` when booting the container.
|
||||||
|
|
@ -273,7 +291,7 @@ To map the `xpu` into the container, you need to specify `--device=/dev/dri` whe
|
||||||
An example could be:
|
An example could be:
|
||||||
```bash
|
```bash
|
||||||
#/bin/bash
|
#/bin/bash
|
||||||
export DOCKER_IMAGE=intelanalytics/ipex-llm-serving-cpu:2.5.0-SNAPSHOT
|
export DOCKER_IMAGE=intelanalytics/ipex-llm-serving-cpu:2.1.0-SNAPSHOT
|
||||||
export CONTAINER_NAME=my_container
|
export CONTAINER_NAME=my_container
|
||||||
export MODEL_PATH=/llm/models[change to your model path]
|
export MODEL_PATH=/llm/models[change to your model path]
|
||||||
export SERVICE_MODEL_PATH=/llm/models/chatglm2-6b[a specified model path for running service]
|
export SERVICE_MODEL_PATH=/llm/models/chatglm2-6b[a specified model path for running service]
|
||||||
|
|
@ -352,7 +370,7 @@ Finally, launch the RESTful API server
|
||||||
python3 -m fastchat.serve.openai_api_server --host localhost --port 8000
|
python3 -m fastchat.serve.openai_api_server --host localhost --port 8000
|
||||||
```
|
```
|
||||||
|
|
||||||
## Docker installation guide for IPEX LLM Fine Tuning on CPU
|
## IPEX-LLM Fine Tuning on CPU
|
||||||
|
|
||||||
### 1. Prepare Docker Image
|
### 1. Prepare Docker Image
|
||||||
|
|
||||||
|
|
@ -360,10 +378,10 @@ You can download directly from Dockerhub like:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
# For standalone
|
# For standalone
|
||||||
docker pull intelanalytics/ipex-llm-finetune-qlora-cpu-standalone:2.5.0-SNAPSHOT
|
docker pull intelanalytics/ipex-llm-finetune-qlora-cpu-standalone:2.1.0-SNAPSHOT
|
||||||
|
|
||||||
# For k8s
|
# For k8s
|
||||||
docker pull intelanalytics/ipex-llm-finetune-qlora-cpu-k8s:2.5.0-SNAPSHOT
|
docker pull intelanalytics/ipex-llm-finetune-qlora-cpu-k8s:2.1.0-SNAPSHOT
|
||||||
```
|
```
|
||||||
|
|
||||||
Or build the image from source:
|
Or build the image from source:
|
||||||
|
|
@ -376,7 +394,7 @@ export HTTPS_PROXY=your_https_proxy
|
||||||
docker build \
|
docker build \
|
||||||
--build-arg http_proxy=${HTTP_PROXY} \
|
--build-arg http_proxy=${HTTP_PROXY} \
|
||||||
--build-arg https_proxy=${HTTPS_PROXY} \
|
--build-arg https_proxy=${HTTPS_PROXY} \
|
||||||
-t intelanalytics/ipex-llm-finetune-qlora-cpu-standalone:2.5.0-SNAPSHOT \
|
-t intelanalytics/ipex-llm-finetune-qlora-cpu-standalone:2.1.0-SNAPSHOT \
|
||||||
-f ./Dockerfile .
|
-f ./Dockerfile .
|
||||||
|
|
||||||
# For k8s
|
# For k8s
|
||||||
|
|
@ -386,7 +404,7 @@ export HTTPS_PROXY=your_https_proxy
|
||||||
docker build \
|
docker build \
|
||||||
--build-arg http_proxy=${HTTP_PROXY} \
|
--build-arg http_proxy=${HTTP_PROXY} \
|
||||||
--build-arg https_proxy=${HTTPS_PROXY} \
|
--build-arg https_proxy=${HTTPS_PROXY} \
|
||||||
-t intelanalytics/ipex-llm-finetune-qlora-cpu-k8s:2.5.0-SNAPSHOT \
|
-t intelanalytics/ipex-llm-finetune-qlora-cpu-k8s:2.1.0-SNAPSHOT \
|
||||||
-f ./Dockerfile.k8s .
|
-f ./Dockerfile.k8s .
|
||||||
```
|
```
|
||||||
|
|
||||||
|
|
@ -407,7 +425,7 @@ docker run -itd \
|
||||||
-e https_proxy=${HTTPS_PROXY} \
|
-e https_proxy=${HTTPS_PROXY} \
|
||||||
-v $BASE_MODE_PATH:/ipex_llm/model \
|
-v $BASE_MODE_PATH:/ipex_llm/model \
|
||||||
-v $DATA_PATH:/ipex_llm/data/alpaca-cleaned \
|
-v $DATA_PATH:/ipex_llm/data/alpaca-cleaned \
|
||||||
intelanalytics/ipex-llm-finetune-qlora-cpu-standalone:2.5.0-SNAPSHOT
|
intelanalytics/ipex-llm-finetune-qlora-cpu-standalone:2.1.0-SNAPSHOT
|
||||||
```
|
```
|
||||||
|
|
||||||
The download and mount of base model and data to a docker container demonstrates a standard fine-tuning process. You can skip this step for a quick start, and in this way, the fine-tuning codes will automatically download the needed files:
|
The download and mount of base model and data to a docker container demonstrates a standard fine-tuning process. You can skip this step for a quick start, and in this way, the fine-tuning codes will automatically download the needed files:
|
||||||
|
|
@ -421,7 +439,7 @@ docker run -itd \
|
||||||
--name=ipex-llm-fintune-qlora-cpu \
|
--name=ipex-llm-fintune-qlora-cpu \
|
||||||
-e http_proxy=${HTTP_PROXY} \
|
-e http_proxy=${HTTP_PROXY} \
|
||||||
-e https_proxy=${HTTPS_PROXY} \
|
-e https_proxy=${HTTPS_PROXY} \
|
||||||
intelanalytics/ipex-llm-finetune-qlora-cpu-standalone:2.5.0-SNAPSHOT
|
intelanalytics/ipex-llm-finetune-qlora-cpu-standalone:2.1.0-SNAPSHOT
|
||||||
```
|
```
|
||||||
|
|
||||||
However, we do recommend you to handle them manually, because the automatical download can be blocked by Internet access and Huggingface authentication etc. according to different environment, and the manual method allows you to fine-tune in a custom way (with different base model and dataset).
|
However, we do recommend you to handle them manually, because the automatical download can be blocked by Internet access and Huggingface authentication etc. according to different environment, and the manual method allows you to fine-tune in a custom way (with different base model and dataset).
|
||||||
|
|
@ -470,7 +488,7 @@ python ./export_merged_model.py --repo-id-or-model-path REPO_ID_OR_MODEL_PATH --
|
||||||
|
|
||||||
Then you can use `./outputs/checkpoint-200-merged` as a normal huggingface transformer model to do inference.
|
Then you can use `./outputs/checkpoint-200-merged` as a normal huggingface transformer model to do inference.
|
||||||
|
|
||||||
## Docker installation guide for IPEX LLM Fine Tuning on XPU
|
## IPEX-LLM Fine Tuning on XPU
|
||||||
|
|
||||||
The following shows how to fine-tune LLM with Quantization (QLoRA built on IPEX-LLM 4bit optimizations) in a docker environment, which is accelerated by Intel XPU.
|
The following shows how to fine-tune LLM with Quantization (QLoRA built on IPEX-LLM 4bit optimizations) in a docker environment, which is accelerated by Intel XPU.
|
||||||
|
|
||||||
|
|
@ -479,7 +497,7 @@ The following shows how to fine-tune LLM with Quantization (QLoRA built on IPEX-
|
||||||
You can download directly from Dockerhub like:
|
You can download directly from Dockerhub like:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
docker pull intelanalytics/ipex-llm-finetune-qlora-xpu:2.5.0-SNAPSHOT
|
docker pull intelanalytics/ipex-llm-finetune-qlora-xpu:2.1.0-SNAPSHOT
|
||||||
```
|
```
|
||||||
|
|
||||||
Or build the image from source:
|
Or build the image from source:
|
||||||
|
|
@ -491,7 +509,7 @@ export HTTPS_PROXY=your_https_proxy
|
||||||
docker build \
|
docker build \
|
||||||
--build-arg http_proxy=${HTTP_PROXY} \
|
--build-arg http_proxy=${HTTP_PROXY} \
|
||||||
--build-arg https_proxy=${HTTPS_PROXY} \
|
--build-arg https_proxy=${HTTPS_PROXY} \
|
||||||
-t intelanalytics/ipex-llm-finetune-qlora-xpu:2.5.0-SNAPSHOT \
|
-t intelanalytics/ipex-llm-finetune-qlora-xpu:2.1.0-SNAPSHOT \
|
||||||
-f ./Dockerfile .
|
-f ./Dockerfile .
|
||||||
```
|
```
|
||||||
|
|
||||||
|
|
@ -515,7 +533,7 @@ docker run -itd \
|
||||||
-v $BASE_MODE_PATH:/model \
|
-v $BASE_MODE_PATH:/model \
|
||||||
-v $DATA_PATH:/data/alpaca-cleaned \
|
-v $DATA_PATH:/data/alpaca-cleaned \
|
||||||
--shm-size="16g" \
|
--shm-size="16g" \
|
||||||
intelanalytics/ipex-llm-fintune-qlora-xpu:2.5.0-SNAPSHOT
|
intelanalytics/ipex-llm-fintune-qlora-xpu:2.1.0-SNAPSHOT
|
||||||
```
|
```
|
||||||
|
|
||||||
The download and mount of base model and data to a docker container demonstrates a standard fine-tuning process. You can skip this step for a quick start, and in this way, the fine-tuning codes will automatically download the needed files:
|
The download and mount of base model and data to a docker container demonstrates a standard fine-tuning process. You can skip this step for a quick start, and in this way, the fine-tuning codes will automatically download the needed files:
|
||||||
|
|
@ -532,7 +550,7 @@ docker run -itd \
|
||||||
-e http_proxy=${HTTP_PROXY} \
|
-e http_proxy=${HTTP_PROXY} \
|
||||||
-e https_proxy=${HTTPS_PROXY} \
|
-e https_proxy=${HTTPS_PROXY} \
|
||||||
--shm-size="16g" \
|
--shm-size="16g" \
|
||||||
intelanalytics/ipex-llm-fintune-qlora-xpu:2.5.0-SNAPSHOT
|
intelanalytics/ipex-llm-fintune-qlora-xpu:2.1.0-SNAPSHOT
|
||||||
```
|
```
|
||||||
|
|
||||||
However, we do recommend you to handle them manually, because the automatical download can be blocked by Internet access and Huggingface authentication etc. according to different environment, and the manual method allows you to fine-tune in a custom way (with different base model and dataset).
|
However, we do recommend you to handle them manually, because the automatical download can be blocked by Internet access and Huggingface authentication etc. according to different environment, and the manual method allows you to fine-tune in a custom way (with different base model and dataset).
|
||||||
|
|
|
||||||
|
|
@ -6,7 +6,7 @@ docker build \
|
||||||
--build-arg http_proxy=.. \
|
--build-arg http_proxy=.. \
|
||||||
--build-arg https_proxy=.. \
|
--build-arg https_proxy=.. \
|
||||||
--build-arg no_proxy=.. \
|
--build-arg no_proxy=.. \
|
||||||
--rm --no-cache -t intelanalytics/ipex-llm-cpu:2.5.0-SNAPSHOT .
|
--rm --no-cache -t intelanalytics/ipex-llm-cpu:2.1.0-SNAPSHOT .
|
||||||
```
|
```
|
||||||
|
|
||||||
|
|
||||||
|
|
@ -16,7 +16,7 @@ docker build \
|
||||||
An example could be:
|
An example could be:
|
||||||
```bash
|
```bash
|
||||||
#/bin/bash
|
#/bin/bash
|
||||||
export DOCKER_IMAGE=intelanalytics/ipex-llm-cpu:2.5.0-SNAPSHOT
|
export DOCKER_IMAGE=intelanalytics/ipex-llm-cpu:2.1.0-SNAPSHOT
|
||||||
|
|
||||||
sudo docker run -itd \
|
sudo docker run -itd \
|
||||||
--net=host \
|
--net=host \
|
||||||
|
|
@ -41,7 +41,7 @@ You can download models and bind the model directory from host machine to contai
|
||||||
|
|
||||||
Here is an example:
|
Here is an example:
|
||||||
```bash
|
```bash
|
||||||
export DOCKER_IMAGE=intelanalytics/ipex-llm-cpu:2.5.0-SNAPSHOT
|
export DOCKER_IMAGE=intelanalytics/ipex-llm-cpu:2.1.0-SNAPSHOT
|
||||||
export MODEL_PATH=/home/llm/models
|
export MODEL_PATH=/home/llm/models
|
||||||
|
|
||||||
sudo docker run -itd \
|
sudo docker run -itd \
|
||||||
|
|
|
||||||
Loading…
Reference in a new issue