Update k8s doc - remove env from yaml (#7670)

This commit is contained in:
Kai Huang 2023-02-23 18:29:22 +08:00 committed by GitHub
parent 48f5144a34
commit 51b8ff3728

View file

@ -138,16 +138,15 @@ def train_data_creator(config, batch_size):
### 2 Pull Docker Image ### 2 Pull Docker Image
Please pull the BigDL [`bigdl-k8s`](https://hub.docker.com/r/intelanalytics/bigdl-k8s/tags) image (built on top of Spark 3.1.3) from Docker Hub beforehand as follows: Please pull the BigDL [`bigdl-k8s`](https://hub.docker.com/r/intelanalytics/bigdl-k8s/tags) image (built on top of Spark 3.1.3) from Docker Hub beforehand as follows:
```bash ```bash
# For the release version, e.g. 2.2.0
sudo docker pull intelanalytics/bigdl-k8s:version
# For the latest nightly build version # For the latest nightly build version
sudo docker pull intelanalytics/bigdl-k8s:latest sudo docker pull intelanalytics/bigdl-k8s:latest
# For the release version, e.g. 2.2.0
sudo docker pull intelanalytics/bigdl-k8s:2.2.0
``` ```
In the BigDL K8s Docker image: * The environment for Spark (including SPARK_VERSION and SPARK_HOME) and BigDL (including BIGDL_VERSION and BIGDL_HOME) are already configured in the BigDL K8s Docker image.
- Spark is located at `/opt/spark`. Spark version is 3.1.3. * Spark executor containers are scheduled by K8s at runtime and you don't need to create them manually.
- BigDL is located at `/opt/bigdl-VERSION`. For the latest nightly build image, BigDL version would be `xxx-SNAPSHOT` (e.g. 2.3.0-SNAPSHOT).
--- ---
## 3. Create BigDL K8s Container ## 3. Create BigDL K8s Container
@ -168,29 +167,25 @@ sudo docker run -itd --net=host \
-e https_proxy=https://your-proxy-host:your-proxy-port \ -e https_proxy=https://your-proxy-host:your-proxy-port \
-e RUNTIME_SPARK_MASTER=k8s://https://<k8s-apiserver-host>:<k8s-apiserver-port> \ -e RUNTIME_SPARK_MASTER=k8s://https://<k8s-apiserver-host>:<k8s-apiserver-port> \
-e RUNTIME_K8S_SERVICE_ACCOUNT=spark \ -e RUNTIME_K8S_SERVICE_ACCOUNT=spark \
-e RUNTIME_K8S_SPARK_IMAGE=intelanalytics/bigdl-k8s:latest \ -e RUNTIME_K8S_SPARK_IMAGE=intelanalytics/bigdl-k8s:version \
-e RUNTIME_PERSISTENT_VOLUME_CLAIM=nfsvolumeclaim \ -e RUNTIME_PERSISTENT_VOLUME_CLAIM=nfsvolumeclaim \
-e RUNTIME_DRIVER_HOST=${RUNTIME_DRIVER_HOST} \ -e RUNTIME_DRIVER_HOST=${RUNTIME_DRIVER_HOST} \
intelanalytics/bigdl-k8s:latest bash intelanalytics/bigdl-k8s:version bash
``` ```
In the script: In the script:
* **Please switch the version tag according to the BigDL K8s Docker image you pull.** * **Please modify the version tag according to the BigDL K8s Docker image you pull.**
* **Please make sure you are mounting the correct Volume path (e.g. NFS) into the container.** * **Please make sure you are mounting the correct Volume path (e.g. NFS) into the container.**
* `--net=host`: use the host network stack for the Docker container. * `--net=host`: use the host network stack for the Docker container.
* `-v /etc/kubernetes:/etc/kubernetes`: specify the path of Kubernetes configurations to mount into the Docker container. * `-v /etc/kubernetes:/etc/kubernetes`: specify the path of Kubernetes configurations to mount into the Docker container.
* `-v /root/.kube:/root/.kube`: specify the path of Kubernetes installation to mount into the Docker container. * `-v /root/.kube:/root/.kube`: specify the path of Kubernetes installation to mount into the Docker container.
* `-v /path/to/nfsdata:/bigdl/nfsdata`: mount NFS path on the host into the Docker container as the specified path (e.g. "/bigdl/nfsdata"). * `-v /path/to/nfsdata:/bigdl/nfsdata`: mount NFS path on the host into the Docker container as the specified path (e.g. "/bigdl/nfsdata").
* `RUNTIME_SPARK_MASTER`: a URL format that specifies the Spark master: `k8s://https://<k8s-apiserver-host>:<k8s-apiserver-port>`. * `RUNTIME_SPARK_MASTER`: a URL format that specifies the Spark master: `k8s://https://<k8s-apiserver-host>:<k8s-apiserver-port>`.
* `RUNTIME_K8S_SERVICE_ACCOUNT`: a string that specifies the service account for the driver pod. * `RUNTIME_K8S_SERVICE_ACCOUNT`: the service account for the driver pod.
* `RUNTIME_K8S_SPARK_IMAGE`: the name of the BigDL K8s Docker image. Note that you need to change the version accordingly. * `RUNTIME_K8S_SPARK_IMAGE`: the name of the BigDL K8s Docker image. Note that you need to change the version accordingly.
* `RUNTIME_PERSISTENT_VOLUME_CLAIM`: a string that specifies the Kubernetes volumeName (e.g. "nfsvolumeclaim"). * `RUNTIME_PERSISTENT_VOLUME_CLAIM`: the Kubernetes volumeName (e.g. "nfsvolumeclaim").
* `RUNTIME_DRIVER_HOST`: a URL format that specifies the driver localhost (only required if you use k8s-client mode). * `RUNTIME_DRIVER_HOST`: a URL format that specifies the driver localhost (only required if you use k8s-client mode).
__Notes:__
* The __Client Container__ already contains all the required environment configurations for Spark and BigDL Orca.
* Spark executor containers are scheduled by K8s at runtime and you don't need to create them manually.
### 3.2 Launch the K8s Client Container ### 3.2 Launch the K8s Client Container
Once the container is created, a `containerID` would be returned and with which you can enter the container following the command below: Once the container is created, a `containerID` would be returned and with which you can enter the container following the command below:
@ -512,7 +507,7 @@ We define a Kubernetes Deployment in a YAML file. Some fields of the YAML are ex
#### 7.3.1 K8s Client #### 7.3.1 K8s Client
BigDL has provided an example [orca-tutorial-k8s-client.yaml](https://github.com/intel-analytics/BigDL/blob/main/python/orca/tutorial/pytorch/docker/orca-tutorial-client.yaml) to directly run the Fashion-MNIST example for k8s-client mode. BigDL has provided an example [orca-tutorial-k8s-client.yaml](https://github.com/intel-analytics/BigDL/blob/main/python/orca/tutorial/pytorch/docker/orca-tutorial-client.yaml) to directly run the Fashion-MNIST example for k8s-client mode.
Note that you need to change the configurations in the YAML file accordingly, including the version of the Docker image, RUNTIME_SPARK_MASTER, BIGDL_VERSION and BIGDL_HOME. The environment variables for Spark (including SPARK_VERSION and SPARK_HOME) and BigDL (including BIGDL_VERSION and BIGDL_HOME) are already configured in the BigDL K8s Docker image.
You need to uncompress the conda archive in NFS before submitting the job: You need to uncompress the conda archive in NFS before submitting the job:
```bash ```bash
@ -521,7 +516,7 @@ mkdir environment
tar -xzvf environment.tar.gz --directory environment tar -xzvf environment.tar.gz --directory environment
``` ```
orca-tutorial-k8s-client.yaml *orca-tutorial-k8s-client.yaml*
```bash ```bash
apiVersion: batch/v1 apiVersion: batch/v1
@ -543,7 +538,7 @@ spec:
export RUNTIME_DRIVER_HOST=$( hostname -I | awk '{print $1}' ); export RUNTIME_DRIVER_HOST=$( hostname -I | awk '{print $1}' );
${SPARK_HOME}/bin/spark-submit \ ${SPARK_HOME}/bin/spark-submit \
--master ${RUNTIME_SPARK_MASTER} \ --master ${RUNTIME_SPARK_MASTER} \
--deploy-mode ${SPARK_MODE} \ --deploy-mode client \
--name orca-k8s-client-tutorial \ --name orca-k8s-client-tutorial \
--conf spark.driver.host=${RUNTIME_DRIVER_HOST} \ --conf spark.driver.host=${RUNTIME_DRIVER_HOST} \
--conf spark.kubernetes.container.image=${RUNTIME_K8S_SPARK_IMAGE} \ --conf spark.kubernetes.container.image=${RUNTIME_K8S_SPARK_IMAGE} \
@ -557,9 +552,9 @@ spec:
--conf spark.pyspark.python=/bigdl/nfsdata/environment/bin/python \ --conf spark.pyspark.python=/bigdl/nfsdata/environment/bin/python \
--properties-file ${BIGDL_HOME}/conf/spark-bigdl.conf \ --properties-file ${BIGDL_HOME}/conf/spark-bigdl.conf \
--py-files ${BIGDL_HOME}/python/bigdl-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,/bigdl/nfsdata/model.py \ --py-files ${BIGDL_HOME}/python/bigdl-spark_${SPARK_VERSION}-${BIGDL_VERSION}-python-api.zip,/bigdl/nfsdata/model.py \
--conf spark.kubernetes.executor.deleteOnTermination=True \
--conf spark.driver.extraClassPath=${BIGDL_HOME}/jars/* \ --conf spark.driver.extraClassPath=${BIGDL_HOME}/jars/* \
--conf spark.executor.extraClassPath=${BIGDL_HOME}/jars/* \ --conf spark.executor.extraClassPath=${BIGDL_HOME}/jars/* \
--conf spark.kubernetes.executor.deleteOnTermination=True \
--conf spark.kubernetes.driver.volumes.persistentVolumeClaim.nfsvolumeclaim.options.claimName=nfsvolumeclaim \ --conf spark.kubernetes.driver.volumes.persistentVolumeClaim.nfsvolumeclaim.options.claimName=nfsvolumeclaim \
--conf spark.kubernetes.driver.volumes.persistentVolumeClaim.nfsvolumeclaim.mount.path=/bigdl/nfsdata/ \ --conf spark.kubernetes.driver.volumes.persistentVolumeClaim.nfsvolumeclaim.mount.path=/bigdl/nfsdata/ \
--conf spark.kubernetes.executor.volumes.persistentVolumeClaim.nfsvolumeclaim.options.claimName=nfsvolumeclaim \ --conf spark.kubernetes.executor.volumes.persistentVolumeClaim.nfsvolumeclaim.options.claimName=nfsvolumeclaim \
@ -575,16 +570,6 @@ spec:
value: intelanalytics/bigdl-k8s:latest value: intelanalytics/bigdl-k8s:latest
- name: RUNTIME_SPARK_MASTER - name: RUNTIME_SPARK_MASTER
value: k8s://https://<k8s-apiserver-host>:<k8s-apiserver-port> value: k8s://https://<k8s-apiserver-host>:<k8s-apiserver-port>
- name: SPARK_MODE
value: client
- name: SPARK_VERSION
value: 3.1.3
- name: SPARK_HOME
value: /opt/spark
- name: BIGDL_VERSION
value: 2.2.0-SNAPSHOT
- name: BIGDL_HOME
value: /opt/bigdl-2.2.0-SNAPSHOT
volumeMounts: volumeMounts:
- name: nfs-storage - name: nfs-storage
mountPath: /bigdl/nfsdata mountPath: /bigdl/nfsdata
@ -626,9 +611,9 @@ kubectl delete job orca-pytorch-job
#### 7.3.2 K8s Cluster #### 7.3.2 K8s Cluster
BigDL has provided an example [orca-tutorial-k8s-cluster.yaml](https://github.com/intel-analytics/BigDL/blob/main/python/orca/tutorial/pytorch/docker/orca-tutorial-cluster.yaml) to run the Fashion-MNIST example for k8s-cluster mode. BigDL has provided an example [orca-tutorial-k8s-cluster.yaml](https://github.com/intel-analytics/BigDL/blob/main/python/orca/tutorial/pytorch/docker/orca-tutorial-cluster.yaml) to run the Fashion-MNIST example for k8s-cluster mode.
Note that you need to change the configurations in the YAML file accordingly, including the version of the Docker image, RUNTIME_SPARK_MASTER, BIGDL_VERSION and BIGDL_HOME. The environment variables for Spark (including SPARK_VERSION and SPARK_HOME) and BigDL (including BIGDL_VERSION and BIGDL_HOME) are already configured in the BigDL K8s Docker image.
orca-tutorial-k8s-cluster.yaml *orca-tutorial-k8s-cluster.yaml*
```bash ```bash
apiVersion: batch/v1 apiVersion: batch/v1
@ -650,7 +635,7 @@ spec:
${SPARK_HOME}/bin/spark-submit \ ${SPARK_HOME}/bin/spark-submit \
--master ${RUNTIME_SPARK_MASTER} \ --master ${RUNTIME_SPARK_MASTER} \
--name orca-k8s-cluster-tutorial \ --name orca-k8s-cluster-tutorial \
--deploy-mode ${SPARK_MODE} \ --deploy-mode cluster \
--conf spark.kubernetes.container.image=${RUNTIME_K8S_SPARK_IMAGE} \ --conf spark.kubernetes.container.image=${RUNTIME_K8S_SPARK_IMAGE} \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=${RUNTIME_K8S_SERVICE_ACCOUNT} \ --conf spark.kubernetes.authenticate.driver.serviceAccountName=${RUNTIME_K8S_SERVICE_ACCOUNT} \
--num-executors 2 \ --num-executors 2 \
@ -685,16 +670,6 @@ spec:
value: k8s://https://<k8s-apiserver-host>:<k8s-apiserver-port> value: k8s://https://<k8s-apiserver-host>:<k8s-apiserver-port>
- name: RUNTIME_K8S_SERVICE_ACCOUNT - name: RUNTIME_K8S_SERVICE_ACCOUNT
value: spark value: spark
- name: SPARK_MODE
value: cluster
- name: SPARK_VERSION
value: 3.1.3
- name: SPARK_HOME
value: /opt/spark
- name: BIGDL_VERSION
value: 2.2.0-SNAPSHOT
- name: BIGDL_HOME
value: /opt/bigdl-2.2.0-SNAPSHOT
volumeMounts: volumeMounts:
- name: nfs-storage - name: nfs-storage
mountPath: /bigdl/nfsdata mountPath: /bigdl/nfsdata