[PPML] Update documentation of PPML Occlum on Azure (#6372)
* update with Azure Marketplace * update * update
This commit is contained in:
		
							parent
							
								
									2629bb0645
								
							
						
					
					
						commit
						4dac556c68
					
				
					 2 changed files with 36 additions and 26 deletions
				
			
		| 
						 | 
					@ -29,7 +29,7 @@ az group create \
 | 
				
			||||||
 | 
					
 | 
				
			||||||
#### 2.2.2 Create Linux client with SGX support
 | 
					#### 2.2.2 Create Linux client with SGX support
 | 
				
			||||||
Create Linux VM through Azure [CLI](https://docs.microsoft.com/en-us/azure/developer/javascript/tutorial/nodejs-virtual-machine-vm/create-linux-virtual-machine-azure-cli)/[Portal](https://docs.microsoft.com/en-us/azure/virtual-machines/linux/quick-create-portal)/Powershell.
 | 
					Create Linux VM through Azure [CLI](https://docs.microsoft.com/en-us/azure/developer/javascript/tutorial/nodejs-virtual-machine-vm/create-linux-virtual-machine-azure-cli)/[Portal](https://docs.microsoft.com/en-us/azure/virtual-machines/linux/quick-create-portal)/Powershell.
 | 
				
			||||||
For size of the VM, please choose DC-V3 Series VM with more than 4 vCPU cores.
 | 
					For size of the VM, please choose DCSv3 Series VM with more than 4 vCPU cores.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
#### 2.2.3 Pull BigDL PPML image and run on Linux client
 | 
					#### 2.2.3 Pull BigDL PPML image and run on Linux client
 | 
				
			||||||
* Go to Azure Marketplace, search "BigDL PPML" and find `BigDL PPML: Secure Big Data AI on Intel SGX` product. Click "Create" button which will lead you to `Subscribe` page.
 | 
					* Go to Azure Marketplace, search "BigDL PPML" and find `BigDL PPML: Secure Big Data AI on Intel SGX` product. Click "Create" button which will lead you to `Subscribe` page.
 | 
				
			||||||
| 
						 | 
					@ -38,7 +38,7 @@ On `Subscribe` page, input your subscription, your Azure container registry, you
 | 
				
			||||||
* Go to your Azure container regsitry, check `Repostirories`, and find `intel_corporation/bigdl-ppml-trusted-big-data-ml-python-graphene`
 | 
					* Go to your Azure container regsitry, check `Repostirories`, and find `intel_corporation/bigdl-ppml-trusted-big-data-ml-python-graphene`
 | 
				
			||||||
* Login to the created VM. Then login to your Azure container registry, pull BigDL PPML image using this command:
 | 
					* Login to the created VM. Then login to your Azure container registry, pull BigDL PPML image using this command:
 | 
				
			||||||
  ```bash
 | 
					  ```bash
 | 
				
			||||||
  docker pull myContainerRegistry/intel_corporation/bigdl-ppml-trusted-big-data-ml-python-graphene
 | 
					  docker pull myContainerRegistry.azurecr.io/intel_corporation/bigdl-ppml-trusted-big-data-ml-python-graphene
 | 
				
			||||||
  ```
 | 
					  ```
 | 
				
			||||||
* Start container of this image
 | 
					* Start container of this image
 | 
				
			||||||
 | 
					
 | 
				
			||||||
| 
						 | 
					@ -46,7 +46,7 @@ On `Subscribe` page, input your subscription, your Azure container registry, you
 | 
				
			||||||
  #!/bin/bash
 | 
					  #!/bin/bash
 | 
				
			||||||
 | 
					
 | 
				
			||||||
  export LOCAL_IP=YOUR_LOCAL_IP
 | 
					  export LOCAL_IP=YOUR_LOCAL_IP
 | 
				
			||||||
  export DOCKER_IMAGE=intel_corporation/bigdl-ppml-trusted-big-data-ml-python-graphene
 | 
					  export DOCKER_IMAGE=myContainerRegistry.azurecr.io/intel_corporation/bigdl-ppml-trusted-big-data-ml-python-graphene
 | 
				
			||||||
 | 
					
 | 
				
			||||||
  sudo docker run -itd \
 | 
					  sudo docker run -itd \
 | 
				
			||||||
      --privileged \
 | 
					      --privileged \
 | 
				
			||||||
| 
						 | 
					@ -317,7 +317,7 @@ bash bigdl-ppml-submit.sh \
 | 
				
			||||||
	--num-executors 2 \
 | 
						--num-executors 2 \
 | 
				
			||||||
	--conf spark.cores.max=8 \
 | 
						--conf spark.cores.max=8 \
 | 
				
			||||||
    --name spark-decrypt-sgx \
 | 
					    --name spark-decrypt-sgx \
 | 
				
			||||||
    --conf spark.kubernetes.container.image=intelanalytics/bigdl-ppml-trusted-big-data-ml-python-graphene:$BIGDL_VERSION \
 | 
					    --conf spark.kubernetes.container.image=myContainerRegistry.azurecr.io/intel_corporation/bigdl-ppml-trusted-big-data-ml-python-graphene:$BIGDL_VERSION \
 | 
				
			||||||
    --conf spark.kubernetes.driver.podTemplateFile=/ppml/trusted-big-data-ml/azure/spark-driver-template-az.yaml \
 | 
					    --conf spark.kubernetes.driver.podTemplateFile=/ppml/trusted-big-data-ml/azure/spark-driver-template-az.yaml \
 | 
				
			||||||
    --conf spark.kubernetes.executor.podTemplateFile=/ppml/trusted-big-data-ml/azure/spark-executor-template-az.yaml \
 | 
					    --conf spark.kubernetes.executor.podTemplateFile=/ppml/trusted-big-data-ml/azure/spark-executor-template-az.yaml \
 | 
				
			||||||
    --jars local://$SPARK_EXTRA_JAR_PATH \
 | 
					    --jars local://$SPARK_EXTRA_JAR_PATH \
 | 
				
			||||||
| 
						 | 
					@ -439,7 +439,7 @@ bash bigdl-ppml-submit.sh \
 | 
				
			||||||
	--num-executors 2 \
 | 
						--num-executors 2 \
 | 
				
			||||||
	--conf spark.cores.max=8 \
 | 
						--conf spark.cores.max=8 \
 | 
				
			||||||
    --name spark-tpch-sgx \
 | 
					    --name spark-tpch-sgx \
 | 
				
			||||||
    --conf spark.kubernetes.container.image=intelanalytics/bigdl-ppml-trusted-big-data-ml-python-graphene:$BIGDL_VERSION \
 | 
					    --conf spark.kubernetes.container.image=myContainerRegistry.azurecr.io/intel_corporation/bigdl-ppml-trusted-big-data-ml-python-graphene:$BIGDL_VERSION \
 | 
				
			||||||
    --conf spark.kubernetes.driver.podTemplateFile=/ppml/trusted-big-data-ml/azure/spark-driver-template-az.yaml \
 | 
					    --conf spark.kubernetes.driver.podTemplateFile=/ppml/trusted-big-data-ml/azure/spark-driver-template-az.yaml \
 | 
				
			||||||
    --conf spark.kubernetes.executor.podTemplateFile=/ppml/trusted-big-data-ml/azure/spark-executor-template-az.yaml \
 | 
					    --conf spark.kubernetes.executor.podTemplateFile=/ppml/trusted-big-data-ml/azure/spark-executor-template-az.yaml \
 | 
				
			||||||
    --conf spark.sql.auto.repartition=true \
 | 
					    --conf spark.sql.auto.repartition=true \
 | 
				
			||||||
| 
						 | 
					
 | 
				
			||||||
| 
						 | 
					@ -22,55 +22,65 @@ Key points:
 | 
				
			||||||
## Prerequisites
 | 
					## Prerequisites
 | 
				
			||||||
 | 
					
 | 
				
			||||||
* Set up Azure VM on Azure
 | 
					* Set up Azure VM on Azure
 | 
				
			||||||
    * [DCsv3](https://docs.microsoft.com/en-us/azure/virtual-machines/dcv3-series) for [single node spark example](#single-node-spark-examples-on-azure).
 | 
					    * Create a [DCSv3](https://docs.microsoft.com/en-us/azure/virtual-machines/dcv3-series) VM for [single node spark example](#single-node-spark-examples-on-azure).
 | 
				
			||||||
    * [Azure Kubernetes Service (AKS)](https://azure.microsoft.com/en-us/services/kubernetes-service/) for [distributed Spark examples](#distributed-spark-example-on-aks).
 | 
					    * Prepare image of Spark (Required for distributed Spark examples only)
 | 
				
			||||||
* Prepare image of Spark
 | 
					      * Login to the created VM, then download [Spark 3.1.2](https://archive.apache.org/dist/spark/spark-3.1.2/spark-3.1.2-bin-hadoop3.2.tgz) and extract Spark binary. Install OpenJDK-8, and `export SPARK_HOME=${Spark_Binary_dir}`.
 | 
				
			||||||
* (Required for distributed Spark examples only) Download [Spark 3.1.2](https://archive.apache.org/dist/spark/spark-3.1.2/spark-3.1.2-bin-hadoop3.2.tgz) and extract Spark binary. Install OpenJDK-8, and `export SPARK_HOME=${Spark_Binary_dir}`.
 | 
					    * Go to Azure Marketplace, search "BigDL PPML" and find `BigDL PPML: Secure Big Data AI on Intel SGX (experimental and reference only, Occlum Edition)` product. Click "Create" button which will lead you to `Subscribe` page.
 | 
				
			||||||
 | 
					      On `Subscribe` page, input your subscription, your Azure container registry, your resource group and your location. Then click `Subscribe` to subscribe BigDL PPML Occlum to your container registry.
 | 
				
			||||||
Pull the image from [Dockerhub](https://hub.docker.com/r/intelanalytics/bigdl-ppml-azure-occlum).
 | 
					    * On the created VM, login to your Azure container registry, then pull BigDL PPML Occlum image using this command:
 | 
				
			||||||
 | 
					 | 
				
			||||||
    ```bash
 | 
					    ```bash
 | 
				
			||||||
docker pull intelanalytics/bigdl-ppml-azure-occlum:2.1.0
 | 
					    docker pull myContainerRegistry.azurecr.io/intel_corporation/bigdl-ppml-azure-occlum:latest
 | 
				
			||||||
 | 
					    ```
 | 
				
			||||||
 | 
					* Set up [Azure Kubernetes Service (AKS)](https://azure.microsoft.com/en-us/services/kubernetes-service/) for [distributed Spark examples](#distributed-spark-example-on-aks).
 | 
				
			||||||
 | 
					  * Follow the [guide](https://learn.microsoft.com/en-us/azure/confidential-computing/confidential-enclave-nodes-aks-get-started) to deploy an AKS with confidential computing Intel SGX nodes.
 | 
				
			||||||
 | 
					  * Install Azure CLI on the created VM or your local machine according to [Azure CLI guide](https://docs.microsoft.com/en-us/cli/azure/install-azure-cli).
 | 
				
			||||||
 | 
					  * Login to AKS with such command:
 | 
				
			||||||
 | 
					    ```bash
 | 
				
			||||||
 | 
					    az aks get-credentials --resource-group  myResourceGroup --name myAKSCluster
 | 
				
			||||||
 | 
					    ```
 | 
				
			||||||
 | 
					  * Create the RBAC to AKS
 | 
				
			||||||
 | 
					    ```bash
 | 
				
			||||||
 | 
					    kubectl create serviceaccount spark
 | 
				
			||||||
 | 
					    kubectl create clusterrolebinding spark-role --clusterrole=edit --serviceaccount=default:spark --namespace=default
 | 
				
			||||||
    ```
 | 
					    ```
 | 
				
			||||||
 | 
					 | 
				
			||||||
## Single Node Spark Examples on Azure
 | 
					## Single Node Spark Examples on Azure
 | 
				
			||||||
 | 
					 | 
				
			||||||
Single node Spark Examples require 1 Azure VM with SGX. All examples are running in SGX. You can apply it to your application with a few changes in dockerfile or scripts.
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
### SparkPi example
 | 
					### SparkPi example
 | 
				
			||||||
 | 
					
 | 
				
			||||||
Run the SparkPi example with `run_spark_on_occlum_glibc.sh`.
 | 
					On the VM, Run the SparkPi example with `run_spark_on_occlum_glibc.sh`.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
```bash
 | 
					```bash
 | 
				
			||||||
docker run --rm -it \
 | 
					docker run --rm -it \
 | 
				
			||||||
    --name=azure-ppml-example-with-occlum \
 | 
					    --name=azure-ppml-example-with-occlum \
 | 
				
			||||||
    --device=/dev/sgx/enclave \
 | 
					    --device=/dev/sgx/enclave \
 | 
				
			||||||
    --device=/dev/sgx/provision \
 | 
					    --device=/dev/sgx/provision \
 | 
				
			||||||
    intelanalytics/bigdl-ppml-azure-occlum:2.1.0 bash 
 | 
					    myContainerRegistry.azurecr.io/intel_corporation/bigdl-ppml-azure-occlum:latest bash 
 | 
				
			||||||
cd /opt
 | 
					cd /opt
 | 
				
			||||||
bash run_spark_on_occlum_glibc.sh pi
 | 
					bash run_spark_on_occlum_glibc.sh pi
 | 
				
			||||||
```
 | 
					```
 | 
				
			||||||
 | 
					
 | 
				
			||||||
### Nytaxi example with Azure NYTaxi
 | 
					### Nytaxi example with Azure NYTaxi
 | 
				
			||||||
 | 
					
 | 
				
			||||||
Run the Nytaxi example with `run_azure_nytaxi.sh`.
 | 
					On the VM, run the Nytaxi example with `run_azure_nytaxi.sh`.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
```bash
 | 
					```bash
 | 
				
			||||||
docker run --rm -it \
 | 
					docker run --rm -it \
 | 
				
			||||||
    --name=azure-ppml-example-with-occlum \
 | 
					    --name=azure-ppml-example-with-occlum \
 | 
				
			||||||
    --device=/dev/sgx/enclave \
 | 
					    --device=/dev/sgx/enclave \
 | 
				
			||||||
    --device=/dev/sgx/provision \
 | 
					    --device=/dev/sgx/provision \
 | 
				
			||||||
    intelanalytics/bigdl-ppml-azure-occlum:2.1.0 bash 
 | 
					    myContainerRegistry.azurecr.io/intel_corporation/bigdl-ppml-azure-occlum:latest bash 
 | 
				
			||||||
bash run_azure_nytaxi.sh
 | 
					bash run_azure_nytaxi.sh
 | 
				
			||||||
```
 | 
					```
 | 
				
			||||||
 | 
					
 | 
				
			||||||
You should get Nytaxi dataframe count and aggregation duration when succeed.
 | 
					You should get Nytaxi dataframe count and aggregation duration when succeed.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
## Distributed Spark Examples on AKS
 | 
					## Distributed Spark Examples on AKS
 | 
				
			||||||
 | 
					Clone the repository to the VM:
 | 
				
			||||||
 | 
					```bash
 | 
				
			||||||
 | 
					git clone https://github.com/intel-analytics/BigDL-PPML-Azure-Occlum-Example.git
 | 
				
			||||||
 | 
					```
 | 
				
			||||||
### SparkPi on AKS
 | 
					### SparkPi on AKS
 | 
				
			||||||
 | 
					
 | 
				
			||||||
Configure environment variables in `run_spark_pi.sh`, `driver.yaml` and `executor.yaml`. Then you can submit SparkPi task with `run_spark_pi.sh`.
 | 
					In `run_spark_pi.sh` script, update `IMAGE` variable to `myContainerRegistry.azurecr.io/intel_corporation/bigdl-ppml-azure-occlum:latest`, and configure your AKS address. In addition, configure environment variables in `driver.yaml` and `executor.yaml` too. Then you can submit SparkPi task with `run_spark_pi.sh`.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
```bash
 | 
					```bash
 | 
				
			||||||
bash run_spark_pi.sh
 | 
					bash run_spark_pi.sh
 | 
				
			||||||
| 
						 | 
					@ -78,7 +88,7 @@ bash run_spark_pi.sh
 | 
				
			||||||
 | 
					
 | 
				
			||||||
### Nytaxi on AKS
 | 
					### Nytaxi on AKS
 | 
				
			||||||
 | 
					
 | 
				
			||||||
Configure environment variables in `run_nytaxi_k8s.sh`, `driver.yaml` and `executor.yaml`. Then you can submit Nytaxi query task with `run_nytaxi_k8s.sh`.
 | 
					In `run_nytaxi_k8s.sh` script, update `IMAGE` variable to `myContainerRegistry.azurecr.io/intel_corporation/bigdl-ppml-azure-occlum:latest`, and configure your AKS address. In addition, configure environment variables in `driver.yaml` and `executor.yaml` too. Then you can submit Nytaxi query task with `run_nytaxi_k8s.sh`.
 | 
				
			||||||
```bash
 | 
					```bash
 | 
				
			||||||
bash run_nytaxi_k8s.sh
 | 
					bash run_nytaxi_k8s.sh
 | 
				
			||||||
```
 | 
					```
 | 
				
			||||||
| 
						 | 
					
 | 
				
			||||||
		Loading…
	
		Reference in a new issue