[PPML] Azure Gramine documentation (#6627)
* update az gramine doc * add python example * update * update * update
This commit is contained in:
		
							parent
							
								
									bd112d6cf3
								
							
						
					
					
						commit
						81a9f8147c
					
				
					 1 changed files with 128 additions and 102 deletions
				
			
		| 
						 | 
					@ -17,7 +17,7 @@ Before you setup your environment, please install Azure CLI on your machine acco
 | 
				
			||||||
 | 
					
 | 
				
			||||||
Then run `az login` to login to Azure system before you run the following Azure commands.
 | 
					Then run `az login` to login to Azure system before you run the following Azure commands.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
### 2.2 Create Azure VM for hosting BigDL PPML image
 | 
					### 2.2 Create Azure Linux VM for hosting BigDL PPML image
 | 
				
			||||||
#### 2.2.1 Create Resource Group
 | 
					#### 2.2.1 Create Resource Group
 | 
				
			||||||
On your machine, create resource group or use your existing resource group. Example code to create resource group with Azure CLI:
 | 
					On your machine, create resource group or use your existing resource group. Example code to create resource group with Azure CLI:
 | 
				
			||||||
```
 | 
					```
 | 
				
			||||||
| 
						 | 
					@ -27,26 +27,53 @@ az group create \
 | 
				
			||||||
    --output none
 | 
					    --output none
 | 
				
			||||||
```
 | 
					```
 | 
				
			||||||
 | 
					
 | 
				
			||||||
#### 2.2.2 Create Linux client with SGX support
 | 
					#### 2.2.2 Create Linux VM with SGX support
 | 
				
			||||||
Create Linux VM through Azure [CLI](https://docs.microsoft.com/en-us/azure/developer/javascript/tutorial/nodejs-virtual-machine-vm/create-linux-virtual-machine-azure-cli)/[Portal](https://docs.microsoft.com/en-us/azure/virtual-machines/linux/quick-create-portal)/Powershell.
 | 
					Create Linux VM through Azure [CLI](https://docs.microsoft.com/en-us/azure/developer/javascript/tutorial/nodejs-virtual-machine-vm/create-linux-virtual-machine-azure-cli)/[Portal](https://docs.microsoft.com/en-us/azure/virtual-machines/linux/quick-create-portal)/Powershell.
 | 
				
			||||||
For size of the VM, please choose DCSv3 Series VM with more than 4 vCPU cores.
 | 
					For size of the VM, please choose DCSv3 Series VM with more than 4 vCPU cores.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
#### 2.2.3 Pull BigDL PPML image and run on Linux client
 | 
					#### 2.2.3 Start AESM service on Linux VM
 | 
				
			||||||
 | 
					* ubuntu 20.04
 | 
				
			||||||
 | 
					```bash
 | 
				
			||||||
 | 
					echo 'deb [arch=amd64] https://download.01.org/intel-sgx/sgx_repo/ubuntu focal main' | tee /etc/apt/sources.list.d/intelsgx.list
 | 
				
			||||||
 | 
					wget -qO - https://download.01.org/intel-sgx/sgx_repo/ubuntu/intel-sgx-deb.key | apt-key add -
 | 
				
			||||||
 | 
					sudo apt update
 | 
				
			||||||
 | 
					apt-get install libsgx-dcap-ql
 | 
				
			||||||
 | 
					apt install sgx-aesm-service
 | 
				
			||||||
 | 
					```
 | 
				
			||||||
 | 
					* ubuntu 18.04
 | 
				
			||||||
 | 
					```bash
 | 
				
			||||||
 | 
					echo 'deb [arch=amd64] https://download.01.org/intel-sgx/sgx_repo/ubuntu bionic main' | tee /etc/apt/sources.list.d/intelsgx.list
 | 
				
			||||||
 | 
					wget -qO - https://download.01.org/intel-sgx/sgx_repo/ubuntu/intel-sgx-deb.key | apt-key add -
 | 
				
			||||||
 | 
					sudo apt update
 | 
				
			||||||
 | 
					apt-get install libsgx-dcap-ql
 | 
				
			||||||
 | 
					apt install sgx-aesm-service
 | 
				
			||||||
 | 
					```
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					#### 2.2.4 Pull BigDL PPML image and run on Linux VM
 | 
				
			||||||
* Go to Azure Marketplace, search "BigDL PPML" and find `BigDL PPML: Secure Big Data AI on Intel SGX` product. Click "Create" button which will lead you to `Subscribe` page.
 | 
					* Go to Azure Marketplace, search "BigDL PPML" and find `BigDL PPML: Secure Big Data AI on Intel SGX` product. Click "Create" button which will lead you to `Subscribe` page.
 | 
				
			||||||
On `Subscribe` page, input your subscription, your Azure container registry, your resource group and your location. Then click `Subscribe` to subscribe BigDL PPML to your container registry.
 | 
					On `Subscribe` page, input your subscription, your Azure container registry, your resource group and your location. Then click `Subscribe` to subscribe BigDL PPML to your container registry.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
* Go to your Azure container regsitry, check `Repostirories`, and find `intel_corporation/bigdl-ppml-trusted-big-data-ml-python-graphene`
 | 
					* Go to your Azure container regsitry (i.e. myContainerRegistry), check `Repostirories`, and find `intel_corporation/bigdl-ppml-trusted-big-data-ml-python-gramine`
 | 
				
			||||||
* Login to the created VM. Then login to your Azure container registry, pull BigDL PPML image using this command:
 | 
					* Login to the created VM. Then login to your Azure container registry, pull BigDL PPML image as needed.
 | 
				
			||||||
  ```bash
 | 
					  * If you want to run with 16G SGX memory, you can pull the image as below:
 | 
				
			||||||
  docker pull myContainerRegistry.azurecr.io/intel_corporation/bigdl-ppml-trusted-big-data-ml-python-graphene
 | 
					    ```bash
 | 
				
			||||||
  ```
 | 
					    docker pull myContainerRegistry.azurecr.io/intel_corporation/bigdl-ppml-trusted-big-data-ml-python-gramine:2.2.0-SNAPSHOT-16g
 | 
				
			||||||
 | 
					    ```
 | 
				
			||||||
 | 
					  * If you want to run with 32G SGX memory, you can pull the image as below:
 | 
				
			||||||
 | 
					    ```bash
 | 
				
			||||||
 | 
					    docker pull myContainerRegistry.azurecr.io/intel_corporation/bigdl-ppml-trusted-big-data-ml-python-gramine:2.2.0-SNAPSHOT-32g
 | 
				
			||||||
 | 
					    ```
 | 
				
			||||||
 | 
					  * If you want to run with 64G SGX memory, you can pull the image as below:
 | 
				
			||||||
 | 
					    ```bash
 | 
				
			||||||
 | 
					    docker pull myContainerRegistry.azurecr.io/intel_corporation/bigdl-ppml-trusted-big-data-ml-python-gramine:2.2.0-SNAPSHOT-64g
 | 
				
			||||||
 | 
					    ```
 | 
				
			||||||
* Start container of this image
 | 
					* Start container of this image
 | 
				
			||||||
 | 
					  The example script to start the image is as below:
 | 
				
			||||||
  ```bash
 | 
					  ```bash
 | 
				
			||||||
  #!/bin/bash
 | 
					  #!/bin/bash
 | 
				
			||||||
 | 
					
 | 
				
			||||||
  export LOCAL_IP=YOUR_LOCAL_IP
 | 
					  export LOCAL_IP=YOUR_LOCAL_IP
 | 
				
			||||||
  export DOCKER_IMAGE=myContainerRegistry.azurecr.io/intel_corporation/bigdl-ppml-trusted-big-data-ml-python-graphene
 | 
					  export DOCKER_IMAGE=myContainerRegistry.azurecr.io/intel_corporation/bigdl-ppml-trusted-big-data-ml-python-gramine:2.2.0-SNAPSHOT-16g
 | 
				
			||||||
 | 
					
 | 
				
			||||||
  sudo docker run -itd \
 | 
					  sudo docker run -itd \
 | 
				
			||||||
      --privileged \
 | 
					      --privileged \
 | 
				
			||||||
| 
						 | 
					@ -59,9 +86,7 @@ On `Subscribe` page, input your subscription, your Azure container registry, you
 | 
				
			||||||
      -v /var/run/aesmd/aesm.socket:/var/run/aesmd/aesm.socket \
 | 
					      -v /var/run/aesmd/aesm.socket:/var/run/aesmd/aesm.socket \
 | 
				
			||||||
      --name=spark-local \
 | 
					      --name=spark-local \
 | 
				
			||||||
      -e LOCAL_IP=$LOCAL_IP \
 | 
					      -e LOCAL_IP=$LOCAL_IP \
 | 
				
			||||||
      -e SGX_MEM_SIZE=64G \
 | 
					 | 
				
			||||||
      $DOCKER_IMAGE bash
 | 
					      $DOCKER_IMAGE bash
 | 
				
			||||||
 | 
					 | 
				
			||||||
  ```
 | 
					  ```
 | 
				
			||||||
 | 
					
 | 
				
			||||||
### 2.3 Create AKS(Azure Kubernetes Services) or use existing AKs
 | 
					### 2.3 Create AKS(Azure Kubernetes Services) or use existing AKs
 | 
				
			||||||
| 
						 | 
					@ -163,7 +188,7 @@ Take note of the following properties for use in the next section:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
  Example command:
 | 
					  Example command:
 | 
				
			||||||
  ```bash
 | 
					  ```bash
 | 
				
			||||||
  az keyvault set-policy --name myKeyVault --object-id <mySystemAssignedIdentity> --secret-permissions all --key-permissions all --certificate-permissions all
 | 
					  az keyvault set-policy --name myKeyVault --object-id <mySystemAssignedIdentity> --secret-permissions all --key-permissions all unwrapKey wrapKey
 | 
				
			||||||
  ```
 | 
					  ```
 | 
				
			||||||
 | 
					
 | 
				
			||||||
#### 2.5.3 AKS access Key Vault
 | 
					#### 2.5.3 AKS access Key Vault
 | 
				
			||||||
| 
						 | 
					@ -186,55 +211,7 @@ Take note of principalId of the first line as System Managed Identity of your VM
 | 
				
			||||||
###### b. Set access policy for AKS VM ScaleSet
 | 
					###### b. Set access policy for AKS VM ScaleSet
 | 
				
			||||||
Example command:
 | 
					Example command:
 | 
				
			||||||
```bash
 | 
					```bash
 | 
				
			||||||
az keyvault set-policy --name myKeyVault --object-id <systemManagedIdentityOfVMSS> --secret-permissions get --key-permissions all
 | 
					az keyvault set-policy --name myKeyVault --object-id <systemManagedIdentityOfVMSS> --secret-permissions get --key-permissions get unwrapKey
 | 
				
			||||||
```
 | 
					 | 
				
			||||||
##### 2.5.3.2 Set access for AKS
 | 
					 | 
				
			||||||
###### a. Enable Azure Key Vault Provider for Secrets Store CSI Driver support
 | 
					 | 
				
			||||||
Example command:
 | 
					 | 
				
			||||||
```bash
 | 
					 | 
				
			||||||
az aks enable-addons --addons azure-keyvault-secrets-provider --name myAKSCluster --resource-group myResourceGroup
 | 
					 | 
				
			||||||
```
 | 
					 | 
				
			||||||
* Verify the Azure Key Vault Provider for Secrets Store CSI Driver installation
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
  Example command:
 | 
					 | 
				
			||||||
  ```bash
 | 
					 | 
				
			||||||
  kubectl get pods -n kube-system -l 'app in (secrets-store-csi-driver, secrets-store-provider-azure)'
 | 
					 | 
				
			||||||
  ```
 | 
					 | 
				
			||||||
  Be sure that a Secrets Store CSI Driver pod and an Azure Key Vault Provider pod are running on each node in your cluster's node pools.
 | 
					 | 
				
			||||||
* Enable Azure Key Vault Provider for Secrets Store CSI Driver to track of secret update in key vault
 | 
					 | 
				
			||||||
  ```bash
 | 
					 | 
				
			||||||
  az aks update -g myResourceGroup -n myAKSCluster --enable-secret-rotation
 | 
					 | 
				
			||||||
  ```
 | 
					 | 
				
			||||||
###### b. Provide an identity to access the Azure Key Vault
 | 
					 | 
				
			||||||
There are several ways to provide identity for Azure Key Vault Provider for Secrets Store CSI Driver to access Azure Key Vault: `An Azure Active Directory pod identity`, `user-assigned identity` or `system-assigned managed identity`. In our solution, we use user-assigned managed identity.
 | 
					 | 
				
			||||||
* Enable managed identity in AKS
 | 
					 | 
				
			||||||
  ```bash
 | 
					 | 
				
			||||||
  az aks update -g myResourceGroup -n myAKSCluster --enable-managed-identity
 | 
					 | 
				
			||||||
  ```
 | 
					 | 
				
			||||||
* Get user-assigned managed identity that you created when you enabled a managed identity on your AKS cluster
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
  Run:
 | 
					 | 
				
			||||||
  ```bash
 | 
					 | 
				
			||||||
  az aks show -g myResourceGroup -n myAKSCluster --query addonProfiles.azureKeyvaultSecretsProvider.identity.clientId -o tsv
 | 
					 | 
				
			||||||
  ```
 | 
					 | 
				
			||||||
  The output would be like:
 | 
					 | 
				
			||||||
  ```bash
 | 
					 | 
				
			||||||
  xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
 | 
					 | 
				
			||||||
  ```
 | 
					 | 
				
			||||||
  Take note of this output as your user-assigned managed identity of Azure KeyVault Secrets Provider
 | 
					 | 
				
			||||||
* Grant your user-assigned managed identity permissions that enable it to read your key vault and view its contents
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
  Example command:
 | 
					 | 
				
			||||||
  ```bash
 | 
					 | 
				
			||||||
  az keyvault set-policy -n myKeyVault --key-permissions get --spn xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
 | 
					 | 
				
			||||||
  az keyvault set-policy -n myKeyVault --secret-permissions get --spn xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
 | 
					 | 
				
			||||||
  ```
 | 
					 | 
				
			||||||
###### c. Create a SecretProviderClass to access your Key Vault
 | 
					 | 
				
			||||||
On your client docker container, edit `/ppml/trusted-big-data-ml/azure/secretProviderClass.yaml` file, modify `<client-id>` to your user-assigned managed identity of Azure KeyVault Secrets Provider, and modify `<key-vault-name>` and  `<tenant-id>` to your real key vault name and tenant id.
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
Then run:
 | 
					 | 
				
			||||||
```bash
 | 
					 | 
				
			||||||
kubectl apply -f /ppml/trusted-big-data-ml/azure/secretProviderClass.yaml
 | 
					 | 
				
			||||||
```
 | 
					```
 | 
				
			||||||
 | 
					
 | 
				
			||||||
## 3. Run Spark PPML jobs
 | 
					## 3. Run Spark PPML jobs
 | 
				
			||||||
| 
						 | 
					@ -252,12 +229,7 @@ Run such script to save kubeconfig to secret
 | 
				
			||||||
```bash
 | 
					```bash
 | 
				
			||||||
/ppml/trusted-big-data-ml/azure/kubeconfig-secret.sh
 | 
					/ppml/trusted-big-data-ml/azure/kubeconfig-secret.sh
 | 
				
			||||||
```
 | 
					```
 | 
				
			||||||
### 3.2 Generate enclave key to Azure Key Vault
 | 
					### 3.2 Generate keys
 | 
				
			||||||
Run such script to generate enclave key
 | 
					 | 
				
			||||||
```
 | 
					 | 
				
			||||||
/ppml/trusted-big-data-ml/azure/generate-enclave-key-az.sh myKeyVault
 | 
					 | 
				
			||||||
```
 | 
					 | 
				
			||||||
### 3.3 Generate keys
 | 
					 | 
				
			||||||
Run such scripts to generate keys:
 | 
					Run such scripts to generate keys:
 | 
				
			||||||
```bash
 | 
					```bash
 | 
				
			||||||
/ppml/trusted-big-data-ml/azure/generate-keys-az.sh
 | 
					/ppml/trusted-big-data-ml/azure/generate-keys-az.sh
 | 
				
			||||||
| 
						 | 
					@ -268,11 +240,16 @@ After generate keys, run such command to save keys in Kubernetes.
 | 
				
			||||||
```
 | 
					```
 | 
				
			||||||
kubectl apply -f /ppml/trusted-big-data-ml/work/keys/keys.yaml
 | 
					kubectl apply -f /ppml/trusted-big-data-ml/work/keys/keys.yaml
 | 
				
			||||||
```
 | 
					```
 | 
				
			||||||
### 3.4 Generate password
 | 
					### 3.3 Generate password
 | 
				
			||||||
Run such script to save the password to Azure Key Vault
 | 
					Run such script to save the password to Azure Key Vault
 | 
				
			||||||
```bash
 | 
					```bash
 | 
				
			||||||
/ppml/trusted-big-data-ml/azure/generate-password-az.sh myKeyVault used_password_when_generate_keys
 | 
					/ppml/trusted-big-data-ml/azure/generate-password-az.sh myKeyVault used_password_when_generate_keys
 | 
				
			||||||
```
 | 
					```
 | 
				
			||||||
 | 
					### 3.4 Create the RBAC
 | 
				
			||||||
 | 
					```bash
 | 
				
			||||||
 | 
					kubectl create serviceaccount spark
 | 
				
			||||||
 | 
					kubectl create clusterrolebinding spark-role --clusterrole=edit --serviceaccount=default:spark --namespace=default
 | 
				
			||||||
 | 
					```
 | 
				
			||||||
### 3.5 Create image pull secret from your Azure container registry
 | 
					### 3.5 Create image pull secret from your Azure container registry
 | 
				
			||||||
  * If you already logged in to your Azure container registry, find your docker config json file (i.e. ~/.docker/config.json), and create secret for your registry credential like below:
 | 
					  * If you already logged in to your Azure container registry, find your docker config json file (i.e. ~/.docker/config.json), and create secret for your registry credential like below:
 | 
				
			||||||
  ```bash
 | 
					  ```bash
 | 
				
			||||||
| 
						 | 
					@ -284,23 +261,21 @@ Run such script to save the password to Azure Key Vault
 | 
				
			||||||
  ```bash
 | 
					  ```bash
 | 
				
			||||||
  kubectl create secret docker-registry regcred --docker-server=myContainerRegistry.azurecr.io --docker-username=<your-name> --docker-password=<your-pword> --docker-email=<your-email>
 | 
					  kubectl create secret docker-registry regcred --docker-server=myContainerRegistry.azurecr.io --docker-username=<your-name> --docker-password=<your-pword> --docker-email=<your-email>
 | 
				
			||||||
  ```
 | 
					  ```
 | 
				
			||||||
### 3.6 Create the RBAC
 | 
					
 | 
				
			||||||
```bash
 | 
					### 3.6 Add image pull secret to service account
 | 
				
			||||||
kubectl create serviceaccount spark
 | 
					 | 
				
			||||||
kubectl create clusterrolebinding spark-role --clusterrole=edit --serviceaccount=default:spark --namespace=default
 | 
					 | 
				
			||||||
```
 | 
					 | 
				
			||||||
### 3.7 Add image pull secret to service account
 | 
					 | 
				
			||||||
```bash
 | 
					```bash
 | 
				
			||||||
kubectl patch serviceaccount spark -p '{"imagePullSecrets": [{"name": "regcred"}]}'
 | 
					kubectl patch serviceaccount spark -p '{"imagePullSecrets": [{"name": "regcred"}]}'
 | 
				
			||||||
```
 | 
					```
 | 
				
			||||||
### 3.8 Run PPML spark job
 | 
					### 3.7 Run PPML spark job
 | 
				
			||||||
The example script to run PPML spark job on AKS is as below. You can also refer to `/ppml/trusted-big-data-ml/azure/submit-spark-sgx-az.sh`
 | 
					The example script to run PPML spark job on AKS is as below. You can also refer to `/ppml/trusted-big-data-ml/azure/submit-spark-sgx-az.sh`
 | 
				
			||||||
```bash
 | 
					```bash
 | 
				
			||||||
RUNTIME_SPARK_MASTER=
 | 
					 | 
				
			||||||
export RUNTIME_DRIVER_MEMORY=8g
 | 
					export RUNTIME_DRIVER_MEMORY=8g
 | 
				
			||||||
export RUNTIME_DRIVER_PORT=54321
 | 
					export RUNTIME_DRIVER_PORT=54321
 | 
				
			||||||
 | 
					
 | 
				
			||||||
BIGDL_VERSION=2.1.0
 | 
					RUNTIME_SPARK_MASTER=
 | 
				
			||||||
 | 
					AZ_CONTAINER_REGISTRY=myContainerRegistry
 | 
				
			||||||
 | 
					BIGDL_VERSION=2.2.0-SNAPSHOT
 | 
				
			||||||
 | 
					SGX_MEM=16g
 | 
				
			||||||
SPARK_EXTRA_JAR_PATH=
 | 
					SPARK_EXTRA_JAR_PATH=
 | 
				
			||||||
SPARK_JOB_MAIN_CLASS=
 | 
					SPARK_JOB_MAIN_CLASS=
 | 
				
			||||||
ARGS=
 | 
					ARGS=
 | 
				
			||||||
| 
						 | 
					@ -310,16 +285,13 @@ KEY_VAULT_NAME=
 | 
				
			||||||
PRIMARY_KEY_PATH=
 | 
					PRIMARY_KEY_PATH=
 | 
				
			||||||
DATA_KEY_PATH=
 | 
					DATA_KEY_PATH=
 | 
				
			||||||
 | 
					
 | 
				
			||||||
secure_password=`az keyvault secret show --name "key-pass" --vault-name $KEY_VAULT_NAME --query "value" | sed -e 's/^"//' -e 's/"$//'`
 | 
					export secure_password=`az keyvault secret show --name "key-pass" --vault-name $KEY_VAULT_NAME --query "value" | sed -e 's/^"//' -e 's/"$//'`
 | 
				
			||||||
 | 
					
 | 
				
			||||||
bash bigdl-ppml-submit.sh \
 | 
					bash bigdl-ppml-submit.sh \
 | 
				
			||||||
    --master $RUNTIME_SPARK_MASTER \
 | 
					    --master $RUNTIME_SPARK_MASTER \
 | 
				
			||||||
    --deploy-mode client \
 | 
					    --deploy-mode client \
 | 
				
			||||||
    --sgx-enabled true \
 | 
					    --sgx-enabled true \
 | 
				
			||||||
    --sgx-log-level error \
 | 
					 | 
				
			||||||
    --sgx-driver-memory 4g \
 | 
					 | 
				
			||||||
    --sgx-driver-jvm-memory 2g \
 | 
					    --sgx-driver-jvm-memory 2g \
 | 
				
			||||||
    --sgx-executor-memory 16g \
 | 
					 | 
				
			||||||
    --sgx-executor-jvm-memory 7g \
 | 
					    --sgx-executor-jvm-memory 7g \
 | 
				
			||||||
    --driver-memory 8g \
 | 
					    --driver-memory 8g \
 | 
				
			||||||
    --driver-cores 4 \
 | 
					    --driver-cores 4 \
 | 
				
			||||||
| 
						 | 
					@ -328,9 +300,9 @@ bash bigdl-ppml-submit.sh \
 | 
				
			||||||
    --num-executors 2 \
 | 
					    --num-executors 2 \
 | 
				
			||||||
    --conf spark.cores.max=8 \
 | 
					    --conf spark.cores.max=8 \
 | 
				
			||||||
    --name spark-decrypt-sgx \
 | 
					    --name spark-decrypt-sgx \
 | 
				
			||||||
    --conf spark.kubernetes.container.image=myContainerRegistry.azurecr.io/intel_corporation/bigdl-ppml-trusted-big-data-ml-python-graphene:$BIGDL_VERSION \
 | 
					    --conf spark.kubernetes.container.image=$AZ_CONTAINER_REGISTRY.azurecr.io/intel_corporation/bigdl-ppml-trusted-big-data-ml-python-gramine:$BIGDL_VERSION-$SGX_MEM \
 | 
				
			||||||
    --conf spark.kubernetes.driver.podTemplateFile=/ppml/trusted-big-data-ml/azure/spark-driver-template-az.yaml \
 | 
					    --driver-template /ppml/trusted-big-data-ml/azure/spark-driver-template-az.yaml \
 | 
				
			||||||
    --conf spark.kubernetes.executor.podTemplateFile=/ppml/trusted-big-data-ml/azure/spark-executor-template-az.yaml \
 | 
					    --executor-template /ppml/trusted-big-data-ml/azure/spark-executor-template-az.yaml \
 | 
				
			||||||
    --jars local://$SPARK_EXTRA_JAR_PATH \
 | 
					    --jars local://$SPARK_EXTRA_JAR_PATH \
 | 
				
			||||||
    --conf spark.hadoop.fs.azure.account.auth.type.${DATA_LAKE_NAME}.dfs.core.windows.net=SharedKey \
 | 
					    --conf spark.hadoop.fs.azure.account.auth.type.${DATA_LAKE_NAME}.dfs.core.windows.net=SharedKey \
 | 
				
			||||||
    --conf spark.hadoop.fs.azure.account.key.${DATA_LAKE_NAME}.dfs.core.windows.net=${DATA_LAKE_ACCESS_KEY} \
 | 
					    --conf spark.hadoop.fs.azure.account.key.${DATA_LAKE_NAME}.dfs.core.windows.net=${DATA_LAKE_ACCESS_KEY} \
 | 
				
			||||||
| 
						 | 
					@ -344,7 +316,59 @@ bash bigdl-ppml-submit.sh \
 | 
				
			||||||
    $SPARK_EXTRA_JAR_PATH \
 | 
					    $SPARK_EXTRA_JAR_PATH \
 | 
				
			||||||
    $ARGS
 | 
					    $ARGS
 | 
				
			||||||
```
 | 
					```
 | 
				
			||||||
 | 
					### 3.8 Run simple query python example
 | 
				
			||||||
 | 
					This is an example script to run simple query python example job on AKS with data stored in Azure data lake store.
 | 
				
			||||||
 | 
					```bash
 | 
				
			||||||
 | 
					export RUNTIME_DRIVER_MEMORY=6g
 | 
				
			||||||
 | 
					export RUNTIME_DRIVER_PORT=54321
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					RUNTIME_SPARK_MASTER=
 | 
				
			||||||
 | 
					AZ_CONTAINER_REGISTRY=myContainerRegistry
 | 
				
			||||||
 | 
					BIGDL_VERSION=2.2.0-SNAPSHOT
 | 
				
			||||||
 | 
					SGX_MEM=16g
 | 
				
			||||||
 | 
					SPARK_VERSION=3.1.3
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					DATA_LAKE_NAME=
 | 
				
			||||||
 | 
					DATA_LAKE_ACCESS_KEY=
 | 
				
			||||||
 | 
					INPUT_DIR_PATH=xxx@$DATA_LAKE_NAME.dfs.core.windows.net/xxx
 | 
				
			||||||
 | 
					KEY_VAULT_NAME=
 | 
				
			||||||
 | 
					PRIMARY_KEY_PATH=
 | 
				
			||||||
 | 
					DATA_KEY_PATH=
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					export secure_password=`az keyvault secret show --name "key-pass" --vault-name $KEY_VAULT_NAME --query "value" | sed -e 's/^"//' -e 's/"$//'`
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					bash bigdl-ppml-submit.sh \
 | 
				
			||||||
 | 
					    --master $RUNTIME_SPARK_MASTER \
 | 
				
			||||||
 | 
					    --deploy-mode client \
 | 
				
			||||||
 | 
					    --sgx-enabled true \
 | 
				
			||||||
 | 
					    --sgx-driver-jvm-memory 2g \
 | 
				
			||||||
 | 
					    --sgx-executor-jvm-memory 7g \
 | 
				
			||||||
 | 
					    --driver-memory 6g \
 | 
				
			||||||
 | 
					    --driver-cores 4 \
 | 
				
			||||||
 | 
					    --executor-memory 24g \
 | 
				
			||||||
 | 
					    --executor-cores 2 \
 | 
				
			||||||
 | 
					    --num-executors 1 \
 | 
				
			||||||
 | 
					    --name simple-query-sgx \
 | 
				
			||||||
 | 
					    --conf spark.kubernetes.container.image=$AZ_CONTAINER_REGISTRY.azurecr.io/intel_corporation/bigdl-ppml-trusted-big-data-ml-python-gramine:$BIGDL_VERSION-$SGX_MEM \
 | 
				
			||||||
 | 
					    --driver-template /ppml/trusted-big-data-ml/azure/spark-driver-template-az.yaml \
 | 
				
			||||||
 | 
					    --executor-template /ppml/trusted-big-data-ml/azure/spark-executor-template-az.yaml \
 | 
				
			||||||
 | 
					    --conf spark.hadoop.fs.azure.account.auth.type.${DATA_LAKE_NAME}.dfs.core.windows.net=SharedKey \
 | 
				
			||||||
 | 
					    --conf spark.hadoop.fs.azure.account.key.${DATA_LAKE_NAME}.dfs.core.windows.net=${DATA_LAKE_ACCESS_KEY} \
 | 
				
			||||||
 | 
					    --conf spark.hadoop.fs.azure.enable.append.support=true \
 | 
				
			||||||
 | 
					    --properties-file /ppml/trusted-big-data-ml/work/bigdl-$BIGDL_VERSION/conf/spark-bigdl.conf \
 | 
				
			||||||
 | 
					    --conf spark.executor.extraClassPath=/ppml/trusted-big-data-ml/work/bigdl-$BIGDL_VERSION/jars/*:/ppml/trusted-big-data-ml/work/spark-$SPARK_VERSION/jars/* \
 | 
				
			||||||
 | 
					    --conf spark.driver.extraClassPath=/ppml/trusted-big-data-ml/work/bigdl-$BIGDL_VERSION/jars/*:/ppml/trusted-big-data-ml/work/spark-$SPARK_VERSION/jars/* \
 | 
				
			||||||
 | 
					    --py-files /ppml/trusted-big-data-ml/work/bigdl-$BIGDL_VERSION/python/bigdl-ppml-spark_$SPARK_VERSION-$BIGDL_VERSION-python-api.zip,/ppml/trusted-big-data-ml/work/bigdl-$BIGDL_VERSION/python/bigdl-spark_$SPARK_VERSION-$BIGDL_VERSION-python-api.zip,/ppml/trusted-big-data-ml/work/bigdl-$BIGDL_VERSION/python/bigdl-dllib-spark_$SPARK_VERSION-$BIGDL_VERSION-python-api.zip \
 | 
				
			||||||
 | 
					    /ppml/trusted-big-data-ml/work/examples/simple_query_example.py \
 | 
				
			||||||
 | 
					    --kms_type AzureKeyManagementService \
 | 
				
			||||||
 | 
					    --azure_vault $KEY_VAULT_NAME \
 | 
				
			||||||
 | 
					    --primary_key_path $PRIMARY_KEY_PATH \
 | 
				
			||||||
 | 
					    --data_key_path $DATA_KEY_PATH \
 | 
				
			||||||
 | 
					    --input_encrypt_mode aes/cbc/pkcs5padding \
 | 
				
			||||||
 | 
					    --output_encrypt_mode plain_text \
 | 
				
			||||||
 | 
					    --input_path $INPUT_DIR_PATH/people.csv \
 | 
				
			||||||
 | 
					    --output_path $INPUT_DIR_PATH/simple-query-result.csv
 | 
				
			||||||
 | 
					```
 | 
				
			||||||
## 4. Run TPC-H example
 | 
					## 4. Run TPC-H example
 | 
				
			||||||
TPC-H queries are implemented using Spark DataFrames API running with BigDL PPML.
 | 
					TPC-H queries are implemented using Spark DataFrames API running with BigDL PPML.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
| 
						 | 
					@ -376,8 +400,9 @@ Generate primary key and data key, then save to file system.
 | 
				
			||||||
The example code for generating the primary key and data key is like below:
 | 
					The example code for generating the primary key and data key is like below:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
```bash
 | 
					```bash
 | 
				
			||||||
BIGDL_VERSION=2.1.0
 | 
					BIGDL_VERSION=2.2.0-SNAPSHOT
 | 
				
			||||||
java -cp '/ppml/trusted-big-data-ml/work/bigdl-$BIGDL_VERSION/jars/*:/ppml/trusted-big-data-ml/work/spark-3.1.2/conf/:/ppml/trusted-big-data-ml/work/spark-3.1.2/jars/* \
 | 
					SPARK_VERSION=3.1.3
 | 
				
			||||||
 | 
					java -cp /ppml/trusted-big-data-ml/work/bigdl-$BIGDL_VERSION/jars/*:/ppml/trusted-big-data-ml/work/spark-$SPARK_VERSION/conf/:/ppml/trusted-big-data-ml/work/spark-$SPARK_VERSION/jars/* \
 | 
				
			||||||
    -Xmx10g \
 | 
					    -Xmx10g \
 | 
				
			||||||
    com.intel.analytics.bigdl.ppml.examples.GenerateKeys \
 | 
					    com.intel.analytics.bigdl.ppml.examples.GenerateKeys \
 | 
				
			||||||
    --kmsType AzureKeyManagementService \
 | 
					    --kmsType AzureKeyManagementService \
 | 
				
			||||||
| 
						 | 
					@ -392,8 +417,9 @@ Encrypt data with specified BigDL `AzureKeyManagementService`
 | 
				
			||||||
The example code of encrypting data is like below:
 | 
					The example code of encrypting data is like below:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
```bash
 | 
					```bash
 | 
				
			||||||
BIGDL_VERSION=2.1.0
 | 
					BIGDL_VERSION=2.2.0-SNAPSHOT
 | 
				
			||||||
java -cp '/ppml/trusted-big-data-ml/work/bigdl-$BIGDL_VERSION/jars/*:/ppml/trusted-big-data-ml/work/spark-3.1.2/conf/:/ppml/trusted-big-data-ml/work/spark-3.1.2/jars/* \
 | 
					SPARK_VERSION=3.1.3
 | 
				
			||||||
 | 
					java -cp /ppml/trusted-big-data-ml/work/bigdl-$BIGDL_VERSION/jars/*:/ppml/trusted-big-data-ml/work/spark-$SPARK_VERSION/conf/:/ppml/trusted-big-data-ml/work/spark-$SPARK_VERSION/jars/* \
 | 
				
			||||||
    -Xmx10g \
 | 
					    -Xmx10g \
 | 
				
			||||||
    com.intel.analytics.bigdl.ppml.examples.tpch.EncryptFiles \
 | 
					    com.intel.analytics.bigdl.ppml.examples.tpch.EncryptFiles \
 | 
				
			||||||
    --kmsType AzureKeyManagementService \
 | 
					    --kmsType AzureKeyManagementService \
 | 
				
			||||||
| 
						 | 
					@ -422,16 +448,19 @@ The example script to run a query is like:
 | 
				
			||||||
export RUNTIME_DRIVER_MEMORY=8g
 | 
					export RUNTIME_DRIVER_MEMORY=8g
 | 
				
			||||||
export RUNTIME_DRIVER_PORT=54321
 | 
					export RUNTIME_DRIVER_PORT=54321
 | 
				
			||||||
 | 
					
 | 
				
			||||||
secure_password=`az keyvault secret show --name "key-pass" --vault-name $KEY_VAULT_NAME --query "value" | sed -e 's/^"//' -e 's/"$//'`
 | 
					export secure_password=`az keyvault secret show --name "key-pass" --vault-name $KEY_VAULT_NAME --query "value" | sed -e 's/^"//' -e 's/"$//'`
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					RUNTIME_SPARK_MASTER=
 | 
				
			||||||
 | 
					AZ_CONTAINER_REGISTRY=myContainerRegistry
 | 
				
			||||||
 | 
					BIGDL_VERSION=2.2.0-SNAPSHOT
 | 
				
			||||||
 | 
					SGX_MEM=16g
 | 
				
			||||||
 | 
					SPARK_VERSION=3.1.3
 | 
				
			||||||
 | 
					
 | 
				
			||||||
BIGDL_VERSION=2.1.0
 | 
					 | 
				
			||||||
DATA_LAKE_NAME=
 | 
					DATA_LAKE_NAME=
 | 
				
			||||||
DATA_LAKE_ACCESS_KEY=
 | 
					DATA_LAKE_ACCESS_KEY=
 | 
				
			||||||
KEY_VAULT_NAME=
 | 
					KEY_VAULT_NAME=
 | 
				
			||||||
PRIMARY_KEY_PATH=
 | 
					PRIMARY_KEY_PATH=
 | 
				
			||||||
DATA_KEY_PATH=
 | 
					DATA_KEY_PATH=
 | 
				
			||||||
 | 
					 | 
				
			||||||
RUNTIME_SPARK_MASTER=
 | 
					 | 
				
			||||||
INPUT_DIR=xxx/dbgen-encrypted
 | 
					INPUT_DIR=xxx/dbgen-encrypted
 | 
				
			||||||
OUTPUT_DIR=xxx/output
 | 
					OUTPUT_DIR=xxx/output
 | 
				
			||||||
 | 
					
 | 
				
			||||||
| 
						 | 
					@ -439,10 +468,7 @@ bash bigdl-ppml-submit.sh \
 | 
				
			||||||
    --master $RUNTIME_SPARK_MASTER \
 | 
					    --master $RUNTIME_SPARK_MASTER \
 | 
				
			||||||
    --deploy-mode client \
 | 
					    --deploy-mode client \
 | 
				
			||||||
    --sgx-enabled true \
 | 
					    --sgx-enabled true \
 | 
				
			||||||
    --sgx-log-level error \
 | 
					 | 
				
			||||||
    --sgx-driver-memory 4g \
 | 
					 | 
				
			||||||
    --sgx-driver-jvm-memory 2g \
 | 
					    --sgx-driver-jvm-memory 2g \
 | 
				
			||||||
    --sgx-executor-memory 16g \
 | 
					 | 
				
			||||||
    --sgx-executor-jvm-memory 7g \
 | 
					    --sgx-executor-jvm-memory 7g \
 | 
				
			||||||
    --driver-memory 8g \
 | 
					    --driver-memory 8g \
 | 
				
			||||||
    --driver-cores 4 \
 | 
					    --driver-cores 4 \
 | 
				
			||||||
| 
						 | 
					@ -451,9 +477,9 @@ bash bigdl-ppml-submit.sh \
 | 
				
			||||||
    --num-executors 2 \
 | 
					    --num-executors 2 \
 | 
				
			||||||
    --conf spark.cores.max=8 \
 | 
					    --conf spark.cores.max=8 \
 | 
				
			||||||
    --name spark-tpch-sgx \
 | 
					    --name spark-tpch-sgx \
 | 
				
			||||||
    --conf spark.kubernetes.container.image=myContainerRegistry.azurecr.io/intel_corporation/bigdl-ppml-trusted-big-data-ml-python-graphene:$BIGDL_VERSION \
 | 
					    --conf spark.kubernetes.container.image=$AZ_CONTAINER_REGISTRY.azurecr.io/intel_corporation/bigdl-ppml-trusted-big-data-ml-python-gramine:$BIGDL_VERSION-$SGX_MEM \
 | 
				
			||||||
    --conf spark.kubernetes.driver.podTemplateFile=/ppml/trusted-big-data-ml/azure/spark-driver-template-az.yaml \
 | 
					    --driver-template /ppml/trusted-big-data-ml/azure/spark-driver-template-az.yaml \
 | 
				
			||||||
    --conf spark.kubernetes.executor.podTemplateFile=/ppml/trusted-big-data-ml/azure/spark-executor-template-az.yaml \
 | 
					    --executor-template /ppml/trusted-big-data-ml/azure/spark-executor-template-az.yaml \
 | 
				
			||||||
    --conf spark.sql.auto.repartition=true \
 | 
					    --conf spark.sql.auto.repartition=true \
 | 
				
			||||||
    --conf spark.default.parallelism=400 \
 | 
					    --conf spark.default.parallelism=400 \
 | 
				
			||||||
    --conf spark.sql.shuffle.partitions=400 \
 | 
					    --conf spark.sql.shuffle.partitions=400 \
 | 
				
			||||||
| 
						 | 
					@ -464,10 +490,10 @@ bash bigdl-ppml-submit.sh \
 | 
				
			||||||
    --conf spark.bigdl.kms.azure.vault=$KEY_VAULT_NAME \
 | 
					    --conf spark.bigdl.kms.azure.vault=$KEY_VAULT_NAME \
 | 
				
			||||||
    --conf spark.bigdl.kms.key.primary=$PRIMARY_KEY_PATH \
 | 
					    --conf spark.bigdl.kms.key.primary=$PRIMARY_KEY_PATH \
 | 
				
			||||||
    --conf spark.bigdl.kms.key.data=$DATA_KEY_PATH \
 | 
					    --conf spark.bigdl.kms.key.data=$DATA_KEY_PATH \
 | 
				
			||||||
    --class $SPARK_JOB_MAIN_CLASS \
 | 
					    --class com.intel.analytics.bigdl.ppml.examples.tpch.TpchQuery \
 | 
				
			||||||
    --verbose \
 | 
					    --verbose \
 | 
				
			||||||
    local:///ppml/trusted-big-data-ml/work/bigdl-$BIGDL_VERSION/jars/bigdl-ppml-spark_3.1.2-$BIGDL_VERSION-SNAPSHOT.jar \
 | 
					    local:///ppml/trusted-big-data-ml/work/bigdl-$BIGDL_VERSION/jars/bigdl-ppml-spark_$SPARK_VERSION-$BIGDL_VERSION.jar \
 | 
				
			||||||
    $INPUT_DIR $OUTPUT_DIR aes_cbc_pkcs5padding plain_text [QUERY]
 | 
					    $INPUT_DIR $OUTPUT_DIR aes/cbc/pkcs5padding plain_text [QUERY]
 | 
				
			||||||
```
 | 
					```
 | 
				
			||||||
 | 
					
 | 
				
			||||||
INPUT_DIR is the TPC-H's data dir.
 | 
					INPUT_DIR is the TPC-H's data dir.
 | 
				
			||||||
| 
						 | 
					
 | 
				
			||||||
		Loading…
	
		Reference in a new issue