[PPML] Azure Gramine documentation (#6627)

* update az gramine doc * add python example * update * update * update
2022-11-29 14:48:57 -08:00 · 2022-11-29 14:48:57 -08:00 · 81a9f8147c
commit 81a9f8147c
parent bd112d6cf3
1 changed files with 128 additions and 102 deletions
--- a/docs/readthedocs/source/doc/PPML/Overview/azure_ppml.md
+++ b/docs/readthedocs/source/doc/PPML/Overview/azure_ppml.md
@ -17,7 +17,7 @@ Before you setup your environment, please install Azure CLI on your machine acco

 Then run `az login` to login to Azure system before you run the following Azure commands.

-### 2.2 Create Azure VM for hosting BigDL PPML image
+### 2.2 Create Azure Linux VM for hosting BigDL PPML image
 #### 2.2.1 Create Resource Group
 On your machine, create resource group or use your existing resource group. Example code to create resource group with Azure CLI:
 ```
@ -27,26 +27,53 @@ az group create \
    --output none
 ```

-#### 2.2.2 Create Linux client with SGX support
+#### 2.2.2 Create Linux VM with SGX support
 Create Linux VM through Azure [CLI](https://docs.microsoft.com/en-us/azure/developer/javascript/tutorial/nodejs-virtual-machine-vm/create-linux-virtual-machine-azure-cli)/[Portal](https://docs.microsoft.com/en-us/azure/virtual-machines/linux/quick-create-portal)/Powershell.
 For size of the VM, please choose DCSv3 Series VM with more than 4 vCPU cores.

-#### 2.2.3 Pull BigDL PPML image and run on Linux client
+#### 2.2.3 Start AESM service on Linux VM
+* ubuntu 20.04
+```bash
+echo 'deb [arch=amd64] https://download.01.org/intel-sgx/sgx_repo/ubuntu focal main' | tee /etc/apt/sources.list.d/intelsgx.list
+wget -qO - https://download.01.org/intel-sgx/sgx_repo/ubuntu/intel-sgx-deb.key | apt-key add -
+sudo apt update
+apt-get install libsgx-dcap-ql
+apt install sgx-aesm-service
+```
+* ubuntu 18.04
+```bash
+echo 'deb [arch=amd64] https://download.01.org/intel-sgx/sgx_repo/ubuntu bionic main' | tee /etc/apt/sources.list.d/intelsgx.list
+wget -qO - https://download.01.org/intel-sgx/sgx_repo/ubuntu/intel-sgx-deb.key | apt-key add -
+sudo apt update
+apt-get install libsgx-dcap-ql
+apt install sgx-aesm-service
+```
+
+#### 2.2.4 Pull BigDL PPML image and run on Linux VM
 * Go to Azure Marketplace, search "BigDL PPML" and find `BigDL PPML: Secure Big Data AI on Intel SGX` product. Click "Create" button which will lead you to `Subscribe` page.
 On `Subscribe` page, input your subscription, your Azure container registry, your resource group and your location. Then click `Subscribe` to subscribe BigDL PPML to your container registry.

-* Go to your Azure container regsitry, check `Repostirories`, and find `intel_corporation/bigdl-ppml-trusted-big-data-ml-python-graphene`
-* Login to the created VM. Then login to your Azure container registry, pull BigDL PPML image using this command:
-  ```bash
-  docker pull myContainerRegistry.azurecr.io/intel_corporation/bigdl-ppml-trusted-big-data-ml-python-graphene
-  ```
+* Go to your Azure container regsitry (i.e. myContainerRegistry), check `Repostirories`, and find `intel_corporation/bigdl-ppml-trusted-big-data-ml-python-gramine`
+* Login to the created VM. Then login to your Azure container registry, pull BigDL PPML image as needed.
+  * If you want to run with 16G SGX memory, you can pull the image as below:
+    ```bash
+    docker pull myContainerRegistry.azurecr.io/intel_corporation/bigdl-ppml-trusted-big-data-ml-python-gramine:2.2.0-SNAPSHOT-16g
+    ```
+  * If you want to run with 32G SGX memory, you can pull the image as below:
+    ```bash
+    docker pull myContainerRegistry.azurecr.io/intel_corporation/bigdl-ppml-trusted-big-data-ml-python-gramine:2.2.0-SNAPSHOT-32g
+    ```
+  * If you want to run with 64G SGX memory, you can pull the image as below:
+    ```bash
+    docker pull myContainerRegistry.azurecr.io/intel_corporation/bigdl-ppml-trusted-big-data-ml-python-gramine:2.2.0-SNAPSHOT-64g
+    ```
 * Start container of this image
-
+  The example script to start the image is as below:
  ```bash
  #!/bin/bash

  export LOCAL_IP=YOUR_LOCAL_IP
-  export DOCKER_IMAGE=myContainerRegistry.azurecr.io/intel_corporation/bigdl-ppml-trusted-big-data-ml-python-graphene
+  export DOCKER_IMAGE=myContainerRegistry.azurecr.io/intel_corporation/bigdl-ppml-trusted-big-data-ml-python-gramine:2.2.0-SNAPSHOT-16g

  sudo docker run -itd \
      --privileged \
@ -59,9 +86,7 @@ On `Subscribe` page, input your subscription, your Azure container registry, you
      -v /var/run/aesmd/aesm.socket:/var/run/aesmd/aesm.socket \
      --name=spark-local \
      -e LOCAL_IP=$LOCAL_IP \
-      -e SGX_MEM_SIZE=64G \
      $DOCKER_IMAGE bash
-
  ```

 ### 2.3 Create AKS(Azure Kubernetes Services) or use existing AKs
@ -163,7 +188,7 @@ Take note of the following properties for use in the next section:

  Example command:
  ```bash
-  az keyvault set-policy --name myKeyVault --object-id <mySystemAssignedIdentity> --secret-permissions all --key-permissions all --certificate-permissions all
+  az keyvault set-policy --name myKeyVault --object-id <mySystemAssignedIdentity> --secret-permissions all --key-permissions all unwrapKey wrapKey
  ```

 #### 2.5.3 AKS access Key Vault
@ -186,55 +211,7 @@ Take note of principalId of the first line as System Managed Identity of your VM
 ###### b. Set access policy for AKS VM ScaleSet
 Example command:
 ```bash
-az keyvault set-policy --name myKeyVault --object-id <systemManagedIdentityOfVMSS> --secret-permissions get --key-permissions all
-```
-##### 2.5.3.2 Set access for AKS
-###### a. Enable Azure Key Vault Provider for Secrets Store CSI Driver support
-Example command:
-```bash
-az aks enable-addons --addons azure-keyvault-secrets-provider --name myAKSCluster --resource-group myResourceGroup
-```
-* Verify the Azure Key Vault Provider for Secrets Store CSI Driver installation
-
-  Example command:
-  ```bash
-  kubectl get pods -n kube-system -l 'app in (secrets-store-csi-driver, secrets-store-provider-azure)'
-  ```
-  Be sure that a Secrets Store CSI Driver pod and an Azure Key Vault Provider pod are running on each node in your cluster's node pools.
-* Enable Azure Key Vault Provider for Secrets Store CSI Driver to track of secret update in key vault
-  ```bash
-  az aks update -g myResourceGroup -n myAKSCluster --enable-secret-rotation
-  ```
-###### b. Provide an identity to access the Azure Key Vault
-There are several ways to provide identity for Azure Key Vault Provider for Secrets Store CSI Driver to access Azure Key Vault: `An Azure Active Directory pod identity`, `user-assigned identity` or `system-assigned managed identity`. In our solution, we use user-assigned managed identity.
-* Enable managed identity in AKS
-  ```bash
-  az aks update -g myResourceGroup -n myAKSCluster --enable-managed-identity
-  ```
-* Get user-assigned managed identity that you created when you enabled a managed identity on your AKS cluster
-
-  Run:
-  ```bash
-  az aks show -g myResourceGroup -n myAKSCluster --query addonProfiles.azureKeyvaultSecretsProvider.identity.clientId -o tsv
-  ```
-  The output would be like:
-  ```bash
-  xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
-  ```
-  Take note of this output as your user-assigned managed identity of Azure KeyVault Secrets Provider
-* Grant your user-assigned managed identity permissions that enable it to read your key vault and view its contents
-
-  Example command:
-  ```bash
-  az keyvault set-policy -n myKeyVault --key-permissions get --spn xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
-  az keyvault set-policy -n myKeyVault --secret-permissions get --spn xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
-  ```
-###### c. Create a SecretProviderClass to access your Key Vault
-On your client docker container, edit `/ppml/trusted-big-data-ml/azure/secretProviderClass.yaml` file, modify `<client-id>` to your user-assigned managed identity of Azure KeyVault Secrets Provider, and modify `<key-vault-name>` and  `<tenant-id>` to your real key vault name and tenant id.
-
-Then run:
-```bash
-kubectl apply -f /ppml/trusted-big-data-ml/azure/secretProviderClass.yaml
+az keyvault set-policy --name myKeyVault --object-id <systemManagedIdentityOfVMSS> --secret-permissions get --key-permissions get unwrapKey
 ```

 ## 3. Run Spark PPML jobs
@ -252,12 +229,7 @@ Run such script to save kubeconfig to secret
 ```bash
 /ppml/trusted-big-data-ml/azure/kubeconfig-secret.sh
 ```
-### 3.2 Generate enclave key to Azure Key Vault
-Run such script to generate enclave key
-```
-/ppml/trusted-big-data-ml/azure/generate-enclave-key-az.sh myKeyVault
-```
-### 3.3 Generate keys
+### 3.2 Generate keys
 Run such scripts to generate keys:
 ```bash
 /ppml/trusted-big-data-ml/azure/generate-keys-az.sh
@ -268,11 +240,16 @@ After generate keys, run such command to save keys in Kubernetes.
 ```
 kubectl apply -f /ppml/trusted-big-data-ml/work/keys/keys.yaml
 ```
-### 3.4 Generate password
+### 3.3 Generate password
 Run such script to save the password to Azure Key Vault
 ```bash
 /ppml/trusted-big-data-ml/azure/generate-password-az.sh myKeyVault used_password_when_generate_keys
 ```
+### 3.4 Create the RBAC
+```bash
+kubectl create serviceaccount spark
+kubectl create clusterrolebinding spark-role --clusterrole=edit --serviceaccount=default:spark --namespace=default
+```
 ### 3.5 Create image pull secret from your Azure container registry
  * If you already logged in to your Azure container registry, find your docker config json file (i.e. ~/.docker/config.json), and create secret for your registry credential like below:
  ```bash
@ -284,23 +261,21 @@ Run such script to save the password to Azure Key Vault
  ```bash
  kubectl create secret docker-registry regcred --docker-server=myContainerRegistry.azurecr.io --docker-username=<your-name> --docker-password=<your-pword> --docker-email=<your-email>
  ```
-### 3.6 Create the RBAC
-```bash
-kubectl create serviceaccount spark
-kubectl create clusterrolebinding spark-role --clusterrole=edit --serviceaccount=default:spark --namespace=default
-```
-### 3.7 Add image pull secret to service account
+
+### 3.6 Add image pull secret to service account
 ```bash
 kubectl patch serviceaccount spark -p '{"imagePullSecrets": [{"name": "regcred"}]}'
 ```
-### 3.8 Run PPML spark job
+### 3.7 Run PPML spark job
 The example script to run PPML spark job on AKS is as below. You can also refer to `/ppml/trusted-big-data-ml/azure/submit-spark-sgx-az.sh`
 ```bash
-RUNTIME_SPARK_MASTER=
 export RUNTIME_DRIVER_MEMORY=8g
 export RUNTIME_DRIVER_PORT=54321

-BIGDL_VERSION=2.1.0
+RUNTIME_SPARK_MASTER=
+AZ_CONTAINER_REGISTRY=myContainerRegistry
+BIGDL_VERSION=2.2.0-SNAPSHOT
+SGX_MEM=16g
 SPARK_EXTRA_JAR_PATH=
 SPARK_JOB_MAIN_CLASS=
 ARGS=
@ -310,16 +285,13 @@ KEY_VAULT_NAME=
 PRIMARY_KEY_PATH=
 DATA_KEY_PATH=

-secure_password=`az keyvault secret show --name "key-pass" --vault-name $KEY_VAULT_NAME --query "value" | sed -e 's/^"//' -e 's/"$//'`
+export secure_password=`az keyvault secret show --name "key-pass" --vault-name $KEY_VAULT_NAME --query "value" | sed -e 's/^"//' -e 's/"$//'`

 bash bigdl-ppml-submit.sh \
    --master $RUNTIME_SPARK_MASTER \
    --deploy-mode client \
    --sgx-enabled true \
-    --sgx-log-level error \
-    --sgx-driver-memory 4g \
    --sgx-driver-jvm-memory 2g \
-    --sgx-executor-memory 16g \
    --sgx-executor-jvm-memory 7g \
    --driver-memory 8g \
    --driver-cores 4 \
@ -328,9 +300,9 @@ bash bigdl-ppml-submit.sh \
    --num-executors 2 \
    --conf spark.cores.max=8 \
    --name spark-decrypt-sgx \
-    --conf spark.kubernetes.container.image=myContainerRegistry.azurecr.io/intel_corporation/bigdl-ppml-trusted-big-data-ml-python-graphene:$BIGDL_VERSION \
-    --conf spark.kubernetes.driver.podTemplateFile=/ppml/trusted-big-data-ml/azure/spark-driver-template-az.yaml \
-    --conf spark.kubernetes.executor.podTemplateFile=/ppml/trusted-big-data-ml/azure/spark-executor-template-az.yaml \
+    --conf spark.kubernetes.container.image=$AZ_CONTAINER_REGISTRY.azurecr.io/intel_corporation/bigdl-ppml-trusted-big-data-ml-python-gramine:$BIGDL_VERSION-$SGX_MEM \
+    --driver-template /ppml/trusted-big-data-ml/azure/spark-driver-template-az.yaml \
+    --executor-template /ppml/trusted-big-data-ml/azure/spark-executor-template-az.yaml \
    --jars local://$SPARK_EXTRA_JAR_PATH \
    --conf spark.hadoop.fs.azure.account.auth.type.${DATA_LAKE_NAME}.dfs.core.windows.net=SharedKey \
    --conf spark.hadoop.fs.azure.account.key.${DATA_LAKE_NAME}.dfs.core.windows.net=${DATA_LAKE_ACCESS_KEY} \
@ -344,7 +316,59 @@ bash bigdl-ppml-submit.sh \
    $SPARK_EXTRA_JAR_PATH \
    $ARGS
 ```
+### 3.8 Run simple query python example
+This is an example script to run simple query python example job on AKS with data stored in Azure data lake store.
+```bash
+export RUNTIME_DRIVER_MEMORY=6g
+export RUNTIME_DRIVER_PORT=54321

+RUNTIME_SPARK_MASTER=
+AZ_CONTAINER_REGISTRY=myContainerRegistry
+BIGDL_VERSION=2.2.0-SNAPSHOT
+SGX_MEM=16g
+SPARK_VERSION=3.1.3
+
+DATA_LAKE_NAME=
+DATA_LAKE_ACCESS_KEY=
+INPUT_DIR_PATH=xxx@$DATA_LAKE_NAME.dfs.core.windows.net/xxx
+KEY_VAULT_NAME=
+PRIMARY_KEY_PATH=
+DATA_KEY_PATH=
+
+export secure_password=`az keyvault secret show --name "key-pass" --vault-name $KEY_VAULT_NAME --query "value" | sed -e 's/^"//' -e 's/"$//'`
+
+bash bigdl-ppml-submit.sh \
+    --master $RUNTIME_SPARK_MASTER \
+    --deploy-mode client \
+    --sgx-enabled true \
+    --sgx-driver-jvm-memory 2g \
+    --sgx-executor-jvm-memory 7g \
+    --driver-memory 6g \
+    --driver-cores 4 \
+    --executor-memory 24g \
+    --executor-cores 2 \
+    --num-executors 1 \
+    --name simple-query-sgx \
+    --conf spark.kubernetes.container.image=$AZ_CONTAINER_REGISTRY.azurecr.io/intel_corporation/bigdl-ppml-trusted-big-data-ml-python-gramine:$BIGDL_VERSION-$SGX_MEM \
+    --driver-template /ppml/trusted-big-data-ml/azure/spark-driver-template-az.yaml \
+    --executor-template /ppml/trusted-big-data-ml/azure/spark-executor-template-az.yaml \
+    --conf spark.hadoop.fs.azure.account.auth.type.${DATA_LAKE_NAME}.dfs.core.windows.net=SharedKey \
+    --conf spark.hadoop.fs.azure.account.key.${DATA_LAKE_NAME}.dfs.core.windows.net=${DATA_LAKE_ACCESS_KEY} \
+    --conf spark.hadoop.fs.azure.enable.append.support=true \
+    --properties-file /ppml/trusted-big-data-ml/work/bigdl-$BIGDL_VERSION/conf/spark-bigdl.conf \
+    --conf spark.executor.extraClassPath=/ppml/trusted-big-data-ml/work/bigdl-$BIGDL_VERSION/jars/*:/ppml/trusted-big-data-ml/work/spark-$SPARK_VERSION/jars/* \
+    --conf spark.driver.extraClassPath=/ppml/trusted-big-data-ml/work/bigdl-$BIGDL_VERSION/jars/*:/ppml/trusted-big-data-ml/work/spark-$SPARK_VERSION/jars/* \
+    --py-files /ppml/trusted-big-data-ml/work/bigdl-$BIGDL_VERSION/python/bigdl-ppml-spark_$SPARK_VERSION-$BIGDL_VERSION-python-api.zip,/ppml/trusted-big-data-ml/work/bigdl-$BIGDL_VERSION/python/bigdl-spark_$SPARK_VERSION-$BIGDL_VERSION-python-api.zip,/ppml/trusted-big-data-ml/work/bigdl-$BIGDL_VERSION/python/bigdl-dllib-spark_$SPARK_VERSION-$BIGDL_VERSION-python-api.zip \
+    /ppml/trusted-big-data-ml/work/examples/simple_query_example.py \
+    --kms_type AzureKeyManagementService \
+    --azure_vault $KEY_VAULT_NAME \
+    --primary_key_path $PRIMARY_KEY_PATH \
+    --data_key_path $DATA_KEY_PATH \
+    --input_encrypt_mode aes/cbc/pkcs5padding \
+    --output_encrypt_mode plain_text \
+    --input_path $INPUT_DIR_PATH/people.csv \
+    --output_path $INPUT_DIR_PATH/simple-query-result.csv
+```
 ## 4. Run TPC-H example
 TPC-H queries are implemented using Spark DataFrames API running with BigDL PPML.

@ -376,8 +400,9 @@ Generate primary key and data key, then save to file system.
 The example code for generating the primary key and data key is like below:

 ```bash
-BIGDL_VERSION=2.1.0
-java -cp '/ppml/trusted-big-data-ml/work/bigdl-$BIGDL_VERSION/jars/*:/ppml/trusted-big-data-ml/work/spark-3.1.2/conf/:/ppml/trusted-big-data-ml/work/spark-3.1.2/jars/* \
+BIGDL_VERSION=2.2.0-SNAPSHOT
+SPARK_VERSION=3.1.3
+java -cp /ppml/trusted-big-data-ml/work/bigdl-$BIGDL_VERSION/jars/*:/ppml/trusted-big-data-ml/work/spark-$SPARK_VERSION/conf/:/ppml/trusted-big-data-ml/work/spark-$SPARK_VERSION/jars/* \
    -Xmx10g \
    com.intel.analytics.bigdl.ppml.examples.GenerateKeys \
    --kmsType AzureKeyManagementService \
@ -392,8 +417,9 @@ Encrypt data with specified BigDL `AzureKeyManagementService`
 The example code of encrypting data is like below:

 ```bash
-BIGDL_VERSION=2.1.0
-java -cp '/ppml/trusted-big-data-ml/work/bigdl-$BIGDL_VERSION/jars/*:/ppml/trusted-big-data-ml/work/spark-3.1.2/conf/:/ppml/trusted-big-data-ml/work/spark-3.1.2/jars/* \
+BIGDL_VERSION=2.2.0-SNAPSHOT
+SPARK_VERSION=3.1.3
+java -cp /ppml/trusted-big-data-ml/work/bigdl-$BIGDL_VERSION/jars/*:/ppml/trusted-big-data-ml/work/spark-$SPARK_VERSION/conf/:/ppml/trusted-big-data-ml/work/spark-$SPARK_VERSION/jars/* \
    -Xmx10g \
    com.intel.analytics.bigdl.ppml.examples.tpch.EncryptFiles \
    --kmsType AzureKeyManagementService \
@ -422,16 +448,19 @@ The example script to run a query is like:
 export RUNTIME_DRIVER_MEMORY=8g
 export RUNTIME_DRIVER_PORT=54321

-secure_password=`az keyvault secret show --name "key-pass" --vault-name $KEY_VAULT_NAME --query "value" | sed -e 's/^"//' -e 's/"$//'`
+export secure_password=`az keyvault secret show --name "key-pass" --vault-name $KEY_VAULT_NAME --query "value" | sed -e 's/^"//' -e 's/"$//'`
+
+RUNTIME_SPARK_MASTER=
+AZ_CONTAINER_REGISTRY=myContainerRegistry
+BIGDL_VERSION=2.2.0-SNAPSHOT
+SGX_MEM=16g
+SPARK_VERSION=3.1.3

-BIGDL_VERSION=2.1.0
 DATA_LAKE_NAME=
 DATA_LAKE_ACCESS_KEY=
 KEY_VAULT_NAME=
 PRIMARY_KEY_PATH=
 DATA_KEY_PATH=
-
-RUNTIME_SPARK_MASTER=
 INPUT_DIR=xxx/dbgen-encrypted
 OUTPUT_DIR=xxx/output

@ -439,10 +468,7 @@ bash bigdl-ppml-submit.sh \
    --master $RUNTIME_SPARK_MASTER \
    --deploy-mode client \
    --sgx-enabled true \
-    --sgx-log-level error \
-    --sgx-driver-memory 4g \
    --sgx-driver-jvm-memory 2g \
-    --sgx-executor-memory 16g \
    --sgx-executor-jvm-memory 7g \
    --driver-memory 8g \
    --driver-cores 4 \
@ -451,9 +477,9 @@ bash bigdl-ppml-submit.sh \
    --num-executors 2 \
    --conf spark.cores.max=8 \
    --name spark-tpch-sgx \
-    --conf spark.kubernetes.container.image=myContainerRegistry.azurecr.io/intel_corporation/bigdl-ppml-trusted-big-data-ml-python-graphene:$BIGDL_VERSION \
-    --conf spark.kubernetes.driver.podTemplateFile=/ppml/trusted-big-data-ml/azure/spark-driver-template-az.yaml \
-    --conf spark.kubernetes.executor.podTemplateFile=/ppml/trusted-big-data-ml/azure/spark-executor-template-az.yaml \
+    --conf spark.kubernetes.container.image=$AZ_CONTAINER_REGISTRY.azurecr.io/intel_corporation/bigdl-ppml-trusted-big-data-ml-python-gramine:$BIGDL_VERSION-$SGX_MEM \
+    --driver-template /ppml/trusted-big-data-ml/azure/spark-driver-template-az.yaml \
+    --executor-template /ppml/trusted-big-data-ml/azure/spark-executor-template-az.yaml \
    --conf spark.sql.auto.repartition=true \
    --conf spark.default.parallelism=400 \
    --conf spark.sql.shuffle.partitions=400 \
@ -464,10 +490,10 @@ bash bigdl-ppml-submit.sh \
    --conf spark.bigdl.kms.azure.vault=$KEY_VAULT_NAME \
    --conf spark.bigdl.kms.key.primary=$PRIMARY_KEY_PATH \
    --conf spark.bigdl.kms.key.data=$DATA_KEY_PATH \
-    --class $SPARK_JOB_MAIN_CLASS \
+    --class com.intel.analytics.bigdl.ppml.examples.tpch.TpchQuery \
    --verbose \
-    local:///ppml/trusted-big-data-ml/work/bigdl-$BIGDL_VERSION/jars/bigdl-ppml-spark_3.1.2-$BIGDL_VERSION-SNAPSHOT.jar \
-    $INPUT_DIR $OUTPUT_DIR aes_cbc_pkcs5padding plain_text [QUERY]
+    local:///ppml/trusted-big-data-ml/work/bigdl-$BIGDL_VERSION/jars/bigdl-ppml-spark_$SPARK_VERSION-$BIGDL_VERSION.jar \
+    $INPUT_DIR $OUTPUT_DIR aes/cbc/pkcs5padding plain_text [QUERY]
 ```

 INPUT_DIR is the TPC-H's data dir.