[PPML] Azure Gramine documentation (#6627)

* update az gramine doc

* add python example

* update

* update

* update
This commit is contained in:
Jiao Wang 2022-11-29 14:48:57 -08:00 committed by GitHub
parent bd112d6cf3
commit 81a9f8147c

View file

@ -17,7 +17,7 @@ Before you setup your environment, please install Azure CLI on your machine acco
Then run `az login` to login to Azure system before you run the following Azure commands.
### 2.2 Create Azure VM for hosting BigDL PPML image
### 2.2 Create Azure Linux VM for hosting BigDL PPML image
#### 2.2.1 Create Resource Group
On your machine, create resource group or use your existing resource group. Example code to create resource group with Azure CLI:
```
@ -27,26 +27,53 @@ az group create \
--output none
```
#### 2.2.2 Create Linux client with SGX support
#### 2.2.2 Create Linux VM with SGX support
Create Linux VM through Azure [CLI](https://docs.microsoft.com/en-us/azure/developer/javascript/tutorial/nodejs-virtual-machine-vm/create-linux-virtual-machine-azure-cli)/[Portal](https://docs.microsoft.com/en-us/azure/virtual-machines/linux/quick-create-portal)/Powershell.
For size of the VM, please choose DCSv3 Series VM with more than 4 vCPU cores.
#### 2.2.3 Pull BigDL PPML image and run on Linux client
#### 2.2.3 Start AESM service on Linux VM
* ubuntu 20.04
```bash
echo 'deb [arch=amd64] https://download.01.org/intel-sgx/sgx_repo/ubuntu focal main' | tee /etc/apt/sources.list.d/intelsgx.list
wget -qO - https://download.01.org/intel-sgx/sgx_repo/ubuntu/intel-sgx-deb.key | apt-key add -
sudo apt update
apt-get install libsgx-dcap-ql
apt install sgx-aesm-service
```
* ubuntu 18.04
```bash
echo 'deb [arch=amd64] https://download.01.org/intel-sgx/sgx_repo/ubuntu bionic main' | tee /etc/apt/sources.list.d/intelsgx.list
wget -qO - https://download.01.org/intel-sgx/sgx_repo/ubuntu/intel-sgx-deb.key | apt-key add -
sudo apt update
apt-get install libsgx-dcap-ql
apt install sgx-aesm-service
```
#### 2.2.4 Pull BigDL PPML image and run on Linux VM
* Go to Azure Marketplace, search "BigDL PPML" and find `BigDL PPML: Secure Big Data AI on Intel SGX` product. Click "Create" button which will lead you to `Subscribe` page.
On `Subscribe` page, input your subscription, your Azure container registry, your resource group and your location. Then click `Subscribe` to subscribe BigDL PPML to your container registry.
* Go to your Azure container regsitry, check `Repostirories`, and find `intel_corporation/bigdl-ppml-trusted-big-data-ml-python-graphene`
* Login to the created VM. Then login to your Azure container registry, pull BigDL PPML image using this command:
```bash
docker pull myContainerRegistry.azurecr.io/intel_corporation/bigdl-ppml-trusted-big-data-ml-python-graphene
```
* Go to your Azure container regsitry (i.e. myContainerRegistry), check `Repostirories`, and find `intel_corporation/bigdl-ppml-trusted-big-data-ml-python-gramine`
* Login to the created VM. Then login to your Azure container registry, pull BigDL PPML image as needed.
* If you want to run with 16G SGX memory, you can pull the image as below:
```bash
docker pull myContainerRegistry.azurecr.io/intel_corporation/bigdl-ppml-trusted-big-data-ml-python-gramine:2.2.0-SNAPSHOT-16g
```
* If you want to run with 32G SGX memory, you can pull the image as below:
```bash
docker pull myContainerRegistry.azurecr.io/intel_corporation/bigdl-ppml-trusted-big-data-ml-python-gramine:2.2.0-SNAPSHOT-32g
```
* If you want to run with 64G SGX memory, you can pull the image as below:
```bash
docker pull myContainerRegistry.azurecr.io/intel_corporation/bigdl-ppml-trusted-big-data-ml-python-gramine:2.2.0-SNAPSHOT-64g
```
* Start container of this image
The example script to start the image is as below:
```bash
#!/bin/bash
export LOCAL_IP=YOUR_LOCAL_IP
export DOCKER_IMAGE=myContainerRegistry.azurecr.io/intel_corporation/bigdl-ppml-trusted-big-data-ml-python-graphene
export DOCKER_IMAGE=myContainerRegistry.azurecr.io/intel_corporation/bigdl-ppml-trusted-big-data-ml-python-gramine:2.2.0-SNAPSHOT-16g
sudo docker run -itd \
--privileged \
@ -59,9 +86,7 @@ On `Subscribe` page, input your subscription, your Azure container registry, you
-v /var/run/aesmd/aesm.socket:/var/run/aesmd/aesm.socket \
--name=spark-local \
-e LOCAL_IP=$LOCAL_IP \
-e SGX_MEM_SIZE=64G \
$DOCKER_IMAGE bash
```
### 2.3 Create AKS(Azure Kubernetes Services) or use existing AKs
@ -163,7 +188,7 @@ Take note of the following properties for use in the next section:
Example command:
```bash
az keyvault set-policy --name myKeyVault --object-id <mySystemAssignedIdentity> --secret-permissions all --key-permissions all --certificate-permissions all
az keyvault set-policy --name myKeyVault --object-id <mySystemAssignedIdentity> --secret-permissions all --key-permissions all unwrapKey wrapKey
```
#### 2.5.3 AKS access Key Vault
@ -186,55 +211,7 @@ Take note of principalId of the first line as System Managed Identity of your VM
###### b. Set access policy for AKS VM ScaleSet
Example command:
```bash
az keyvault set-policy --name myKeyVault --object-id <systemManagedIdentityOfVMSS> --secret-permissions get --key-permissions all
```
##### 2.5.3.2 Set access for AKS
###### a. Enable Azure Key Vault Provider for Secrets Store CSI Driver support
Example command:
```bash
az aks enable-addons --addons azure-keyvault-secrets-provider --name myAKSCluster --resource-group myResourceGroup
```
* Verify the Azure Key Vault Provider for Secrets Store CSI Driver installation
Example command:
```bash
kubectl get pods -n kube-system -l 'app in (secrets-store-csi-driver, secrets-store-provider-azure)'
```
Be sure that a Secrets Store CSI Driver pod and an Azure Key Vault Provider pod are running on each node in your cluster's node pools.
* Enable Azure Key Vault Provider for Secrets Store CSI Driver to track of secret update in key vault
```bash
az aks update -g myResourceGroup -n myAKSCluster --enable-secret-rotation
```
###### b. Provide an identity to access the Azure Key Vault
There are several ways to provide identity for Azure Key Vault Provider for Secrets Store CSI Driver to access Azure Key Vault: `An Azure Active Directory pod identity`, `user-assigned identity` or `system-assigned managed identity`. In our solution, we use user-assigned managed identity.
* Enable managed identity in AKS
```bash
az aks update -g myResourceGroup -n myAKSCluster --enable-managed-identity
```
* Get user-assigned managed identity that you created when you enabled a managed identity on your AKS cluster
Run:
```bash
az aks show -g myResourceGroup -n myAKSCluster --query addonProfiles.azureKeyvaultSecretsProvider.identity.clientId -o tsv
```
The output would be like:
```bash
xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
```
Take note of this output as your user-assigned managed identity of Azure KeyVault Secrets Provider
* Grant your user-assigned managed identity permissions that enable it to read your key vault and view its contents
Example command:
```bash
az keyvault set-policy -n myKeyVault --key-permissions get --spn xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
az keyvault set-policy -n myKeyVault --secret-permissions get --spn xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
```
###### c. Create a SecretProviderClass to access your Key Vault
On your client docker container, edit `/ppml/trusted-big-data-ml/azure/secretProviderClass.yaml` file, modify `<client-id>` to your user-assigned managed identity of Azure KeyVault Secrets Provider, and modify `<key-vault-name>` and `<tenant-id>` to your real key vault name and tenant id.
Then run:
```bash
kubectl apply -f /ppml/trusted-big-data-ml/azure/secretProviderClass.yaml
az keyvault set-policy --name myKeyVault --object-id <systemManagedIdentityOfVMSS> --secret-permissions get --key-permissions get unwrapKey
```
## 3. Run Spark PPML jobs
@ -252,12 +229,7 @@ Run such script to save kubeconfig to secret
```bash
/ppml/trusted-big-data-ml/azure/kubeconfig-secret.sh
```
### 3.2 Generate enclave key to Azure Key Vault
Run such script to generate enclave key
```
/ppml/trusted-big-data-ml/azure/generate-enclave-key-az.sh myKeyVault
```
### 3.3 Generate keys
### 3.2 Generate keys
Run such scripts to generate keys:
```bash
/ppml/trusted-big-data-ml/azure/generate-keys-az.sh
@ -268,11 +240,16 @@ After generate keys, run such command to save keys in Kubernetes.
```
kubectl apply -f /ppml/trusted-big-data-ml/work/keys/keys.yaml
```
### 3.4 Generate password
### 3.3 Generate password
Run such script to save the password to Azure Key Vault
```bash
/ppml/trusted-big-data-ml/azure/generate-password-az.sh myKeyVault used_password_when_generate_keys
```
### 3.4 Create the RBAC
```bash
kubectl create serviceaccount spark
kubectl create clusterrolebinding spark-role --clusterrole=edit --serviceaccount=default:spark --namespace=default
```
### 3.5 Create image pull secret from your Azure container registry
* If you already logged in to your Azure container registry, find your docker config json file (i.e. ~/.docker/config.json), and create secret for your registry credential like below:
```bash
@ -284,23 +261,21 @@ Run such script to save the password to Azure Key Vault
```bash
kubectl create secret docker-registry regcred --docker-server=myContainerRegistry.azurecr.io --docker-username=<your-name> --docker-password=<your-pword> --docker-email=<your-email>
```
### 3.6 Create the RBAC
```bash
kubectl create serviceaccount spark
kubectl create clusterrolebinding spark-role --clusterrole=edit --serviceaccount=default:spark --namespace=default
```
### 3.7 Add image pull secret to service account
### 3.6 Add image pull secret to service account
```bash
kubectl patch serviceaccount spark -p '{"imagePullSecrets": [{"name": "regcred"}]}'
```
### 3.8 Run PPML spark job
### 3.7 Run PPML spark job
The example script to run PPML spark job on AKS is as below. You can also refer to `/ppml/trusted-big-data-ml/azure/submit-spark-sgx-az.sh`
```bash
RUNTIME_SPARK_MASTER=
export RUNTIME_DRIVER_MEMORY=8g
export RUNTIME_DRIVER_PORT=54321
BIGDL_VERSION=2.1.0
RUNTIME_SPARK_MASTER=
AZ_CONTAINER_REGISTRY=myContainerRegistry
BIGDL_VERSION=2.2.0-SNAPSHOT
SGX_MEM=16g
SPARK_EXTRA_JAR_PATH=
SPARK_JOB_MAIN_CLASS=
ARGS=
@ -310,16 +285,13 @@ KEY_VAULT_NAME=
PRIMARY_KEY_PATH=
DATA_KEY_PATH=
secure_password=`az keyvault secret show --name "key-pass" --vault-name $KEY_VAULT_NAME --query "value" | sed -e 's/^"//' -e 's/"$//'`
export secure_password=`az keyvault secret show --name "key-pass" --vault-name $KEY_VAULT_NAME --query "value" | sed -e 's/^"//' -e 's/"$//'`
bash bigdl-ppml-submit.sh \
--master $RUNTIME_SPARK_MASTER \
--deploy-mode client \
--sgx-enabled true \
--sgx-log-level error \
--sgx-driver-memory 4g \
--sgx-driver-jvm-memory 2g \
--sgx-executor-memory 16g \
--sgx-executor-jvm-memory 7g \
--driver-memory 8g \
--driver-cores 4 \
@ -328,9 +300,9 @@ bash bigdl-ppml-submit.sh \
--num-executors 2 \
--conf spark.cores.max=8 \
--name spark-decrypt-sgx \
--conf spark.kubernetes.container.image=myContainerRegistry.azurecr.io/intel_corporation/bigdl-ppml-trusted-big-data-ml-python-graphene:$BIGDL_VERSION \
--conf spark.kubernetes.driver.podTemplateFile=/ppml/trusted-big-data-ml/azure/spark-driver-template-az.yaml \
--conf spark.kubernetes.executor.podTemplateFile=/ppml/trusted-big-data-ml/azure/spark-executor-template-az.yaml \
--conf spark.kubernetes.container.image=$AZ_CONTAINER_REGISTRY.azurecr.io/intel_corporation/bigdl-ppml-trusted-big-data-ml-python-gramine:$BIGDL_VERSION-$SGX_MEM \
--driver-template /ppml/trusted-big-data-ml/azure/spark-driver-template-az.yaml \
--executor-template /ppml/trusted-big-data-ml/azure/spark-executor-template-az.yaml \
--jars local://$SPARK_EXTRA_JAR_PATH \
--conf spark.hadoop.fs.azure.account.auth.type.${DATA_LAKE_NAME}.dfs.core.windows.net=SharedKey \
--conf spark.hadoop.fs.azure.account.key.${DATA_LAKE_NAME}.dfs.core.windows.net=${DATA_LAKE_ACCESS_KEY} \
@ -344,7 +316,59 @@ bash bigdl-ppml-submit.sh \
$SPARK_EXTRA_JAR_PATH \
$ARGS
```
### 3.8 Run simple query python example
This is an example script to run simple query python example job on AKS with data stored in Azure data lake store.
```bash
export RUNTIME_DRIVER_MEMORY=6g
export RUNTIME_DRIVER_PORT=54321
RUNTIME_SPARK_MASTER=
AZ_CONTAINER_REGISTRY=myContainerRegistry
BIGDL_VERSION=2.2.0-SNAPSHOT
SGX_MEM=16g
SPARK_VERSION=3.1.3
DATA_LAKE_NAME=
DATA_LAKE_ACCESS_KEY=
INPUT_DIR_PATH=xxx@$DATA_LAKE_NAME.dfs.core.windows.net/xxx
KEY_VAULT_NAME=
PRIMARY_KEY_PATH=
DATA_KEY_PATH=
export secure_password=`az keyvault secret show --name "key-pass" --vault-name $KEY_VAULT_NAME --query "value" | sed -e 's/^"//' -e 's/"$//'`
bash bigdl-ppml-submit.sh \
--master $RUNTIME_SPARK_MASTER \
--deploy-mode client \
--sgx-enabled true \
--sgx-driver-jvm-memory 2g \
--sgx-executor-jvm-memory 7g \
--driver-memory 6g \
--driver-cores 4 \
--executor-memory 24g \
--executor-cores 2 \
--num-executors 1 \
--name simple-query-sgx \
--conf spark.kubernetes.container.image=$AZ_CONTAINER_REGISTRY.azurecr.io/intel_corporation/bigdl-ppml-trusted-big-data-ml-python-gramine:$BIGDL_VERSION-$SGX_MEM \
--driver-template /ppml/trusted-big-data-ml/azure/spark-driver-template-az.yaml \
--executor-template /ppml/trusted-big-data-ml/azure/spark-executor-template-az.yaml \
--conf spark.hadoop.fs.azure.account.auth.type.${DATA_LAKE_NAME}.dfs.core.windows.net=SharedKey \
--conf spark.hadoop.fs.azure.account.key.${DATA_LAKE_NAME}.dfs.core.windows.net=${DATA_LAKE_ACCESS_KEY} \
--conf spark.hadoop.fs.azure.enable.append.support=true \
--properties-file /ppml/trusted-big-data-ml/work/bigdl-$BIGDL_VERSION/conf/spark-bigdl.conf \
--conf spark.executor.extraClassPath=/ppml/trusted-big-data-ml/work/bigdl-$BIGDL_VERSION/jars/*:/ppml/trusted-big-data-ml/work/spark-$SPARK_VERSION/jars/* \
--conf spark.driver.extraClassPath=/ppml/trusted-big-data-ml/work/bigdl-$BIGDL_VERSION/jars/*:/ppml/trusted-big-data-ml/work/spark-$SPARK_VERSION/jars/* \
--py-files /ppml/trusted-big-data-ml/work/bigdl-$BIGDL_VERSION/python/bigdl-ppml-spark_$SPARK_VERSION-$BIGDL_VERSION-python-api.zip,/ppml/trusted-big-data-ml/work/bigdl-$BIGDL_VERSION/python/bigdl-spark_$SPARK_VERSION-$BIGDL_VERSION-python-api.zip,/ppml/trusted-big-data-ml/work/bigdl-$BIGDL_VERSION/python/bigdl-dllib-spark_$SPARK_VERSION-$BIGDL_VERSION-python-api.zip \
/ppml/trusted-big-data-ml/work/examples/simple_query_example.py \
--kms_type AzureKeyManagementService \
--azure_vault $KEY_VAULT_NAME \
--primary_key_path $PRIMARY_KEY_PATH \
--data_key_path $DATA_KEY_PATH \
--input_encrypt_mode aes/cbc/pkcs5padding \
--output_encrypt_mode plain_text \
--input_path $INPUT_DIR_PATH/people.csv \
--output_path $INPUT_DIR_PATH/simple-query-result.csv
```
## 4. Run TPC-H example
TPC-H queries are implemented using Spark DataFrames API running with BigDL PPML.
@ -376,8 +400,9 @@ Generate primary key and data key, then save to file system.
The example code for generating the primary key and data key is like below:
```bash
BIGDL_VERSION=2.1.0
java -cp '/ppml/trusted-big-data-ml/work/bigdl-$BIGDL_VERSION/jars/*:/ppml/trusted-big-data-ml/work/spark-3.1.2/conf/:/ppml/trusted-big-data-ml/work/spark-3.1.2/jars/* \
BIGDL_VERSION=2.2.0-SNAPSHOT
SPARK_VERSION=3.1.3
java -cp /ppml/trusted-big-data-ml/work/bigdl-$BIGDL_VERSION/jars/*:/ppml/trusted-big-data-ml/work/spark-$SPARK_VERSION/conf/:/ppml/trusted-big-data-ml/work/spark-$SPARK_VERSION/jars/* \
-Xmx10g \
com.intel.analytics.bigdl.ppml.examples.GenerateKeys \
--kmsType AzureKeyManagementService \
@ -392,8 +417,9 @@ Encrypt data with specified BigDL `AzureKeyManagementService`
The example code of encrypting data is like below:
```bash
BIGDL_VERSION=2.1.0
java -cp '/ppml/trusted-big-data-ml/work/bigdl-$BIGDL_VERSION/jars/*:/ppml/trusted-big-data-ml/work/spark-3.1.2/conf/:/ppml/trusted-big-data-ml/work/spark-3.1.2/jars/* \
BIGDL_VERSION=2.2.0-SNAPSHOT
SPARK_VERSION=3.1.3
java -cp /ppml/trusted-big-data-ml/work/bigdl-$BIGDL_VERSION/jars/*:/ppml/trusted-big-data-ml/work/spark-$SPARK_VERSION/conf/:/ppml/trusted-big-data-ml/work/spark-$SPARK_VERSION/jars/* \
-Xmx10g \
com.intel.analytics.bigdl.ppml.examples.tpch.EncryptFiles \
--kmsType AzureKeyManagementService \
@ -422,16 +448,19 @@ The example script to run a query is like:
export RUNTIME_DRIVER_MEMORY=8g
export RUNTIME_DRIVER_PORT=54321
secure_password=`az keyvault secret show --name "key-pass" --vault-name $KEY_VAULT_NAME --query "value" | sed -e 's/^"//' -e 's/"$//'`
export secure_password=`az keyvault secret show --name "key-pass" --vault-name $KEY_VAULT_NAME --query "value" | sed -e 's/^"//' -e 's/"$//'`
RUNTIME_SPARK_MASTER=
AZ_CONTAINER_REGISTRY=myContainerRegistry
BIGDL_VERSION=2.2.0-SNAPSHOT
SGX_MEM=16g
SPARK_VERSION=3.1.3
BIGDL_VERSION=2.1.0
DATA_LAKE_NAME=
DATA_LAKE_ACCESS_KEY=
KEY_VAULT_NAME=
PRIMARY_KEY_PATH=
DATA_KEY_PATH=
RUNTIME_SPARK_MASTER=
INPUT_DIR=xxx/dbgen-encrypted
OUTPUT_DIR=xxx/output
@ -439,10 +468,7 @@ bash bigdl-ppml-submit.sh \
--master $RUNTIME_SPARK_MASTER \
--deploy-mode client \
--sgx-enabled true \
--sgx-log-level error \
--sgx-driver-memory 4g \
--sgx-driver-jvm-memory 2g \
--sgx-executor-memory 16g \
--sgx-executor-jvm-memory 7g \
--driver-memory 8g \
--driver-cores 4 \
@ -451,9 +477,9 @@ bash bigdl-ppml-submit.sh \
--num-executors 2 \
--conf spark.cores.max=8 \
--name spark-tpch-sgx \
--conf spark.kubernetes.container.image=myContainerRegistry.azurecr.io/intel_corporation/bigdl-ppml-trusted-big-data-ml-python-graphene:$BIGDL_VERSION \
--conf spark.kubernetes.driver.podTemplateFile=/ppml/trusted-big-data-ml/azure/spark-driver-template-az.yaml \
--conf spark.kubernetes.executor.podTemplateFile=/ppml/trusted-big-data-ml/azure/spark-executor-template-az.yaml \
--conf spark.kubernetes.container.image=$AZ_CONTAINER_REGISTRY.azurecr.io/intel_corporation/bigdl-ppml-trusted-big-data-ml-python-gramine:$BIGDL_VERSION-$SGX_MEM \
--driver-template /ppml/trusted-big-data-ml/azure/spark-driver-template-az.yaml \
--executor-template /ppml/trusted-big-data-ml/azure/spark-executor-template-az.yaml \
--conf spark.sql.auto.repartition=true \
--conf spark.default.parallelism=400 \
--conf spark.sql.shuffle.partitions=400 \
@ -464,10 +490,10 @@ bash bigdl-ppml-submit.sh \
--conf spark.bigdl.kms.azure.vault=$KEY_VAULT_NAME \
--conf spark.bigdl.kms.key.primary=$PRIMARY_KEY_PATH \
--conf spark.bigdl.kms.key.data=$DATA_KEY_PATH \
--class $SPARK_JOB_MAIN_CLASS \
--class com.intel.analytics.bigdl.ppml.examples.tpch.TpchQuery \
--verbose \
local:///ppml/trusted-big-data-ml/work/bigdl-$BIGDL_VERSION/jars/bigdl-ppml-spark_3.1.2-$BIGDL_VERSION-SNAPSHOT.jar \
$INPUT_DIR $OUTPUT_DIR aes_cbc_pkcs5padding plain_text [QUERY]
local:///ppml/trusted-big-data-ml/work/bigdl-$BIGDL_VERSION/jars/bigdl-ppml-spark_$SPARK_VERSION-$BIGDL_VERSION.jar \
$INPUT_DIR $OUTPUT_DIR aes/cbc/pkcs5padding plain_text [QUERY]
```
INPUT_DIR is the TPC-H's data dir.