diff --git a/docs/readthedocs/source/doc/PPML/Overview/azure_ppml.md b/docs/readthedocs/source/doc/PPML/Overview/azure_ppml.md index 7f641660..f411534b 100644 --- a/docs/readthedocs/source/doc/PPML/Overview/azure_ppml.md +++ b/docs/readthedocs/source/doc/PPML/Overview/azure_ppml.md @@ -32,18 +32,20 @@ Create Linux VM through Azure [CLI](https://docs.microsoft.com/en-us/azure/devel For size of the VM, please choose DC-V3 Series VM with more than 4 vCPU cores. #### 2.2.3 Pull BigDL PPML image and run on Linux client -* Go to Azure Marketplace, search "BigDL PPML" and find `BigDL PPML` product. Click "Create" button which will lead you to `Subscribe` page. +* Go to Azure Marketplace, search "BigDL PPML" and find `BigDL PPML: Secure Big Data AI on Intel SGX` product. Click "Create" button which will lead you to `Subscribe` page. On `Subscribe` page, input your subscription, your Azure container registry, your resource group, location. Then click `Subscribe` to subscribe BigDL PPML to your container registry. + +* Go to your Azure container regsitry, check `Repostirories`, and find `intel_corporation/bigdl-ppml-trusted-big-data-ml-python-graphene` * Login to the created VM. Then login to your Azure container registry, pull BigDL PPML image using such command: ```bash -docker pull myContainerRegistry/bigdl-ppml-trusted-big-data-ml-python-graphene:2.1.0-SNAPSHOT +docker pull myContainerRegistry/intel_corporation/bigdl-ppml-trusted-big-data-ml-python-graphene ``` * Start container of this image ```bash #!/bin/bash export LOCAL_IP=YOUR_LOCAL_IP -export DOCKER_IMAGE=intelanalytics/bigdl-ppml-trusted-big-data-ml-python-graphene:2.1.0-SNAPSHOT +export DOCKER_IMAGE=intelanalytics/bigdl-ppml-trusted-big-data-ml-python-graphene sudo docker run -itd \ --privileged \ @@ -110,8 +112,8 @@ az storage fs directory upload -f myFS --account-name myDataLakeAccount -s "path ### 2.4.2 Access data in Hadoop through ABFS(Azure Blob Filesystem) driver You can access Data Lake Storage in Hadoop filesytem by such URI: ```abfs[s]://file_system@account_name.dfs.core.windows.net///``` #### Authentication -The ABFS driver supports two forms of authentication so that the Hadoop application may securely access resources contained within a Data Lake Storage Gen2 capable account. - - Shared Key: This permits users access to ALL resources in the account. The key is encrypted and stored in Hadoop configuration. +The ABFS driver supports two forms of authentication so that the Hadoop application may securely access resources contained within a Data Lake Storage Gen2 capable account. +- Shared Key: This permits users access to ALL resources in the account. The key is encrypted and stored in Hadoop configuration. - Azure Active Directory OAuth Bearer Token: Azure AD bearer tokens are acquired and refreshed by the driver using either the identity of the end user or a configured Service Principal. Using this authentication model, all access is authorized on a per-call basis using the identity associated with the supplied token and evaluated against the assigned POSIX Access Control List (ACL). @@ -175,54 +177,8 @@ Take note of principalId of the first line as System Managed Identity of your VM ##### b. Set access policy for AKS VM ScaleSet Example command: ```bash -az keyvault set-policy --name myKeyVault --object-id --secret-permissions get --key-permissions all --certificate-permissions all +az keyvault set-policy --name myKeyVault --object-id --secret-permissions get --key-permissions all ``` -#### 2.5.3.2 Set access for AKS -##### a. Enable Azure Key Vault Provider for Secrets Store CSI Driver support -Example command: -```bash -az aks enable-addons --addons azure-keyvault-secrets-provider --name myAKSCluster --resource-group myResourceGroup -``` -* Verify the Azure Key Vault Provider for Secrets Store CSI Driver installation -Example command: -```bash -kubectl get pods -n kube-system -l 'app in (secrets-store-csi-driver, secrets-store-provider-azure)' -``` -Be sure that a Secrets Store CSI Driver pod and an Azure Key Vault Provider pod are running on each node in your cluster's node pools. -* Enable Azure Key Vault Provider for Secrets Store CSI Driver to track of secret update in key vault -```bash -az aks update -g myResourceGroup -n myAKSCluster --enable-secret-rotation -``` -#### b. Provide an identity to access the Azure Key Vault -There are several ways to provide identity for Azure Key Vault Provider for Secrets Store CSI Driver to access Azure Key Vault: `An Azure Active Directory pod identity`, `user-assigned identity` or `system-assigned managed identity`. In our solution, we use user-assigned managed identity. -* Enable managed identity in AKS -```bash -az aks update -g myResourceGroup -n myAKSCluster --enable-managed-identity -``` -* Get user-assigned managed identity that you created when you enabled a managed identity on your AKS cluster -Run: -```bash -az aks show -g myResourceGroup -n myAKSCluster --query addonProfiles.azureKeyvaultSecretsProvider.identity.clientId -o tsv -``` -The output would be like: -```bash -f95519c1-3fe8-441b-a7b9-368d5e13b534 -``` -Take note of this output as your user-assigned managed identity of Azure KeyVault Secrets Provider -* Grant your user-assigned managed identity permissions that enable it to read your key vault and view its contents -Example command: -```bash -az keyvault set-policy -n myKeyVault --key-permissions get --spn f95519c1-3fe8-441b-a7b9-368d5e13b534 -az keyvault set-policy -n myKeyVault --secret-permissions get --spn f95519c1-3fe8-441b-a7b9-368d5e13b534 -``` -#### c. Create a SecretProviderClass to access your Key Vault -On your client docker container, edit `/ppml/trusted-big-data-ml/azure/secretProviderClass.yaml` file, modify `` to your user-assigned managed identity of Azure KeyVault Secrets Provider, and modify `` and `` to your real key vault name and tenant id. - -Then run: -```bash -kubectl apply -f /ppml/trusted-big-data-ml/azure/secretProviderClass.yaml -``` -to create secretProviderClass in your AKS. ## 3. Run Spark PPML jobs Login to your client VM and enter your BigDL PPML container: