This commit is contained in:
Jiao Wang 2022-08-10 12:56:41 -07:00 committed by GitHub
parent 19e6099483
commit c8dd8d6e5e

View file

@ -32,18 +32,20 @@ Create Linux VM through Azure [CLI](https://docs.microsoft.com/en-us/azure/devel
For size of the VM, please choose DC-V3 Series VM with more than 4 vCPU cores.
#### 2.2.3 Pull BigDL PPML image and run on Linux client
* Go to Azure Marketplace, search "BigDL PPML" and find `BigDL PPML` product. Click "Create" button which will lead you to `Subscribe` page.
* Go to Azure Marketplace, search "BigDL PPML" and find `BigDL PPML: Secure Big Data AI on Intel SGX` product. Click "Create" button which will lead you to `Subscribe` page.
On `Subscribe` page, input your subscription, your Azure container registry, your resource group, location. Then click `Subscribe` to subscribe BigDL PPML to your container registry.
* Go to your Azure container regsitry, check `Repostirories`, and find `intel_corporation/bigdl-ppml-trusted-big-data-ml-python-graphene`
* Login to the created VM. Then login to your Azure container registry, pull BigDL PPML image using such command:
```bash
docker pull myContainerRegistry/bigdl-ppml-trusted-big-data-ml-python-graphene:2.1.0-SNAPSHOT
docker pull myContainerRegistry/intel_corporation/bigdl-ppml-trusted-big-data-ml-python-graphene
```
* Start container of this image
```bash
#!/bin/bash
export LOCAL_IP=YOUR_LOCAL_IP
export DOCKER_IMAGE=intelanalytics/bigdl-ppml-trusted-big-data-ml-python-graphene:2.1.0-SNAPSHOT
export DOCKER_IMAGE=intelanalytics/bigdl-ppml-trusted-big-data-ml-python-graphene
sudo docker run -itd \
--privileged \
@ -110,8 +112,8 @@ az storage fs directory upload -f myFS --account-name myDataLakeAccount -s "path
### 2.4.2 Access data in Hadoop through ABFS(Azure Blob Filesystem) driver
You can access Data Lake Storage in Hadoop filesytem by such URI: ```abfs[s]://file_system@account_name.dfs.core.windows.net/<path>/<path>/<file_name>```
#### Authentication
The ABFS driver supports two forms of authentication so that the Hadoop application may securely access resources contained within a Data Lake Storage Gen2 capable account.
- Shared Key: This permits users access to ALL resources in the account. The key is encrypted and stored in Hadoop configuration.
The ABFS driver supports two forms of authentication so that the Hadoop application may securely access resources contained within a Data Lake Storage Gen2 capable account.
- Shared Key: This permits users access to ALL resources in the account. The key is encrypted and stored in Hadoop configuration.
- Azure Active Directory OAuth Bearer Token: Azure AD bearer tokens are acquired and refreshed by the driver using either the identity of the end user or a configured Service Principal. Using this authentication model, all access is authorized on a per-call basis using the identity associated with the supplied token and evaluated against the assigned POSIX Access Control List (ACL).
@ -175,54 +177,8 @@ Take note of principalId of the first line as System Managed Identity of your VM
##### b. Set access policy for AKS VM ScaleSet
Example command:
```bash
az keyvault set-policy --name myKeyVault --object-id <systemManagedIdentityOfVMSS> --secret-permissions get --key-permissions all --certificate-permissions all
az keyvault set-policy --name myKeyVault --object-id <systemManagedIdentityOfVMSS> --secret-permissions get --key-permissions all
```
#### 2.5.3.2 Set access for AKS
##### a. Enable Azure Key Vault Provider for Secrets Store CSI Driver support
Example command:
```bash
az aks enable-addons --addons azure-keyvault-secrets-provider --name myAKSCluster --resource-group myResourceGroup
```
* Verify the Azure Key Vault Provider for Secrets Store CSI Driver installation
Example command:
```bash
kubectl get pods -n kube-system -l 'app in (secrets-store-csi-driver, secrets-store-provider-azure)'
```
Be sure that a Secrets Store CSI Driver pod and an Azure Key Vault Provider pod are running on each node in your cluster's node pools.
* Enable Azure Key Vault Provider for Secrets Store CSI Driver to track of secret update in key vault
```bash
az aks update -g myResourceGroup -n myAKSCluster --enable-secret-rotation
```
#### b. Provide an identity to access the Azure Key Vault
There are several ways to provide identity for Azure Key Vault Provider for Secrets Store CSI Driver to access Azure Key Vault: `An Azure Active Directory pod identity`, `user-assigned identity` or `system-assigned managed identity`. In our solution, we use user-assigned managed identity.
* Enable managed identity in AKS
```bash
az aks update -g myResourceGroup -n myAKSCluster --enable-managed-identity
```
* Get user-assigned managed identity that you created when you enabled a managed identity on your AKS cluster
Run:
```bash
az aks show -g myResourceGroup -n myAKSCluster --query addonProfiles.azureKeyvaultSecretsProvider.identity.clientId -o tsv
```
The output would be like:
```bash
f95519c1-3fe8-441b-a7b9-368d5e13b534
```
Take note of this output as your user-assigned managed identity of Azure KeyVault Secrets Provider
* Grant your user-assigned managed identity permissions that enable it to read your key vault and view its contents
Example command:
```bash
az keyvault set-policy -n myKeyVault --key-permissions get --spn f95519c1-3fe8-441b-a7b9-368d5e13b534
az keyvault set-policy -n myKeyVault --secret-permissions get --spn f95519c1-3fe8-441b-a7b9-368d5e13b534
```
#### c. Create a SecretProviderClass to access your Key Vault
On your client docker container, edit `/ppml/trusted-big-data-ml/azure/secretProviderClass.yaml` file, modify `<client-id>` to your user-assigned managed identity of Azure KeyVault Secrets Provider, and modify `<key-vault-name>` and `<tenant-id>` to your real key vault name and tenant id.
Then run:
```bash
kubectl apply -f /ppml/trusted-big-data-ml/azure/secretProviderClass.yaml
```
to create secretProviderClass in your AKS.
## 3. Run Spark PPML jobs
Login to your client VM and enter your BigDL PPML container: