5.5 KiB
		
	
	
	
	
	
	
	
			
		
		
	
	Trusted Cluster Serving with Graphene on Kubernetes
Prerequisites
Prior to deploying PPML Cluster Serving, please make sure the following is setup
- Hardware that supports SGX
 - A fully configured Kubernetes cluster
 - Intel SGX Device Plugin to use SGX in K8S cluster (install following instructions here)
 - Java
 
Deploy Trusted Realtime ML for Kubernetes
- 
Pull docker image from dockerhub
$ docker pull intelanalytics/analytics-zoo-ppml-trusted-realtime-ml-scala-graphene:0.12.0-SNAPSHOT - 
Pull the source code of Analytics Zoo and enter PPML graphene k8s directory
$ git clone https://github.com/intel-analytics/analytics-zoo.git $ cd analytics-zoo/ppml/trusted-realtime-ml/scala/docker-graphene/kubernetes - 
Generate secure keys and passwords, and deploy as secrets (Refer here for details)
- 
Generate keys and passwords
Note: Make sure to add
${JAVA_HOME}/binto$PATHto avoidkeytool: command not founderror.$ sudo ../../../../scripts/generate-keys.sh $ openssl genrsa -3 -out enclave-key.pem 3072 $ ../../../../scripts/generate-password.sh <used_password_when_generate_keys> - 
Deploy as secrets for Kubernetes
$ kubectl apply -f keys/keys.yaml $ kubectl apply -f password/password.yaml 
 - 
 - 
In
values.yaml, configure pulled image name, path ofenclave-key.pemgenerated in step 3 and path of scriptstart-all-but-flink.sh. - 
If kernel version is 5.11+ with built-in SGX support, create soft links for SGX device
$ sudo ln -s /dev/sgx_enclave /dev/sgx/enclave $ sudo ln -s /dev/sgx_provision /dev/sgx/provision 
Configure SGX mode
In templates/flink-configuration-configmap.yaml, configure sgx.mode to sgx or nonsgx to determine whether to run the workload with SGX.
Configure Resource for Components
- 
Configure jobmanager resource allocation in
templates/jobmanager-deployment.yaml... env: - name: SGX_MEM_SIZE value: "16G" ... resources: requests: cpu: 2 memory: 16Gi sgx.intel.com/enclave: "1" sgx.intel.com/epc: 16Gi limits: cpu: 2 memory: 16Gi sgx.intel.com/enclave: "1" sgx.intel.com/epc: 16Gi ... - 
Configure Taskmanager resource allocation
- 
Memory allocation in
templates/flink-configuration-configmap.yamltaskmanager.memory.managed.size: 4gb taskmanager.memory.task.heap.size: 5gb xmx.size: 5g - 
Pod resource allocation
Use
taskmanager-deployment.yamlinstead oftaskmanager-statefulset.yamlfor functionality test$ mv templates/taskmanager-statefulset.yaml ./ $ mv taskmanager-deployment.yaml.back templates/taskmanager-deployment.yamlConfigure resource in
templates/taskmanager-deployment.yaml(allocate 16 cores in this example, please configure according to scenario)... env: - name: CORE_NUM value: "16" - name: SGX_MEM_SIZE value: "32G" ... resources: requests: cpu: 16 memory: 32Gi sgx.intel.com/enclave: "1" sgx.intel.com/epc: 32Gi limits: cpu: 16 memory: 32Gi sgx.intel.com/enclave: "1" sgx.intel.com/epc: 32Gi ... 
 - 
 - 
Configure Redis and client resource allocation
- SGX memory allocation in 
start-all-but-flink.sh/ppml/trusted-realtime-ml/java ort SGX_MEM_SIZE=16G t "$SGX_MODE" = sgx && ./init.sh o "java initiated" - Pod resource allocation in 
templates/master-deployment.yaml... env: - name: CORE_NUM #batchsize per instance value: "16" ... resources: requests: cpu: 12 memory: 32Gi sgx.intel.com/enclave: "1" sgx.intel.com/epc: 32Gi limits: cpu: 12 memory: 32Gi sgx.intel.com/enclave: "1" sgx.intel.com/epc: 32Gi ... 
 - SGX memory allocation in 
 
Deploy Cluster Serving
- 
Deploy all components and start job
- Download helm from release page and install
 - Deploy cluster serving
$ helm install ppml ./ 
 - 
Port forwarding
Set up port forwarding of jobmanager Rest port for access to Flink WebUI on host
- Run 
kubectl port-forward <flink-jobmanager-pod> --address 0.0.0.0 8081:8081to forward jobmanager’s web UI port to 8081 on host. - Navigate to 
http://<host-IP>:8081in web browser to check status of Flink cluster and job. 
 - Run 
 - 
Performance benchmark
$ kubectl exec <master-deployment-pod> -it -- bash $ cd /ppml/trusted-realtime-ml/java/work/benchmark/ $ bash init-benchmark.sh $ python3 e2e_throughput.py -n <image_num> -i ../data/ILSVRC2012_val_00000001.JPEGThe
e2e_throughput.pyscript pushes test image for-ntimes (default 1000 if not manually set), and time the process from push images (enqueue) to retrieve all inference results (dequeue), to calculate cluster serving end-to-end throughput. The output should look likeServed xxx images in xxx sec, e2e throughput is xxx images/sec