Resolved merge conflict
This commit is contained in:
commit
0351f312ca
5 changed files with 257 additions and 36 deletions
BIN
docs/readthedocs/image/KMS-Client.png
Normal file
BIN
docs/readthedocs/image/KMS-Client.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 150 KiB |
BIN
docs/readthedocs/image/KMS_End-to-end_Example.png
Normal file
BIN
docs/readthedocs/image/KMS_End-to-end_Example.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 191 KiB |
|
|
@ -31,44 +31,112 @@ Key features:
|
||||||
* Training Worker in SGX
|
* Training Worker in SGX
|
||||||
## Example
|
## Example
|
||||||
|
|
||||||
### Prepare environment
|
### Before running code
|
||||||
#### SGX
|
|
||||||
TO ADD
|
|
||||||
#### Get jar ready
|
|
||||||
##### Build from source
|
|
||||||
```bash
|
|
||||||
git clone https://github.com/intel-analytics/BigDL.git
|
|
||||||
cd BigDL/scala
|
|
||||||
./make-dist.sh
|
|
||||||
```
|
|
||||||
the jar would be `BigDL/scala/ppml/target/bigdl-ppml...jar-with-dependencies.jar`
|
|
||||||
##### Download pre-build
|
|
||||||
```bash
|
|
||||||
wget
|
|
||||||
```
|
|
||||||
#### Config
|
|
||||||
If deploying PPML on cluster, need to overwrite config `./ppml-conf.yaml`. Default config (localhost:8980) would be used if no `ppml-conf.yaml` exists in the directory.
|
|
||||||
#### Start FL Server
|
|
||||||
```bash
|
|
||||||
java -cp com.intel.analytics.bigdl.ppml.FLServer
|
|
||||||
```
|
|
||||||
### HFL Logistic Regression
|
|
||||||
We provide an example demo in `BigDL/scala/ppml/demo`
|
|
||||||
```bash
|
|
||||||
# client 1
|
|
||||||
java -cp com.intel.analytics.bigdl.ppml.example.HflLogisticRegression -d data/diabetes-hfl-1.csv
|
|
||||||
|
|
||||||
# client 2
|
#### **Prepare Docker Image**
|
||||||
java -cp com.intel.analytics.bigdl.ppml.example.HflLogisticRegression -d data/diabetes-hfl-2.csv
|
|
||||||
```
|
|
||||||
### VFL Logistic Regression
|
|
||||||
```bash
|
|
||||||
# client 1
|
|
||||||
java -cp com.intel.analytics.bigdl.ppml.example.VflLogisticRegression -d data/diabetes-vfl-1.csv
|
|
||||||
|
|
||||||
# client 2
|
##### **Build jar from Source**
|
||||||
java -cp com.intel.analytics.bigdl.ppml.example.VflLogisticRegression -d data/diabetes-vfl-2.csv
|
|
||||||
|
```bash
|
||||||
|
cd BigDL/scala && bash make-dist.sh -DskipTests -Pspark_3.x
|
||||||
|
mv ppml/target/bigdl-ppml-spark_3.1.2-0.14.0-SNAPSHOT-jar-with-dependencies.jar ppml/demo
|
||||||
|
cd ppml/demo
|
||||||
```
|
```
|
||||||
|
|
||||||
|
##### **Build Image**
|
||||||
|
Modify your `http_proxy` in `build-image.sh` then run:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
./build-image.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
#### **Enclave key**
|
||||||
|
You need to generate your enclave key using the command below, and keep it safely for future remote attestations and to start SGX enclaves more securely.
|
||||||
|
|
||||||
|
It will generate a file `enclave-key.pem` in your present working directory, which will be your enclave key. To store the key elsewhere, modify the outputted file path.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
openssl genrsa -3 -out enclave-key.pem 3072
|
||||||
|
```
|
||||||
|
|
||||||
|
Then modify `ENCLAVE_KEY_PATH` in `deploy_fl_container.sh` with your path to `enclave-key.pem`.
|
||||||
|
|
||||||
|
#### **Tls certificate**
|
||||||
|
If you want to build tls channel with certifacate, you need to prepare the secure keys. In this tutorial, you can generate keys with root permission (test only, need input security password for keys).
|
||||||
|
|
||||||
|
**Note: Must enter `localhost` in step `Common Name` for test purpose.**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
sudo bash ../../../ppml/scripts/generate-keys.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
If run in container, please modify `KEYS_PATH` to `keys/` you generated in last step in `deploy_fl_container.sh`. This dir will mount to container's `/ppml/trusted-big-data-ml/work/keys`, then modify the `privateKeyFilePath` and `certChainFilePath` in `ppml-conf.yaml` with container's absolute path.
|
||||||
|
|
||||||
|
If not in container, just modify the `privateKeyFilePath` and `certChainFilePath` in `ppml-conf.yaml` with your local path.
|
||||||
|
|
||||||
|
If you don't want to build tls channel with cerfiticate, just delete the `privateKeyFilePath` and `certChainFilePath` in `ppml-conf.yaml`.
|
||||||
|
|
||||||
|
Then modify `DATA_PATH` to `./data` with absolute path in your machine and your local ip in `deploy_fl_container.sh`. The `./data` path will mlount to container's `/ppml/trusted-big-data-ml/work/data`, so if you don't run in container, you need to modify the data path in `runH_VflClient1_2.sh`.
|
||||||
|
|
||||||
|
### **Start container**
|
||||||
|
Running this command will start a docker container and initialize the sgx environment.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
bash deploy_fl_container.sh
|
||||||
|
sudo docker exec -it flDemo bash
|
||||||
|
./init.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
### **Start FLServer**
|
||||||
|
In container, run:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
./runFlServer.sh
|
||||||
|
```
|
||||||
|
The fl-server will start and listen on 8980 port. Both horizontal fl-demo and vertical fl-demo need two clients. You can change the listening port and client number by editing `BigDL/scala/ppml/demo/ppml-conf.yaml`'s `serverPort` and `clientNum`.
|
||||||
|
|
||||||
|
### **HFL Logistic Regression**
|
||||||
|
Open two new terminals, run:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
sudo docker exec -it flDemo bash
|
||||||
|
```
|
||||||
|
|
||||||
|
to enter the container, then in a terminal run:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
./runHflClient1.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
in another terminal run:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
./runHflClient2.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
Then we start two horizontal fl-clients to cooperate in training a model.
|
||||||
|
|
||||||
|
### **VFL Logistic Regression**
|
||||||
|
Open two new windows, run:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
sudo docker exec -it flDemo bash
|
||||||
|
```
|
||||||
|
|
||||||
|
to enter the container, then in a terminal run:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
./runVflClient1.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
in another terminal run:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
./runVflClient2.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
Then we start two vertical fl-clients to cooperate in training a model.
|
||||||
|
|
||||||
## References
|
## References
|
||||||
|
|
||||||
1. [Intel SGX](https://software.intel.com/content/www/us/en/develop/topics/software-guard-extensions.html)
|
1. [Intel SGX](https://software.intel.com/content/www/us/en/develop/topics/software-guard-extensions.html)
|
||||||
|
|
|
||||||
|
|
@ -0,0 +1,152 @@
|
||||||
|
## TPC-H with Trusted SparkSQL on Kubernetes ##
|
||||||
|
|
||||||
|
### Prerequisites ###
|
||||||
|
- Hardware that supports SGX
|
||||||
|
- A fully configured Kubernetes cluster
|
||||||
|
- Intel SGX Device Plugin to use SGX in K8S cluster (install following instructions [here](https://bigdl.readthedocs.io/en/latest/doc/PPML/QuickStart/deploy_intel_sgx_device_plugin_for_kubernetes.html "here"))
|
||||||
|
|
||||||
|
### Prepare TPC-H kit and data ###
|
||||||
|
1. Download and compile tpc-h
|
||||||
|
```
|
||||||
|
git clone https://github.com/intel-analytics/zoo-tutorials.git
|
||||||
|
cd zoo-tutorials/tpch-spark
|
||||||
|
|
||||||
|
sed -i 's/2.11.7/2.12.1/g' tpch.sbt
|
||||||
|
sed -i 's/2.4.0/3.1.2/g' tpch.sbt
|
||||||
|
sbt package
|
||||||
|
|
||||||
|
cd dbgen
|
||||||
|
make
|
||||||
|
```
|
||||||
|
2. Generate data
|
||||||
|
|
||||||
|
Generate input data with size ~100GB (user can adjust data size to need):
|
||||||
|
```
|
||||||
|
./dbgen -s 100
|
||||||
|
```
|
||||||
|
|
||||||
|
### Deploy PPML TPC-H on Kubernetes ###
|
||||||
|
1. Pull docker image
|
||||||
|
```
|
||||||
|
sudo docker pull intelanalytics/bigdl-ppml-trusted-big-data-ml-python-graphene:0.14.0-SNAPSHOT
|
||||||
|
```
|
||||||
|
2. Prepare SGX keys, make sure keys and tpch-spark can be accessed on each K8S node
|
||||||
|
3. Start a bigdl-ppml enabled Spark K8S client container with configured local IP, key, tpch and kuberconfig path
|
||||||
|
```
|
||||||
|
export ENCLAVE_KEY=/root/keys/enclave-key.pem
|
||||||
|
export DATA_PATH=/root/zoo-tutorials/tpch-spark
|
||||||
|
export KEYS_PATH=/root/keys
|
||||||
|
export KUBERCONFIG_PATH=/root/kuberconfig
|
||||||
|
export LOCAL_IP=$local_ip
|
||||||
|
export DOCKER_IMAGE=intelanalytics/bigdl-ppml-trusted-big-data-ml-python-graphene:0.14.0-SNAPSHOT
|
||||||
|
sudo docker run -itd \
|
||||||
|
--privileged \
|
||||||
|
--net=host \
|
||||||
|
--name=spark-local-k8s-client \
|
||||||
|
--oom-kill-disable \
|
||||||
|
--device=/dev/sgx/enclave \
|
||||||
|
--device=/dev/sgx/provision \
|
||||||
|
-v /var/run/aesmd/aesm.socket:/var/run/aesmd/aesm.socket \
|
||||||
|
-v $ENCLAVE_KEY:/graphene/Pal/src/host/Linux-SGX/signer/enclave-key.pem \
|
||||||
|
-v $DATA_PATH:/ppml/trusted-big-data-ml/work/tpch-spark \
|
||||||
|
-v $KEYS_PATH:/ppml/trusted-big-data-ml/work/keys \
|
||||||
|
-v $KUBERCONFIG_PATH:/root/.kube/config \
|
||||||
|
-e RUNTIME_SPARK_MASTER=k8s://https://$LOCAL_IP:6443 \
|
||||||
|
-e RUNTIME_K8S_SERVICE_ACCOUNT=spark \
|
||||||
|
-e RUNTIME_K8S_SPARK_IMAGE=$DOCKER_IMAGE \
|
||||||
|
-e RUNTIME_DRIVER_HOST=$LOCAL_IP \
|
||||||
|
-e RUNTIME_DRIVER_PORT=54321 \
|
||||||
|
-e RUNTIME_EXECUTOR_INSTANCES=1 \
|
||||||
|
-e RUNTIME_EXECUTOR_CORES=4 \
|
||||||
|
-e RUNTIME_EXECUTOR_MEMORY=20g \
|
||||||
|
-e RUNTIME_TOTAL_EXECUTOR_CORES=4 \
|
||||||
|
-e RUNTIME_DRIVER_CORES=4 \
|
||||||
|
-e RUNTIME_DRIVER_MEMORY=10g \
|
||||||
|
-e SGX_MEM_SIZE=64G \
|
||||||
|
-e SGX_LOG_LEVEL=error \
|
||||||
|
-e LOCAL_IP=$LOCAL_IP \
|
||||||
|
$DOCKER_IMAGE bash
|
||||||
|
```
|
||||||
|
4. Attach to the client container
|
||||||
|
```
|
||||||
|
sudo docker exec -it spark-local-k8s-client bash
|
||||||
|
```
|
||||||
|
5. Modify `spark-executor-template.yaml`, add path of `enclave-key`, `tpch-spark` and `kuberconfig` on host
|
||||||
|
```
|
||||||
|
apiVersion: v1
|
||||||
|
kind: Pod
|
||||||
|
spec:
|
||||||
|
containers:
|
||||||
|
- name: spark-executor
|
||||||
|
securityContext:
|
||||||
|
privileged: true
|
||||||
|
volumeMounts:
|
||||||
|
...
|
||||||
|
- name: tpch
|
||||||
|
mountPath: /ppml/trusted-big-data-ml/work/tpch-spark
|
||||||
|
- name: kubeconf
|
||||||
|
mountPath: /root/.kube/config
|
||||||
|
volumes:
|
||||||
|
- name: enclave-key
|
||||||
|
hostPath:
|
||||||
|
path: /root/keys/enclave-key.pem
|
||||||
|
...
|
||||||
|
- name: tpch
|
||||||
|
hostPath:
|
||||||
|
path: /path/to/tpch-spark
|
||||||
|
- name: kubeconf
|
||||||
|
hostPath:
|
||||||
|
path: /path/to/kuberconfig
|
||||||
|
```
|
||||||
|
6. Run PPML TPC-H
|
||||||
|
```
|
||||||
|
export TF_MKL_ALLOC_MAX_BYTES=10737418240 && \
|
||||||
|
export SPARK_LOCAL_IP=$LOCAL_IP && \
|
||||||
|
export HDFS_HOST=$hdfs_host_ip && \
|
||||||
|
export HDFS_PORT=$hdfs_port && \
|
||||||
|
export TPCH_DIR=/ppml/trusted-big-data-ml/work/tpch-spark \
|
||||||
|
export INPUT_DIR=$TPCH_DIR/dbgen \
|
||||||
|
export OUTPUT_DIR=hdfs://$HDFS_HOST:$HDFS_PORT/tpc-h/output \
|
||||||
|
/opt/jdk8/bin/java \
|
||||||
|
-cp '$TPCH_DIR/target/scala-2.12/spark-tpc-h-queries_2.12-1.0.jar:$TPCH_DIR/dbgen/*:/ppml/trusted-big-data-ml/work/spark-3.1.2/conf/:/ppml/trusted-big-data-ml/work/spark-3.1.2/jars/*' \
|
||||||
|
-Xmx10g \
|
||||||
|
-Dbigdl.mklNumThreads=1 \
|
||||||
|
org.apache.spark.deploy.SparkSubmit \
|
||||||
|
--master $RUNTIME_SPARK_MASTER \
|
||||||
|
--deploy-mode client \
|
||||||
|
--name spark-tpch-sgx \
|
||||||
|
--conf spark.driver.host=$LOCAL_IP \
|
||||||
|
--conf spark.driver.port=54321 \
|
||||||
|
--conf spark.driver.memory=10g \
|
||||||
|
--conf spark.driver.blockManager.port=10026 \
|
||||||
|
--conf spark.blockManager.port=10025 \
|
||||||
|
--conf spark.scheduler.maxRegisteredResourcesWaitingTime=5000000 \
|
||||||
|
--conf spark.worker.timeout=600 \
|
||||||
|
--conf spark.python.use.daemon=false \
|
||||||
|
--conf spark.python.worker.reuse=false \
|
||||||
|
--conf spark.network.timeout=10000000 \
|
||||||
|
--conf spark.starvation.timeout=250000 \
|
||||||
|
--conf spark.rpc.askTimeout=600 \
|
||||||
|
--conf spark.sql.autoBroadcastJoinThreshold=-1 \
|
||||||
|
--conf spark.io.compression.codec=lz4 \
|
||||||
|
--conf spark.sql.shuffle.partitions=8 \
|
||||||
|
--conf spark.speculation=false \
|
||||||
|
--conf spark.executor.heartbeatInterval=10000000 \
|
||||||
|
--conf spark.executor.instances=24 \
|
||||||
|
--executor-cores 8 \
|
||||||
|
--total-executor-cores 192 \
|
||||||
|
--executor-memory 16G \
|
||||||
|
--properties-file /ppml/trusted-big-data-ml/work/bigdl-0.14.0-SNAPSHOT/conf/spark-bigdl.conf \
|
||||||
|
--conf spark.kubernetes.authenticate.serviceAccountName=spark \
|
||||||
|
--conf spark.kubernetes.container.image=$RUNTIME_K8S_SPARK_IMAGE \
|
||||||
|
--conf spark.kubernetes.executor.podTemplateFile=/ppml/trusted-big-data-ml/spark-executor-template.yaml \
|
||||||
|
--conf spark.kubernetes.executor.deleteOnTermination=false \
|
||||||
|
--conf spark.kubernetes.executor.podNamePrefix=spark-tpch-sgx \
|
||||||
|
--conf spark.kubernetes.sgx.enabled=true \
|
||||||
|
--conf spark.kubernetes.sgx.mem=32g \
|
||||||
|
--conf spark.kubernetes.sgx.jvm.mem=10g \
|
||||||
|
--class main.scala.TpchQuery \
|
||||||
|
--verbose \
|
||||||
|
$TPCH_DIR/target/scala-2.12/spark-tpc-h-queries_2.12-1.0.jar \
|
||||||
|
$INPUT_DIR $OUTPUT_DIR
|
||||||
|
```
|
||||||
|
|
@ -78,6 +78,7 @@ BigDL Documentation
|
||||||
doc/PPML/QuickStart/build_kernel_with_sgx.md
|
doc/PPML/QuickStart/build_kernel_with_sgx.md
|
||||||
doc/PPML/QuickStart/deploy_intel_sgx_device_plugin_for_kubernetes.md
|
doc/PPML/QuickStart/deploy_intel_sgx_device_plugin_for_kubernetes.md
|
||||||
doc/PPML/QuickStart/trusted-serving-on-k8s-guide.md
|
doc/PPML/QuickStart/trusted-serving-on-k8s-guide.md
|
||||||
|
doc/PPML/QuickStart/tpc-h_with_sparksql_on_k8s.md
|
||||||
|
|
||||||
.. toctree::
|
.. toctree::
|
||||||
:maxdepth: 1
|
:maxdepth: 1
|
||||||
|
|
|
||||||
Loading…
Reference in a new issue