diff --git a/docs/readthedocs/source/doc/PPML/QuickStart/tpc-ds_with_sparksql_on_k8s.md b/docs/readthedocs/source/doc/PPML/QuickStart/tpc-ds_with_sparksql_on_k8s.md index 0cf8d6b3..b043703b 100644 --- a/docs/readthedocs/source/doc/PPML/QuickStart/tpc-ds_with_sparksql_on_k8s.md +++ b/docs/readthedocs/source/doc/PPML/QuickStart/tpc-ds_with_sparksql_on_k8s.md @@ -10,7 +10,7 @@ 1. Download and compile tpc-ds -``` +```bash git clone --recursive https://github.com/intel-analytics/zoo-tutorials.git cd /path/to/zoo-tutorials git clone https://github.com/databricks/tpcds-kit.git @@ -20,7 +20,7 @@ make OS=LINUX 2. Generate data -``` +```bash cd /path/to/zoo-tutorials cd tpcds-spark/spark-sql-perf sbt "test:runMain com.databricks.spark.sql.perf.tpcds.GenTPCDSData -d -s -l -f parquet" @@ -32,14 +32,14 @@ sbt "test:runMain com.databricks.spark.sql.perf.tpcds.GenTPCDSData -d \ @@ -53,18 +53,19 @@ $SPARK_HOME/bin/spark-submit \ 3. Pull docker image -``` +```bash sudo docker pull intelanalytics/bigdl-ppml-trusted-big-data-ml-python-graphene:2.1.0-SNAPSHOT ``` -4. Prepare SGX keys, make sure keys and tpcds-spark can be accessed on each K8S node +4. Prepare SGX keys (following instructions [here](https://github.com/intel-analytics/BigDL/tree/main/ppml/trusted-big-data-ml/python/docker-graphene#11-prepare-the-keyspassworddataenclave-keypem "here")), make sure keys and tpcds-spark can be accessed on each K8S node 5. Start a bigdl-ppml enabled Spark K8S client container with configured local IP, key, tpc-ds and kuberconfig path -``` -export ENCLAVE_KEY=/root/keys/enclave-key.pem -export DATA_PATH=/root/zoo-tutorials/tpcds-spark -export KEYS_PATH=/root/keys -export KUBERCONFIG_PATH=/root/kuberconfig +```bash +export ENCLAVE_KEY=/YOUR_DIR/keys/enclave-key.pem +export DATA_PATH=/YOUR_DIR/zoo-tutorials/tpcds-spark +export KEYS_PATH=/YOUR_DIR/keys +export SECURE_PASSWORD_PATH=/YOUR_DIR/password +export KUBERCONFIG_PATH=/YOUR_DIR/kuberconfig export LOCAL_IP=$local_ip export DOCKER_IMAGE=intelanalytics/bigdl-ppml-trusted-big-data-ml-python-graphene:2.1.0-SNAPSHOT sudo docker run -itd \ @@ -78,6 +79,7 @@ sudo docker run -itd \ -v $ENCLAVE_KEY:/graphene/Pal/src/host/Linux-SGX/signer/enclave-key.pem \ -v $DATA_PATH:/ppml/trusted-big-data-ml/work/tpcds-spark \ -v $KEYS_PATH:/ppml/trusted-big-data-ml/work/keys \ + -v $SECURE_PASSWORD_PATH:/ppml/trusted-big-data-ml/work/password \ -v $KUBERCONFIG_PATH:/root/.kube/config \ -e RUNTIME_SPARK_MASTER=k8s://https://$LOCAL_IP:6443 \ -e RUNTIME_K8S_SERVICE_ACCOUNT=spark \ @@ -98,13 +100,13 @@ sudo docker run -itd \ 6. Attach to the client container -``` +```bash sudo docker exec -it spark-local-k8s-client bash ``` 7. Modify `spark-executor-template.yaml`, add path of `enclave-key`, `tpcds-spark` and `kuberconfig` on host -``` +```yaml apiVersion: v1 kind: Pod spec: @@ -135,7 +137,8 @@ spec: Optional argument `QUERY` is the query number to run. Multiple query numbers should be separated by space, e.g. `1 2 3`. If no query number is specified, all 1-99 queries would be executed. -``` +```bash +secure_password=`openssl rsautl -inkey /ppml/trusted-big-data-ml/work/password/key.txt -decrypt