[PPML]doc: fix TPCH markdown list number render (#5992)

* Fix list number rendering wrongly

* position of the ellipsis
This commit is contained in:
Shaojie Cui 2022-09-28 11:07:48 +08:00 committed by GitHub
parent a703ae4eba
commit 27587b5db5

View file

@ -8,198 +8,198 @@
### Prepare TPC-H kit and data ### ### Prepare TPC-H kit and data ###
1. Generate data 1. Generate data
Go to [TPC Download](https://www.tpc.org/tpc_documents_current_versions/current_specifications5.asp) site, choose `TPC-H` source code, then download the TPC-H toolkits. **Follow the download instructions carefully.** Go to [TPC Download](https://www.tpc.org/tpc_documents_current_versions/current_specifications5.asp) site, choose `TPC-H` source code, then download the TPC-H toolkits. **Follow the download instructions carefully.**
After you download the tpc-h tools zip and uncompressed the zip file. Go to `dbgen` directory, and create `makefile` based on `makefile.suite`, and modify `makefile` according to the prompts inside, and run `make`. After you download the tpc-h tools zip and uncompressed the zip file. Go to `dbgen` directory, and create `makefile` based on `makefile.suite`, and modify `makefile` according to the prompts inside, and run `make`.
This should generate an executable called `dbgen` This should generate an executable called `dbgen`
``` ```
./dbgen -h ./dbgen -h
``` ```
gives you the various options for generating the tables. The simplest case is running: gives you the various options for generating the tables. The simplest case is running:
``` ```
./dbgen ./dbgen
``` ```
which generates tables with extension `.tbl` with scale 1 (default) for a total of rougly 1GB size across all tables. For different size tables you can use the `-s` option: which generates tables with extension `.tbl` with scale 1 (default) for a total of rougly 1GB size across all tables. For different size tables you can use the `-s` option:
``` ```
./dbgen -s 10 ./dbgen -s 10
``` ```
will generate roughly 10GB of input data. will generate roughly 10GB of input data.
You need to move all .tbl files to a new directory as raw data. You need to move all .tbl files to a new directory as raw data.
You can then either upload your data to remote file system or read them locally. You can then either upload your data to remote file system or read them locally.
2. Encrypt Data 2. Encrypt Data
Encrypt data with specified Key Management Service (`SimpleKeyManagementService`, or `EHSMKeyManagementService` , or `AzureKeyManagementService`). Details can be found here: https://github.com/intel-analytics/BigDL/tree/main/ppml/services/kms-utils/docker Encrypt data with specified Key Management Service (`SimpleKeyManagementService`, or `EHSMKeyManagementService` , or `AzureKeyManagementService`). Details can be found here: https://github.com/intel-analytics/BigDL/tree/main/ppml/services/kms-utils/docker
The example code of encrypt data with `SimpleKeyManagementService` is like below: The example code of encrypt data with `SimpleKeyManagementService` is like below:
``` ```
java -cp "$BIGDL_HOME/jars/bigdl-ppml-spark_3.1.2-2.1.0-SNAPSHOT.jar:$SPARK_HOME/conf/:$SPARK_HOME/jars/*:$BIGDL_HOME/jars/*" \ java -cp "$BIGDL_HOME/jars/bigdl-ppml-spark_3.1.2-2.1.0-SNAPSHOT.jar:$SPARK_HOME/conf/:$SPARK_HOME/jars/*:$BIGDL_HOME/jars/*" \
-Xmx10g \ -Xmx10g \
com.intel.analytics.bigdl.ppml.examples.tpch.EncryptFiles \ com.intel.analytics.bigdl.ppml.examples.tpch.EncryptFiles \
--inputPath xxx/dbgen-input \ --inputPath xxx/dbgen-input \
--outputPath xxx/dbgen-encrypted --outputPath xxx/dbgen-encrypted
--kmsType SimpleKeyManagementService --kmsType SimpleKeyManagementService
--simpleAPPID xxxxxxxxxxxx \ --simpleAPPID xxxxxxxxxxxx \
--simpleAPPKEY xxxxxxxxxxxx \ --simpleAPPKEY xxxxxxxxxxxx \
--primaryKeyPath /path/to/simple_encrypted_primary_key \ --primaryKeyPath /path/to/simple_encrypted_primary_key \
--dataKeyPath /path/to/simple_encrypted_data_key --dataKeyPath /path/to/simple_encrypted_data_key
``` ```
### Deploy PPML TPC-H on Kubernetes ### ### Deploy PPML TPC-H on Kubernetes ###
1. Pull docker image 1. Pull docker image
``` ```
sudo docker pull intelanalytics/bigdl-ppml-trusted-big-data-ml-python-graphene:2.1.0-SNAPSHOT sudo docker pull intelanalytics/bigdl-ppml-trusted-big-data-ml-python-graphene:2.1.0-SNAPSHOT
``` ```
2. Prepare SGX keys (following instructions [here](https://github.com/intel-analytics/BigDL/tree/main/ppml/trusted-big-data-ml/python/docker-graphene#11-prepare-the-keyspassworddataenclave-keypem "here")), make sure keys and tpch-spark can be accessed on each K8S node 2. Prepare SGX keys (following instructions [here](https://github.com/intel-analytics/BigDL/tree/main/ppml/trusted-big-data-ml/python/docker-graphene#11-prepare-the-keyspassworddataenclave-keypem "here")), make sure keys and tpch-spark can be accessed on each K8S node
3. Start a bigdl-ppml enabled Spark K8S client container with configured local IP, key, tpch and kuberconfig path 3. Start a bigdl-ppml enabled Spark K8S client container with configured local IP, key, tpch and kuberconfig path
``` ```
export ENCLAVE_KEY=/path/to/enclave-key.pem export ENCLAVE_KEY=/path/to/enclave-key.pem
export SECURE_PASSWORD_PATH=/path/to/password export SECURE_PASSWORD_PATH=/path/to/password
export DATA_PATH=/path/to/data export DATA_PATH=/path/to/data
export KEYS_PATH=/path/to/keys export KEYS_PATH=/path/to/keys
export KUBERCONFIG_PATH=/path/to/kuberconfig export KUBERCONFIG_PATH=/path/to/kuberconfig
export LOCAL_IP=$local_ip export LOCAL_IP=$local_ip
export DOCKER_IMAGE=intelanalytics/bigdl-ppml-trusted-big-data-ml-python-graphene:2.1.0-SNAPSHOT export DOCKER_IMAGE=intelanalytics/bigdl-ppml-trusted-big-data-ml-python-graphene:2.1.0-SNAPSHOT
sudo docker run -itd \ sudo docker run -itd \
--privileged \ --privileged \
--net=host \ --net=host \
--name=spark-local-k8s-client \ --name=spark-local-k8s-client \
--oom-kill-disable \ --oom-kill-disable \
--device=/dev/sgx/enclave \ --device=/dev/sgx/enclave \
--device=/dev/sgx/provision \ --device=/dev/sgx/provision \
-v /var/run/aesmd/aesm.socket:/var/run/aesmd/aesm.socket \ -v /var/run/aesmd/aesm.socket:/var/run/aesmd/aesm.socket \
-v $SECURE_PASSWORD_PATH:/ppml/trusted-big-data-ml/work/password \ -v $SECURE_PASSWORD_PATH:/ppml/trusted-big-data-ml/work/password \
-v $ENCLAVE_KEY:/graphene/Pal/src/host/Linux-SGX/signer/enclave-key.pem \ -v $ENCLAVE_KEY:/graphene/Pal/src/host/Linux-SGX/signer/enclave-key.pem \
-v $DATA_PATH:/ppml/trusted-big-data-ml/work/data \ -v $DATA_PATH:/ppml/trusted-big-data-ml/work/data \
-v $KEYS_PATH:/ppml/trusted-big-data-ml/work/keys \ -v $KEYS_PATH:/ppml/trusted-big-data-ml/work/keys \
-v $KUBERCONFIG_PATH:/root/.kube/config \ -v $KUBERCONFIG_PATH:/root/.kube/config \
-e RUNTIME_SPARK_MASTER=k8s://https://$LOCAL_IP:6443 \ -e RUNTIME_SPARK_MASTER=k8s://https://$LOCAL_IP:6443 \
-e RUNTIME_K8S_SERVICE_ACCOUNT=spark \ -e RUNTIME_K8S_SERVICE_ACCOUNT=spark \
-e RUNTIME_K8S_SPARK_IMAGE=$DOCKER_IMAGE \ -e RUNTIME_K8S_SPARK_IMAGE=$DOCKER_IMAGE \
-e RUNTIME_DRIVER_HOST=$LOCAL_IP \ -e RUNTIME_DRIVER_HOST=$LOCAL_IP \
-e RUNTIME_DRIVER_PORT=54321 \ -e RUNTIME_DRIVER_PORT=54321 \
-e RUNTIME_EXECUTOR_INSTANCES=1 \ -e RUNTIME_EXECUTOR_INSTANCES=1 \
-e RUNTIME_EXECUTOR_CORES=4 \ -e RUNTIME_EXECUTOR_CORES=4 \
-e RUNTIME_EXECUTOR_MEMORY=20g \ -e RUNTIME_EXECUTOR_MEMORY=20g \
-e RUNTIME_TOTAL_EXECUTOR_CORES=4 \ -e RUNTIME_TOTAL_EXECUTOR_CORES=4 \
-e RUNTIME_DRIVER_CORES=4 \ -e RUNTIME_DRIVER_CORES=4 \
-e RUNTIME_DRIVER_MEMORY=10g \ -e RUNTIME_DRIVER_MEMORY=10g \
-e SGX_MEM_SIZE=64G \ -e SGX_MEM_SIZE=64G \
-e SGX_LOG_LEVEL=error \ -e SGX_LOG_LEVEL=error \
-e LOCAL_IP=$LOCAL_IP \ -e LOCAL_IP=$LOCAL_IP \
$DOCKER_IMAGE bash $DOCKER_IMAGE bash
``` ```
4. Attach to the client container 4. Attach to the client container
``` ```
sudo docker exec -it spark-local-k8s-client bash sudo docker exec -it spark-local-k8s-client bash
``` ```
5. Modify `spark-executor-template.yaml`, add path of `enclave-key`, `tpch-spark` and `kuberconfig` on host 5. Modify `spark-executor-template.yaml`, add path of `enclave-key`, `tpch-spark` and `kuberconfig` on host
``` ```
apiVersion: v1 apiVersion: v1
kind: Pod kind: Pod
spec: spec:
containers: containers:
- name: spark-executor - name: spark-executor
securityContext: securityContext:
privileged: true privileged: true
volumeMounts: volumeMounts:
... ...
- name: tpch
mountPath: /ppml/trusted-big-data-ml/work/tpch-spark
- name: kubeconf
mountPath: /root/.kube/config
volumes:
- name: enclave-key
hostPath:
path: /root/keys/enclave-key.pem
...
- name: tpch - name: tpch
mountPath: /ppml/trusted-big-data-ml/work/tpch-spark hostPath:
path: /path/to/tpch-spark
- name: kubeconf - name: kubeconf
mountPath: /root/.kube/config hostPath:
volumes: path: /path/to/kuberconfig
- name: enclave-key ```
hostPath:
path: /root/keys/enclave-key.pem
...
- name: tpch
hostPath:
path: /path/to/tpch-spark
- name: kubeconf
hostPath:
path: /path/to/kuberconfig
```
6. Run PPML TPC-H 6. Run PPML TPC-H
```bash ```bash
secure_password=`openssl rsautl -inkey /ppml/trusted-big-data-ml/work/password/key.txt -decrypt </ppml/trusted-big-data-ml/work/password/output.bin` && \ secure_password=`openssl rsautl -inkey /ppml/trusted-big-data-ml/work/password/key.txt -decrypt </ppml/trusted-big-data-ml/work/password/output.bin` && \
export TF_MKL_ALLOC_MAX_BYTES=10737418240 && \ export TF_MKL_ALLOC_MAX_BYTES=10737418240 && \
export SPARK_LOCAL_IP=$LOCAL_IP && \ export SPARK_LOCAL_IP=$LOCAL_IP && \
export INPUT_DIR=xxx/dbgen-encrypted && \ export INPUT_DIR=xxx/dbgen-encrypted && \
export OUTPUT_DIR=xxx/dbgen-output && \ export OUTPUT_DIR=xxx/dbgen-output && \
/opt/jdk8/bin/java \ /opt/jdk8/bin/java \
-cp '/ppml/trusted-big-data-ml/work/bigdl-2.1.0-SNAPSHOT/lib/bigdl-ppml-spark_3.1.2-2.1.0-SNAPSHOT-jar-with-dependencies.jar:/ppml/trusted-big-data-ml/work/spark-3.1.2/conf/:/ppml/trusted-big-data-ml/work/spark-3.1.2/jars/*' \ -cp '/ppml/trusted-big-data-ml/work/bigdl-2.1.0-SNAPSHOT/lib/bigdl-ppml-spark_3.1.2-2.1.0-SNAPSHOT-jar-with-dependencies.jar:/ppml/trusted-big-data-ml/work/spark-3.1.2/conf/:/ppml/trusted-big-data-ml/work/spark-3.1.2/jars/*' \
-Xmx10g \ -Xmx10g \
-Dbigdl.mklNumThreads=1 \ -Dbigdl.mklNumThreads=1 \
org.apache.spark.deploy.SparkSubmit \ org.apache.spark.deploy.SparkSubmit \
--master $RUNTIME_SPARK_MASTER \ --master $RUNTIME_SPARK_MASTER \
--deploy-mode client \ --deploy-mode client \
--name spark-tpch-sgx \ --name spark-tpch-sgx \
--conf spark.driver.host=$LOCAL_IP \ --conf spark.driver.host=$LOCAL_IP \
--conf spark.driver.port=54321 \ --conf spark.driver.port=54321 \
--conf spark.driver.memory=10g \ --conf spark.driver.memory=10g \
--conf spark.driver.blockManager.port=10026 \ --conf spark.driver.blockManager.port=10026 \
--conf spark.blockManager.port=10025 \ --conf spark.blockManager.port=10025 \
--conf spark.scheduler.maxRegisteredResourcesWaitingTime=5000000 \ --conf spark.scheduler.maxRegisteredResourcesWaitingTime=5000000 \
--conf spark.worker.timeout=600 \ --conf spark.worker.timeout=600 \
--conf spark.python.use.daemon=false \ --conf spark.python.use.daemon=false \
--conf spark.python.worker.reuse=false \ --conf spark.python.worker.reuse=false \
--conf spark.network.timeout=10000000 \ --conf spark.network.timeout=10000000 \
--conf spark.starvation.timeout=250000 \ --conf spark.starvation.timeout=250000 \
--conf spark.rpc.askTimeout=600 \ --conf spark.rpc.askTimeout=600 \
--conf spark.sql.autoBroadcastJoinThreshold=-1 \ --conf spark.sql.autoBroadcastJoinThreshold=-1 \
--conf spark.io.compression.codec=lz4 \ --conf spark.io.compression.codec=lz4 \
--conf spark.sql.shuffle.partitions=8 \ --conf spark.sql.shuffle.partitions=8 \
--conf spark.speculation=false \ --conf spark.speculation=false \
--conf spark.executor.heartbeatInterval=10000000 \ --conf spark.executor.heartbeatInterval=10000000 \
--conf spark.executor.instances=24 \ --conf spark.executor.instances=24 \
--executor-cores 8 \ --executor-cores 8 \
--total-executor-cores 192 \ --total-executor-cores 192 \
--executor-memory 16G \ --executor-memory 16G \
--properties-file /ppml/trusted-big-data-ml/work/bigdl-2.1.0-SNAPSHOT/conf/spark-bigdl.conf \ --properties-file /ppml/trusted-big-data-ml/work/bigdl-2.1.0-SNAPSHOT/conf/spark-bigdl.conf \
--conf spark.kubernetes.authenticate.serviceAccountName=spark \ --conf spark.kubernetes.authenticate.serviceAccountName=spark \
--conf spark.kubernetes.container.image=$RUNTIME_K8S_SPARK_IMAGE \ --conf spark.kubernetes.container.image=$RUNTIME_K8S_SPARK_IMAGE \
--conf spark.kubernetes.executor.podTemplateFile=/ppml/trusted-big-data-ml/spark-executor-template.yaml \ --conf spark.kubernetes.executor.podTemplateFile=/ppml/trusted-big-data-ml/spark-executor-template.yaml \
--conf spark.kubernetes.executor.deleteOnTermination=false \ --conf spark.kubernetes.executor.deleteOnTermination=false \
--conf spark.kubernetes.executor.podNamePrefix=spark-tpch-sgx \ --conf spark.kubernetes.executor.podNamePrefix=spark-tpch-sgx \
--conf spark.kubernetes.sgx.enabled=true \ --conf spark.kubernetes.sgx.enabled=true \
--conf spark.kubernetes.sgx.executor.mem=32g \ --conf spark.kubernetes.sgx.executor.mem=32g \
--conf spark.kubernetes.sgx.executor.jvm.mem=10g \ --conf spark.kubernetes.sgx.executor.jvm.mem=10g \
--conf spark.kubernetes.sgx.log.level=$SGX_LOG_LEVEL \ --conf spark.kubernetes.sgx.log.level=$SGX_LOG_LEVEL \
--conf spark.authenticate=true \ --conf spark.authenticate=true \
--conf spark.authenticate.secret=$secure_password \ --conf spark.authenticate.secret=$secure_password \
--conf spark.kubernetes.executor.secretKeyRef.SPARK_AUTHENTICATE_SECRET="spark-secret:secret" \ --conf spark.kubernetes.executor.secretKeyRef.SPARK_AUTHENTICATE_SECRET="spark-secret:secret" \
--conf spark.kubernetes.driver.secretKeyRef.SPARK_AUTHENTICATE_SECRET="spark-secret:secret" \ --conf spark.kubernetes.driver.secretKeyRef.SPARK_AUTHENTICATE_SECRET="spark-secret:secret" \
--conf spark.authenticate.enableSaslEncryption=true \ --conf spark.authenticate.enableSaslEncryption=true \
--conf spark.network.crypto.enabled=true \ --conf spark.network.crypto.enabled=true \
--conf spark.network.crypto.keyLength=128 \ --conf spark.network.crypto.keyLength=128 \
--conf spark.network.crypto.keyFactoryAlgorithm=PBKDF2WithHmacSHA1 \ --conf spark.network.crypto.keyFactoryAlgorithm=PBKDF2WithHmacSHA1 \
--conf spark.io.encryption.enabled=true \ --conf spark.io.encryption.enabled=true \
--conf spark.io.encryption.keySizeBits=128 \ --conf spark.io.encryption.keySizeBits=128 \
--conf spark.io.encryption.keygen.algorithm=HmacSHA1 \ --conf spark.io.encryption.keygen.algorithm=HmacSHA1 \
--conf spark.ssl.enabled=true \ --conf spark.ssl.enabled=true \
--conf spark.ssl.port=8043 \ --conf spark.ssl.port=8043 \
--conf spark.ssl.keyPassword=$secure_password \ --conf spark.ssl.keyPassword=$secure_password \
--conf spark.ssl.keyStore=/ppml/trusted-big-data-ml/work/keys/keystore.jks \ --conf spark.ssl.keyStore=/ppml/trusted-big-data-ml/work/keys/keystore.jks \
--conf spark.ssl.keyStorePassword=$secure_password \ --conf spark.ssl.keyStorePassword=$secure_password \
--conf spark.ssl.keyStoreType=JKS \ --conf spark.ssl.keyStoreType=JKS \
--conf spark.ssl.trustStore=/ppml/trusted-big-data-ml/work/keys/keystore.jks \ --conf spark.ssl.trustStore=/ppml/trusted-big-data-ml/work/keys/keystore.jks \
--conf spark.ssl.trustStorePassword=$secure_password \ --conf spark.ssl.trustStorePassword=$secure_password \
--conf spark.ssl.trustStoreType=JKS \ --conf spark.ssl.trustStoreType=JKS \
--conf spark.bigdl.kms.type=SimpleKeyManagementService \ --conf spark.bigdl.kms.type=SimpleKeyManagementService \
--conf spark.bigdl.kms.simple.id=simpleAPPID \ --conf spark.bigdl.kms.simple.id=simpleAPPID \
--conf spark.bigdl.kms.simple.key=simpleAPIKEY \ --conf spark.bigdl.kms.simple.key=simpleAPIKEY \
--conf spark.bigdl.kms.key.primary=xxxx/primaryKey \ --conf spark.bigdl.kms.key.primary=xxxx/primaryKey \
--conf spark.bigdl.kms.key.data=xxxx/dataKey \ --conf spark.bigdl.kms.key.data=xxxx/dataKey \
--class com.intel.analytics.bigdl.ppml.examples.tpch.TpchQuery \ --class com.intel.analytics.bigdl.ppml.examples.tpch.TpchQuery \
--verbose \ --verbose \
/ppml/trusted-big-data-ml/work/bigdl-2.1.0-SNAPSHOT/lib/bigdl-ppml-spark_3.1.2-2.1.0-SNAPSHOT-jar-with-dependencies.jar \ /ppml/trusted-big-data-ml/work/bigdl-2.1.0-SNAPSHOT/lib/bigdl-ppml-spark_3.1.2-2.1.0-SNAPSHOT-jar-with-dependencies.jar \
$INPUT_DIR $OUTPUT_DIR aes/cbc/pkcs5padding plain_text [QUERY] $INPUT_DIR $OUTPUT_DIR aes/cbc/pkcs5padding plain_text [QUERY]
``` ```
The optional parameter [QUERY] is the number of the query to run e.g 1, 2, ..., 22. The optional parameter [QUERY] is the number of the query to run e.g 1, 2, ..., 22.
The result is in OUTPUT_DIR. There should be a file called TIMES.TXT with content formatted like: The result is in OUTPUT_DIR. There should be a file called TIMES.TXT with content formatted like:
>Q01 39.80204010 >Q01 39.80204010