[PPML] Refine PPML readthedocs (#3355)

* Link ppml docs to readthedocs page * Copy trusted_fl, build_kernel from narwhal * Copy trusted_big_bata_analytics_and_ml.md from zoo
2021-11-01 12:54:46 +08:00 · 2021-11-01 12:54:46 +08:00 · 8b610482b7
commit 8b610482b7
parent ac96d9a35a
5 changed files with 138 additions and 0 deletions
--- a/docs/readthedocs/source/doc/PPML/Overview/trusted_big_bata_analytics_and_ml.md
+++ b/docs/readthedocs/source/doc/PPML/Overview/trusted_big_bata_analytics_and_ml.md
@ -0,0 +1,30 @@
+# Trusted Big Data Analytics and ML
+
+Artificial intelligence on big data is increasingly important to many real-world applications. Many machine learning and data analytics applications are benefiting from the private data in different domains. Most of these applications leverage the private data to offer certain valuable services to the users. But the private data could be repurposed to infer sensitive information, which would jeopardize the privacy of individuals. Privacy-Preserving Machine Learning (PPML) helps address these risks. Using techniques such as cryptography differential privacy, and hardware technologies, PPML aims to protect the privacy of sensitive user data and of the trained model as it performs ML tasks.
+
+BigDL helps to build PPML applications (including big data analytics, machine learning, and cluster serving etc) on top of Intel® SGX Software Guard Extensions (Intel® SGX) and library OSes such as Graphene and Occlum. In the current release, two types of trusted Big Data AI applications are supported:
+
+1. Big Data analytics and ML/DL (supporting [Apache Spark](https://spark.apache.org/) and [BigDL](https://github.com/intel-analytics/BigDL))
+2. Realtime compute and ML/DL (supporting [Apache Flink](https://flink.apache.org/) and BigDL [Cluster Serving](https://www.usenix.org/conference/opml20/presentation/song))
+
+## [1. Trusted Big Data ML](https://github.com/intel-analytics/BigDL/tree/branch-2.0/ppml/trusted-big-data-ml)
+
+With the trusted Big Data analytics and ML/DL support, users can run standard Spark data analysis (such as Spark SQL, Dataframe, MLlib, etc.) and distributed deep learning (using BigDL) in a secure and trusted fashion.
+
+## [2. Trusted Real Time ML](https://github.com/intel-analytics/BigDL/tree/branch-2.0/ppml/trusted-realtime-ml/scala)
+
+With the trusted realtime compute and ML/DL support, users can run standard Flink stream processing and distributed DL model inference (using Cluster Serving) in a secure and trusted fashion.
+
+## 3. Intel SGX and LibOS
+
+### [Intel® SGX](https://software.intel.com/content/www/us/en/develop/topics/software-guard-extensions.html)
+
+Intel® SGX runs on Intel’s Trusted Execution Environment (TEE), offering hardware-based memory encryption that isolates specific application code and data in memory. Intel® SGX enables user-level code to allocate private regions of memory, called enclaves, which are designed to be protected from processes running at higher privilege levels.
+
+### [Graphene-SGX](https://github.com/oscarlab/graphene)
+
+Graphene is a lightweight guest OS, designed to run a single application with minimal host requirements. Graphene can run applications in an isolated environment with benefits comparable to running a complete OS in a virtual machine -- including guest customization, ease of porting to different OSes, and process migration. Graphene supports native, unmodified Linux applications on any platform. Currently, Graphene runs on Linux and Intel SGX enclaves on Linux platforms. With Intel SGX support, Graphene can secure a critical application in a hardware-encrypted memory region. Graphene can protect applications from a malicious system stack with minimal porting effort.
+
+### [Occlum](https://github.com/occlum/occlum)
+
+Occlum is a memory-safe, multi-process library OS (LibOS) for Intel SGX. As a LibOS, it enables legacy applications to run on SGX with little or even no modifications of source code, thus protecting the confidentiality and integrity of user workloads transparently.
--- a/docs/readthedocs/source/doc/PPML/Overview/trusted_fl.md
+++ b/docs/readthedocs/source/doc/PPML/Overview/trusted_fl.md
@ -0,0 +1,36 @@
+# Trusted FL (Federated Learning)
+
+SGX-based End-to-end Trusted FL platform
+
+## ID & Feature align
+
+Before we start Federated Learning, we need to align ID & Feature, and figure out portions of local data that will participate in later training stage.
+
+Let RID1 and RID2 be randomized ID from party 1 and party 2.
+
+## Vertical FL
+
+Vertical FL training across multi-parties with different features.
+
+Key features:
+
+* FL Server in SGX
+    * ID & feature align
+    * Forward & backward aggregation
+* Training node in SGX
+
+## Horizontal FL
+
+Horizontal FL training across multi-parties.
+
+Key features:
+
+* FL Server in SGX
+   * ID & feature align (optional)
+   * Weight/Gradient Aggregation in SGX
+* Training Worker in SGX
+
+## References
+
+1. [Intel SGX](https://software.intel.com/content/www/us/en/develop/topics/software-guard-extensions.html)
+2. Qiang Yang, Yang Liu, Tianjian Chen, and Yongxin Tong. 2019. Federated Machine Learning: Concept and Applications. ACM Trans. Intell. Syst. Technol. 10, 2, Article 12 (February 2019), 19 pages. DOI:https://doi.org/10.1145/3298981
--- a/docs/readthedocs/source/doc/PPML/QuickStart/build_kernel_with_sgx.md
+++ b/docs/readthedocs/source/doc/PPML/QuickStart/build_kernel_with_sgx.md
@ -0,0 +1,68 @@
+# Building Linux Kernel from Source with SGX Enabled
+
+SGX driver is merged to Linux Kernel from 5.11. After enable SGX feature during kernel building, we don't have to install SGX driver anymore.
+
+In this guide, we show how to build Kernel 5.13 from souce and enable SGX feature on Ubuntu 18.04.
+
+
+## Prerequisite
+
+Install prerequites for kernel build. Please follow your distro instruction or your favorite way to build kernel.
+
+```
+sudo apt-get install flex bison git build-essential kernel-package fakeroot libncurses5-dev libssl-dev ccache
+
+```
+
+## Main steps
+
+Clone Linux Kernel source code.
+
+```
+# Obtain Linux kernel source tree
+mkdir kernel && cd kernel
+git clone https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
+cd linux
+# You can change this version
+git checkout v5.13
+```
+
+Build Kernel from source code with SGX enabled.
+
+```
+cp /boot/config-`uname -r` .config
+yes '' | make oldconfig
+# Enable SGX and SGX KVM
+/bin/sed -i 's/^# CONFIG_X86_SGX is not set/CONFIG_X86_SGX=y/g' .config
+echo 'CONFIG_X86_SGX_KVM=y' >> .config
+make -j `getconf _NPROCESSORS_ONLN` deb-pkg
+```
+
+Install kernel from deb and reboot
+
+```
+cd ..
+sudo dpkg -i linux-headers-5.13.0_5.13.0-1_amd64.deb linux-image-5.13.0_5.13.0-1_amd64.deb
+sudo reboot
+```
+
+Check if kernel was installed correctly and the SGX driver is working
+
+```bash
+$ uname -r
+$ ls -l /dev/ | grep sgx
+```
+
+## Uninstall this kernel
+
+Uninstall kernel with dpkg (if you want to change back to previous kernel)
+
+```bash
+sudo dpkg --purge linux-image-5.13.0 linux-headers-5.13.0
+sudo reboot
+```
+
+### Trouble Shooting
+
+* Building on Ubuntu 5.4.X may encounter "make[2]: *** No rule to make target 'debian/certs/benh@debian.org.cert.pem', needed by 'certs/x509_certificate_list'.  Stop.". Pls refer to [CONFIG_SYSTEM_TRUSTED_KEYS](https://askubuntu.com/questions/1329538/compiling-the-kernel-5-11-11).
+* In some kernels, SGX option is `CONFIG_INTEL_SGX`.
--- a/docs/readthedocs/source/doc/PPML/QuickStart/trusted-serving-on-k8s-guide.md
+++ b/docs/readthedocs/source/doc/PPML/QuickStart/trusted-serving-on-k8s-guide.md
--- a/docs/readthedocs/source/index.rst
+++ b/docs/readthedocs/source/index.rst
@ -79,6 +79,10 @@ BigDL Documentation
   :caption: PPML Overview

   doc/PPML/Overview/ppml.md
+   doc/PPML/Overview/trusted_big_bata_analytics_and_ml.md
+   doc/PPML/Overview/trusted_fl.md
+   doc/PPML/QuickStart/build_kernel_with_sgx.md
+   doc/PPML/QuickStart/trusted-serving-on-k8s-guide.md

 .. toctree::
   :maxdepth: 1