[PPML] Refine PPML readthedocs (#3355)

* Link ppml docs to readthedocs page
* Copy trusted_fl, build_kernel from narwhal
* Copy trusted_big_bata_analytics_and_ml.md from zoo
This commit is contained in:
Qiyuan Gong 2021-11-01 12:54:46 +08:00 committed by GitHub
parent ac96d9a35a
commit 8b610482b7
5 changed files with 138 additions and 0 deletions

View file

@ -0,0 +1,30 @@
# Trusted Big Data Analytics and ML
Artificial intelligence on big data is increasingly important to many real-world applications. Many machine learning and data analytics applications are benefiting from the private data in different domains. Most of these applications leverage the private data to offer certain valuable services to the users. But the private data could be repurposed to infer sensitive information, which would jeopardize the privacy of individuals. Privacy-Preserving Machine Learning (PPML) helps address these risks. Using techniques such as cryptography differential privacy, and hardware technologies, PPML aims to protect the privacy of sensitive user data and of the trained model as it performs ML tasks.
BigDL helps to build PPML applications (including big data analytics, machine learning, and cluster serving etc) on top of Intel® SGX Software Guard Extensions (Intel® SGX) and library OSes such as Graphene and Occlum. In the current release, two types of trusted Big Data AI applications are supported:
1. Big Data analytics and ML/DL (supporting [Apache Spark](https://spark.apache.org/) and [BigDL](https://github.com/intel-analytics/BigDL))
2. Realtime compute and ML/DL (supporting [Apache Flink](https://flink.apache.org/) and BigDL [Cluster Serving](https://www.usenix.org/conference/opml20/presentation/song))
## [1. Trusted Big Data ML](https://github.com/intel-analytics/BigDL/tree/branch-2.0/ppml/trusted-big-data-ml)
With the trusted Big Data analytics and ML/DL support, users can run standard Spark data analysis (such as Spark SQL, Dataframe, MLlib, etc.) and distributed deep learning (using BigDL) in a secure and trusted fashion.
## [2. Trusted Real Time ML](https://github.com/intel-analytics/BigDL/tree/branch-2.0/ppml/trusted-realtime-ml/scala)
With the trusted realtime compute and ML/DL support, users can run standard Flink stream processing and distributed DL model inference (using Cluster Serving) in a secure and trusted fashion.
## 3. Intel SGX and LibOS
### [Intel® SGX](https://software.intel.com/content/www/us/en/develop/topics/software-guard-extensions.html)
Intel® SGX runs on Intels Trusted Execution Environment (TEE), offering hardware-based memory encryption that isolates specific application code and data in memory. Intel® SGX enables user-level code to allocate private regions of memory, called enclaves, which are designed to be protected from processes running at higher privilege levels.
### [Graphene-SGX](https://github.com/oscarlab/graphene)
Graphene is a lightweight guest OS, designed to run a single application with minimal host requirements. Graphene can run applications in an isolated environment with benefits comparable to running a complete OS in a virtual machine -- including guest customization, ease of porting to different OSes, and process migration. Graphene supports native, unmodified Linux applications on any platform. Currently, Graphene runs on Linux and Intel SGX enclaves on Linux platforms. With Intel SGX support, Graphene can secure a critical application in a hardware-encrypted memory region. Graphene can protect applications from a malicious system stack with minimal porting effort.
### [Occlum](https://github.com/occlum/occlum)
Occlum is a memory-safe, multi-process library OS (LibOS) for Intel SGX. As a LibOS, it enables legacy applications to run on SGX with little or even no modifications of source code, thus protecting the confidentiality and integrity of user workloads transparently.

View file

@ -0,0 +1,36 @@
# Trusted FL (Federated Learning)
SGX-based End-to-end Trusted FL platform
## ID & Feature align
Before we start Federated Learning, we need to align ID & Feature, and figure out portions of local data that will participate in later training stage.
Let RID1 and RID2 be randomized ID from party 1 and party 2.
## Vertical FL
Vertical FL training across multi-parties with different features.
Key features:
* FL Server in SGX
* ID & feature align
* Forward & backward aggregation
* Training node in SGX
## Horizontal FL
Horizontal FL training across multi-parties.
Key features:
* FL Server in SGX
* ID & feature align (optional)
* Weight/Gradient Aggregation in SGX
* Training Worker in SGX
## References
1. [Intel SGX](https://software.intel.com/content/www/us/en/develop/topics/software-guard-extensions.html)
2. Qiang Yang, Yang Liu, Tianjian Chen, and Yongxin Tong. 2019. Federated Machine Learning: Concept and Applications. ACM Trans. Intell. Syst. Technol. 10, 2, Article 12 (February 2019), 19 pages. DOI:https://doi.org/10.1145/3298981

View file

@ -0,0 +1,68 @@
# Building Linux Kernel from Source with SGX Enabled
SGX driver is merged to Linux Kernel from 5.11. After enable SGX feature during kernel building, we don't have to install SGX driver anymore.
In this guide, we show how to build Kernel 5.13 from souce and enable SGX feature on Ubuntu 18.04.
## Prerequisite
Install prerequites for kernel build. Please follow your distro instruction or your favorite way to build kernel.
```
sudo apt-get install flex bison git build-essential kernel-package fakeroot libncurses5-dev libssl-dev ccache
```
## Main steps
Clone Linux Kernel source code.
```
# Obtain Linux kernel source tree
mkdir kernel && cd kernel
git clone https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
cd linux
# You can change this version
git checkout v5.13
```
Build Kernel from source code with SGX enabled.
```
cp /boot/config-`uname -r` .config
yes '' | make oldconfig
# Enable SGX and SGX KVM
/bin/sed -i 's/^# CONFIG_X86_SGX is not set/CONFIG_X86_SGX=y/g' .config
echo 'CONFIG_X86_SGX_KVM=y' >> .config
make -j `getconf _NPROCESSORS_ONLN` deb-pkg
```
Install kernel from deb and reboot
```
cd ..
sudo dpkg -i linux-headers-5.13.0_5.13.0-1_amd64.deb linux-image-5.13.0_5.13.0-1_amd64.deb
sudo reboot
```
Check if kernel was installed correctly and the SGX driver is working
```bash
$ uname -r
$ ls -l /dev/ | grep sgx
```
## Uninstall this kernel
Uninstall kernel with dpkg (if you want to change back to previous kernel)
```bash
sudo dpkg --purge linux-image-5.13.0 linux-headers-5.13.0
sudo reboot
```
### Trouble Shooting
* Building on Ubuntu 5.4.X may encounter "make[2]: *** No rule to make target 'debian/certs/benh@debian.org.cert.pem', needed by 'certs/x509_certificate_list'. Stop.". Pls refer to [CONFIG_SYSTEM_TRUSTED_KEYS](https://askubuntu.com/questions/1329538/compiling-the-kernel-5-11-11).
* In some kernels, SGX option is `CONFIG_INTEL_SGX`.

View file

@ -79,6 +79,10 @@ BigDL Documentation
:caption: PPML Overview :caption: PPML Overview
doc/PPML/Overview/ppml.md doc/PPML/Overview/ppml.md
doc/PPML/Overview/trusted_big_bata_analytics_and_ml.md
doc/PPML/Overview/trusted_fl.md
doc/PPML/QuickStart/build_kernel_with_sgx.md
doc/PPML/QuickStart/trusted-serving-on-k8s-guide.md
.. toctree:: .. toctree::
:maxdepth: 1 :maxdepth: 1