* refactor toc * refactor toc * Change to pydata-sphinx-theme and update packages requirement list for ReadtheDocs * Remove customized css for old theme * Add index page to each top bar section and limit dropdown maximum to be 4 * Use js to change 'More' to 'Libraries' * Add custom.css to conf.py for further css changes * Add BigDL logo and search bar * refactor toc * refactor toc and add overview * refactor toc and add overview * refactor toc and add overview * refactor get started * add paper and video section * add videos * add grid columns in landing page * add document roadmap to index * reapply search bar and github icon commit * reorg orca and chronos sections * Test: weaken ads by js * update: change left attrbute * update: add comments * update: change opacity to 0.7 * Remove useless theme template override for old theme * Add sidebar releases component in the home page * Remove sidebar search and restore top nav search button * Add BigDL handouts * Add back to homepage button to pages except from the home page * Update releases contents & styles in left sidebar * Add version badge to the top bar * Test: weaken ads by js * update: add comments * remove landing page contents * rfix chronos install * refactor install * refactor chronos section titles * refactor nano index * change chronos landing * revise chronos landing page * add document navigator to nano landing page * revise install landing page * Improve css of versions in sidebar * Make handouts image pointing to a page in new tab * add win guide to install * add dliib installation * revise title bar * rename index files * add index page for user guide * add dllib and orca API * update user guide landing page * refactor side bar * Remove extra style configuration of card components & make different card usage consistent * Remove extra styles for Nano how-to guides * Remove extra styles for Chronos how-to guides * Remove dark mode for now * Update index page description * Add decision tree for choosing BigDL libraries in index page * add dllib models api, revise core layers formats * Change primary & info color in light mode * Restyle card components * Restructure Chronos landing page * Update card style * Update BigDL library selection decision tree * Fix failed Chronos tutorials filter * refactor PPML documents * refactor and add friesian documents * add friesian arch diagram * update landing pages and fill key features guide index page * Restyle link card component * Style video frames in PPML sections * Adjust Nano landing page * put api docs to the last in index for convinience * Make badge horizontal padding smaller & small changes * Change the second letter of all header titles to be small capitalizd * Small changes on Chronos index page * Revise decision tree to make it smaller * Update: try to change the position of ads. * Bugfix: deleted nonexist file config * Update: update ad JS/CSS/config * Update: change ad. * Update: delete my template and change files. * Update: change chronos installation table color. * Update: change table font color to --pst-color-primary-text * Remove old contents in landing page sidebar * Restyle badge for usage in card footer again * Add quicklinks template on landing page sidebar * add quick links * Add scala logo * move tf, pytorch out of the link * change orca key features cards * fix typo * fix a mistake in wording * Restyle badge for card footer * Update decision tree * Remove useless html templates * add more api docs and update tutorials in dllib * update chronos install using new style * merge changes in nano doc from master * fix quickstart links in sidebar quicklinks * Make tables responsive * Fix overflow in api doc * Fix list indents problems in [User guide] section * Further fixes to nested bullets contents in [User Guide] section * Fix strange title in Nano 5-min doc * Fix list indent problems in [DLlib] section * Fix misnumbered list problems and other small fixes for [Chronos] section * Fix list indent problems and other small fixes for [Friesian] section * Fix list indent problem and other small fixes for [PPML] section * Fix list indent problem for developer guide * Fix list indent problem for [Cluster Serving] section * fix dllib links * Fix wrong relative link in section landing page Co-authored-by: Yuwen Hu <yuwen.hu@intel.com> Co-authored-by: Juntao Luo <1072087358@qq.com>
		
			
				
	
	
	
	
		
			14 KiB
		
	
	
	
	
	
	
	
			
		
		
	
	Develop your own Big Data & AI applications with BigDL PPML
First you need to create a PPMLContext, which wraps SparkSession and provides methods to read encrypted data file into plain-text RDD/DataFrame and write DataFrame to encrypted data file. Then you can read & write data through PPMLContext.
If you are familiar with Spark, you may find that the usage of PPMLConext is very similar to Spark.
1. Create PPMLContext
- 
create a PPMLContext with
appNameThis is the simplest way to create a
PPMLContext. When you don't need to read/write encrypted files, you can use this way to create aPPMLContext.scala
import com.intel.analytics.bigdl.ppml.PPMLContext val sc = PPMLContext.initPPMLContext("MyApp")python
from bigdl.ppml.ppml_context import * sc = PPMLContext("MyApp")If you want to read/write encrypted files, then you need to provide more information.
 - 
create a PPMLContext with
appName&ppmlArgsppmlArgsis ppml arguments in a Map,ppmlArgsvaries according to the kind of Key Management Service (KMS) you are using. Key Management Service (KMS) is used to generateprimaryKeyanddataKeyto encrypt/decrypt data. We provide 3 types of KMS ——SimpleKeyManagementService, EHSMKeyManagementService, AzureKeyManagementService.Refer to KMS Utils to use KMS to generate
primaryKeyanddataKey, then you are ready to create PPMLContext withppmlArgs.- 
For
SimpleKeyManagementService:scala
import com.intel.analytics.bigdl.ppml.PPMLContext val ppmlArgs: Map[String, String] = Map( "spark.bigdl.kms.type" -> "SimpleKeyManagementService", "spark.bigdl.kms.simple.id" -> "your_app_id", "spark.bigdl.kms.simple.key" -> "your_app_key", "spark.bigdl.kms.key.primary" -> "/your/primary/key/path/primaryKey", "spark.bigdl.kms.key.data" -> "/your/data/key/path/dataKey" ) val sc = PPMLContext.initPPMLContext("MyApp", ppmlArgs)python
from bigdl.ppml.ppml_context import * ppml_args = {"kms_type": "SimpleKeyManagementService", "simple_app_id": "your_app_id", "simple_app_key": "your_app_key", "primary_key_path": "/your/primary/key/path/primaryKey", "data_key_path": "/your/data/key/path/dataKey" } sc = PPMLContext("MyApp", ppml_args) - 
For
EHSMKeyManagementService:scala
import com.intel.analytics.bigdl.ppml.PPMLContext val ppmlArgs: Map[String, String] = Map( "spark.bigdl.kms.type" -> "EHSMKeyManagementService", "spark.bigdl.kms.ehs.ip" -> "your_server_ip", "spark.bigdl.kms.ehs.port" -> "your_server_port", "spark.bigdl.kms.ehs.id" -> "your_app_id", "spark.bigdl.kms.ehs.key" -> "your_app_key", "spark.bigdl.kms.key.primary" -> "/your/primary/key/path/primaryKey", "spark.bigdl.kms.key.data" -> "/your/data/key/path/dataKey" ) val sc = PPMLContext.initPPMLContext("MyApp", ppmlArgs)python
from bigdl.ppml.ppml_context import * ppml_args = {"kms_type": "EHSMKeyManagementService", "kms_server_ip": "your_server_ip", "kms_server_port": "your_server_port" "ehsm_app_id": "your_app_id", "ehsm_app_key": "your_app_key", "primary_key_path": "/your/primary/key/path/primaryKey", "data_key_path": "/your/data/key/path/dataKey" } sc = PPMLContext("MyApp", ppml_args) - 
For
AzureKeyManagementServicethe parameter
clientIdis not necessary, you don't have to provide this parameter.scala
import com.intel.analytics.bigdl.ppml.PPMLContext val ppmlArgs: Map[String, String] = Map( "spark.bigdl.kms.type" -> "AzureKeyManagementService", "spark.bigdl.kms.azure.vault" -> "key_vault_name", "spark.bigdl.kms.azure.clientId" -> "client_id", "spark.bigdl.kms.key.primary" -> "/your/primary/key/path/primaryKey", "spark.bigdl.kms.key.data" -> "/your/data/key/path/dataKey" ) val sc = PPMLContext.initPPMLContext("MyApp", ppmlArgs)python
from bigdl.ppml.ppml_context import * ppml_args = {"kms_type": "AzureKeyManagementService", "azure_vault": "your_azure_vault", "azure_client_id": "your_azure_client_id", "primary_key_path": "/your/primary/key/path/primaryKey", "data_key_path": "/your/data/key/path/dataKey" } sc = PPMLContext("MyApp", ppml_args) 
 - 
 - 
create a PPMLContext with
sparkConf&appName&ppmlArgsIf you need to set Spark configurations, you can provide a
SparkConfwith Spark configurations to create aPPMLContext.scala
import com.intel.analytics.bigdl.ppml.PPMLContext import org.apache.spark.SparkConf val ppmlArgs: Map[String, String] = Map( "spark.bigdl.kms.type" -> "SimpleKeyManagementService", "spark.bigdl.kms.simple.id" -> "your_app_id", "spark.bigdl.kms.simple.key" -> "your_app_key", "spark.bigdl.kms.key.primary" -> "/your/primary/key/path/primaryKey", "spark.bigdl.kms.key.data" -> "/your/data/key/path/dataKey" ) val conf: SparkConf = new SparkConf().setMaster("local[4]") val sc = PPMLContext.initPPMLContext(conf, "MyApp", ppmlArgs)python
from bigdl.ppml.ppml_context import * from pyspark import SparkConf ppml_args = {"kms_type": "SimpleKeyManagementService", "simple_app_id": "your_app_id", "simple_app_key": "your_app_key", "primary_key_path": "/your/primary/key/path/primaryKey", "data_key_path": "/your/data/key/path/dataKey" } conf = SparkConf() conf.setMaster("local[4]") sc = PPMLContext("MyApp", ppml_args, conf) 
2. Read and Write Files
To read/write data, you should set the CryptoMode:
plain_text: no encryptionAES/CBC/PKCS5Padding: for CSV, JSON and text fileAES_GCM_V1: for PARQUET onlyAES_GCM_CTR_V1: for PARQUET only
To write data, you should set the write mode:
overwrite: Overwrite existing data with the content of dataframe.append: Append content of the dataframe to existing data or table.ignore: Ignore current write operation if data / table already exists without any error.error: Throw an exception if data or table already exists.errorifexists: Throw an exception if data or table already exists.
scala
import com.intel.analytics.bigdl.ppml.crypto.{AES_CBC_PKCS5PADDING, PLAIN_TEXT}
// read data
val df = sc.read(cryptoMode = PLAIN_TEXT)
         ...
// write data
sc.write(dataFrame = df, cryptoMode = AES_CBC_PKCS5PADDING)
.mode("overwrite")
...
python
from bigdl.ppml.ppml_context import *
# read data
df = sc.read(crypto_mode = CryptoMode.PLAIN_TEXT)
  ...
# write data
sc.write(dataframe = df, crypto_mode = CryptoMode.AES_CBC_PKCS5PADDING)
.mode("overwrite")
...
expand to see the examples of reading/writing CSV, PARQUET, JSON and text file
The following examples use sc to represent a initialized PPMLContext
read/write CSV file
scala
import com.intel.analytics.bigdl.ppml.PPMLContext
import com.intel.analytics.bigdl.ppml.crypto.{AES_CBC_PKCS5PADDING, PLAIN_TEXT}
// read a plain csv file and return a DataFrame
val plainCsvPath = "/plain/csv/path"
val df1 = sc.read(cryptoMode = PLAIN_TEXT).option("header", "true").csv(plainCsvPath)
// write a DataFrame as a plain csv file
val plainOutputPath = "/plain/output/path"
sc.write(df1, PLAIN_TEXT)
.mode("overwrite")
.option("header", "true")
.csv(plainOutputPath)
// read a encrypted csv file and return a DataFrame
val encryptedCsvPath = "/encrypted/csv/path"
val df2 = sc.read(cryptoMode = AES_CBC_PKCS5PADDING).option("header", "true").csv(encryptedCsvPath)
// write a DataFrame as a encrypted csv file
val encryptedOutputPath = "/encrypted/output/path"
sc.write(df2, AES_CBC_PKCS5PADDING)
.mode("overwrite")
.option("header", "true")
.csv(encryptedOutputPath)
python
# import
from bigdl.ppml.ppml_context import *
# read a plain csv file and return a DataFrame
plain_csv_path = "/plain/csv/path"
df1 = sc.read(CryptoMode.PLAIN_TEXT).option("header", "true").csv(plain_csv_path)
# write a DataFrame as a plain csv file
plain_output_path = "/plain/output/path"
sc.write(df1, CryptoMode.PLAIN_TEXT)
.mode('overwrite')
.option("header", True)
.csv(plain_output_path)
# read a encrypted csv file and return a DataFrame
encrypted_csv_path = "/encrypted/csv/path"
df2 = sc.read(CryptoMode.AES_CBC_PKCS5PADDING).option("header", "true").csv(encrypted_csv_path)
# write a DataFrame as a encrypted csv file
encrypted_output_path = "/encrypted/output/path"
sc.write(df2, CryptoMode.AES_CBC_PKCS5PADDING)
.mode('overwrite')
.option("header", True)
.csv(encrypted_output_path)
read/write PARQUET file
scala
import com.intel.analytics.bigdl.ppml.PPMLContext
import com.intel.analytics.bigdl.ppml.crypto.{AES_GCM_CTR_V1, PLAIN_TEXT}
// read a plain parquet file and return a DataFrame
val plainParquetPath = "/plain/parquet/path"
val df1 = sc.read(PLAIN_TEXT).parquet(plainParquetPath)
// write a DataFrame as a plain parquet file
plainOutputPath = "/plain/output/path"
sc.write(df1, PLAIN_TEXT)
.mode("overwrite")
.parquet(plainOutputPath)
// read a encrypted parquet file and return a DataFrame
val encryptedParquetPath = "/encrypted/parquet/path"
val df2 = sc.read(AES_GCM_CTR_V1).parquet(encryptedParquetPath)
// write a DataFrame as a encrypted parquet file
val encryptedOutputPath = "/encrypted/output/path"
sc.write(df2, AES_GCM_CTR_V1)
.mode("overwrite")
.parquet(encryptedOutputPath)
python
# import
from bigdl.ppml.ppml_context import *
# read a plain parquet file and return a DataFrame
plain_parquet_path = "/plain/parquet/path"
df1 = sc.read(CryptoMode.PLAIN_TEXT).parquet(plain_parquet_path)
# write a DataFrame as a plain parquet file
plain_output_path = "/plain/output/path"
sc.write(df1, CryptoMode.PLAIN_TEXT)
.mode('overwrite')
.parquet(plain_output_path)
# read a encrypted parquet file and return a DataFrame
encrypted_parquet_path = "/encrypted/parquet/path"
df2 = sc.read(CryptoMode.AES_GCM_CTR_V1).parquet(encrypted_parquet_path)
# write a DataFrame as a encrypted parquet file
encrypted_output_path = "/encrypted/output/path"
sc.write(df2, CryptoMode.AES_GCM_CTR_V1)
.mode('overwrite')
.parquet(encrypted_output_path)
read/write JSON file
scala
import com.intel.analytics.bigdl.ppml.PPMLContext
import com.intel.analytics.bigdl.ppml.crypto.{AES_CBC_PKCS5PADDING, PLAIN_TEXT}
// read a plain json file and return a DataFrame
val plainJsonPath = "/plain/json/path"
val df1 = sc.read(PLAIN_TEXT).json(plainJsonPath)
// write a DataFrame as a plain json file
val plainOutputPath = "/plain/output/path"
sc.write(df1, PLAIN_TEXT)
.mode("overwrite")
.json(plainOutputPath)
// read a encrypted json file and return a DataFrame
val encryptedJsonPath = "/encrypted/parquet/path"
val df2 = sc.read(AES_CBC_PKCS5PADDING).json(encryptedJsonPath)
// write a DataFrame as a encrypted parquet file
val encryptedOutputPath = "/encrypted/output/path"
sc.write(df2, AES_CBC_PKCS5PADDING)
.mode("overwrite")
.json(encryptedOutputPath)
python
# import
from bigdl.ppml.ppml_context import *
# read a plain json file and return a DataFrame
plain_json_path = "/plain/json/path"
df1 = sc.read(CryptoMode.PLAIN_TEXT).json(plain_json_path)
# write a DataFrame as a plain json file
plain_output_path = "/plain/output/path"
sc.write(df1, CryptoMode.PLAIN_TEXT)
.mode('overwrite')
.json(plain_output_path)
# read a encrypted json file and return a DataFrame
encrypted_json_path = "/encrypted/parquet/path"
df2 = sc.read(CryptoMode.AES_CBC_PKCS5PADDING).json(encrypted_json_path)
# write a DataFrame as a encrypted parquet file
encrypted_output_path = "/encrypted/output/path"
sc.write(df2, CryptoMode.AES_CBC_PKCS5PADDING)
.mode('overwrite')
.json(encrypted_output_path)
read textfile
scala
import com.intel.analytics.bigdl.ppml.PPMLContext
import com.intel.analytics.bigdl.ppml.crypto.{AES_CBC_PKCS5PADDING, PLAIN_TEXT}
// read from a plain csv file and return a RDD
val plainCsvPath = "/plain/csv/path"
val rdd1 = sc.textfile(plainCsvPath) // the default cryptoMode is PLAIN_TEXT
// read from a encrypted csv file and return a RDD
val encryptedCsvPath = "/encrypted/csv/path"
val rdd2 = sc.textfile(path=encryptedCsvPath, cryptoMode=AES_CBC_PKCS5PADDING)
python
# import
from bigdl.ppml.ppml_context import *
# read from a plain csv file and return a RDD
plain_csv_path = "/plain/csv/path"
rdd1 = sc.textfile(plain_csv_path) # the default crypto_mode is "plain_text"
# read from a encrypted csv file and return a RDD
encrypted_csv_path = "/encrypted/csv/path"
rdd2 = sc.textfile(path=encrypted_csv_path, crypto_mode=CryptoMode.AES_CBC_PKCS5PADDING)
More usage with PPMLContext Python API, please refer to PPMLContext Python API.