change the readthedocs theme and reorg the sections (#6056)

* refactor toc

* refactor toc

* Change to pydata-sphinx-theme and update packages requirement list for ReadtheDocs

* Remove customized css for old theme

* Add index page to each top bar section and limit dropdown maximum to be 4

* Use js to change 'More' to 'Libraries'

* Add custom.css to conf.py for further css changes

* Add BigDL logo and search bar

* refactor toc

* refactor toc and add overview

* refactor toc and add overview

* refactor toc and add overview

* refactor get started

* add paper and video section

* add videos

* add grid columns in landing page

* add document roadmap to index

* reapply search bar and github icon commit

* reorg orca and chronos sections

* Test: weaken ads by js

* update: change left attrbute

* update: add comments

* update: change opacity to 0.7

* Remove useless theme template override for old theme

* Add sidebar releases component in the home page

* Remove sidebar search and restore top nav search button

* Add BigDL handouts

* Add back to homepage button to pages except from the home page

* Update releases contents & styles in left sidebar

* Add version badge to the top bar

* Test: weaken ads by js

* update: add comments

* remove landing page contents

* rfix chronos install

* refactor install

* refactor chronos section titles

* refactor nano index

* change chronos landing

* revise chronos landing page

* add document navigator to nano landing page

* revise install landing page

* Improve css of versions in sidebar

* Make handouts image pointing to a page in new tab

* add win guide to install

* add dliib installation

* revise title bar

* rename index files

* add index page for user guide

* add dllib and orca API

* update user guide landing page

* refactor side bar

* Remove extra style configuration of card components & make different card usage consistent

* Remove extra styles for Nano how-to guides

* Remove extra styles for Chronos how-to guides

* Remove dark mode for now

* Update index page description

* Add decision tree for choosing BigDL libraries in index page

* add dllib models api, revise core layers formats

* Change primary & info color in light mode

* Restyle card components

* Restructure Chronos landing page

* Update card style

* Update BigDL library selection decision tree

* Fix failed Chronos tutorials filter

* refactor PPML documents

* refactor and add friesian documents

* add friesian arch diagram

* update landing pages and fill key features guide index page

* Restyle link card component

* Style video frames in PPML sections

* Adjust Nano landing page

* put api docs to the last in index for convinience

* Make badge horizontal padding smaller & small changes

* Change the second letter of all header titles to be small capitalizd

* Small changes on Chronos index page

* Revise decision tree to make it smaller

* Update: try to change the position of ads.

* Bugfix: deleted nonexist file config

* Update: update ad JS/CSS/config

* Update: change ad.

* Update: delete my template and change files.

* Update: change chronos installation table color.

* Update: change table font color to --pst-color-primary-text

* Remove old contents in landing page sidebar

* Restyle badge for usage in card footer again

* Add quicklinks template on landing page sidebar

* add quick links

* Add scala logo

* move tf, pytorch out of the link

* change orca key features cards

* fix typo

* fix a mistake in wording

* Restyle badge for card footer

* Update decision tree

* Remove useless html templates

* add more api docs and update tutorials in dllib

* update chronos install using new style

* merge changes in nano doc from master

* fix quickstart links in sidebar quicklinks

* Make tables responsive

* Fix overflow in api doc

* Fix list indents problems in [User guide] section

* Further fixes to nested bullets contents in [User Guide] section

* Fix strange title in Nano 5-min doc

* Fix list indent problems in [DLlib] section

* Fix misnumbered list problems and other small fixes for [Chronos] section

* Fix list indent problems and other small fixes for [Friesian] section

* Fix list indent problem and other small fixes for [PPML] section

* Fix list indent problem for developer guide

* Fix list indent problem for [Cluster Serving] section

* fix dllib links

* Fix wrong relative link in section landing page

Co-authored-by: Yuwen Hu <yuwen.hu@intel.com>
Co-authored-by: Juntao Luo <1072087358@qq.com>
This commit is contained in:
Shengsheng Huang 2022-10-18 15:35:31 +08:00 committed by GitHub
parent 8e0d589845
commit f2e4c40cee
107 changed files with 7662 additions and 4587 deletions

Binary file not shown.

After

Width:  |  Height:  |  Size: 214 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 795 B

View file

@ -11,9 +11,9 @@ cloudpickle==2.1.0
ray[tune]==1.9.2 ray[tune]==1.9.2
ray==1.9.2 ray==1.9.2
torch==1.9.0 torch==1.9.0
Pygments==2.6.1 Pygments==2.7
setuptools==41.0.1 setuptools==41.0.1
docutils==0.17 docutils==0.17.1
mock==1.0.1 mock==1.0.1
pillow==5.4.1 pillow==5.4.1
sphinx==4.5.0 sphinx==4.5.0
@ -21,7 +21,6 @@ alabaster>=0.7,<0.8,!=0.7.5
commonmark==0.8.1 commonmark==0.8.1
recommonmark==0.5.0 recommonmark==0.5.0
readthedocs-sphinx-ext<2.2 readthedocs-sphinx-ext<2.2
sphinx_rtd_theme==1.0.0
scikit-learn==1.0.2 scikit-learn==1.0.2
pystan==2.19.1.1 pystan==2.19.1.1
prophet==1.0.1 prophet==1.0.1
@ -40,4 +39,5 @@ sphinx-external-toc==0.3.0
nbsphinx==0.8.9 nbsphinx==0.8.9
ipython==7.34.0 ipython==7.34.0
sphinx-design==0.2.0 sphinx-design==0.2.0
nbsphinx-link==1.3.0 nbsphinx-link==1.3.0
pydata-sphinx-theme==0.11.0

View file

@ -11,7 +11,7 @@
} }
#table-1 tr, td{ #table-1 tr, td{
background-color: rgb(240, 241, 245); background-color: var(--pst-color-on-surface);
height: 30px; height: 30px;
border-width: 2px; border-width: 2px;
border-style: solid; border-style: solid;
@ -26,7 +26,7 @@
#table-1 td{ #table-1 td{
font-size: 16px; font-size: 16px;
font-family: Verdana; font-family: Verdana;
color: rgb(15, 24, 33); color: var(--pst-color-text-base);
text-align: center; text-align: center;
/* height: 56px; /* height: 56px;
line-height: 56px; */ line-height: 56px; */

View file

@ -1,65 +1,63 @@
/*Extends the docstring signature box.*/ /* change primary & info color for light mode*/
.rst-content dl:not(.docutils) dt { html[data-theme="light"] {
display: block; --pst-color-primary: rgb(1, 113, 195);
padding: 10px; --pst-color-info: rgb(1, 113, 195);
word-wrap: break-word;
padding-right: 100px;
}
/*Lists in an admonition note do not have awkward whitespace below.*/
.rst-content .admonition-note .section ul {
margin-bottom: 0px;
}
/*Properties become blue (classmethod, staticmethod, property)*/
.rst-content dl dt em.property {
color: #2980b9;
text-transform: uppercase;
} }
.rst-content .section ol p, /* ectra css variables */
.rst-content .section ul p { :root {
margin-bottom: 0px; --pst-color-info-tiny-opacity: rgba(1, 113, 195, 0.1);
--pst-color-info-low-opacity: rgba(1, 113, 195, 0.25);
} }
div.sphx-glr-bigcontainer {
display: inline-block; /* align items in the left part of header to the ground*/
width: 100%; .bd-header #navbar-start {
align-items: end;
} }
td.tune-colab, /* for version badge, possible for other badges*/
th.tune-colab { .version-badge{
border: 1px solid #dddddd; border: 1px solid var(--pst-color-primary);
text-align: left; border-radius: 0.25rem;
padding: 8px; color: var(--pst-color-primary);
padding: 0.1rem 0.25rem;
font-size: var(--pst-font-size-milli);
} }
/* Adjustment to Sphinx Book Theme */ /* for card components */
.table td { .bd-content .sd-card {
/* Remove row spacing */ border: none;
padding: 0; border-left: .2rem solid var(--pst-color-info-low-opacity);
} }
table { .bd-content .sd-card .sd-card-header{
/* Force full width for all table */ background-color: var(--pst-color-info-tiny-opacity);
width: 136% !important; border: none;
} }
img.inline-figure { .bigdl-link-card:hover{
/* Override the display: block for img */ border-left: .2rem solid var(--pst-color-info);
display: inherit !important;
} }
#version-warning-banner { /* for sphinx-design badge components (customized for usage in card footer)*/
/* Make version warning clickable */ .sd-badge{
z-index: 1; padding: .35em 0em;
font-size: 0.9em;
} }
span.rst-current-version > span.fa.fa-book { /* for landing page side bar */
/* Move the book icon away from the top right .bigdl-quicklinks-section-nav{
* corner of the version flyout menu */ padding-bottom: 0.5rem;
margin: 10px 0px 0px 5px; padding-left: 1rem;
} }
/* Adjustment to Version block */ .bigdl-quicklinks-section-title{
.rst-versions { color: var(--pst-color-primary);
z-index: 1200 !important;
} }
/* force long parameter definition (which occupy a whole line)
to break in api documents for class/method */
.sig-object{
overflow-wrap: break-word;
}

View file

@ -232,8 +232,8 @@ function refresh_cmd(){
//set the color of selected buttons //set the color of selected buttons
function set_color(id){ function set_color(id){
$("#"+id).parent().css("background-color","rgb(74, 106, 237)"); $("#"+id).parent().css("background-color","var(--pst-color-primary)");
$("#"+id).css("color","white"); $("#"+id).css("color","var(--pst-color-primary-text)");
$("#"+id).addClass("isset"); $("#"+id).addClass("isset");
} }
@ -241,7 +241,7 @@ function set_color(id){
function reset_color(list){ function reset_color(list){
for (btn in list){ for (btn in list){
$("#"+list[btn]).parent().css("background-color","transparent"); $("#"+list[btn]).parent().css("background-color","transparent");
$("#"+list[btn]).css("color","black"); $("#"+list[btn]).css("color","var(--pst-color-text-base)");
$("#"+list[btn]).removeClass("isset"); $("#"+list[btn]).removeClass("isset");
} }
} }
@ -254,7 +254,7 @@ function disable(list){
} }
reset_color(list); reset_color(list);
for(btn in list){ for(btn in list){
$("#"+list[btn]).parent().css("background-color","rgb(133, 133, 133)"); $("#"+list[btn]).parent().css("background-color","var(--pst-color-muted)");
} }
} }
@ -303,14 +303,14 @@ $(document).on('click',"button",function(){
$(document).on({ $(document).on({
mouseenter: function () { mouseenter: function () {
if($(this).prop("disabled")!=true){ if($(this).prop("disabled")!=true){
$(this).parent().css("background-color","rgb(74, 106, 237)"); $(this).parent().css("background-color","var(--pst-color-primary)");
$(this).css("color","white"); $(this).css("color","var(--pst-color-primary-text)");
} }
}, },
mouseleave: function () { mouseleave: function () {
if(!$(this).hasClass("isset") && $(this).prop("disabled")!=true){ if(!$(this).hasClass("isset") && $(this).prop("disabled")!=true){
$(this).parent().css("background-color","transparent"); $(this).parent().css("background-color","transparent");
$(this).css("color","black"); $(this).css("color","var(--pst-color-text-base)");
} }
} }
}, "button"); }, "button");

View file

@ -24,8 +24,9 @@ function disCheck(ids){
//event when click the checkboxes //event when click the checkboxes
$(".checkboxes").click(function(){ $(".checkboxes").click(function(){
//get all checked values //get all checked values
//class checkboxes is specified to avoid selecting toctree checkboxes (arrows)
var vals = []; var vals = [];
$('input:checkbox:checked').each(function (index, item) { $('.checkboxes:input:checkbox:checked').each(function (index, item) {
vals.push($(this).val()); vals.push($(this).val());
}); });

View file

@ -0,0 +1,26 @@
$(document).ready(function(){
// $('.btn.dropdown-toggle.nav-item').text('Libraries'); // change text for dropdown menu in header from More to Libraries
// hide the original left sidebar ads display
$('#ethical-ad-placement').css({
"display":"none"
});
// manually add the ads to the end of content
$(".bd-article").append(
"<br />\
<div style='display:flex;justify-content:center;'\
<div\
id='ethical-ad-placement'\
class='horizontal'\
data-ea-publisher='readthedocs'\
data-ea-type='image'\
></div>\
</div>"
);
// make tables responsive
$("table").wrap(
"<div style='overflow-x:auto;'></div>"
);
})

View file

@ -1,60 +0,0 @@
<!--
Copyright 2016 The BigDL Authors.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
the following code is adapted from https://github.com/readthedocs/sphinx_rtd_theme/
The MIT License (MIT)
Copyright (c) 2013-2018 Dave Snider, Read the Docs, Inc. & contributors
Permission is hereby granted, free of charge, to any person obtaining a copy of
this software and associated documentation files (the "Software"), to deal in
the Software without restriction, including without limitation the rights to
use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of
the Software, and to permit persons to whom the Software is furnished to do so,
subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS
FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER
IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
-->
{%- extends "sphinx_rtd_theme/breadcrumbs.html" %}
<!--Change "Edit on Github" button on top-right corner to "Edit this page" in every page-->
{%- block breadcrumbs_aside %}
<li class="wy-breadcrumbs-aside">
{%- if hasdoc(pagename) and display_vcs_links %}
{%- if display_github %}
{%- if check_meta and 'github_url' in meta %}
<!-- User defined GitHub URL -->
<a href="{{ meta['github_url'] }}" class="fa fa-github"> {{ _('Edit this page') }}</a>
{%- else %}
<a href="https://{{ github_host|default("github.com") }}/{{ github_user }}/{{ github_repo }}/{{ theme_vcs_pageview_mode or "blob" }}/{{ github_version }}{{ conf_py_path }}{{ pagename }}{{ page_source_suffix }}" class="fa fa-github"> {{ _('Edit this page') }}</a>
{%- endif %}
{%- elif show_source and source_url_prefix %}
<a href="{{ source_url_prefix }}{{ pagename }}{{ page_source_suffix }}">{{ _('View page source') }}</a>
{%- elif show_source and has_source and sourcename %}
<a href="{{ pathto('_sources/' + sourcename, true)|e }}" rel="nofollow"> {{ _('View page source') }}</a>
{%- endif %}
{%- endif %}
</li>
{%- endblock %}

View file

@ -0,0 +1,6 @@
{% set home_href = pathto(master_doc) %}
<div>
<a href={{ home_href }}>
<strong>Back to Homepage ↵</strong>
</a>
</div>

View file

@ -0,0 +1,68 @@
<nav class="bd-links">
<p class="bd-links__title">Quick Links</p>
<div class="navbar-nav">
<strong class="bigdl-quicklinks-section-title">Orca QuickStart</Q></strong>
<ul class="nav bigdl-quicklinks-section-nav">
<li>
<a href="doc/UseCase/spark-dataframe.html">Use Spark Dataframe for Deep Learning</a>
</li>
<li>
<a href="doc/Orca/QuickStart/orca-pytorch-distributed-quickstart.html">Distributed PyTorch using Orca</a>
</li>
<li>
<a href="doc/Orca/QuickStart/orca-autoxgboost-quickstart.html">Use AutoXGBoost to tune XGBoost parameters automatically</a>
</li>
</ul>
<strong class="bigdl-quicklinks-section-title">Nano QuickStart</strong>
<ul class="nav bigdl-quicklinks-section-nav" >
<li>
<a href="doc/Nano/QuickStart/pytorch_train_quickstart.html">PyTorch Training Acceleration</a>
</li>
<li>
<a href="doc/Nano/QuickStart/pytorch_quantization_inc_onnx.html">PyTorch Inference Quantization with ONNXRuntime Acceleration </a>
</li>
<li>
<a href="doc/Nano/QuickStart/pytorch_openvino.html">PyTorch Inference Acceleration using OpenVINO</a>
</li>
<li>
<a href="doc/Nano/QuickStart/tensorflow_train_quickstart.html">Tensorflow Training Acceleration</a>
</li>
<li>
<a href="doc/Nano/QuickStart/tensorflow_quantization_quickstart.html">Tensorflow Quantization Acceleration</a>
</li>
</ul>
<strong class="bigdl-quicklinks-section-title">DLlib QuickStart</strong>
<ul class="nav bigdl-quicklinks-section-nav" >
<li>
<a href="doc/DLlib/QuickStart/python-getting-started.html">Python QuickStart</a>
</li>
<li>
<a href="doc/DLlib/QuickStart/scala-getting-started.html">Scala QuickStart</a>
</li>
</ul>
<strong class="bigdl-quicklinks-section-title">Chronos QuickStart</strong>
<ul class="nav bigdl-quicklinks-section-nav" >
<li>
<a href="doc/Chronos/QuickStart/chronos-tsdataset-forecaster-quickstart.html">Basic Forecasting</a>
</li>
<li>
<a href="doc/Chronos/QuickStart/chronos-autotsest-quickstart.html">Forecasting using AutoML</a>
</li>
<li>
<a href="doc/Chronos/QuickStart/chronos-anomaly-detector.html">Anomaly Detection</a>
</li>
</ul>
<strong class="bigdl-quicklinks-section-title">PPML QuickStart</strong>
<ul class="nav bigdl-quicklinks-section-nav" >
<li>
<a href="doc/PPML/Overview/quicktour.html">Hello World Example</a>
</li>
<li>
<a href="doc/PPML/QuickStart/end-to-end.html">End-to-End Example</a>
</li>
</ul>
</div>
</nav>

View file

@ -0,0 +1,3 @@
<div class="version-badge" style="margin-bottom: 2px;">
{{ release }}
</div>

View file

@ -1,105 +1,224 @@
root: index root: index
subtrees: subtrees:
- caption: Quick Start - entries:
entries: - file: doc/UserGuide/index
- file: doc/Orca/QuickStart/orca-tf-quickstart title: 'User guide'
- file: doc/Orca/QuickStart/orca-keras-quickstart subtrees:
- file: doc/Orca/QuickStart/orca-tf2keras-quickstart - entries:
- file: doc/Orca/QuickStart/orca-pytorch-quickstart - file: doc/UserGuide/python
- file: doc/Ray/QuickStart/ray-quickstart - file: doc/UserGuide/scala
- file: doc/UserGuide/win
- file: doc/UserGuide/docker
- file: doc/UserGuide/colab
- file: doc/UserGuide/hadoop
- file: doc/UserGuide/k8s
- file: doc/UserGuide/databricks
- caption: User Guide
entries:
- file: doc/UserGuide/python
- file: doc/UserGuide/scala
- file: doc/UserGuide/win
- file: doc/UserGuide/colab
- file: doc/UserGuide/docker
- file: doc/UserGuide/hadoop
- file: doc/UserGuide/k8s
- file: doc/UserGuide/databricks
- file: doc/UserGuide/develop
- file: doc/UserGuide/known_issues
- caption: Nano - entries:
entries: - file: doc/Application/powered-by
- file: doc/Nano/Overview/nano title: "Powered by"
- file: doc/Nano/QuickStart/pytorch_train
- file: doc/Nano/QuickStart/pytorch_inference
- file: doc/Nano/QuickStart/tensorflow_train
- file: doc/Nano/QuickStart/tensorflow_inference
- file: doc/Nano/QuickStart/hpo
- file: doc/Nano/QuickStart/index
- file: doc/Nano/Howto/index
- file: doc/Nano/Overview/known_issues
- caption: DLlib
entries:
- file: doc/DLlib/Overview/dllib
- file: doc/DLlib/Overview/keras-api
- file: doc/DLlib/Overview/nnframes
- caption: Orca - entries:
entries: - file: doc/Orca/index
- file: doc/Orca/Overview/orca title: "Orca"
title: "Orca User Guide" subtrees:
- file: doc/Orca/Overview/orca-context - entries:
- file: doc/Orca/Overview/data-parallel-processing - file: doc/Orca/Overview/orca
- file: doc/Orca/Overview/distributed-training-inference title: "Orca in 5 miniutes"
- file: doc/Orca/Overview/distributed-tuning - file: doc/Orca/Overview/install
- file: doc/Ray/Overview/ray title: "Installation"
- file: doc/Orca/Overview/known_issues - file: doc/Orca/Overview/index
title: "Key Features"
subtrees:
- entries:
- file: doc/Orca/Overview/orca-context
- file: doc/Orca/Overview/data-parallel-processing
- file: doc/Orca/Overview/distributed-training-inference
- file: doc/Orca/Overview/distributed-tuning
- file: doc/Orca/Overview/ray
- file: doc/Orca/QuickStart/index
title: "Tutorials"
subtrees:
- entries:
- file: doc/UseCase/spark-dataframe
- file: doc/UseCase/xshards-pandas
- file: doc/Orca/QuickStart/ray-quickstart
- file: doc/Orca/QuickStart/orca-pytorch-distributed-quickstart
- file: doc/Orca/QuickStart/orca-autoestimator-pytorch-quickstart
- file: doc/Orca/QuickStart/orca-autoxgboost-quickstart
- file: doc/Orca/Overview/known_issues
title: "Tips and Known Issues"
- file: doc/PythonAPI/Orca/index
title: "API Reference"
- caption: Chronos
entries:
- file: doc/Chronos/Overview/chronos
- file: doc/Chronos/Overview/quick-tour
- file: doc/Chronos/Howto/index
- file: doc/Chronos/QuickStart/index
- file: doc/Chronos/Overview/deep_dive
- file: doc/Chronos/Overview/chronos_known_issue
- caption: PPML
entries:
- file: doc/PPML/Overview/ppml
- file: doc/PPML/Overview/trusted_big_data_analytics_and_ml
- file: doc/PPML/Overview/trusted_fl
- file: doc/PPML/QuickStart/secure_your_services
- file: doc/PPML/QuickStart/build_kernel_with_sgx
- file: doc/PPML/QuickStart/deploy_intel_sgx_device_plugin_for_kubernetes
- file: doc/PPML/QuickStart/trusted-serving-on-k8s-guide
- file: doc/PPML/QuickStart/tpc-h_with_sparksql_on_k8s
- file: doc/PPML/QuickStart/tpc-ds_with_sparksql_on_k8s
- file: doc/PPML/Overview/azure_ppml
- caption: Serving - entries:
entries: - file: doc/Nano/index
- file: doc/Serving/Overview/serving.md title: "Nano"
- file: doc/Serving/QuickStart/serving-quickstart subtrees:
- file: doc/Serving/ProgrammingGuide/serving-installation - entries:
- file: doc/Serving/ProgrammingGuide/serving-start - file: doc/Nano/Overview/nano
- file: doc/Serving/ProgrammingGuide/serving-inference title: "Nano in 5 minutes"
- file: doc/Serving/Example/example - file: doc/Nano/Overview/install
- file: doc/Serving/FAQ/faq title: "Installation"
- file: doc/Serving/FAQ/contribute-guide - file: doc/Nano/Overview/index
title: "Key Features"
subtrees:
- entries:
- file: doc/Nano/Overview/pytorch_train
- file: doc/Nano/Overview/pytorch_inference
- file: doc/Nano/Overview/tensorflow_train
- file: doc/Nano/Overview/tensorflow_inference
- file: doc/Nano/Overview/hpo
- file: doc/Nano/QuickStart/index
title: "Tutorials"
subtrees:
- entries:
- file: doc/Nano/QuickStart/pytorch_train_quickstart
- file: doc/Nano/QuickStart/pytorch_onnxruntime
- file: doc/Nano/QuickStart/pytorch_openvino
- file: doc/Nano/QuickStart/pytorch_quantization_inc_onnx
- file: doc/Nano/QuickStart/pytorch_quantization_inc
- file: doc/Nano/QuickStart/pytorch_quantization_openvino
- file: doc/Nano/QuickStart/tensorflow_train_quickstart
- file: doc/Nano/QuickStart/tensorflow_embedding
- file: doc/Nano/QuickStart/tensorflow_quantization_quickstart
- file: doc/Nano/Howto/index
title: "How-to Guides"
- file: doc/Nano/Overview/known_issues
title: "Tips and Known Issues"
- file: doc/PythonAPI/Nano/index
title: "API Reference"
- caption: Common Use Case
entries:
- file: doc/Orca/QuickStart/orca-pytorch-distributed-quickstart
- file: doc/UseCase/spark-dataframe
- file: doc/UseCase/xshards-pandas
- file: doc/Orca/QuickStart/orca-autoestimator-pytorch-quickstart
- file: doc/Orca/QuickStart/orca-autoxgboost-quickstart
- caption: Python API
entries:
- file: doc/PythonAPI/Orca/orca
- file: doc/PythonAPI/Friesian/feature
- file: doc/PythonAPI/Chronos/index
- file: doc/PythonAPI/Nano/index
- caption: Real-World Application - entries:
entries: - file: doc/DLlib/index
- file: doc/Application/presentations title: "DLlib"
- file: doc/Application/blogs subtrees:
- file: doc/Application/powered-by - entries:
- file: doc/DLlib/Overview/dllib
title: "DLLib in 5 minutes"
- file: doc/DLlib/Overview/install
title: "Installation"
- file: doc/DLlib/Overview/index
title: "Key Features"
subtrees:
- entries:
- file: doc/DLlib/Overview/keras-api
- file: doc/DLlib/Overview/nnframes
- file: doc/DLlib/Overview/visualization
title: "Visualization"
- file: doc/DLlib/QuickStart/index
title: "Tutorials"
subtrees:
- entries:
- file: doc/DLlib/QuickStart/python-getting-started
title: "Python Quick Start"
- file: doc/DLlib/QuickStart/scala-getting-started
title: "Scala Quick Start"
- file: doc/PythonAPI/DLlib/index
title: "API Reference"
- entries:
- file: doc/Chronos/index
title: "Chronos"
subtrees:
- entries:
- file: doc/Chronos/Overview/quick-tour
title: "Chronos in 5 minutes"
- file: doc/Chronos/Overview/install
title: "Installation"
- file: doc/Chronos/Overview/deep_dive
title: "Key Features"
- file: doc/Chronos/Howto/index
title: "How-to Guides"
- file: doc/Chronos/QuickStart/index
title: "Tutorials"
subtrees:
- entries:
- file: doc/Chronos/QuickStart/chronos-tsdataset-forecaster-quickstart
- file: doc/Chronos/QuickStart/chronos-autotsest-quickstart
- file: doc/Chronos/QuickStart/chronos-anomaly-detector
- file: doc/Chronos/Overview/chronos_known_issue
title: "Tips and Known Issues"
- file: doc/PythonAPI/Chronos/index
title: "API Reference"
- entries:
- file: doc/Friesian/index
title: "Friesian"
subtrees:
- entries:
- file: doc/Friesian/intro
title: "Introduction"
- file: doc/Friesian/serving
title: "Serving"
- file: doc/Friesian/examples
title: "Use Cases"
- file: doc/PythonAPI/Friesian/index
title: "API Reference"
- entries:
- file: doc/PPML/index
title: "PPML"
subtrees:
- entries:
- file: doc/PPML/Overview/intro
title: "PPML Introduction"
- file: doc/PPML/Overview/userguide
title: 'User Guide'
- file: doc/PPML/Overview/examples
title: "Tutorials"
subtrees:
- entries:
- file: doc/PPML/Overview/quicktour
- file: doc/PPML/QuickStart/end-to-end
- file: doc/PPML/Overview/misc
title: "Advanced Topics"
subtrees:
- entries:
- file: doc/PPML/Overview/ppml
- file: doc/PPML/Overview/trusted_big_data_analytics_and_ml
- file: doc/PPML/Overview/trusted_fl
- file: doc/PPML/QuickStart/secure_your_services
- file: doc/PPML/QuickStart/build_kernel_with_sgx
- file: doc/PPML/QuickStart/deploy_intel_sgx_device_plugin_for_kubernetes
- file: doc/PPML/QuickStart/trusted-serving-on-k8s-guide
- file: doc/PPML/QuickStart/tpc-h_with_sparksql_on_k8s
- file: doc/PPML/QuickStart/tpc-ds_with_sparksql_on_k8s
- file: doc/PPML/Overview/azure_ppml
- entries:
- file: doc/UserGuide/develop
title: "Developer guide"
- entries:
- file: doc/Serving/index
title: "Cluster serving"
subtrees:
- entries:
- file: doc/Serving/Overview/serving.md
title: "User Guide"
- file: doc/Serving/QuickStart/serving-quickstart
title: "Serving in 5 miniutes"
- file: doc/Serving/ProgrammingGuide/serving-installation
- file: doc/Serving/ProgrammingGuide/serving-start
- file: doc/Serving/ProgrammingGuide/serving-inference
- file: doc/Serving/Example/example
title: "Examples"
- file: doc/Serving/FAQ/faq
- file: doc/Serving/FAQ/contribute-guide
- entries:
- file: doc/Application/presentations
title: "Presentations"
- entries:
- file: doc/Application/blogs

View file

@ -31,19 +31,39 @@ sys.path.insert(0, os.path.abspath("../../../python/serving/src/"))
sys.path.insert(0, os.path.abspath("../../../python/nano/src/")) sys.path.insert(0, os.path.abspath("../../../python/nano/src/"))
# -- Project information ----------------------------------------------------- # -- Project information -----------------------------------------------------
import sphinx_rtd_theme html_theme = "pydata_sphinx_theme"
html_theme = "sphinx_rtd_theme"
html_theme_path = [sphinx_rtd_theme.get_html_theme_path()]
#html_theme = "sphinx_book_theme"
html_theme_options = { html_theme_options = {
"repository_url": "https://github.com/intel-analytics/BigDL", "header_links_before_dropdown": 8,
"use_repository_button": True, "icon_links": [
"use_issues_button": True, {
"use_edit_page_button": True, "name": "GitHub Repository for BigDL",
"path_to_docs": "doc/source", "url": "https://github.com/intel-analytics/BigDL",
"home_page_in_toc": True, "icon": "fa-brands fa-square-github",
"type": "fontawesome",
}
],
"navbar_start": ["navbar-logo.html", "version_badge.html"],
"navbar_end": ["navbar-icon-links.html"], # remove dark mode for now
} }
# add search bar to side bar
html_sidebars = {
"index": [
"sidebar_quicklinks.html"
],
"**": ["sidebar_backbutton.html","sidebar-nav-bs.html"]
}
# remove dark mode for now
html_context = {
"default_mode": "light"
}
html_logo = "../image/bigdl_logo.png"
# hard code it for now, may change it to read from installed bigdl
release = "latest"
# The suffix of source filenames. # The suffix of source filenames.
from recommonmark.parser import CommonMarkParser from recommonmark.parser import CommonMarkParser
source_suffix = {'.rst': 'restructuredtext', source_suffix = {'.rst': 'restructuredtext',
@ -92,7 +112,8 @@ extensions = [
'sphinx_external_toc', 'sphinx_external_toc',
'sphinx_design', 'sphinx_design',
'nbsphinx', 'nbsphinx',
'nbsphinx_link' 'nbsphinx_link',
'sphinx.ext.graphviz' # for embedded graphviz diagram
] ]
# Add any paths that contain templates here, relative to this directory. # Add any paths that contain templates here, relative to this directory.
@ -136,6 +157,13 @@ exclude_patterns = ['_build']
# relative to this directory. They are copied after the builtin static files, # relative to this directory. They are copied after the builtin static files,
# so a file named "default.css" will overwrite the builtin "default.css". # so a file named "default.css" will overwrite the builtin "default.css".
html_static_path = ['_static'] html_static_path = ['_static']
# add js/css for customizing each page
html_js_files = [
'js/custom.js',
]
html_css_files = [
'css/custom.css',
]
# Custom sidebar templates, must be a dictionary that maps document namesan # Custom sidebar templates, must be a dictionary that maps document namesan
# to template names. # to template names.
@ -246,4 +274,7 @@ def setup(app):
app.add_transform(AutoStructify) app.add_transform(AutoStructify)
# disable notebook execution # disable notebook execution
nbsphinx_execute = 'never' nbsphinx_execute = 'never'
# make output of graphviz diagram to svg
graphviz_output_format = 'svg'

View file

@ -0,0 +1,2 @@
Real-World Application
=========================

View file

@ -97,15 +97,15 @@ After the Jupyter Notebook service is successfully started, you can connect to t
You should shut down the BigDL Docker container after using it. You should shut down the BigDL Docker container after using it.
1. First, use `ctrl+p+q` to quit the container when you are still in it. 1. First, use `ctrl+p+q` to quit the container when you are still in it.
2. Then, you can list all the active Docker containers by command line: 2. Then, you can list all the active Docker containers by command line:
```bash ```bash
sudo docker ps sudo docker ps
``` ```
You will see your docker containers: You will see your docker containers:
```bash ```bash
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
40de2cdad025 chronos-nightly:b1 "/opt/work/" 3 hours ago Up 3 hours upbeat_al 40de2cdad025 chronos-nightly:b1 "/opt/work/" 3 hours ago Up 3 hours upbeat_al
``` ```
3. Shut down the corresponding docker container by its ID: 3. Shut down the corresponding docker container by its ID:
```bash ```bash
sudo docker rm -f 40de2cdad025 sudo docker rm -f 40de2cdad025
``` ```

View file

@ -1 +1 @@
<svg width="535" height="368" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" overflow="hidden"><defs><clipPath id="clip0"><rect x="1771" y="750" width="535" height="368"/></clipPath></defs><g clip-path="url(#clip0)" transform="translate(-1771 -750)"><path d="M0 0 76.8928 246.699" stroke="#4472C4" stroke-width="10.3125" stroke-miterlimit="8" fill="none" fill-rule="evenodd" transform="matrix(1 0 0 -1 1792 1097.7)"/><path d="M1868 846 1948.1 1102.31" stroke="#4472C4" stroke-width="10.3125" stroke-miterlimit="8" fill="none" fill-rule="evenodd"/><path d="M1771 1097.5C1771 1086.18 1780.18 1077 1791.5 1077 1802.82 1077 1812 1086.18 1812 1097.5 1812 1108.82 1802.82 1118 1791.5 1118 1780.18 1118 1771 1108.82 1771 1097.5Z" fill="#70AD47" fill-rule="evenodd"/><path d="M1848 848C1848 836.402 1857.18 827 1868.5 827 1879.82 827 1889 836.402 1889 848 1889 859.598 1879.82 869 1868.5 869 1857.18 869 1848 859.598 1848 848Z" fill="#70AD47" fill-rule="evenodd"/><path d="M0 0 76.8928 246.699" stroke="#4472C4" stroke-width="10.3125" stroke-miterlimit="8" fill="none" fill-rule="evenodd" transform="matrix(1 0 0 -1 1948 1097.7)"/><path d="M2025 846 2105.1 1102.31" stroke="#4472C4" stroke-width="10.3125" stroke-miterlimit="8" fill="none" fill-rule="evenodd"/><path d="M1928 1097.5C1928 1086.18 1937.18 1077 1948.5 1077 1959.82 1077 1969 1086.18 1969 1097.5 1969 1108.82 1959.82 1118 1948.5 1118 1937.18 1118 1928 1108.82 1928 1097.5Z" fill="#70AD47" fill-rule="evenodd"/><path d="M2005 848C2005 836.402 2014.18 827 2025.5 827 2036.82 827 2046 836.402 2046 848 2046 859.598 2036.82 869 2025.5 869 2014.18 869 2005 859.598 2005 848Z" fill="#70AD47" fill-rule="evenodd"/><path d="M0 0 75.2197 328.131" stroke="#4472C4" stroke-width="10.3125" stroke-miterlimit="8" fill="none" fill-rule="evenodd" transform="matrix(1 0 0 -1 2107 1092.13)"/><path d="M2187 762 2284.72 1094.67" stroke="#4472C4" stroke-width="10.3125" stroke-miterlimit="8" fill="none" fill-rule="evenodd"/><path d="M2086 1092.5C2086 1081.18 2095.4 1072 2107 1072 2118.6 1072 2128 1081.18 2128 1092.5 2128 1103.82 2118.6 1113 2107 1113 2095.4 1113 2086 1103.82 2086 1092.5Z" fill="#70AD47" fill-rule="evenodd"/><path d="M2264 1092.5C2264 1081.18 2273.4 1072 2285 1072 2296.6 1072 2306 1081.18 2306 1092.5 2306 1103.82 2296.6 1113 2285 1113 2273.4 1113 2264 1103.82 2264 1092.5Z" fill="#70AD47" fill-rule="evenodd"/><path d="M2166 770.5C2166 759.178 2175.4 750 2187 750 2198.6 750 2208 759.178 2208 770.5 2208 781.822 2198.6 791 2187 791 2175.4 791 2166 781.822 2166 770.5Z" fill="#FF0000" fill-rule="evenodd"/></g></svg> <svg width="1320" height="990" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" overflow="hidden"><defs><clipPath id="clip0"><rect x="1907" y="139" width="1320" height="990"/></clipPath></defs><g clip-path="url(#clip0)" transform="translate(-1907 -139)"><rect x="1907" y="139" width="1320" height="990" fill="#FFFFFF"/><path d="M0.00756648 0.0364743 153.008 492.037" stroke="#0171C3" stroke-width="20.5406" stroke-miterlimit="8" fill="none" fill-rule="evenodd" transform="matrix(1 0 0 -1 2090.5 932.5)"/><path d="M2241.86 430.229 2400.86 941.229" stroke="#0171C3" stroke-width="20.5406" stroke-miterlimit="8" fill="none" fill-rule="evenodd"/><path d="M2047.36 930.729C2047.36 908.089 2065.72 889.728 2088.36 889.728 2111 889.728 2129.36 908.089 2129.36 930.729 2129.36 953.369 2111 971.729 2088.36 971.729 2065.72 971.729 2047.36 953.369 2047.36 930.729Z" fill="#28A745" fill-rule="evenodd"/><path d="M2201.36 433.729C2201.36 410.532 2219.72 391.728 2242.36 391.728 2265 391.728 2283.36 410.532 2283.36 433.729 2283.36 456.924 2265 475.729 2242.36 475.729 2219.72 475.729 2201.36 456.924 2201.36 433.729Z" fill="#28A745" fill-rule="evenodd"/><path d="M0.0637745 0.0364743 153.064 492.037" stroke="#0171C3" stroke-width="20.5406" stroke-miterlimit="8" fill="none" fill-rule="evenodd" transform="matrix(1 0 0 -1 2401.5 932.5)"/><path d="M2554.86 430.229 2713.86 941.229" stroke="#0171C3" stroke-width="20.5406" stroke-miterlimit="8" fill="none" fill-rule="evenodd"/><path d="M2360.36 930.729C2360.36 908.089 2378.72 889.728 2401.36 889.728 2424 889.728 2442.36 908.089 2442.36 930.729 2442.36 953.369 2424 971.729 2401.36 971.729 2378.72 971.729 2360.36 953.369 2360.36 930.729Z" fill="#28A745" fill-rule="evenodd"/><path d="M2514.37 433.729C2514.37 410.532 2532.5 391.728 2554.87 391.728 2577.23 391.728 2595.37 410.532 2595.37 433.729 2595.37 456.924 2577.23 475.729 2554.87 475.729 2532.5 475.729 2514.37 456.924 2514.37 433.729Z" fill="#28A745" fill-rule="evenodd"/><path d="M0.12133 0.005045 150.122 653.006" stroke="#0171C3" stroke-width="20.5406" stroke-miterlimit="8" fill="none" fill-rule="evenodd" transform="matrix(1 0 0 -1 2718.5 920.5)"/><path d="M2876.86 263.229 3071.86 926.23" stroke="#0171C3" stroke-width="20.5406" stroke-miterlimit="8" fill="none" fill-rule="evenodd"/><path d="M2675.36 920.729C2675.36 898.089 2694.16 879.728 2717.36 879.728 2740.56 879.728 2759.36 898.089 2759.36 920.729 2759.36 943.369 2740.56 961.729 2717.36 961.729 2694.16 961.729 2675.36 943.369 2675.36 920.729Z" fill="#28A745" fill-rule="evenodd"/><path d="M3030.36 920.729C3030.36 898.089 3049.16 879.728 3072.36 879.728 3095.56 879.728 3114.36 898.089 3114.36 920.729 3114.36 943.369 3095.56 961.729 3072.36 961.729 3049.16 961.729 3030.36 943.369 3030.36 920.729Z" fill="#28A745" fill-rule="evenodd"/><path d="M2835.37 279.232C2835.37 256.864 2853.94 238.732 2876.87 238.732 2899.79 238.732 2918.37 256.864 2918.37 279.232 2918.37 301.6 2899.79 319.732 2876.87 319.732 2853.94 319.732 2835.37 301.6 2835.37 279.232Z" fill="#DC3545" fill-rule="evenodd"/></g></svg>

Before

Width:  |  Height:  |  Size: 2.5 KiB

After

Width:  |  Height:  |  Size: 3 KiB

View file

@ -1 +1 @@
<svg width="551" height="416" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" overflow="hidden"><defs><clipPath id="clip0"><rect x="681" y="729" width="551" height="416"/></clipPath></defs><g clip-path="url(#clip0)" transform="translate(-681 -729)"><path d="M692 996 813.747 1124.16" stroke="#4472C4" stroke-width="10.3125" stroke-miterlimit="8" fill="none" fill-rule="evenodd"/><path d="M0 0 76.8928 246.699" stroke="#4472C4" stroke-width="10.3125" stroke-miterlimit="8" fill="none" fill-rule="evenodd" transform="matrix(1 0 0 -1 814 1124.7)"/><path d="M891 873 1012.75 1001.16" stroke="#4472C4" stroke-width="10.3125" stroke-miterlimit="8" fill="none" fill-rule="evenodd"/><path d="M0 0 76.8928 246.699" stroke="#4472C4" stroke-width="10.3125" stroke-miterlimit="8" stroke-dasharray="41.25 30.9375" fill="none" fill-rule="evenodd" transform="matrix(1 0 0 -1 1012 1001.7)"/><path d="M1089 750 1210.75 878.155" stroke="#4472C4" stroke-width="10.3125" stroke-miterlimit="8" stroke-dasharray="41.25 30.9375" fill="none" fill-rule="evenodd"/><path d="M681 1007.5C681 996.178 690.178 987 701.5 987 712.822 987 722 996.178 722 1007.5 722 1018.82 712.822 1028 701.5 1028 690.178 1028 681 1018.82 681 1007.5Z" fill="#FFC000" fill-rule="evenodd"/><path d="M793 1124.5C793 1113.18 802.402 1104 814 1104 825.598 1104 835 1113.18 835 1124.5 835 1135.82 825.598 1145 814 1145 802.402 1145 793 1135.82 793 1124.5Z" fill="#FFC000" fill-rule="evenodd"/><path d="M870 875.5C870 864.178 879.402 855 891 855 902.598 855 912 864.178 912 875.5 912 886.822 902.598 896 891 896 879.402 896 870 886.822 870 875.5Z" fill="#FFC000" fill-rule="evenodd"/><path d="M992 996.5C992 985.178 1001.18 976 1012.5 976 1023.82 976 1033 985.178 1033 996.5 1033 1007.82 1023.82 1017 1012.5 1017 1001.18 1017 992 1007.82 992 996.5Z" fill="#FFC000" fill-rule="evenodd"/><path d="M1068 750C1068 738.402 1077.4 729 1089 729 1100.6 729 1110 738.402 1110 750 1110 761.598 1100.6 771 1089 771 1077.4 771 1068 761.598 1068 750Z" fill="#FFC000" fill-rule="evenodd"/><path d="M1191 873C1191 861.402 1200.18 852 1211.5 852 1222.82 852 1232 861.402 1232 873 1232 884.598 1222.82 894 1211.5 894 1200.18 894 1191 884.598 1191 873Z" fill="#FFC000" fill-rule="evenodd"/></g></svg> <svg width="1320" height="990" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" overflow="hidden"><defs><clipPath id="clip0"><rect x="192" y="139" width="1320" height="990"/></clipPath></defs><g clip-path="url(#clip0)" transform="translate(-192 -139)"><rect x="192" y="139" width="1320" height="990" fill="#FFFFFF"/><path d="M316.254 753.236 563.254 1013.24" stroke="#0171C3" stroke-width="20.8448" stroke-miterlimit="8" fill="none" fill-rule="evenodd"/><path d="M0.048095 0.0538267 156.049 500.054" stroke="#0171C3" stroke-width="20.8448" stroke-miterlimit="8" fill="none" fill-rule="evenodd" transform="matrix(1 0 0 -1 563.5 1014.5)"/><path d="M719.254 504.237 966.254 764.236" stroke="#0171C3" stroke-width="20.8448" stroke-miterlimit="8" fill="none" fill-rule="evenodd"/><path d="M0.119695 0.00939258 156.12 500.01" stroke="#0171C3" stroke-width="20.8448" stroke-miterlimit="8" stroke-dasharray="83.3791 62.5344" fill="none" fill-rule="evenodd" transform="matrix(1 0 0 -1 964.5 765.5)"/><path d="M1120.25 255.236 1367.25 515.237" stroke="#0171C3" stroke-width="20.8448" stroke-miterlimit="8" stroke-dasharray="83.3791 62.5344" fill="none" fill-rule="evenodd"/><path d="M293.754 776.237C293.754 753.317 312.334 734.737 335.254 734.737 358.175 734.737 376.755 753.317 376.755 776.237 376.755 799.154 358.175 817.737 335.254 817.737 312.334 817.737 293.754 799.154 293.754 776.237Z" fill="#EE9040" fill-rule="evenodd"/><path d="M520.754 1013.24C520.754 990.321 539.782 971.737 563.255 971.737 586.727 971.737 605.754 990.321 605.754 1013.24 605.754 1036.15 586.727 1054.74 563.255 1054.74 539.782 1054.74 520.754 1036.15 520.754 1013.24Z" fill="#EE9040" fill-rule="evenodd"/><path d="M676.754 509.237C676.754 486.317 695.782 467.737 719.254 467.737 742.727 467.737 761.754 486.317 761.754 509.237 761.754 532.157 742.727 550.737 719.254 550.737 695.782 550.737 676.754 532.157 676.754 509.237Z" fill="#EE9040" fill-rule="evenodd"/><path d="M923.754 754.237C923.754 731.317 942.338 712.737 965.254 712.737 988.171 712.737 1006.75 731.317 1006.75 754.237 1006.75 777.154 988.171 795.737 965.254 795.737 942.338 795.737 923.754 777.154 923.754 754.237Z" fill="#EE9040" fill-rule="evenodd"/><path d="M1077.75 255.237C1077.75 231.765 1096.78 212.737 1120.25 212.737 1143.73 212.737 1162.75 231.765 1162.75 255.237 1162.75 278.709 1143.73 297.737 1120.25 297.737 1096.78 297.737 1077.75 278.709 1077.75 255.237Z" fill="#EE9040" fill-rule="evenodd"/><path d="M1326.75 504.237C1326.75 480.765 1345.34 461.737 1368.25 461.737 1391.17 461.737 1409.75 480.765 1409.75 504.237 1409.75 527.709 1391.17 546.737 1368.25 546.737 1345.34 546.737 1326.75 527.709 1326.75 504.237Z" fill="#EE9040" fill-rule="evenodd"/></g></svg>

Before

Width:  |  Height:  |  Size: 2.2 KiB

After

Width:  |  Height:  |  Size: 2.7 KiB

File diff suppressed because one or more lines are too long

Before

Width:  |  Height:  |  Size: 4.9 KiB

After

Width:  |  Height:  |  Size: 5.9 KiB

View file

@ -1,29 +1,29 @@
# Time Series Anomaly Detection Overview # Anomaly Detection
Anomaly Detection detects abnormal samples in a given time series. _Chronos_ provides a set of unsupervised anomaly detectors. Anomaly Detection detects abnormal samples in a given time series. _Chronos_ provides a set of unsupervised anomaly detectors.
View some examples notebooks for [Datacenter AIOps][AIOps]. View some examples notebooks for [Datacenter AIOps][AIOps].
## **1. ThresholdDetector** ## **1. ThresholdDetector**
ThresholdDetector detects anomaly based on threshold. It can be used to detect anomaly on a given time series ([notebook][AIOps_anomaly_detect_unsupervised]), or used together with [Forecasters](#forecasting) to detect anomaly on new coming samples ([notebook][AIOps_anomaly_detect_unsupervised_forecast_based]). ThresholdDetector detects anomaly based on threshold. It can be used to detect anomaly on a given time series ([notebook][AIOps_anomaly_detect_unsupervised]), or used together with [Forecasters](#forecasting) to detect anomaly on new coming samples ([notebook][AIOps_anomaly_detect_unsupervised_forecast_based]).
View [ThresholdDetector API Doc](../../PythonAPI/Chronos/anomaly_detectors.html#chronos-model-anomaly-th-detector) for more details. View [ThresholdDetector API Doc](../../PythonAPI/Chronos/anomaly_detectors.html#chronos-model-anomaly-th-detector) for more details.
## **2. AEDetector** ## **2. AEDetector**
AEDetector detects anomaly based on the reconstruction error of an autoencoder network. AEDetector detects anomaly based on the reconstruction error of an autoencoder network.
View anomaly detection [notebook][AIOps_anomaly_detect_unsupervised] and [AEDetector API Doc](../../PythonAPI/Chronos/anomaly_detectors.html#chronos-model-anomaly-ae-detector) for more details. View anomaly detection [notebook][AIOps_anomaly_detect_unsupervised] and [AEDetector API Doc](../../PythonAPI/Chronos/anomaly_detectors.html#chronos-model-anomaly-ae-detector) for more details.
## **3. DBScanDetector** ## **3. DBScanDetector**
DBScanDetector uses DBSCAN clustering algortihm for anomaly detection. DBScanDetector uses DBSCAN clustering algortihm for anomaly detection.
```eval_rst ```eval_rst
.. note:: .. note::
Users may install `scikit-learn-intelex` to accelerate this detector. Chronos will detect if `scikit-learn-intelex` is installed to decide if using it. More details please refer to: https://intel.github.io/scikit-learn-intelex/installation.html Users may install ``scikit-learn-intelex`` to accelerate this detector. Chronos will detect if ``scikit-learn-intelex`` is installed to decide if using it. More details please refer to: https://intel.github.io/scikit-learn-intelex/installation.html
``` ```
View anomaly detection [notebook][AIOps_anomaly_detect_unsupervised] and [DBScanDetector API Doc](../../PythonAPI/Chronos/anomaly_detectors.html#chronos-model-anomaly-dbscan-detector) for more details. View anomaly detection [notebook][AIOps_anomaly_detect_unsupervised] and [DBScanDetector API Doc](../../PythonAPI/Chronos/anomaly_detectors.html#chronos-model-anomaly-dbscan-detector) for more details.

View file

@ -1,4 +1,4 @@
# Time Series Processing and Feature Engineering Overview # Data Processing and Feature Engineering
Time series data is a special data formulation with its specific operations. _Chronos_ provides [`TSDataset`](../../PythonAPI/Chronos/tsdataset.html) as a time series dataset abstract for data processing (e.g. impute, deduplicate, resample, scale/unscale, roll sampling) and auto feature engineering (e.g. datetime feature, aggregation feature). Chronos also provides [`XShardsTSDataset`](../../PythonAPI/Chronos/tsdataset.html#xshardstsdataset) with same(or similar) API for distributed and parallelized data preprocessing on large data. Time series data is a special data formulation with its specific operations. _Chronos_ provides [`TSDataset`](../../PythonAPI/Chronos/tsdataset.html) as a time series dataset abstract for data processing (e.g. impute, deduplicate, resample, scale/unscale, roll sampling) and auto feature engineering (e.g. datetime feature, aggregation feature). Chronos also provides [`XShardsTSDataset`](../../PythonAPI/Chronos/tsdataset.html#xshardstsdataset) with same(or similar) API for distributed and parallelized data preprocessing on large data.
@ -6,7 +6,7 @@ Users can create a [`TSDataset`](../../PythonAPI/Chronos/tsdataset.html) quickly
## **1. Basic concepts** ## **1. Basic concepts**
A time series can be interpreted as a sequence of real value whose order is timestamp. While a time series dataset can be a combination of one or a huge amount of time series. It may contain multiple time series since users may collect different time series in the same/different period of time (e.g. An AIops dataset may have CPU usage ratio and memory usage ratio data for two servers at a period of time. This dataset contains four time series). A time series can be interpreted as a sequence of real value whose order is timestamp. While a time series dataset can be a combination of one or a huge amount of time series. It may contain multiple time series since users may collect different time series in the same/different period of time (e.g. An AIops dataset may have CPU usage ratio and memory usage ratio data for two servers at a period of time. This dataset contains four time series).
In [`TSDataset`](../../PythonAPI/Chronos/tsdataset.html) and [`XShardsTSDataset`](../../PythonAPI/Chronos/tsdataset.html#xshardstsdataset), we provide **2** possible dimensions to construct a high dimension time series dataset (i.e. **feature dimension** and **id dimension**). In [`TSDataset`](../../PythonAPI/Chronos/tsdataset.html) and [`XShardsTSDataset`](../../PythonAPI/Chronos/tsdataset.html#xshardstsdataset), we provide **2** possible dimensions to construct a high dimension time series dataset (i.e. **feature dimension** and **id dimension**).
@ -16,10 +16,10 @@ In [`TSDataset`](../../PythonAPI/Chronos/tsdataset.html) and [`XShardsTSDataset`
All the preprocessing operations will be done on each independent time series(i.e on both feature dimension and id dimension), while feature scaling will be only carried out on the feature dimension. All the preprocessing operations will be done on each independent time series(i.e on both feature dimension and id dimension), while feature scaling will be only carried out on the feature dimension.
```eval_rst ```eval_rst
.. note:: .. note::
``XShardsTSDataset`` will perform the data processing in parallel(based on spark) to support large dataset. While the parallelization will only be performed on "id dimension". This means, in previous example, ``XShardsTSDataset`` will only utilize multiple workers to process data for different servers at the same time. If a dataset only has 1 id, ``XShardsTSDataset`` will be even slower than ``TSDataset`` because of the overhead. ``XShardsTSDataset`` will perform the data processing in parallel(based on spark) to support large dataset. While the parallelization will only be performed on "id dimension". This means, in previous example, ``XShardsTSDataset`` will only utilize multiple workers to process data for different servers at the same time. If a dataset only has 1 id, ``XShardsTSDataset`` will be even slower than ``TSDataset`` because of the overhead.
``` ```
## **2. Create a TSDataset** ## **2. Create a TSDataset**
@ -40,13 +40,13 @@ You can initialize a [`XShardsTSDataset`](../../PythonAPI/Chronos/tsdataset.html
.. code-block:: python .. code-block:: python
# Server id Datetime CPU usage Mem usage # Server id Datetime CPU usage Mem usage
# 0 08:39 2021/7/9 93 24 # 0 08:39 2021/7/9 93 24
# 0 08:40 2021/7/9 91 24 # 0 08:40 2021/7/9 91 24
# 0 08:41 2021/7/9 93 25 # 0 08:41 2021/7/9 93 25
# 0 ... ... ... # 0 ... ... ...
# 1 08:39 2021/7/9 73 79 # 1 08:39 2021/7/9 73 79
# 1 08:40 2021/7/9 72 80 # 1 08:40 2021/7/9 72 80
# 1 08:41 2021/7/9 79 80 # 1 08:41 2021/7/9 79 80
# 1 ... ... ... # 1 ... ... ...
from bigdl.chronos.data import TSDataset from bigdl.chronos.data import TSDataset
@ -74,14 +74,14 @@ You can initialize a [`XShardsTSDataset`](../../PythonAPI/Chronos/tsdataset.html
target_col="value", id_col="id", target_col="value", id_col="id",
extra_feature_col=["extra feature 1", extra_feature_col=["extra feature 1",
"extra feature 2"]) "extra feature 2"])
``` ```
`target_col` is a list of all elements along feature dimension, while `id_col` is the identifier that distinguishes the id dimension. `dt_col` is the datetime column. For `extra_feature_col`(not shown in this case), you should list those features that you are not interested for your task (e.g. you will **not** perform forecasting or anomaly detection task on this col). `target_col` is a list of all elements along feature dimension, while `id_col` is the identifier that distinguishes the id dimension. `dt_col` is the datetime column. For `extra_feature_col`(not shown in this case), you should list those features that you are not interested for your task (e.g. you will **not** perform forecasting or anomaly detection task on this col).
If you are building a prototype for your forecasting/anomaly detection task and you need to split you TSDataset to train/valid/test set, you can use `with_split` parameter.[`TSDataset`](../../PythonAPI/Chronos/tsdataset.html) or [`XShardsTSDataset`](../../PythonAPI/Chronos/tsdataset.html#xshardstsdataset) supports split with ratio by `val_ratio` and `test_ratio`. If you are building a prototype for your forecasting/anomaly detection task and you need to split you TSDataset to train/valid/test set, you can use `with_split` parameter.[`TSDataset`](../../PythonAPI/Chronos/tsdataset.html) or [`XShardsTSDataset`](../../PythonAPI/Chronos/tsdataset.html#xshardstsdataset) supports split with ratio by `val_ratio` and `test_ratio`.
## **3. Time series dataset preprocessing** ## **3. Time series dataset preprocessing**
[`TSDataset`](../../PythonAPI/Chronos/tsdataset.html) supports [`impute`](../../PythonAPI/Chronos/tsdataset.html#bigdl.chronos.data.tsdataset.TSDataset.impute), [`deduplicate`](../../PythonAPI/Chronos/tsdataset.html#bigdl.chronos.data.tsdataset.TSDataset.deduplicate) and [`resample`](../../PythonAPI/Chronos/tsdataset.html#bigdl.chronos.data.tsdataset.TSDataset.resample). You may fill the missing point by [`impute`](../../PythonAPI/Chronos/tsdataset.html#bigdl.chronos.data.tsdataset.TSDataset.impute) in different modes. You may remove the records that are totally the same by [`deduplicate`](../../PythonAPI/Chronos/tsdataset.html#bigdl.chronos.data.tsdataset.TSDataset.deduplicate). You may change the sample frequency by [`resample`](../../PythonAPI/Chronos/tsdataset.html#bigdl.chronos.data.tsdataset.TSDataset.resample). [`XShardsTSDataset`](../../PythonAPI/Chronos/tsdataset.html#xshardstsdataset) only supports [`impute`](../../PythonAPI/Chronos/tsdataset.html#bigdl.chronos.data.experimental.xshards_tsdataset.XShardsTSDataset.impute) for now. [`TSDataset`](../../PythonAPI/Chronos/tsdataset.html) supports [`impute`](../../PythonAPI/Chronos/tsdataset.html#bigdl.chronos.data.tsdataset.TSDataset.impute), [`deduplicate`](../../PythonAPI/Chronos/tsdataset.html#bigdl.chronos.data.tsdataset.TSDataset.deduplicate) and [`resample`](../../PythonAPI/Chronos/tsdataset.html#bigdl.chronos.data.tsdataset.TSDataset.resample). You may fill the missing point by [`impute`](../../PythonAPI/Chronos/tsdataset.html#bigdl.chronos.data.tsdataset.TSDataset.impute) in different modes. You may remove the records that are totally the same by [`deduplicate`](../../PythonAPI/Chronos/tsdataset.html#bigdl.chronos.data.tsdataset.TSDataset.deduplicate). You may change the sample frequency by [`resample`](../../PythonAPI/Chronos/tsdataset.html#bigdl.chronos.data.tsdataset.TSDataset.resample). [`XShardsTSDataset`](../../PythonAPI/Chronos/tsdataset.html#xshardstsdataset) only supports [`impute`](../../PythonAPI/Chronos/tsdataset.html#bigdl.chronos.data.experimental.xshards_tsdataset.XShardsTSDataset.impute) for now.
A typical cascade call for preprocessing is: A typical cascade call for preprocessing is:
```eval_rst ```eval_rst
@ -92,7 +92,7 @@ A typical cascade call for preprocessing is:
.. code-block:: python .. code-block:: python
tsdata.deduplicate().resample(interval="2s").impute() tsdata.deduplicate().resample(interval="2s").impute()
.. tab:: XShardsTSDataset .. tab:: XShardsTSDataset
.. code-block:: python .. code-block:: python
@ -109,7 +109,7 @@ Since a scaler should not fit, a typical call for scaling operations is is:
.. tabs:: .. tabs::
.. tab:: TSDataset .. tab:: TSDataset
.. code-block:: python .. code-block:: python
from sklearn.preprocessing import StandardScaler from sklearn.preprocessing import StandardScaler
@ -139,14 +139,14 @@ Since a scaler should not fit, a typical call for scaling operations is is:
for tsdata in [tsdata_train, tsdata_valid, tsdata_test]: for tsdata in [tsdata_train, tsdata_valid, tsdata_test]:
tsdata.unscale() tsdata.unscale()
``` ```
[`unscale_numpy`](../../PythonAPI/Chronos/tsdataset.html#bigdl.chronos.data.tsdataset.TSDataset.unscale_numpy) in TSDataset or [`unscale_xshards`](../../PythonAPI/Chronos/tsdataset.html#bigdl.chronos.data.experimental.xshards_tsdataset.XShardsTSDataset.unscale_xshards) in XShardsTSDataset is specially designed for forecasters. Users may unscale the output of a forecaster by this operation. [`unscale_numpy`](../../PythonAPI/Chronos/tsdataset.html#bigdl.chronos.data.tsdataset.TSDataset.unscale_numpy) in TSDataset or [`unscale_xshards`](../../PythonAPI/Chronos/tsdataset.html#bigdl.chronos.data.experimental.xshards_tsdataset.XShardsTSDataset.unscale_xshards) in XShardsTSDataset is specially designed for forecasters. Users may unscale the output of a forecaster by this operation.
A typical call is: A typical call is:
```eval_rst ```eval_rst
.. tabs:: .. tabs::
.. tab:: TSDataset .. tab:: TSDataset
.. code-block:: python .. code-block:: python
x, y = tsdata_test.scale(scaler)\ x, y = tsdata_test.scale(scaler)\
@ -156,9 +156,9 @@ A typical call is:
unscaled_yhat = tsdata_test.unscale_numpy(yhat) unscaled_yhat = tsdata_test.unscale_numpy(yhat)
unscaled_y = tsdata_test.unscale_numpy(y) unscaled_y = tsdata_test.unscale_numpy(y)
# calculate metric by unscaled_yhat and unscaled_y # calculate metric by unscaled_yhat and unscaled_y
.. tab:: XShardsTSDataset .. tab:: XShardsTSDataset
.. code-block:: python .. code-block:: python
x, y = tsdata_test.scale(scaler)\ x, y = tsdata_test.scale(scaler)\
@ -176,28 +176,28 @@ Other than historical target data and other extra feature provided by users, som
A time series dataset needs to be sampling and exporting as numpy ndarray/dataloader to be used in machine learning and deep learning models(e.g. forecasters, anomaly detectors, auto models, etc.). A time series dataset needs to be sampling and exporting as numpy ndarray/dataloader to be used in machine learning and deep learning models(e.g. forecasters, anomaly detectors, auto models, etc.).
```eval_rst ```eval_rst
.. warning:: .. warning::
You don't need to call any sampling or exporting methods introduced in this section when using `AutoTSEstimator`. You don't need to call any sampling or exporting methods introduced in this section when using ``AutoTSEstimator``.
``` ```
### **6.1 Roll sampling** ### **6.1 Roll sampling**
Roll sampling (or sliding window sampling) is useful when you want to train a RR type supervised deep learning forecasting model. It works as the [diagram](#RR-forecast-image) shows. Roll sampling (or sliding window sampling) is useful when you want to train a RR type supervised deep learning forecasting model. It works as the [diagram](#RR-forecast-image) shows.
Please refer to the API doc [`roll`](../../PythonAPI/Chronos/tsdataset.html#bigdl.chronos.data.tsdataset.TSDataset.roll) for detailed behavior. Users can simply export the sampling result as numpy ndarray by [`to_numpy`](../../PythonAPI/Chronos/tsdataset.html#bigdl.chronos.data.tsdataset.TSDataset.to_numpy), pytorch dataloader [`to_torch_data_loader`](../../PythonAPI/Chronos/tsdataset.html#bigdl.chronos.data.tsdataset.TSDataset.to_torch_data_loader), tensorflow dataset by [to_tf_dataset](https://bigdl.readthedocs.io/en/latest/doc/PythonAPI/Chronos/tsdataset.html#bigdl.chronos.data.tsdataset.TSDataset.to_tf_dataset) or xshards object by [to_xshards](https://bigdl.readthedocs.io/en/latest/doc/PythonAPI/Chronos/tsdataset.html#bigdl.chronos.data.experimental.xshards_tsdataset.XShardsTSDataset.to_xshards). Please refer to the API doc [`roll`](../../PythonAPI/Chronos/tsdataset.html#bigdl.chronos.data.tsdataset.TSDataset.roll) for detailed behavior. Users can simply export the sampling result as numpy ndarray by [`to_numpy`](../../PythonAPI/Chronos/tsdataset.html#bigdl.chronos.data.tsdataset.TSDataset.to_numpy), pytorch dataloader [`to_torch_data_loader`](../../PythonAPI/Chronos/tsdataset.html#bigdl.chronos.data.tsdataset.TSDataset.to_torch_data_loader), tensorflow dataset by [to_tf_dataset](https://bigdl.readthedocs.io/en/latest/doc/PythonAPI/Chronos/tsdataset.html#bigdl.chronos.data.tsdataset.TSDataset.to_tf_dataset) or xshards object by [to_xshards](https://bigdl.readthedocs.io/en/latest/doc/PythonAPI/Chronos/tsdataset.html#bigdl.chronos.data.experimental.xshards_tsdataset.XShardsTSDataset.to_xshards).
```eval_rst ```eval_rst
.. note:: .. note::
**Difference between `roll` and `to_torch_data_loader`**: **Difference between** ``roll`` **and** ``to_torch_data_loader``:
`.roll(...)` performs the rolling before RR forecasters/auto models training while `.to_torch_data_loader(...)` performs rolling during the training. ``.roll(...)`` performs the rolling before RR forecasters/auto models training while ``.to_torch_data_loader(...)`` performs rolling during the training.
It is fine to use either of them when you have a relatively small dataset (less than 1G). `.to_torch_data_loader(...)` is recommended when you have a large dataset (larger than 1G) to save memory usage. It is fine to use either of them when you have a relatively small dataset (less than 1G). ``.to_torch_data_loader(...)`` is recommended when you have a large dataset (larger than 1G) to save memory usage.
``` ```
```eval_rst ```eval_rst
.. note:: .. note::
**Roll sampling format**: **Roll sampling format**:
As decribed in RR style forecasting concept, the sampling result will have the following shape requirement. As decribed in RR style forecasting concept, the sampling result will have the following shape requirement.
| x: (sample_num, lookback, input_feature_num) | x: (sample_num, lookback, input_feature_num)
@ -218,7 +218,7 @@ A typical call of [`roll`](../../PythonAPI/Chronos/tsdataset.html#bigdl.chronos.
# forecaster # forecaster
x, y = tsdata.roll(lookback=..., horizon=...).to_numpy() x, y = tsdata.roll(lookback=..., horizon=...).to_numpy()
forecaster.fit((x, y)) forecaster.fit((x, y))
.. tab:: XShardsTSDataset .. tab:: XShardsTSDataset
.. code-block:: python .. code-block:: python
@ -235,7 +235,7 @@ Now we support pandas dataframe exporting through `to_pandas()` for users to car
x = tsdata.to_pandas()["target"].to_numpy() x = tsdata.to_pandas()["target"].to_numpy()
anomaly_detector.fit(x) anomaly_detector.fit(x)
``` ```
View [TSDataset API Doc](../../PythonAPI/Chronos/tsdataset.html#) for more details. View [TSDataset API Doc](../../PythonAPI/Chronos/tsdataset.html#) for more details.
## **7. Built-in Dataset** ## **7. Built-in Dataset**

View file

@ -1,4 +1,4 @@
# Time Series Forecasting Overview # Time Series Forecasting
_Chronos_ provides both deep learning/machine learning models and traditional statistical models for forecasting. _Chronos_ provides both deep learning/machine learning models and traditional statistical models for forecasting.
@ -67,16 +67,16 @@ For AutoTS Pipeline, we will leverage `AutoTSEstimator`, `TSPipeline` and prefer
3. Use the returned `TSPipeline` for further development. 3. Use the returned `TSPipeline` for further development.
```eval_rst ```eval_rst
.. warning:: .. warning::
`AutoTSTrainer` workflow has been deprecated, no feature updates or performance improvement will be carried out. Users of `AutoTSTrainer` may refer to `Chronos API doc <https://bigdl.readthedocs.io/en/latest/doc/PythonAPI/Chronos/autots.html>`_. ``AutoTSTrainer`` workflow has been deprecated, no feature updates or performance improvement will be carried out. Users of ``AutoTSTrainer`` may refer to `Chronos API doc <https://bigdl.readthedocs.io/en/latest/doc/PythonAPI/Chronos/autots.html>`_.
``` ```
```eval_rst ```eval_rst
.. note:: .. note::
`AutoTSEstimator` currently only support pytorch backend. ``AutoTSEstimator`` currently only support pytorch backend.
``` ```
View [Quick Start](../QuickStart/chronos-autotsest-quickstart.html) for a more detailed example. View [Quick Start](../QuickStart/chronos-autotsest-quickstart.html) for a more detailed example.
##### **2.1 Prepare dataset** ##### **2.1 Prepare dataset**
`AutoTSEstimator` support 2 types of data input. `AutoTSEstimator` support 2 types of data input.
You can easily prepare your data in `TSDataset` (recommended). You may refer to [here](#TSDataset) for the detailed information to prepare your `TSDataset` with proper data processing and feature generation. Here is a typical `TSDataset` preparation. You can easily prepare your data in `TSDataset` (recommended). You may refer to [here](#TSDataset) for the detailed information to prepare your `TSDataset` with proper data processing and feature generation. Here is a typical `TSDataset` preparation.
```python ```python
@ -107,7 +107,7 @@ auto_estimator = AutoTSEstimator(model='lstm',
search_space='normal', search_space='normal',
past_seq_len=hp.randint(1, 10), past_seq_len=hp.randint(1, 10),
future_seq_len=1, future_seq_len=1,
selected_features="auto") selected_features="auto")
``` ```
We prebuild three defualt search space for each build-in model, which you can use the by setting `search_space` to "minimal""normal", or "large" or define your own search space in a dictionary. The larger the search space, the better accuracy you will get and the more time will be cost. We prebuild three defualt search space for each build-in model, which you can use the by setting `search_space` to "minimal""normal", or "large" or define your own search space in a dictionary. The larger the search space, the better accuracy you will get and the more time will be cost.
@ -147,7 +147,7 @@ Detailed information please refer to [TSPipeline API doc](../../PythonAPI/Chrono
```eval_rst ```eval_rst
.. note:: .. note::
`init_orca_context` is not needed if you just use the trained TSPipeline for inference, evaluation or incremental fitting. ``init_orca_context`` is not needed if you just use the trained TSPipeline for inference, evaluation or incremental fitting.
``` ```
```eval_rst ```eval_rst
.. note:: .. note::
@ -160,7 +160,7 @@ _Chronos_ provides a set of standalone time series forecasters without AutoML su
View some examples notebooks for [Network Traffic Prediction][network_traffic] View some examples notebooks for [Network Traffic Prediction][network_traffic]
The common process of using a Forecaster looks like below. The common process of using a Forecaster looks like below.
```python ```python
# set fixed hyperparameters, loss, metric... # set fixed hyperparameters, loss, metric...
f = Forecaster(...) f = Forecaster(...)
@ -197,9 +197,9 @@ View Network Traffic multivariate multistep Prediction [notebook][network_traffi
##### **3.4 MTNetForecaster** ##### **3.4 MTNetForecaster**
```eval_rst ```eval_rst
.. note:: .. note::
**Additional Dependencies**: **Additional Dependencies**:
You need to install `bigdl-nano[tensorflow]` to enable this built-in model. You need to install ``bigdl-nano[tensorflow]`` to enable this built-in model.
``pip install bigdl-nano[tensorflow]`` ``pip install bigdl-nano[tensorflow]``
``` ```
@ -219,9 +219,9 @@ View High-dimensional Electricity Data Forecasting [example][run_electricity] an
##### **3.6 ARIMAForecaster** ##### **3.6 ARIMAForecaster**
```eval_rst ```eval_rst
.. note:: .. note::
**Additional Dependencies**: **Additional Dependencies**:
You need to install `pmdarima` to enable this built-in model. You need to install ``pmdarima`` to enable this built-in model.
``pip install pmdarima==1.8.5`` ``pip install pmdarima==1.8.5``
``` ```
@ -234,7 +234,7 @@ View [ARIMAForecaster API Doc](../../PythonAPI/Chronos/forecasters.html#arimafor
##### **3.7 ProphetForecaster** ##### **3.7 ProphetForecaster**
```eval_rst ```eval_rst
.. note:: .. note::
**Additional Dependencies**: **Additional Dependencies**:
You need to install `prophet` to enable this built-in model. You need to install `prophet` to enable this built-in model.
@ -242,7 +242,7 @@ View [ARIMAForecaster API Doc](../../PythonAPI/Chronos/forecasters.html#arimafor
``` ```
```eval_rst ```eval_rst
.. note:: .. note::
**Acceleration Note**: **Acceleration Note**:
Intel® Distribution for Python may improve the speed of prophet's training and inferencing. You may install it by refering to https://www.intel.com/content/www/us/en/developer/tools/oneapi/distribution-for-python.html. Intel® Distribution for Python may improve the speed of prophet's training and inferencing. You may install it by refering to https://www.intel.com/content/www/us/en/developer/tools/oneapi/distribution-for-python.html.
``` ```

View file

@ -1,59 +1,30 @@
# Chronos User Guide # Chronos Installation
### **1. Overview**
_BigDL-Chronos_ (_Chronos_ for short) is an application framework for building a fast, accurate and scalable time series analysis application.
You can use _Chronos_ to:
```eval_rst
.. grid:: 3
:gutter: 1
.. grid-item-card::
:class-footer: sd-bg-light
**Forecasting**
^^^
.. image:: ../Image/forecasting.svg
:width: 200
:alt: Alternative text
+++
Predict future using history data.
.. grid-item-card::
:class-footer: sd-bg-light
**Anomaly Detection**
^^^
.. image:: ../Image/anomaly_detection.svg
:width: 200
:alt: Alternative text
+++
Discover unexpected items in data.
.. grid-item-card::
:class-footer: sd-bg-light
**Simulation**
^^^
.. image:: ../Image/simulation.svg
:width: 200
:alt: Alternative text
+++
Generate similar data as history data.
```
--- ---
### **2. Install**
#### **OS and Python version requirement**
```eval_rst
.. note::
**Supported OS**:
Chronos is thoroughly tested on Ubuntu (16.04/18.04/20.04), and should works fine on CentOS. If you are a Windows user, the most convenient way to use Chronos on a windows laptop might be using WSL2, you may refer to https://docs.microsoft.com/en-us/windows/wsl/setup/environment or just install a ubuntu virtual machine.
```
```eval_rst
.. note::
**Supported Python Version**:
Chronos only supports Python 3.7.2 ~ latest 3.7.x. We are validating more Python versions.
```
#### **Install using Conda**
We recommend using conda to manage the Chronos python environment. For more information about Conda, refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
Select your preferences in the panel below to find the proper install command. Then run the install command as the example shown below.
```eval_rst ```eval_rst
.. raw:: html .. raw:: html
@ -61,7 +32,7 @@ You can use _Chronos_ to:
<link rel="stylesheet" type="text/css" href="../../../_static/css/chronos_installation_guide.css" /> <link rel="stylesheet" type="text/css" href="../../../_static/css/chronos_installation_guide.css" />
<div class="displayed"> <div class="displayed">
<table id="table-1"> <table id="table-1">
<tbody> <tbody>
<tr> <tr>
@ -131,97 +102,22 @@ You can use _Chronos_ to:
</table> </table>
</div> </div>
<script src="../../../_static/js/chronos_installation_guide.js"></script> <script src="../../../_static/js/chronos_installation_guide.js"></script>
``` ```
</br> </br>
#### **2.1 Pypi**
When you install `bigdl-chronos` from PyPI. We recommend to install with a conda virtual environment. To install Conda, please refer to [here](https://docs.conda.io/en/latest/miniconda.html#).
```bash ```bash
# create a conda environment for chronos
conda create -n my_env python=3.7 setuptools=58.0.4 conda create -n my_env python=3.7 setuptools=58.0.4
conda activate my_env conda activate my_env
# click the installation panel above to find which installation option to use
pip install --pre --upgrade bigdl-chronos[pytorch] # or other options you may want to use # select your preference in above panel to find the proper command to replace the below command, e.g.
pip install --pre --upgrade bigdl-chronos[pytorch]
# init bigdl-nano to enable local accelerations
source bigdl-nano-init # accelerate the conda env source bigdl-nano-init # accelerate the conda env
``` ```
#### **2.2 OS and Python version requirement**
```eval_rst
.. note::
**Supported OS**:
Chronos is thoroughly tested on Ubuntu (16.04/18.04/20.04), and should works fine on CentOS. If you are a Windows user, the most convenient way to use Chronos on a windows laptop might be using WSL2, you may refer to https://docs.microsoft.com/en-us/windows/wsl/setup/environment or just install a ubuntu virtual machine.
```
```eval_rst
.. note::
**Supported Python Version**:
Chronos only supports Python 3.7.2 ~ latest 3.7.x. We are validating more Python versions.
```
---
### **3. Which document to see?**
```eval_rst
.. grid:: 2
:gutter: 1
.. grid-item-card::
:class-footer: sd-bg-light
**Quick Tour**
^^^
You may understand the basic usage of Chronos' components and learn to write the first runnable application in this quick tour page.
+++
`Quick Tour <./quick-tour.html>`_
.. grid-item-card::
:class-footer: sd-bg-light
**User Guides**
^^^
Our user guides provide you with in-depth information, concepts and knowledges about Chronos.
+++
`Data <./data_processing_feature_engineering.html>`_ /
`Forecast <./forecasting.html>`_ /
`Detect <./anomaly_detection.html>`_ /
`Simulate <./simulation.html>`_
.. grid:: 2
:gutter: 1
.. grid-item-card::
:class-footer: sd-bg-light
**How-to-Guide** / **Example**
^^^
If you are meeting with some specific problems during the usage, how-to guides are good place to be checked.
Examples provides short, high quality use case that users can emulated in their own works.
+++
`How-to-Guide <../Howto/index.html>`_ / `Example <../QuickStart/index.html>`_
.. grid-item-card::
:class-footer: sd-bg-light
**API Document**
^^^
API Document provides you with a detailed description of the Chronos APIs.
+++
`API Document <../../PythonAPI/Chronos/index.html>`_
```

View file

@ -1,15 +1,11 @@
Chronos Quick Tour Chronos Quick Tour
====================== =================================
Welcome to Chronos for building a fast, accurate and scalable time series analysis application🎉! Start with our quick tour to understand some critical concepts and how to use them to tackle your tasks. Welcome to Chronos for building a fast, accurate and scalable time series analysis application🎉! Start with our quick tour to understand some critical concepts and how to use them to tackle your tasks.
.. grid:: 1 1 1 1 .. grid:: 1 1 1 1
.. grid-item-card:: .. grid-item-card::
:text-align: center :text-align: center
:shadow: none
:class-header: sd-bg-light
:class-footer: sd-bg-light
:class-card: sd-mb-2
**Data processing** **Data processing**
^^^ ^^^
@ -22,13 +18,11 @@ Welcome to Chronos for building a fast, accurate and scalable time series analys
Get Started Get Started
.. grid:: 1 1 3 3 .. grid:: 1 3 3 3
:gutter: 2
.. grid-item-card:: .. grid-item-card::
:text-align: center :text-align: center
:shadow: none
:class-header: sd-bg-light
:class-footer: sd-bg-light
:class-card: sd-mb-2 :class-card: sd-mb-2
**Forecasting** **Forecasting**
@ -42,11 +36,8 @@ Welcome to Chronos for building a fast, accurate and scalable time series analys
Get Started Get Started
.. grid-item-card:: .. grid-item-card::
:text-align: center :text-align: center
:shadow: none
:class-header: sd-bg-light
:class-footer: sd-bg-light
:class-card: sd-mb-2 :class-card: sd-mb-2
**Anomaly Detection** **Anomaly Detection**
@ -60,11 +51,8 @@ Welcome to Chronos for building a fast, accurate and scalable time series analys
Get Started Get Started
.. grid-item-card:: .. grid-item-card::
:text-align: center :text-align: center
:shadow: none
:class-header: sd-bg-light
:class-footer: sd-bg-light
:class-card: sd-mb-2 :class-card: sd-mb-2
**Simulation** **Simulation**
@ -104,7 +92,7 @@ In Chronos, we provide a ``TSDataset`` (and a ``XShardsTSDataset`` to handle lar
.. grid:: 2 .. grid:: 2
:gutter: 1 :gutter: 2
.. grid-item-card:: .. grid-item-card::
@ -192,7 +180,7 @@ For time series forecasting, we also provide an ``AutoTSEstimator`` for distribu
stop_orca_context() stop_orca_context()
.. grid:: 3 .. grid:: 3
:gutter: 1 :gutter: 2
.. grid-item-card:: .. grid-item-card::
@ -246,7 +234,7 @@ To import a specific detector, you may use {algorithm name} + "Detector", and ca
anomaly_indexes = detector.anomaly_indexes() anomaly_indexes = detector.anomaly_indexes()
.. grid:: 3 .. grid:: 3
:gutter: 1 :gutter: 2
.. grid-item-card:: .. grid-item-card::
@ -280,7 +268,7 @@ Simulator(experimental)
Simulator is still under activate development with unstable API. Simulator is still under activate development with unstable API.
.. grid:: 2 .. grid:: 2
:gutter: 1 :gutter: 2
.. grid-item-card:: .. grid-item-card::

View file

@ -1,17 +1,17 @@
# Generate Synthetic Sequential Data Overview # Synthetic Data Generation
Chronos provides simulators to generate synthetic time series data for users who want to conquer limited data access in a deep learning/machine learning project or only want to generate some synthetic data to play with. Chronos provides simulators to generate synthetic time series data for users who want to conquer limited data access in a deep learning/machine learning project or only want to generate some synthetic data to play with.
```eval_rst ```eval_rst
.. note:: .. note::
DPGANSimulator is the only simulator chronos provides at the moment, more simulators are on their way. ``DPGANSimulator`` is the only simulator chronos provides at the moment, more simulators are on their way.
``` ```
## **1. DPGANSimulator** ## **1. DPGANSimulator**
`DPGANSimulator` adopt DoppelGANger raised in [Using GANs for Sharing Networked Time Series Data: Challenges, Initial Promise, and Open Questions](http://arxiv.org/abs/1909.13403). The method is data-driven unsupervised method based on deep learning model with GAN (Generative Adversarial Networks) structure. The model features a pair of seperate attribute generator and feature generator and their corresponding discriminators `DPGANSimulator` also supports a rich and comprehensive input data (training data) format and outperform other algorithms in many evalution metrics. `DPGANSimulator` adopt DoppelGANger raised in [Using GANs for Sharing Networked Time Series Data: Challenges, Initial Promise, and Open Questions](http://arxiv.org/abs/1909.13403). The method is data-driven unsupervised method based on deep learning model with GAN (Generative Adversarial Networks) structure. The model features a pair of seperate attribute generator and feature generator and their corresponding discriminators `DPGANSimulator` also supports a rich and comprehensive input data (training data) format and outperform other algorithms in many evalution metrics.
```eval_rst ```eval_rst
.. note:: .. note::
We reimplement this model by pytorch(original implementation was based on tf1) for better performance(both speed and memory). We reimplement this model by pytorch(original implementation was based on tf1) for better performance(both speed and memory).
``` ```

View file

@ -1,4 +1,4 @@
# Speed up Chronos built-in models/customized time-series models # Accelerated Training and Inference
Chronos provides transparent acceleration for Chronos built-in models and customized time-series models. In this deep-dive page, we will introduce how to enable/disable them. Chronos provides transparent acceleration for Chronos built-in models and customized time-series models. In this deep-dive page, we will introduce how to enable/disable them.
@ -16,7 +16,7 @@ Time series model, especially those deep learning models, often suffers slow tra
### **2. Training Acceleration** ### **2. Training Acceleration**
Training Acceleration is transparent in Chronos's API. Transparentness means that Chronos users will enjoy the acceleration without changing their code(unless some expert users want to set some advanced settings). Training Acceleration is transparent in Chronos's API. Transparentness means that Chronos users will enjoy the acceleration without changing their code(unless some expert users want to set some advanced settings).
```eval_rst ```eval_rst
.. note:: .. note::
**Write your script under** ``if __name__=="__main__":``: **Write your script under** ``if __name__=="__main__":``:
Chronos will automatically utilize the computation resources on the hardware. This may include multi-process training on a single node. Use this header will prevent many strange behavior. Chronos will automatically utilize the computation resources on the hardware. This may include multi-process training on a single node. Use this header will prevent many strange behavior.
@ -65,7 +65,7 @@ We have examples adapted from `pytorch-forecasting`'s examples to show the signi
We are working on the acceleration of `AutoModel` and `AutoTSEstimator`. Please unset the environment by: We are working on the acceleration of `AutoModel` and `AutoTSEstimator`. Please unset the environment by:
```bash ```bash
source bigdl-nano-unset-env source bigdl-nano-unset-env
``` ```
### **3. Inference Acceleration** ### **3. Inference Acceleration**
Inference has become a critical part for time series model's performance. This may be divided to two parts: Inference has become a critical part for time series model's performance. This may be divided to two parts:
@ -77,10 +77,10 @@ Typically, throughput and latency is a trade-off pair. We have three optimizatio
- **ONNX Runtime**: Users may export their trained(w/wo auto tuning) model to ONNX file and deploy it on other service. Chronos also provides an internal onnxruntime inference support for those users who pursue low latency and higher throughput during inference on a single node. - **ONNX Runtime**: Users may export their trained(w/wo auto tuning) model to ONNX file and deploy it on other service. Chronos also provides an internal onnxruntime inference support for those users who pursue low latency and higher throughput during inference on a single node.
- **Quantization**: Quantization refers to processes that enable lower precision inference. In Chronos, post-training quantization is supported relied on [Intel® Neural Compressor](https://intel.github.io/neural-compressor/README.html). - **Quantization**: Quantization refers to processes that enable lower precision inference. In Chronos, post-training quantization is supported relied on [Intel® Neural Compressor](https://intel.github.io/neural-compressor/README.html).
```eval_rst ```eval_rst
.. note:: .. note::
**Additional Dependencies**: **Additional Dependencies**:
You need to install `neural-compressor` to enable quantization related methods. You need to install ``neural-compressor`` to enable quantization related methods.
``pip install neural-compressor==1.8.1`` ``pip install neural-compressor==1.8.1``
``` ```

View file

@ -1,56 +1,7 @@
# Useful Functionalities Overview # Distributed Processing
#### **1. AutoML Visualization**
AutoML visualization provides two kinds of visualization. You may use them while fitting on auto models or AutoTS pipeline. #### **Distributed training**
* During the searching process, the visualizations of each trail are shown and updated every 30 seconds. (Monitor view)
* After the searching process, a leaderboard of each trail's configs and metrics is shown. (Leaderboard view)
**Note**: AutoML visualization is based on tensorboard and tensorboardx. They should be installed properly before the training starts.
<span id="monitor_view">**Monitor view**</span>
Before training, start the tensorboard server through
```python
tensorboard --logdir=<logs_dir>/<name>
```
`logs_dir` is the log directory you set for your predictor(e.g. `AutoTSEstimator`, `AutoTCN`, etc.). `name ` is the name parameter you set for your predictor.
The data in SCALARS tag will be updated every 30 seconds for users to see the training progress.
![](../Image/automl_monitor.png)
After training, start the tensorboard server through
```python
tensorboard --logdir=<logs_dir>/<name>_leaderboard/
```
where `logs_dir` and `name` are the same as stated in [Monitor view](#monitor_view).
A dashboard of each trail's configs and metrics is shown in the SCALARS tag.
![](../Image/automl_scalars.png)
A leaderboard of each trail's configs and metrics is shown in the HPARAMS tag.
![](../Image/automl_hparams.png)
**Use visualization in Jupyter Notebook**
You can enable a tensorboard view in jupyter notebook by the following code.
```python
%load_ext tensorboard
# for scalar view
%tensorboard --logdir <logs_dir>/<name>/
# for leaderboard view
%tensorboard --logdir <logs_dir>/<name>_leaderboard/
```
#### **2. Distributed training**
LSTM, TCN and Seq2seq users can easily train their forecasters in a distributed fashion to **handle extra large dataset and utilize a cluster**. The functionality is powered by Project Orca. LSTM, TCN and Seq2seq users can easily train their forecasters in a distributed fashion to **handle extra large dataset and utilize a cluster**. The functionality is powered by Project Orca.
```python ```python
f = Forecaster(..., distributed=True) f = Forecaster(..., distributed=True)
@ -59,10 +10,10 @@ f.predict(...)
f.to_local() # collect the forecaster to single node f.to_local() # collect the forecaster to single node
f.predict_with_onnx(...) # onnxruntime only supports single node f.predict_with_onnx(...) # onnxruntime only supports single node
``` ```
#### **3. XShardsTSDataset** #### **Distributed Data processing: XShardsTSDataset**
```eval_rst ```eval_rst
.. warning:: .. warning::
`XShardsTSDataset` is still experimental. ``XShardsTSDataset`` is still experimental.
``` ```
`TSDataset` is a single thread lib with reasonable speed on large datasets(~10G). When you handle an extra large dataset or limited memory on a single node, `XShardsTSDataset` can be involved to handle the exact same functionality and usage as `TSDataset` in a distributed fashion. `TSDataset` is a single thread lib with reasonable speed on large datasets(~10G). When you handle an extra large dataset or limited memory on a single node, `XShardsTSDataset` can be involved to handle the exact same functionality and usage as `TSDataset` in a distributed fashion.

View file

@ -0,0 +1,49 @@
# AutoML Visualization
AutoML visualization provides two kinds of visualization. You may use them while fitting on auto models or AutoTS pipeline.
* During the searching process, the visualizations of each trail are shown and updated every 30 seconds. (Monitor view)
* After the searching process, a leaderboard of each trail's configs and metrics is shown. (Leaderboard view)
**Note**: AutoML visualization is based on tensorboard and tensorboardx. They should be installed properly before the training starts.
<span id="monitor_view">**Monitor view**</span>
Before training, start the tensorboard server through
```python
tensorboard --logdir=<logs_dir>/<name>
```
`logs_dir` is the log directory you set for your predictor(e.g. `AutoTSEstimator`, `AutoTCN`, etc.). `name ` is the name parameter you set for your predictor.
The data in SCALARS tag will be updated every 30 seconds for users to see the training progress.
![](../Image/automl_monitor.png)
After training, start the tensorboard server through
```python
tensorboard --logdir=<logs_dir>/<name>_leaderboard/
```
where `logs_dir` and `name` are the same as stated in [Monitor view](#monitor_view).
A dashboard of each trail's configs and metrics is shown in the SCALARS tag.
![](../Image/automl_scalars.png)
A leaderboard of each trail's configs and metrics is shown in the HPARAMS tag.
![](../Image/automl_hparams.png)
**Use visualization in Jupyter Notebook**
You can enable a tensorboard view in jupyter notebook by the following code.
```python
%load_ext tensorboard
# for scalar view
%tensorboard --logdir <logs_dir>/<name>/
# for leaderboard view
%tensorboard --logdir <logs_dir>/<name>_leaderboard/
```

View file

@ -8,7 +8,7 @@
**In this guide we will demonstrate how to use _Chronos TSDataset_ and _Chronos Forecaster_ for time seires processing and forecasting in 4 simple steps.** **In this guide we will demonstrate how to use _Chronos TSDataset_ and _Chronos Forecaster_ for time seires processing and forecasting in 4 simple steps.**
### **Step 0: Prepare Environment** ### Step 0: Prepare Environment
We recommend using [conda](https://docs.conda.io/projects/conda/en/latest/user-guide/install/) to prepare the environment. Please refer to the [install guide](../Overview/chronos.html#install) for more details. We recommend using [conda](https://docs.conda.io/projects/conda/en/latest/user-guide/install/) to prepare the environment. Please refer to the [install guide](../Overview/chronos.html#install) for more details.

View file

@ -0,0 +1,89 @@
BigDL-Chronos
========================
**BigDL-Chronos** (**Chronos** for short) is an application framework for building a fast, accurate and scalable time series analysis application.
You can use **Chronos** for:
.. grid:: 1 3 3 3
.. grid-item::
.. image:: ./Image/forecasting.svg
:alt: Forcasting example diagram
**Forecasting:** Predict future using history data.
.. grid-item::
.. image:: ./Image/anomaly_detection.svg
:alt: Anomaly Detection example diagram
**Anomaly Detection:** Discover unexpected items in data.
.. grid-item::
.. image:: ./Image/simulation.svg
:alt: Simulation example diagram
**Simulation:** Generate similar data as history data.
-------
.. grid:: 1 2 2 2
:gutter: 2
.. grid-item-card::
**Get Started**
^^^
You may understand the basic usage of Chronos' components and learn to write the first runnable application in this quick tour page.
+++
:bdg-link:`Chronos in 5 minutes <./Overview/quick-tour.html>` |
:bdg-link:`Installation <./Overview/install.html>`
.. grid-item-card::
**Key Features Guide**
^^^
Our user guides provide you with in-depth information, concepts and knowledges about Chronos.
+++
:bdg-link:`Data <./Overview/data_processing_feature_engineering.html>` |
:bdg-link:`Forecast <./Overview/forecasting.html>` |
:bdg-link:`Detect <./Overview/anomaly_detection.html>` |
:bdg-link:`Simulate <./Overview/simulation.html>`
.. grid-item-card::
**How-to-Guide** / **Tutorials**
^^^
If you are meeting with some specific problems during the usage, how-to guides are good place to be checked.
Examples provides short, high quality use case that users can emulated in their own works.
+++
:bdg-link:`How-to-Guide <./Howto/index.html>` | :bdg-link:`Example <./QuickStart/index.html>`
.. grid-item-card::
**API Document**
^^^
API Document provides you with a detailed description of the Chronos APIs.
+++
:bdg-link:`API Document <../PythonAPI/Chronos/index.html>`
.. toctree::
:hidden:
BigDL-Chronos Document <self>

Binary file not shown.

After

Width:  |  Height:  |  Size: 170 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 143 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 86 KiB

View file

@ -1,140 +1,81 @@
# DLlib User Guide # DLlib in 5 minutes
## 1. Overview ## Overview
DLlib is a distributed deep learning library for Apache Spark; with DLlib, users can write their deep learning applications as standard Spark programs (using either Scala or Python APIs). DLlib is a distributed deep learning library for Apache Spark; with DLlib, users can write their deep learning applications as standard Spark programs (using either Scala or Python APIs).
It includes the functionalities of the [original BigDL](https://github.com/intel-analytics/BigDL/tree/branch-0.14) project, and provides following high-level APIs for distributed deep learning on Spark: It includes the functionalities of the [original BigDL](https://github.com/intel-analytics/BigDL/tree/branch-0.14) project, and provides following high-level APIs for distributed deep learning on Spark:
* [Keras-like API](keras-api.md) * [Keras-like API](keras-api.md)
* [Spark ML pipeline support](nnframes.md) * [Spark ML pipeline support](nnframes.md)
## 2. Scala user guide
### 2.1 Install and Run
Please refer [scala guide](../../UserGuide/scala.md) for details.
--- ---
### 2.2 Get started ## Scala Example
---
This section show a single example of how to use dllib to build a deep learning application on Spark, using Keras APIs This section show a single example of how to use dllib to build a deep learning application on Spark, using Keras APIs
---
#### **LeNet Model on MNIST using Keras-Style API** #### **LeNet Model on MNIST using Keras-Style API**
This tutorial is an explanation of what is happening in the [lenet](https://github.com/intel-analytics/BigDL/tree/branch-2.0/scala/dllib/src/main/scala/com/intel/analytics/bigdl/dllib/example/keras) example This tutorial is an explanation of what is happening in the [lenet](https://github.com/intel-analytics/BigDL/tree/branch-2.0/scala/dllib/src/main/scala/com/intel/analytics/bigdl/dllib/example/keras) example
A bigdl-dllib program starts with initialize as follows. A bigdl-dllib program starts with initialize as follows.
````scala ````scala
val conf = Engine.createSparkConf() val conf = Engine.createSparkConf()
.setAppName("Train Lenet on MNIST") .setAppName("Train Lenet on MNIST")
.set("spark.task.maxFailures", "1") .set("spark.task.maxFailures", "1")
val sc = new SparkContext(conf) val sc = new SparkContext(conf)
Engine.init Engine.init
```` ````
After the initialization, we need to: After the initialization, we need to:
1. Load train and validation data by _**creating the [```DataSet```](https://github.com/intel-analytics/BigDL/blob/branch-2.0/scala/dllib/src/main/scala/com/intel/analytics/bigdl/dllib/feature/dataset/DataSet.scala)**_ (e.g., ````SampleToGreyImg````, ````GreyImgNormalizer```` and ````GreyImgToBatch````): 1. Load train and validation data by _**creating the [```DataSet```](https://github.com/intel-analytics/BigDL/blob/branch-2.0/scala/dllib/src/main/scala/com/intel/analytics/bigdl/dllib/feature/dataset/DataSet.scala)**_ (e.g., ````SampleToGreyImg````, ````GreyImgNormalizer```` and ````GreyImgToBatch````):
````scala
val trainSet = (if (sc.isDefined) {
DataSet.array(load(trainData, trainLabel), sc.get, param.nodeNumber)
} else {
DataSet.array(load(trainData, trainLabel))
}) -> SampleToGreyImg(28, 28) -> GreyImgNormalizer(trainMean, trainStd) -> GreyImgToBatch(
param.batchSize)
````scala val validationSet = DataSet.array(load(validationData, validationLabel), sc) ->
val trainSet = (if (sc.isDefined) { BytesToGreyImg(28, 28) -> GreyImgNormalizer(testMean, testStd) -> GreyImgToBatch(
DataSet.array(load(trainData, trainLabel), sc.get, param.nodeNumber) param.batchSize)
} else { ````
DataSet.array(load(trainData, trainLabel))
}) -> SampleToGreyImg(28, 28) -> GreyImgNormalizer(trainMean, trainStd) -> GreyImgToBatch(
param.batchSize)
val validationSet = DataSet.array(load(validationData, validationLabel), sc) ->
BytesToGreyImg(28, 28) -> GreyImgNormalizer(testMean, testStd) -> GreyImgToBatch(
param.batchSize)
````
2. We then define Lenet model using Keras-style api 2. We then define Lenet model using Keras-style api
````scala ````scala
val input = Input(inputShape = Shape(28, 28, 1)) val input = Input(inputShape = Shape(28, 28, 1))
val reshape = Reshape(Array(1, 28, 28)).inputs(input) val reshape = Reshape(Array(1, 28, 28)).inputs(input)
val conv1 = Convolution2D(6, 5, 5, activation = "tanh").setName("conv1_5x5").inputs(reshape) val conv1 = Convolution2D(6, 5, 5, activation = "tanh").setName("conv1_5x5").inputs(reshape)
val pool1 = MaxPooling2D().inputs(conv1) val pool1 = MaxPooling2D().inputs(conv1)
val conv2 = Convolution2D(12, 5, 5, activation = "tanh").setName("conv2_5x5").inputs(pool1) val conv2 = Convolution2D(12, 5, 5, activation = "tanh").setName("conv2_5x5").inputs(pool1)
val pool2 = MaxPooling2D().inputs(conv2) val pool2 = MaxPooling2D().inputs(conv2)
val flatten = Flatten().inputs(pool2) val flatten = Flatten().inputs(pool2)
val fc1 = Dense(100, activation = "tanh").setName("fc1").inputs(flatten) val fc1 = Dense(100, activation = "tanh").setName("fc1").inputs(flatten)
val fc2 = Dense(classNum, activation = "softmax").setName("fc2").inputs(fc1) val fc2 = Dense(classNum, activation = "softmax").setName("fc2").inputs(fc1)
Model(input, fc2) Model(input, fc2)
```` ````
3. After that, we configure the learning process. Set the ````optimization method```` and the ````Criterion```` (which, given input and target, computes gradient per given loss function): 3. After that, we configure the learning process. Set the ````optimization method```` and the ````Criterion```` (which, given input and target, computes gradient per given loss function):
````scala ````scala
model.compile(optimizer = optimMethod, model.compile(optimizer = optimMethod,
loss = ClassNLLCriterion[Float](logProbAsInput = false), loss = ClassNLLCriterion[Float](logProbAsInput = false),
metrics = Array(new Top1Accuracy[Float](), new Top5Accuracy[Float](), new Loss[Float])) metrics = Array(new Top1Accuracy[Float](), new Top5Accuracy[Float](), new Loss[Float]))
```` ````
Finally we _**train the model**_ by calling ````model.fit````: Finally we _**train the model**_ by calling ````model.fit````:
````scala ````scala
model.fit(trainSet, nbEpoch = param.maxEpoch, validationData = validationSet) model.fit(trainSet, nbEpoch = param.maxEpoch, validationData = validationSet)
```` ````
--- ---
## 3. Python user guide ## Python Example
### 3.1 Install #### **Initialize NN Context**
#### 3.1.1 Official Release
Run below command to install _bigdl-dllib_.
```bash
conda create -n my_env python=3.7
conda activate my_env
pip install bigdl-dllib
```
#### 3.1.2 Nightly build
You can install the latest nightly build of bigdl-dllib as follows:
```bash
pip install --pre --upgrade bigdl-dllib
```
### 3.2 Run
#### **3.2.1 Interactive Shell**
You may test if the installation is successful using the interactive Python shell as follows:
* Type `python` in the command line to start a REPL.
* Try to run the example code below to verify the installation:
```python
from bigdl.dllib.utils.nncontext import *
sc = init_nncontext() # Initiation of bigdl-dllib on the underlying cluster.
```
#### **3.2.2 Jupyter Notebook**
You can start the Jupyter notebook as you normally do using the following command and run bigdl-dllib programs directly in a Jupyter notebook:
```bash
jupyter notebook --notebook-dir=./ --ip=* --no-browser
```
#### **3.2.3 Python Script**
You can directly write bigdl-dlllib programs in a Python file (e.g. script.py) and run in the command line as a normal Python program:
```bash
python script.py
```
---
### 3.3 Get started
#### **NN Context**
`NNContext` is the main entry for provisioning the dllib program on the underlying cluster (such as K8s or Hadoop cluster), or just on a single laptop. `NNContext` is the main entry for provisioning the dllib program on the underlying cluster (such as K8s or Hadoop cluster), or just on a single laptop.
@ -158,15 +99,15 @@ This tutorial describes the [Autograd](https://github.com/intel-analytics/BigDL/
The example first do the initializton using `init_nncontext()`: The example first do the initializton using `init_nncontext()`:
```python ```python
sc = init_nncontext() sc = init_nncontext()
``` ```
It then generate the input data X_, Y_ It then generate the input data X_, Y_
```python ```python
data_len = 1000 data_len = 1000
X_ = np.random.uniform(0, 1, (1000, 2)) X_ = np.random.uniform(0, 1, (1000, 2))
Y_ = ((2 * X_).sum(1) + 0.4).reshape([data_len, 1]) Y_ = ((2 * X_).sum(1) + 0.4).reshape([data_len, 1])
``` ```
It then define the custom loss It then define the custom loss
@ -179,20 +120,20 @@ def mean_absolute_error(y_true, y_pred):
After that, the example creates the model as follows and set the criterion as the custom loss: After that, the example creates the model as follows and set the criterion as the custom loss:
```python ```python
a = Input(shape=(2,)) a = Input(shape=(2,))
b = Dense(1)(a) b = Dense(1)(a)
c = Lambda(function=add_one_func)(b) c = Lambda(function=add_one_func)(b)
model = Model(input=a, output=c) model = Model(input=a, output=c)
model.compile(optimizer=SGD(learningrate=1e-2), model.compile(optimizer=SGD(learningrate=1e-2),
loss=mean_absolute_error) loss=mean_absolute_error)
``` ```
Finally the example trains the model by calling `model.fit`: Finally the example trains the model by calling `model.fit`:
```python ```python
model.fit(x=X_, model.fit(x=X_,
y=Y_, y=Y_,
batch_size=32, batch_size=32,
nb_epoch=int(options.nb_epoch), nb_epoch=int(options.nb_epoch),
distributed=False) distributed=False)
``` ```

View file

@ -0,0 +1,6 @@
DLLib Key Features
================================
* `Keras-like API <keras-api.html>`_
* `Spark ML Pipeline Support <nnframes.html>`_
* `Visualization <visualization.html>`_

View file

@ -0,0 +1,41 @@
# Installation
## Scala
Refer to [BigDl Install guide for Scala](../../UserGuide/scala.md).
## Python
### Install a Stable Release
Run below command to install _bigdl-dllib_.
```bash
conda create -n my_env python=3.7
conda activate my_env
pip install bigdl-dllib
```
### Install Nightly build version
You can install the latest nightly build of bigdl-dllib as follows:
```bash
pip install --pre --upgrade bigdl-dllib
```
### Verify your install
You may verify if the installation is successful using the interactive Python shell as follows:
* Type `python` in the command line to start a REPL.
* Try to run the example code below to verify the installation:
```python
from bigdl.dllib.utils.nncontext import *
sc = init_nncontext() # Initiation of bigdl-dllib on the underlying cluster.
```

File diff suppressed because it is too large Load diff

View file

@ -1,216 +0,0 @@
# Python DLLib Getting Start Guide
## 1. Code initialization
```nncontext``` is the main entry for provisioning the dllib program on the underlying cluster (such as K8s or Hadoop cluster), or just on a single laptop.
It is recommended to initialize `nncontext` at the beginning of your program:
```
from bigdl.dllib.nncontext import *
sc = init_nncontext()
```
For more information about ```nncontext```, please refer to [nncontext](https://bigdl.readthedocs.io/en/latest/doc/DLlib/Overview/dllib.html#nn-context)
## 3. Distributed Data Loading
#### Using Spark Dataframe APIs
DLlib supports Spark Dataframes as the input to the distributed training, and as
the input/output of the distributed inference. Consequently, the user can easily
process large-scale dataset using Apache Spark, and directly apply AI models on
the distributed (and possibly in-memory) Dataframes without data conversion or serialization
We create Spark session so we can use Spark API to load and process the data
```
spark = SQLContext(sc)
```
1. We can use Spark API to load the data into Spark DataFrame, eg. read csv file into Spark DataFrame
```
path = "pima-indians-diabetes.data.csv"
spark.read.csv(path)
```
If the feature column for the model is a Spark ML Vector. Please assemble related columns into a Vector and pass it to the model. eg.
```
from pyspark.ml.feature import VectorAssembler
vecAssembler = VectorAssembler(outputCol="features")
vecAssembler.setInputCols(["num_times_pregrant", "plasma_glucose", "blood_pressure", "skin_fold_thickness", "2-hour_insulin", "body_mass_index", "diabetes_pedigree_function", "age"])
assemble_df = vecAssembler.transform(df)
assemble_df.withColumn("label", col("class").cast(DoubleType) + lit(1))
```
2. If the training data is image, we can use DLLib api to load image into Spark DataFrame. Eg.
```
imgPath = "cats_dogs/"
imageDF = NNImageReader.readImages(imgPath, sc)
```
It will load the images and generate feature tensors automatically. Also we need generate labels ourselves. eg:
```
labelDF = imageDF.withColumn("name", getName(col("image"))) \
.withColumn("label", getLabel(col('name')))
```
Then split the Spark DataFrame into traing part and validation part
```
(trainingDF, validationDF) = labelDF.randomSplit([0.9, 0.1])
```
## 4. Model Definition
#### Using Keras-like APIs
To define a model, you can use the [Keras Style API](https://bigdl.readthedocs.io/en/latest/doc/DLlib/Overview/keras-api.html).
```
x1 = Input(shape=[8])
dense1 = Dense(12, activation="relu")(x1)
dense2 = Dense(8, activation="relu")(dense1)
dense3 = Dense(2)(dense2)
dmodel = Model(input=x1, output=dense3)
```
After creating the model, you will have to decide which loss function to use in training.
Now you can use `compile` function of the model to set the loss function, optimization method.
```
dmodel.compile(optimizer = "adam", loss = "sparse_categorical_crossentropy")
```
Now the model is built and ready to train.
## 5. Distributed Model Training
Now you can use 'fit' begin the training, please set the label columns. Model Evaluation can be performed periodically during a training.
1. If the dataframe is generated using Spark apis, you also need set the feature columns. eg.
```
model.fit(df, feature_cols=["features"], label_cols=["label"], batch_size=4, nb_epoch=1)
```
Note: Above model accepts single input(column `features`) and single output(column `label`).
If your model accepts multiple inputs(eg. column `f1`, `f2`, `f3`), please set the features as below:
```
model.fit(df, feature_cols=["f1", "f2"], label_cols=["label"], batch_size=4, nb_epoch=1)
```
Similarly, if the model accepts multiple outputs(eg. column `label1`, `label2`), please set the label columns as below:
```
model.fit(df, feature_cols=["features"], label_cols=["l1", "l2"], batch_size=4, nb_epoch=1)
```
2. If the dataframe is generated using DLLib `NNImageReader`, we don't need set `feature_cols`, we can set `transform` to config how to process the images before training. Eg.
```
from bigdl.dllib.feature.image import transforms
transformers = transforms.Compose([ImageResize(50, 50), ImageMirror()])
model.fit(image_df, label_cols=["label"], batch_size=1, nb_epoch=1, transform=transformers)
```
For more details about how to use DLLib keras api to train image data, you may want to refer [ImageClassification](https://github.com/intel-analytics/BigDL/tree/main/python/dllib/examples/keras/image_classification.py)
## 6. Model saving and loading
When training is finished, you may need to save the final model for later use.
BigDL allows you to save your BigDL model on local filesystem, HDFS, or Amazon s3.
- **save**
```
modelPath = "/tmp/demo/keras.model"
dmodel.saveModel(modelPath)
```
- **load**
```
loadModel = Model.loadModel(modelPath)
preDF = loadModel.predict(df, feature_cols=["features"], prediction_col="predict")
```
You may want to refer [Save/Load](https://bigdl.readthedocs.io/en/latest/doc/DLlib/Overview/keras-api.html#save)
## 7. Distributed evaluation and inference
After training finishes, you can then use the trained model for prediction or evaluation.
- **inference**
1. For dataframe generated by Spark API, please set `feature_cols` and `prediction_col`
```
dmodel.predict(df, feature_cols=["features"], prediction_col="predict")
```
2. For dataframe generated by `NNImageReader`, please set `prediction_col` and you can set `transform` if needed
```
model.predict(df, prediction_col="predict", transform=transformers)
```
- **evaluation**
Similary for dataframe generated by Spark API, the code is as below:
```
dmodel.evaluate(df, batch_size=4, feature_cols=["features"], label_cols=["label"])
```
For dataframe generated by `NNImageReader`:
```
model.evaluate(image_df, batch_size=1, label_cols=["label"], transform=transformers)
```
## 8. Checkpointing and resuming training
You can configure periodically taking snapshots of the model.
```
cpPath = "/tmp/demo/cp"
dmodel.set_checkpoint(cpPath)
```
You can also set ```over_write``` to ```true``` to enable overwriting any existing snapshot files
After training stops, you can resume from any saved point. Choose one of the model snapshots to resume (saved in checkpoint path, details see Checkpointing). Use Models.loadModel to load the model snapshot into an model object.
```
loadModel = Model.loadModel(path)
```
## 9. Monitor your training
- **Tensorboard**
BigDL provides a convenient way to monitor/visualize your training progress. It writes the statistics collected during training/validation. Saved summary can be viewed via TensorBoard.
In order to take effect, it needs to be called before fit.
```
dmodel.set_tensorboard("./", "dllib_demo")
```
For more details, please refer [visulization](visualization.md)
## 10. Transfer learning and finetuning
- **freeze and trainable**
BigDL DLLib supports exclude some layers of model from training.
```
dmodel.freeze(layer_names)
```
Layers that match the given names will be freezed. If a layer is freezed, its parameters(weight/bias, if exists) are not changed in training process.
BigDL DLLib also support unFreeze operations. The parameters for the layers that match the given names will be trained(updated) in training process
```
dmodel.unFreeze(layer_names)
```
For more information, you may refer [freeze](freeze.md)
## 11. Hyperparameter tuning
- **optimizer**
DLLib supports a list of optimization methods.
For more details, please refer [optimization](optim-Methods.md)
- **learning rate scheduler**
DLLib supports a list of learning rate scheduler.
For more details, please refer [lr_scheduler](learningrate-Scheduler.md)
- **batch size**
DLLib supports set batch size during training and prediction. We can adjust the batch size to tune the model's accuracy.
- **regularizer**
DLLib supports a list of regularizers.
For more details, please refer [regularizer](regularizers.md)
- **clipping**
DLLib supports gradient clipping operations.
For more details, please refer [gradient_clip](clipping.md)
## 12. Running program
```
python you_app_code.py
```

View file

@ -1,301 +0,0 @@
# DLLib Getting Start Guide
## 1. Creating dev environment
#### Scala project (maven & sbt)
- **Maven**
To use BigDL DLLib to build your own deep learning application, you can use maven to create your project and add bigdl-dllib to your dependency. Please add below code to your pom.xml to add BigDL DLLib as your dependency:
```
<dependency>
<groupId>com.intel.analytics.bigdl</groupId>
<artifactId>bigdl-dllib-spark_2.4.6</artifactId>
<version>0.14.0</version>
</dependency>
```
- **SBT**
```
libraryDependencies += "com.intel.analytics.bigdl" % "bigdl-dllib-spark_2.4.6" % "0.14.0"
```
For more information about how to add BigDL dependency, please refer https://bigdl.readthedocs.io/en/latest/doc/UserGuide/scala.html#build-a-scala-project
#### IDE (Intelij)
Open up IntelliJ and click File => Open
Navigate to your project. If you have add BigDL DLLib as dependency in your pom.xml.
The IDE will automatically download it from maven and you are able to run your application.
For more details about how to setup IDE for BigDL project, please refer https://bigdl-project.github.io/master/#ScalaUserGuide/install-build-src/#setup-ide
## 2. Code initialization
```NNContext``` is the main entry for provisioning the dllib program on the underlying cluster (such as K8s or Hadoop cluster), or just on a single laptop.
It is recommended to initialize `NNContext` at the beginning of your program:
```
import com.intel.analytics.bigdl.dllib.NNContext
import com.intel.analytics.bigdl.dllib.keras.Model
import com.intel.analytics.bigdl.dllib.keras.models.Models
import com.intel.analytics.bigdl.dllib.keras.optimizers.Adam
import com.intel.analytics.bigdl.dllib.nn.ClassNLLCriterion
import com.intel.analytics.bigdl.dllib.utils.Shape
import com.intel.analytics.bigdl.dllib.keras.layers._
import com.intel.analytics.bigdl.numeric.NumericFloat
import org.apache.spark.ml.feature.VectorAssembler
import org.apache.spark.sql.SQLContext
import org.apache.spark.sql.functions._
import org.apache.spark.sql.types.DoubleType
val sc = NNContext.initNNContext("dllib_demo")
```
For more information about ```NNContext```, please refer to [NNContext](https://bigdl.readthedocs.io/en/latest/doc/DLlib/Overview/dllib.html#nn-context)
## 3. Distributed Data Loading
#### Using Spark Dataframe APIs
DLlib supports Spark Dataframes as the input to the distributed training, and as
the input/output of the distributed inference. Consequently, the user can easily
process large-scale dataset using Apache Spark, and directly apply AI models on
the distributed (and possibly in-memory) Dataframes without data conversion or serialization
We create Spark session so we can use Spark API to load and process the data
```
val spark = new SQLContext(sc)
```
1. We can use Spark API to load the data into Spark DataFrame, eg. read csv file into Spark DataFrame
```
val path = "pima-indians-diabetes.data.csv"
val df = spark.read.options(Map("inferSchema"->"true","delimiter"->",")).csv(path)
.toDF("num_times_pregrant", "plasma_glucose", "blood_pressure", "skin_fold_thickness", "2-hour_insulin", "body_mass_index", "diabetes_pedigree_function", "age", "class")
```
If the feature column for the model is a Spark ML Vector. Please assemble related columns into a Vector and pass it to the model. eg.
```
val assembler = new VectorAssembler()
.setInputCols(Array("num_times_pregrant", "plasma_glucose", "blood_pressure", "skin_fold_thickness", "2-hour_insulin", "body_mass_index", "diabetes_pedigree_function", "age"))
.setOutputCol("features")
val assembleredDF = assembler.transform(df)
val df2 = assembleredDF.withColumn("label",col("class").cast(DoubleType) + lit(1))
```
2. If the training data is image, we can use DLLib api to load image into Spark DataFrame. Eg.
```
val createLabel = udf { row: Row =>
if (new Path(row.getString(0)).getName.contains("cat")) 1 else 2
}
val imagePath = "cats_dogs/"
val imgDF = NNImageReader.readImages(imagePath, sc)
```
It will load the images and generate feature tensors automatically. Also we need generate labels ourselves. eg:
```
val df = imgDF.withColumn("label", createLabel(col("image")))
```
Then split the Spark DataFrame into traing part and validation part
```
val Array(trainDF, valDF) = df.randomSplit(Array(0.8, 0.2))
```
## 4. Model Definition
#### Using Keras-like APIs
To define a model, you can use the [Keras Style API](https://bigdl.readthedocs.io/en/latest/doc/DLlib/Overview/keras-api.html).
```
val x1 = Input(Shape(8))
val dense1 = Dense(12, activation="relu").inputs(x1)
val dense2 = Dense(8, activation="relu").inputs(dense1)
val dense3 = Dense(2).inputs(dense2)
val dmodel = Model(x1, dense3)
```
After creating the model, you will have to decide which loss function to use in training.
Now you can use `compile` function of the model to set the loss function, optimization method.
```
dmodel.compile(optimizer = new Adam(), loss = ClassNLLCriterion())
```
Now the model is built and ready to train.
## 5. Distributed Model Training
Now you can use 'fit' begin the training, please set the label columns. Model Evaluation can be performed periodically during a training.
1. If the dataframe is generated using Spark apis, you also need set the feature columns. eg.
```
model.fit(x=trainDF, batchSize=4, nbEpoch = 2,
featureCols = Array("feature1"), labelCols = Array("label"), valX=valDF)
```
Note: Above model accepts single input(column `feature1`) and single output(column `label`).
If your model accepts multiple inputs(eg. column `f1`, `f2`, `f3`), please set the features as below:
```
model.fit(x=dataframe, batchSize=4, nbEpoch = 2,
featureCols = Array("f1", "f2", "f3"), labelCols = Array("label"))
```
Similarly, if the model accepts multiple outputs(eg. column `label1`, `label2`), please set the label columns as below:
```
model.fit(x=dataframe, batchSize=4, nbEpoch = 2,
featureCols = Array("f1", "f2", "f3"), labelCols = Array("label1", "label2"))
```
2. If the dataframe is generated using DLLib `NNImageReader`, we don't need set `featureCols`, we can set `transform` to config how to process the images before training. Eg.
```
val transformers = transforms.Compose(Array(ImageResize(50, 50),
ImageMirror()))
model.fit(x=dataframe, batchSize=4, nbEpoch = 2,
labelCols = Array("label"), transform = transformers)
```
For more details about how to use DLLib keras api to train image data, you may want to refer [ImageClassification](https://github.com/intel-analytics/BigDL/blob/main/scala/dllib/src/main/scala/com/intel/analytics/bigdl/dllib/example/keras/ImageClassification.scala)
## 6. Model saving and loading
When training is finished, you may need to save the final model for later use.
BigDL allows you to save your BigDL model on local filesystem, HDFS, or Amazon s3.
- **save**
```
val modelPath = "/tmp/demo/keras.model"
dmodel.saveModel(modelPath)
```
- **load**
```
val loadModel = Models.loadModel(modelPath)
val preDF2 = loadModel.predict(valDF, featureCols = Array("features"), predictionCol = "predict")
```
You may want to refer [Save/Load](https://bigdl.readthedocs.io/en/latest/doc/DLlib/Overview/keras-api.html#save)
## 7. Distributed evaluation and inference
After training finishes, you can then use the trained model for prediction or evaluation.
- **inference**
1. For dataframe generated by Spark API, please set `featureCols`
```
dmodel.predict(trainDF, featureCols = Array("features"), predictionCol = "predict")
```
2. For dataframe generated by `NNImageReader`, no need to set `featureCols` and you can set `transform` if needed
```
model.predict(imgDF, predictionCol = "predict", transform = transformers)
```
- **evaluation**
Similary for dataframe generated by Spark API, the code is as below:
```
dmodel.evaluate(trainDF, batchSize = 4, featureCols = Array("features"),
labelCols = Array("label"))
```
For dataframe generated by `NNImageReader`:
```
model.evaluate(imgDF, batchSize = 1, labelCols = Array("label"), transform = transformers)
```
## 8. Checkpointing and resuming training
You can configure periodically taking snapshots of the model.
```
val cpPath = "/tmp/demo/cp"
dmodel.setCheckpoint(cpPath, overWrite=false)
```
You can also set ```overWrite``` to ```true``` to enable overwriting any existing snapshot files
After training stops, you can resume from any saved point. Choose one of the model snapshots to resume (saved in checkpoint path, details see Checkpointing). Use Models.loadModel to load the model snapshot into an model object.
```
val loadModel = Models.loadModel(path)
```
## 9. Monitor your training
- **Tensorboard**
BigDL provides a convenient way to monitor/visualize your training progress. It writes the statistics collected during training/validation. Saved summary can be viewed via TensorBoard.
In order to take effect, it needs to be called before fit.
```
dmodel.setTensorBoard("./", "dllib_demo")
```
For more details, please refer [visulization](visualization.md)
## 10. Transfer learning and finetuning
- **freeze and trainable**
BigDL DLLib supports exclude some layers of model from training.
```
dmodel.freeze(layer_names)
```
Layers that match the given names will be freezed. If a layer is freezed, its parameters(weight/bias, if exists) are not changed in training process.
BigDL DLLib also support unFreeze operations. The parameters for the layers that match the given names will be trained(updated) in training process
```
dmodel.unFreeze(layer_names)
```
For more information, you may refer [freeze](freeze.md)
## 11. Hyperparameter tuning
- **optimizer**
DLLib supports a list of optimization methods.
For more details, please refer [optimization](optim-Methods.md)
- **learning rate scheduler**
DLLib supports a list of learning rate scheduler.
For more details, please refer [lr_scheduler](learningrate-Scheduler.md)
- **batch size**
DLLib supports set batch size during training and prediction. We can adjust the batch size to tune the model's accuracy.
- **regularizer**
DLLib supports a list of regularizers.
For more details, please refer [regularizer](regularizers.md)
- **clipping**
DLLib supports gradient clipping operations.
For more details, please refer [gradient_clip](clipping.md)
## 12. Running program
You can run a bigdl-dllib program as a standard Spark program (running on either a local machine or a distributed cluster) as follows:
```
# Spark local mode
${BIGDL_HOME}/bin/spark-submit-with-dllib.sh \
--master local[2] \
--class class_name \
jar_path
# Spark standalone mode
## ${SPARK_HOME}/sbin/start-master.sh
## check master URL from http://localhost:8080
${BIGDL_HOME}/bin/spark-submit-with-dllib.sh \
--master spark://... \
--executor-cores cores_per_executor \
--total-executor-cores total_cores_for_the_job \
--class class_name \
jar_path
# Spark yarn client mode
${BIGDL_HOME}/bin/spark-submit-with-dllib.sh \
--master yarn \
--deploy-mode client \
--executor-cores cores_per_executor \
--num-executors executors_number \
--class class_name \
jar_path
# Spark yarn cluster mode
${BIGDL_HOME}/bin/spark-submit-with-dllib.sh \
--master yarn \
--deploy-mode cluster \
--executor-cores cores_per_executor \
--num-executors executors_number \
--class class_name
jar_path
```
For more detail about how to run BigDL scala application, please refer https://bigdl.readthedocs.io/en/latest/doc/UserGuide/scala.html

View file

@ -1,5 +1,5 @@
## **Visualizing training with TensorBoard** ## **Visualizing training with TensorBoard**
With the summary info generated, we can then use [TensorBoard](https://pypi.python.org/pypi/tensorboard) to visualize the behaviors of the BigDL program. With the summary info generated, we can then use [TensorBoard](https://pypi.python.org/pypi/tensorboard) to visualize the behaviors of the BigDL program.
* **Installing TensorBoard** * **Installing TensorBoard**
@ -31,10 +31,10 @@ After that, navigate to the TensorBoard dashboard using a browser. You can find
* **Visualizations in TensorBoard** * **Visualizations in TensorBoard**
Within the TensorBoard dashboard, you will be able to read the visualizations of each run, including the “Loss” and “Throughput” curves under the SCALARS tab (as illustrated below): Within the TensorBoard dashboard, you will be able to read the visualizations of each run, including the “Loss” and “Throughput” curves under the SCALARS tab (as illustrated below):
![Scalar](../Image/tensorboard-scalar.png) ![](../Image/tensorboard-scalar.png)
And “weights”, “bias”, “gradientWeights” and “gradientBias” under the DISTRIBUTIONS and HISTOGRAMS tabs (as illustrated below): And “weights”, “bias”, “gradientWeights” and “gradientBias” under the DISTRIBUTIONS and HISTOGRAMS tabs (as illustrated below):
![histogram1](../Image/tensorboard-histo1.png) ![](../Image/tensorboard-histo1.png)
![histogram2](../Image/tensorboard-histo2.png) ![](../Image/tensorboard-histo2.png)
--- ---

View file

@ -0,0 +1,9 @@
# DLlib Tutorial
- [**Python Quickstart Notebook**](./python-getting-started.html)
> ![](../../../../image/colab_logo_32px.png)[Run in Google Colab](https://colab.research.google.com/github/intel-analytics/BigDL/blob/branch-2.0/python/dllib/colab-notebook/dllib_keras_api.ipynb) &nbsp;![](../../../../image/GitHub-Mark-32px.png)[View source on GitHub](https://github.com/intel-analytics/BigDL/blob/branch-2.0/python/dllib/colab-notebook/dllib_keras_api.ipynb)
In this guide we will demonstrate how to use _DLlib keras style api_ and _DLlib NNClassifier_ for classification.

View file

@ -0,0 +1,218 @@
# DLLib Python Getting Start Guide
## 1. Code initialization
```nncontext``` is the main entry for provisioning the dllib program on the underlying cluster (such as K8s or Hadoop cluster), or just on a single laptop.
It is recommended to initialize `nncontext` at the beginning of your program:
```
from bigdl.dllib.nncontext import *
sc = init_nncontext()
```
For more information about ```nncontext```, please refer to [nncontext](../Overview/dllib.md#initialize-nn-context)
## 2. Distributed Data Loading
#### Using Spark Dataframe APIs
DLlib supports Spark Dataframes as the input to the distributed training, and as
the input/output of the distributed inference. Consequently, the user can easily
process large-scale dataset using Apache Spark, and directly apply AI models on
the distributed (and possibly in-memory) Dataframes without data conversion or serialization
We create Spark session so we can use Spark API to load and process the data
```
spark = SQLContext(sc)
```
1. We can use Spark API to load the data into Spark DataFrame, eg. read csv file into Spark DataFrame
```
path = "pima-indians-diabetes.data.csv"
spark.read.csv(path)
```
If the feature column for the model is a Spark ML Vector. Please assemble related columns into a Vector and pass it to the model. eg.
```
from pyspark.ml.feature import VectorAssembler
vecAssembler = VectorAssembler(outputCol="features")
vecAssembler.setInputCols(["num_times_pregrant", "plasma_glucose", "blood_pressure", "skin_fold_thickness", "2-hour_insulin", "body_mass_index", "diabetes_pedigree_function", "age"])
assemble_df = vecAssembler.transform(df)
assemble_df.withColumn("label", col("class").cast(DoubleType) + lit(1))
```
2. If the training data is image, we can use DLLib api to load image into Spark DataFrame. Eg.
```
imgPath = "cats_dogs/"
imageDF = NNImageReader.readImages(imgPath, sc)
```
It will load the images and generate feature tensors automatically. Also we need generate labels ourselves. eg:
```
labelDF = imageDF.withColumn("name", getName(col("image"))) \
.withColumn("label", getLabel(col('name')))
```
Then split the Spark DataFrame into traing part and validation part
```
(trainingDF, validationDF) = labelDF.randomSplit([0.9, 0.1])
```
## 3. Model Definition
#### Using Keras-like APIs
To define a model, you can use the [Keras Style API](../Overview/keras-api.md).
```
x1 = Input(shape=[8])
dense1 = Dense(12, activation="relu")(x1)
dense2 = Dense(8, activation="relu")(dense1)
dense3 = Dense(2)(dense2)
dmodel = Model(input=x1, output=dense3)
```
After creating the model, you will have to decide which loss function to use in training.
Now you can use `compile` function of the model to set the loss function, optimization method.
```
dmodel.compile(optimizer = "adam", loss = "sparse_categorical_crossentropy")
```
Now the model is built and ready to train.
## 4. Distributed Model Training
Now you can use 'fit' begin the training, please set the label columns. Model Evaluation can be performed periodically during a training.
1. If the dataframe is generated using Spark apis, you also need set the feature columns. eg.
```
model.fit(df, feature_cols=["features"], label_cols=["label"], batch_size=4, nb_epoch=1)
```
Note: Above model accepts single input(column `features`) and single output(column `label`).
If your model accepts multiple inputs(eg. column `f1`, `f2`, `f3`), please set the features as below:
```
model.fit(df, feature_cols=["f1", "f2"], label_cols=["label"], batch_size=4, nb_epoch=1)
```
Similarly, if the model accepts multiple outputs(eg. column `label1`, `label2`), please set the label columns as below:
```
model.fit(df, feature_cols=["features"], label_cols=["l1", "l2"], batch_size=4, nb_epoch=1)
```
2. If the dataframe is generated using DLLib `NNImageReader`, we don't need set `feature_cols`, we can set `transform` to config how to process the images before training. Eg.
```
from bigdl.dllib.feature.image import transforms
transformers = transforms.Compose([ImageResize(50, 50), ImageMirror()])
model.fit(image_df, label_cols=["label"], batch_size=1, nb_epoch=1, transform=transformers)
```
For more details about how to use DLLib keras api to train image data, you may want to refer [ImageClassification](https://github.com/intel-analytics/BigDL/tree/main/python/dllib/examples/keras/image_classification.py)
## 5. Model saving and loading
When training is finished, you may need to save the final model for later use.
BigDL allows you to save your BigDL model on local filesystem, HDFS, or Amazon s3.
- **save**
```
modelPath = "/tmp/demo/keras.model"
dmodel.saveModel(modelPath)
```
- **load**
```
loadModel = Model.loadModel(modelPath)
preDF = loadModel.predict(df, feature_cols=["features"], prediction_col="predict")
```
You may want to refer [Save/Load](../Overview/keras-api.html#save)
## 6. Distributed evaluation and inference
After training finishes, you can then use the trained model for prediction or evaluation.
- **inference**
1. For dataframe generated by Spark API, please set `feature_cols` and `prediction_col`
```
dmodel.predict(df, feature_cols=["features"], prediction_col="predict")
```
2. For dataframe generated by `NNImageReader`, please set `prediction_col` and you can set `transform` if needed
```
model.predict(df, prediction_col="predict", transform=transformers)
```
- **evaluation**
Similary for dataframe generated by Spark API, the code is as below:
```
dmodel.evaluate(df, batch_size=4, feature_cols=["features"], label_cols=["label"])
```
For dataframe generated by `NNImageReader`:
```
model.evaluate(image_df, batch_size=1, label_cols=["label"], transform=transformers)
```
## 7. Checkpointing and resuming training
You can configure periodically taking snapshots of the model.
```
cpPath = "/tmp/demo/cp"
dmodel.set_checkpoint(cpPath)
```
You can also set ```over_write``` to ```true``` to enable overwriting any existing snapshot files
After training stops, you can resume from any saved point. Choose one of the model snapshots to resume (saved in checkpoint path, details see Checkpointing). Use Models.loadModel to load the model snapshot into an model object.
```
loadModel = Model.loadModel(path)
```
## 8. Monitor your training
- **Tensorboard**
BigDL provides a convenient way to monitor/visualize your training progress. It writes the statistics collected during training/validation. Saved summary can be viewed via TensorBoard.
In order to take effect, it needs to be called before fit.
```
dmodel.set_tensorboard("./", "dllib_demo")
```
For more details, please refer [visulization](../Overview/visualization.md)
## 9. Transfer learning and finetuning
- **freeze and trainable**
BigDL DLLib supports exclude some layers of model from training.
```
dmodel.freeze(layer_names)
```
Layers that match the given names will be freezed. If a layer is freezed, its parameters(weight/bias, if exists) are not changed in training process.
BigDL DLLib also support unFreeze operations. The parameters for the layers that match the given names will be trained(updated) in training process
```
dmodel.unFreeze(layer_names)
```
For more information, you may refer [freeze](../../PythonAPI/DLlib/freeze.md)
## 10. Hyperparameter tuning
- **optimizer**
DLLib supports a list of optimization methods.
For more details, please refer [optimization](../../PythonAPI/DLlib/optim-Methods.md)
- **learning rate scheduler**
DLLib supports a list of learning rate scheduler.
For more details, please refer [lr_scheduler](../../PythonAPI/DLlib/learningrate-Scheduler.md)
- **batch size**
DLLib supports set batch size during training and prediction. We can adjust the batch size to tune the model's accuracy.
- **regularizer**
DLLib supports a list of regularizers.
For more details, please refer [regularizer](../../PythonAPI/DLlib/regularizers.md)
- **clipping**
DLLib supports gradient clipping operations.
For more details, please refer [gradient_clip](../../PythonAPI/DLlib/clipping.md)
## 11. Running program
```
python you_app_code.py
```

View file

@ -0,0 +1,303 @@
# DLLib Scala Getting Start Guide
## 1. Creating dev environment
#### Scala project (maven & sbt)
- **Maven**
To use BigDL DLLib to build your own deep learning application, you can use maven to create your project and add bigdl-dllib to your dependency. Please add below code to your pom.xml to add BigDL DLLib as your dependency:
```
<dependency>
<groupId>com.intel.analytics.bigdl</groupId>
<artifactId>bigdl-dllib-spark_2.4.6</artifactId>
<version>0.14.0</version>
</dependency>
```
- **SBT**
```
libraryDependencies += "com.intel.analytics.bigdl" % "bigdl-dllib-spark_2.4.6" % "0.14.0"
```
For more information about how to add BigDL dependency, please refer [scala docs](../../UserGuide/scala.md#build-a-scala-project)
#### IDE (Intelij)
Open up IntelliJ and click File => Open
Navigate to your project. If you have add BigDL DLLib as dependency in your pom.xml.
The IDE will automatically download it from maven and you are able to run your application.
For more details about how to setup IDE for BigDL project, please refer [IDE Setup Guide](../../UserGuide/develop.html#id2)
## 2. Code initialization
```NNContext``` is the main entry for provisioning the dllib program on the underlying cluster (such as K8s or Hadoop cluster), or just on a single laptop.
It is recommended to initialize `NNContext` at the beginning of your program:
```
import com.intel.analytics.bigdl.dllib.NNContext
import com.intel.analytics.bigdl.dllib.keras.Model
import com.intel.analytics.bigdl.dllib.keras.models.Models
import com.intel.analytics.bigdl.dllib.keras.optimizers.Adam
import com.intel.analytics.bigdl.dllib.nn.ClassNLLCriterion
import com.intel.analytics.bigdl.dllib.utils.Shape
import com.intel.analytics.bigdl.dllib.keras.layers._
import com.intel.analytics.bigdl.numeric.NumericFloat
import org.apache.spark.ml.feature.VectorAssembler
import org.apache.spark.sql.SQLContext
import org.apache.spark.sql.functions._
import org.apache.spark.sql.types.DoubleType
val sc = NNContext.initNNContext("dllib_demo")
```
For more information about ```NNContext```, please refer to [NNContext](../Overview/dllib.md#initialize-nn-context)
## 3. Distributed Data Loading
#### Using Spark Dataframe APIs
DLlib supports Spark Dataframes as the input to the distributed training, and as
the input/output of the distributed inference. Consequently, the user can easily
process large-scale dataset using Apache Spark, and directly apply AI models on
the distributed (and possibly in-memory) Dataframes without data conversion or serialization
We create Spark session so we can use Spark API to load and process the data
```
val spark = new SQLContext(sc)
```
1. We can use Spark API to load the data into Spark DataFrame, eg. read csv file into Spark DataFrame
```
val path = "pima-indians-diabetes.data.csv"
val df = spark.read.options(Map("inferSchema"->"true","delimiter"->",")).csv(path)
.toDF("num_times_pregrant", "plasma_glucose", "blood_pressure", "skin_fold_thickness", "2-hour_insulin", "body_mass_index", "diabetes_pedigree_function", "age", "class")
```
If the feature column for the model is a Spark ML Vector. Please assemble related columns into a Vector and pass it to the model. eg.
```
val assembler = new VectorAssembler()
.setInputCols(Array("num_times_pregrant", "plasma_glucose", "blood_pressure", "skin_fold_thickness", "2-hour_insulin", "body_mass_index", "diabetes_pedigree_function", "age"))
.setOutputCol("features")
val assembleredDF = assembler.transform(df)
val df2 = assembleredDF.withColumn("label",col("class").cast(DoubleType) + lit(1))
```
2. If the training data is image, we can use DLLib api to load image into Spark DataFrame. Eg.
```
val createLabel = udf { row: Row =>
if (new Path(row.getString(0)).getName.contains("cat")) 1 else 2
}
val imagePath = "cats_dogs/"
val imgDF = NNImageReader.readImages(imagePath, sc)
```
It will load the images and generate feature tensors automatically. Also we need generate labels ourselves. eg:
```
val df = imgDF.withColumn("label", createLabel(col("image")))
```
Then split the Spark DataFrame into traing part and validation part
```
val Array(trainDF, valDF) = df.randomSplit(Array(0.8, 0.2))
```
## 4. Model Definition
#### Using Keras-like APIs
To define a model, you can use the [Keras Style API](../Overview/keras-api.md).
```
val x1 = Input(Shape(8))
val dense1 = Dense(12, activation="relu").inputs(x1)
val dense2 = Dense(8, activation="relu").inputs(dense1)
val dense3 = Dense(2).inputs(dense2)
val dmodel = Model(x1, dense3)
```
After creating the model, you will have to decide which loss function to use in training.
Now you can use `compile` function of the model to set the loss function, optimization method.
```
dmodel.compile(optimizer = new Adam(), loss = ClassNLLCriterion())
```
Now the model is built and ready to train.
## 5. Distributed Model Training
Now you can use 'fit' begin the training, please set the label columns. Model Evaluation can be performed periodically during a training.
1. If the dataframe is generated using Spark apis, you also need set the feature columns. eg.
```
model.fit(x=trainDF, batchSize=4, nbEpoch = 2,
featureCols = Array("feature1"), labelCols = Array("label"), valX=valDF)
```
Note: Above model accepts single input(column `feature1`) and single output(column `label`).
If your model accepts multiple inputs(eg. column `f1`, `f2`, `f3`), please set the features as below:
```
model.fit(x=dataframe, batchSize=4, nbEpoch = 2,
featureCols = Array("f1", "f2", "f3"), labelCols = Array("label"))
```
Similarly, if the model accepts multiple outputs(eg. column `label1`, `label2`), please set the label columns as below:
```
model.fit(x=dataframe, batchSize=4, nbEpoch = 2,
featureCols = Array("f1", "f2", "f3"), labelCols = Array("label1", "label2"))
```
2. If the dataframe is generated using DLLib `NNImageReader`, we don't need set `featureCols`, we can set `transform` to config how to process the images before training. Eg.
```
val transformers = transforms.Compose(Array(ImageResize(50, 50),
ImageMirror()))
model.fit(x=dataframe, batchSize=4, nbEpoch = 2,
labelCols = Array("label"), transform = transformers)
```
For more details about how to use DLLib keras api to train image data, you may want to refer [ImageClassification](https://github.com/intel-analytics/BigDL/blob/main/scala/dllib/src/main/scala/com/intel/analytics/bigdl/dllib/example/keras/ImageClassification.scala)
## 6. Model saving and loading
When training is finished, you may need to save the final model for later use.
BigDL allows you to save your BigDL model on local filesystem, HDFS, or Amazon s3.
- **save**
```
val modelPath = "/tmp/demo/keras.model"
dmodel.saveModel(modelPath)
```
- **load**
```
val loadModel = Models.loadModel(modelPath)
val preDF2 = loadModel.predict(valDF, featureCols = Array("features"), predictionCol = "predict")
```
You may want to refer [Save/Load](../Overview/keras-api.html#save)
## 7. Distributed evaluation and inference
After training finishes, you can then use the trained model for prediction or evaluation.
- **inference**
1. For dataframe generated by Spark API, please set `featureCols`
```
dmodel.predict(trainDF, featureCols = Array("features"), predictionCol = "predict")
```
2. For dataframe generated by `NNImageReader`, no need to set `featureCols` and you can set `transform` if needed
```
model.predict(imgDF, predictionCol = "predict", transform = transformers)
```
- **evaluation**
Similary for dataframe generated by Spark API, the code is as below:
```
dmodel.evaluate(trainDF, batchSize = 4, featureCols = Array("features"),
labelCols = Array("label"))
```
For dataframe generated by `NNImageReader`:
```
model.evaluate(imgDF, batchSize = 1, labelCols = Array("label"), transform = transformers)
```
## 8. Checkpointing and resuming training
You can configure periodically taking snapshots of the model.
```
val cpPath = "/tmp/demo/cp"
dmodel.setCheckpoint(cpPath, overWrite=false)
```
You can also set ```overWrite``` to ```true``` to enable overwriting any existing snapshot files
After training stops, you can resume from any saved point. Choose one of the model snapshots to resume (saved in checkpoint path, details see Checkpointing). Use Models.loadModel to load the model snapshot into an model object.
```
val loadModel = Models.loadModel(path)
```
## 9. Monitor your training
- **Tensorboard**
BigDL provides a convenient way to monitor/visualize your training progress. It writes the statistics collected during training/validation. Saved summary can be viewed via TensorBoard.
In order to take effect, it needs to be called before fit.
```
dmodel.setTensorBoard("./", "dllib_demo")
```
For more details, please refer [visulization](../Overview/visualization.md)`
## 10. Transfer learning and finetuning
- **freeze and trainable**
BigDL DLLib supports exclude some layers of model from training.
```
dmodel.freeze(layer_names)
```
Layers that match the given names will be freezed. If a layer is freezed, its parameters(weight/bias, if exists) are not changed in training process.
BigDL DLLib also support unFreeze operations. The parameters for the layers that match the given names will be trained(updated) in training process
```
dmodel.unFreeze(layer_names)
```
For more information, you may refer [freeze](../../PythonAPI/DLlib/freeze.md)
## 11. Hyperparameter tuning
- **optimizer**
DLLib supports a list of optimization methods.
For more details, please refer [optimization](../../PythonAPI/DLlib/optim-Methods.md)
- **learning rate scheduler**
DLLib supports a list of learning rate scheduler.
For more details, please refer [lr_scheduler](../../PythonAPI/DLlib/learningrate-Scheduler.md)
- **batch size**
DLLib supports set batch size during training and prediction. We can adjust the batch size to tune the model's accuracy.
- **regularizer**
DLLib supports a list of regularizers.
For more details, please refer [regularizer](../../PythonAPI/DLlib/regularizers.md)
- **clipping**
DLLib supports gradient clipping operations.
For more details, please refer [gradient_clip](../../PythonAPI/DLlib/clipping.md)
## 12. Running program
You can run a bigdl-dllib program as a standard Spark program (running on either a local machine or a distributed cluster) as follows:
```
# Spark local mode
${BIGDL_HOME}/bin/spark-submit-with-dllib.sh \
--master local[2] \
--class class_name \
jar_path
# Spark standalone mode
## ${SPARK_HOME}/sbin/start-master.sh
## check master URL from http://localhost:8080
${BIGDL_HOME}/bin/spark-submit-with-dllib.sh \
--master spark://... \
--executor-cores cores_per_executor \
--total-executor-cores total_cores_for_the_job \
--class class_name \
jar_path
# Spark yarn client mode
${BIGDL_HOME}/bin/spark-submit-with-dllib.sh \
--master yarn \
--deploy-mode client \
--executor-cores cores_per_executor \
--num-executors executors_number \
--class class_name \
jar_path
# Spark yarn cluster mode
${BIGDL_HOME}/bin/spark-submit-with-dllib.sh \
--master yarn \
--deploy-mode cluster \
--executor-cores cores_per_executor \
--num-executors executors_number \
--class class_name
jar_path
```
For more detail about how to run BigDL scala application, please refer to [Scala UserGuide](../../UserGuide/scala.md)

View file

@ -0,0 +1,62 @@
BigDL-DLlib
=========================
**BigDL-DLlib** (or **DLlib** for short) is a distributed deep learning library for Apache Spark; with DLlib, users can write their deep learning applications as standard Spark programs (using either Scala or Python APIs).
-------
.. grid:: 1 2 2 2
:gutter: 2
.. grid-item-card::
**Get Started**
^^^
Documents in these sections helps you getting started quickly with DLLib.
+++
:bdg-link:`DLlib in 5 minutes <./Overview/dllib.html>` |
:bdg-link:`Installation <./Overview/install.html>`
.. grid-item-card::
**Key Features Guide**
^^^
Each guide in this section provides you with in-depth information, concepts and knowledges about DLLib key features.
+++
:bdg-link:`Keras-Like API <./Overview/keras-api.html>` |
:bdg-link:`Spark ML Pipeline <./Overview/nnframes.html>`
.. grid-item-card::
**Examples**
^^^
DLLib Examples and Tutorials.
+++
:bdg-link:`Tutorials <./QuickStart/index.html>`
.. grid-item-card::
**API Document**
^^^
API Document provides detailed description of DLLib APIs.
+++
:bdg-link:`API Document <../PythonAPI/DLlib/index.html>`
.. toctree::
:hidden:
BigDL-DLlib Document <self>

View file

@ -0,0 +1,70 @@
### Use Cases
- **Train a DeepFM model using recsys data**
>![](../../../image/GitHub-Mark-32px.png)[View source on GitHub](https://github.com/intel-analytics/BigDL/tree/main/python/friesian/example/deep_fm)
---------------------------
- **Run DeepRec with BigDL**
>![](../../../image/GitHub-Mark-32px.png)[View source on GitHub](https://github.com/intel-analytics/BigDL/tree/main/python/friesian/example/deeprec)
---------------------------
- **Train DIEN using the Amazon Book Reviews dataset**
>![](../../../image/GitHub-Mark-32px.png)[View source on GitHub](https://github.com/intel-analytics/BigDL/tree/main/python/friesian/example/dien)
---------------------------
- **Preprocess the Criteo dataset for DLRM Model**
>![](../../../image/GitHub-Mark-32px.png)[View source on GitHub](https://github.com/intel-analytics/BigDL/tree/main/python/friesian/example/dlrm)
---------------------------
- **Train an LightGBM model using Twitter dataset**
>![](../../../image/GitHub-Mark-32px.png)[View source on GitHub](https://github.com/intel-analytics/BigDL/tree/main/python/friesian/example/lightGBM)
---------------------------
- **Running Friesian listwise example**
>![](../../../image/GitHub-Mark-32px.png)[View source on GitHub](https://github.com/intel-analytics/BigDL/tree/main/python/friesian/example/listwise_ranking)
---------------------------
- **Multi-task Recommendation with BigDL**
>![](../../../image/GitHub-Mark-32px.png)[View source on GitHub](https://github.com/intel-analytics/BigDL/tree/main/python/friesian/example/multi_task)
---------------------------
- **Train an NCF model on MovieLens**
>![](../../../image/GitHub-Mark-32px.png)[View source on GitHub](https://github.com/intel-analytics/BigDL/tree/main/python/friesian/example/ncf)
---------------------------
- **Offline Recall with Faiss on Spark**
>![](../../../image/GitHub-Mark-32px.png)[View source on GitHub](https://github.com/intel-analytics/BigDL/tree/main/python/friesian/example/recall)
---------------------------
- **Recommend items using Friesian-Serving Framework**
>![](../../../image/GitHub-Mark-32px.png)[View source on GitHub](https://github.com/intel-analytics/BigDL/tree/main/python/friesian/example/serving)
---------------------------
- **Train a two tower model using recsys data**
>![](../../../image/GitHub-Mark-32px.png)[View source on GitHub](https://github.com/intel-analytics/BigDL/tree/main/python/friesian/example/two_tower)
---------------------------
- **Preprocess the Criteo dataset for WideAndDeep Model**
>![](../../../image/GitHub-Mark-32px.png)[View source on GitHub](https://github.com/intel-analytics/BigDL/tree/main/python/friesian/example/wnd)
---------------------------
- **Train an XGBoost model using Twitter dataset**
>![](../../../image/GitHub-Mark-32px.png)[View source on GitHub](https://github.com/intel-analytics/BigDL/tree/main/python/friesian/example/xgb)

View file

@ -0,0 +1,66 @@
BigDL-Friesian
=========================
BigDL Friesian is an application framework for building optimized large-scale recommender solutions. The recommending workflows built on top of Friesian can seamlessly scale out to distributed big data clusters in the production environment.
Friesian provides end-to-end support for three typical stages in a modern recommendation system:
- Offline stage: distributed feature engineering and model training.
- Nearline stage: Feature and model updates.
- Online stage: Recall and ranking.
-------
.. grid:: 1 2 2 2
:gutter: 2
.. grid-item-card::
**Get Started**
^^^
Documents in these sections helps you getting started quickly with Friesian.
+++
:bdg-link:`Introduction <./intro.html>`
.. grid-item-card::
**Key Features Guide**
^^^
Each guide in this section provides you with in-depth information, concepts and knowledges about Friesian key features.
+++
:bdg-link:`Serving <./serving.html>`
.. grid-item-card::
**Use Cases**
^^^
Use Cases and Examples.
+++
:bdg-link:`Use Cases <./examples.html>`
.. grid-item-card::
**API Document**
^^^
API Document provides detailed description of Nano APIs.
+++
:bdg-link:`API Document <../PythonAPI/Friesian/index.html>`
.. toctree::
:hidden:
BigDL-Friesian Document <self>

View file

@ -0,0 +1,17 @@
Friesian Introduction
==========================
BigDL Friesian is an application framework for building optimized large-scale recommender solutions. The recommending workflows built on top of Friesian can seamlessly scale out to distributed big data clusters in the production environment.
Friesian provides end-to-end support for three typical stages in a modern recommendation system:
- Offline stage: distributed feature engineering and model training.
- Nearline stage: Feature and model updates.
- Online stage: Recall and ranking.
The overall architecture of Friesian is shown in the following diagram:
.. image:: ../../../image/friesian_architecture.png

View file

@ -0,0 +1,600 @@
## Serving Recommendation Framework
### Architecture of the serving pipelines
The diagram below demonstrates the components of the friesian serving system, which typically consists of three stages:
- Offline: Preprocess the data to get user/item DNN features and user/item Embedding features. Then use the embedding features and embedding model to get embedding vectors.
- Nearline: Retrieve user/item profiles and keep them in the Key-Value store. Retrieve item embedding vectors and build the faiss index. Make updates to the profiles from time to time.
- Online: Trigger the recommendation process whenever a user comes. Recall service generate candidates from millions of items based on embeddings and the deep learning model ranks the candidates for the final recommendation results.
![](../../../image/friesian_architecture.png)
### Services and APIs
The friesian serving system consists of 4 types of services:
- Ranking Service: performs model inference and returns the results.
- `rpc doPredict(Content) returns (Prediction) {}`
- Input: The `encodeStr` is a Base64 string encoded from a bigdl [Activity](https://github.com/intel-analytics/BigDL/blob/branch-2.0/scala/dllib/src/main/scala/com/intel/analytics/bigdl/dllib/nn/abstractnn/Activity.scala) serialized byte array.
```bash
message Content {
string encodedStr = 1;
}
```
- Output: The `predictStr` is a Base64 string encoded from a bigdl [Activity](https://github.com/intel-analytics/BigDL/blob/branch-2.0/scala/dllib/src/main/scala/com/intel/analytics/bigdl/dllib/nn/abstractnn/Activity.scala) (the inference result) serialized byte array.
```bash
message Prediction {
string predictStr = 1;
}
```
- Feature Service: searches user embeddings, user features or item features in Redis, and returns the features.
- `rpc getUserFeatures(IDs) returns (Features) {}` and `rpc getItemFeatures(IDs) returns (Features) {}`
- Input: The user/item id list for searching.
```bash
message IDs {
repeated int32 ID = 1;
}
```
- Output: `colNames` is a string list of the column names. `b64Feature` is a list of Base64 string, each string is encoded from java serialized array of objects. `ID` is a list of ids corresponding `b64Feature`.
```bash
message Features {
repeated string colNames = 1;
repeated string b64Feature = 2;
repeated int32 ID = 3;
}
```
- Recall Service: searches item candidates in the built faiss index and returns candidates id list.
- `rpc searchCandidates(Query) returns (Candidates) {}`
- Input: `userID` is the id of the user to search similar item candidates. `k` is the number of candidates.
```bash
message Query {
int32 userID = 1;
int32 k = 2;
}
```
- Output: `candidate` is the list of ids of item candidates.
```bash
message Candidates {
repeated int32 candidate = 1;
}
```
- Recommender Service: gets candidates from the recall service, calls the feature service to get the user and item candidate's features, then sorts the inference results from ranking service and returns the top recommendNum items.
- `rpc getRecommendIDs(RecommendRequest) returns (RecommendIDProbs) {}`
- Input: `ID` is a list of user ids to recommend. `recommendNum` is the number of items to recommend. `candidateNum` is the number of generated candidates to inference in ranking service.
```bash
message RecommendRequest {
int32 recommendNum = 1;
int32 candidateNum = 2;
repeated int32 ID = 3;
}
```
- Output: `IDProbList` is a list of results corresponding to user `ID` in input. Each `IDProbs` consists of `ID` and `prob`, `ID` is the list of item ids, and `prob` is the corresponding probability.
```bash
message RecommendIDProbs {
repeated IDProbs IDProbList = 1;
}
message IDProbs {
repeated int32 ID = 1;
repeated float prob = 2;
}
```
### Quick Start
You can run Friesian Serving Recommendation Framework using the official Docker images.
You can follow the following steps to run the WnD demo.
1. Pull docker image from dockerhub
```bash
docker pull intelanalytics/friesian-grpc:0.0.2
```
2. Run & enter docker container
```bash
docker run -itd --name friesian --net=host intelanalytics/friesian-grpc:0.0.2
docker exec -it friesian bash
```
3. Add vec_feature_user_prediction.parquet, vec_feature_item_prediction.parquet, wnd model,
wnd_item.parquet and wnd_user.parquet (You can check [the schema of the parquet files](#schema-of-the-parquet-files))
4. Start ranking service
```bash
export OMP_NUM_THREADS=1
java -cp bigdl-friesian-serving-spark_2.4.6-0.14.0-SNAPSHOT.jar com.intel.analytics.bigdl.friesian.serving.ranking.RankingServer -c config_ranking.yaml > logs/inf.log 2>&1 &
```
5. Start feature service for recommender service
```bash
./redis-5.0.5/src/redis-server &
java -Dspark.master=local[*] -cp bigdl-friesian-serving-spark_2.4.6-0.14.0-SNAPSHOT.jar com.intel.analytics.bigdl.friesian.serving.feature.FeatureServer -c config_feature.yaml > logs/feature.log 2>&1 &
```
6. Start feature service for recall service
```bash
java -Dspark.master=local[*] -cp bigdl-friesian-serving-spark_2.4.6-0.14.0-SNAPSHOT.jar com.intel.analytics.bigdl.friesian.serving.feature.FeatureServer -c config_feature_vec.yaml > logs/fea_recall.log 2>&1 &
```
7. Start recall service
```bash
java -Dspark.master=local[*] -Dspark.driver.maxResultSize=2G -cp bigdl-friesian-serving-spark_2.4.6-0.14.0-SNAPSHOT.jar com.intel.analytics.bigdl.friesian.serving.recall.RecallServer -c config_recall.yaml > logs/vec.log 2>&1 &
```
8. Start recommender service
```bash
java -cp bigdl-friesian-serving-spark_2.4.6-0.14.0-SNAPSHOT.jar com.intel.analytics.bigdl.friesian.serving.recommender.RecommenderServer -c config_recommender.yaml > logs/rec.log 2>&1 &
```
9. Check if the services are running
```bash
ps aux|grep friesian
```
You will see 5 processes start with 'java'
10. Run client to test
```bash
java -Dspark.master=local[*] -cp bigdl-friesian-serving-spark_2.4.6-0.14.0-SNAPSHOT.jar com.intel.analytics.bigdl.friesian.serving.recommender.RecommenderMultiThreadClient -target localhost:8980 -dataDir wnd_user.parquet -k 50 -clientNum 4 -testNum 2
```
11. Close services
```bash
ps aux|grep friesian (find the service pid)
kill xxx (pid of the service which should be closed)
```
### Schema of the parquet files
#### The schema of the user and item embedding files
The embedding parquet files should contain at least 2 columns, id column and prediction column.
The id column should be IntegerType and the column name should be specified in the config files.
The prediction column should be DenseVector type, and you can transfer your existing embedding vectors using pyspark:
```python
from pyspark.sql import SparkSession
from pyspark.sql.functions import udf, col
from pyspark.ml.linalg import VectorUDT, DenseVector
spark = SparkSession.builder \
.master("local[*]") \
.config("spark.driver.memory", "2g") \
.getOrCreate()
df = spark.read.parquet("data_path")
def trans_densevector(data):
return DenseVector(data)
vector_udf = udf(lambda x: trans_densevector(x), VectorUDT())
# suppose the embedding column (ArrayType(FloatType,true)) is the existing user/item embedding.
df = df.withColumn("prediction", vector_udf(col("embedding")))
df.write.parquet("output_file_path", mode="overwrite")
```
#### The schema of the recommendation model feature files
The feature parquet files should contain at least 2 columns, the id column and other feature columns.
The feature columns can be int, float, double, long and array of int, float, double and long.
Here is an example of the WideAndDeep model feature.
```bash
+-------------+--------+--------+----------+--------------------------------+---------------------------------+------------+-----------+---------+----------------------+-----------------------------+
|present_media|language|tweet_id|tweet_type|engaged_with_user_follower_count|engaged_with_user_following_count|len_hashtags|len_domains|len_links|present_media_language|engaged_with_user_is_verified|
+-------------+--------+--------+----------+--------------------------------+---------------------------------+------------+-----------+---------+----------------------+-----------------------------+
| 9| 43| 924| 2| 6| 3| 0.0| 0.1| 0.1| 45| 1|
| 0| 6| 4741724| 2| 3| 3| 0.0| 0.0| 0.0| 527| 0|
+-------------+--------+--------+----------+--------------------------------+---------------------------------+------------+-----------+---------+----------------------+-----------------------------+
```
### The data schema in Redis
The user features, item features and user embedding vectors are saved in Redis.
The data saved in Redis is a key-value set.
#### Key in Redis
The key in Redis consists of 3 parts: key prefix, data type, and data id.
- Key prefix is `redisKeyPrefix` specified in the feature service config file.
- Data type is one of `user` or `item`.
- Data id is the value of `userIDColumn` or `itemIDColumn`.
Here is an example of key: `2tower_user:29`
#### Value in Redis
A row in the input parquet file will be converted to java array of object, then serialized into byte array, and encoded into Base64 string.
#### Data schema entry
Every key prefix and data type combination has its data schema entry to save the corresponding column names. The key of the schema entry is `keyPrefix + dataType`, such as `2tower_user`. The value of the schema entry is a string of column names separated by `,`, such as `enaging_user_follower_count,enaging_user_following_count,enaging_user_is_verified`.
### Config for different service
You can pass some important information to services using `-c config.yaml`
```bash
java -Dspark.master=local[*] -Dspark.driver.maxResultSize=2G -cp bigdl-friesian-serving-spark_2.4.6-0.14.0-SNAPSHOT.jar com.intel.analytics.bigdl.friesian.serving.recall.RecallServer -c config_recall.yaml
```
#### Ranking Service Config
Config with example:
```yaml
# Default: 8980, which port to create the server
servicePort: 8083
# Default: 0, open a port for prometheus monitoring tool, if set, user can check the
# performance using prometheus
monitorPort: 1234
# model path must be provided
modelPath: /home/yina/Documents/model/recys2021/wnd_813/recsys_wnd
# default: null, savedmodel input list if the model is tf savedmodel. If not provided, the inputs
# of the savedmodel will be arranged in alphabetical order
savedModelInputs: serving_default_input_1:0, serving_default_input_2:0, serving_default_input_3:0, serving_default_input_4:0, serving_default_input_5:0, serving_default_input_6:0, serving_default_input_7:0, serving_default_input_8:0, serving_default_input_9:0, serving_default_input_10:0, serving_default_input_11:0, serving_default_input_12:0, serving_default_input_13:0
# default: 1, number of models used in inference service
modelParallelism: 4
```
##### Feature Service Config
Config with example:
1. load data into redis. Search data from redis
```yaml
### Basic setting
# Default: 8980, which port to create the server
servicePort: 8082
# Default: null, open a port for prometheus monitoring tool, if set, user can check the
# performance using prometheus
monitorPort: 1235
# 'kv' or 'inference' default: kv
serviceType: kv
# default: false, if need to load initial data to redis, set true
loadInitialData: true
# default: "", prefix for redis key
redisKeyPrefix:
# default: 0, item slot type on redis cluster. 0 means slot number use the default value 16384, 1 means all keys save to same slot, 2 means use the last character of id as hash tag.
redisClusterItemSlotType: 2
# default: null, if loadInitialData=true, initialUserDataPath or initialItemDataPath must be
# provided. Only support parquet file
initialUserDataPath: /home/yina/Documents/data/recsys/preprocess_output/wnd_user.parquet
initialItemDataPath: /home/yina/Documents/data/recsys/preprocess_output/wnd_exp1/wnd_item.parquet
# default: null, if loadInitialData=true and initialUserDataPath != null, userIDColumn and
# userFeatureColumns must be provided
userIDColumn: enaging_user_id
userFeatureColumns: enaging_user_follower_count,enaging_user_following_count
# default: null, if loadInitialData=true and initialItemDataPath != null, userIDColumn and
# userFeatureColumns must be provided
itemIDColumn: tweet_id
itemFeatureColumns: present_media, language, tweet_id, hashtags, present_links, present_domains, tweet_type, engaged_with_user_follower_count,engaged_with_user_following_count, len_hashtags, len_domains, len_links, present_media_language, tweet_id_engaged_with_user_id
# default: null, user model path or item model path must be provided if serviceType
# contains 'inference'. If serviceType=kv, usermodelPath, itemModelPath and modelParallelism will
# be ignored
# userModelPath:
# default: null, user model path or item model path must be provided if serviceType
# contains 'inference'. If serviceType=kv, usermodelPath, itemModelPath and modelParallelism will
# be ignored
# itemModelPath:
# default: 1, number of models used for inference
# modelParallelism:
### Redis Configuration
# default: localhost:6379
# redisUrl:
# default: 256, JedisPoolMaxTotal
# redisPoolMaxTotal:
```
2. load user features into redis. Get features from redis, use model at 'userModelPath' to do
inference and get the user embedding
```yaml
### Basic setting
# Default: 8980, which port to create the server
servicePort: 8085
# Default: null, open a port for prometheus monitoring tool, if set, user can check the
# performance using prometheus
monitorPort: 1236
# 'kv' or 'inference' default: kv
serviceType: kv, inference
# default: false, if need to load initial data to redis, set true
loadInitialData: true
# default: ""
redisKeyPrefix: 2tower_
# default: 0, item slot type on redis cluster. 0 means slot number use the default value 16384, 1 means all keys save to same slot, 2 means use the last character of id as hash tag.
redisClusterItemSlotType: 2
# default: null, if loadInitialData=true, initialDataPath must be provided. Only support parquet
# file
initialUserDataPath: /home/yina/Documents/data/recsys/preprocess_output/guoqiong/vec_feature_user.parquet
# initialItemDataPath:
# default: null, if loadInitialData=true and initialUserDataPath != null, userIDColumn and
# userFeatureColumns must be provided
#userIDColumn: user
userIDColumn: enaging_user_id
userFeatureColumns: user
# default: null, if loadInitialData=true and initialItemDataPath != null, userIDColumn and
# userFeatureColumns must be provided
# itemIDColumn:
# itemFeatureColumns:
# default: null, user model path or item model path must be provided if serviceType
# includes 'inference'. If serviceType=kv, usermodelPath, itemModelPath and modelParallelism will
# be ignored
userModelPath: /home/yina/Documents/model/recys2021/2tower/guoqiong/user-model
# default: null, user model path or item model path must be provided if serviceType
# contains 'inference'. If serviceType=kv, usermodelPath, itemModelPath and modelParallelism will
# be ignored
# itemModelPath:
# default: 1, number of models used for inference
# modelParallelism:
### Redis Configuration
# default: localhost:6379
# redisUrl:
# default: 256, JedisPoolMaxTotal
# redisPoolMaxTotal:
```
#### Recall Service Config
Config with example:
1. load initial item vector from vec_feature_item.parquet and item-model to build faiss index.
```yaml
# Default: 8980, which port to create the server
servicePort: 8084
# Default: null, open a port for prometheus monitoring tool, if set, user can check the
# performance using prometheus
monitorPort: 1238
# default: 128, the dimensionality of the embedding vectors
indexDim: 50
# default: false, if load saved index, set true
# loadSavedIndex: true
# default: false, if true, the built index will be saved to indexPath. Ignored when
# loadSavedIndex=true
saveBuiltIndex: true
# default: null, path to saved index path, must be provided if loadSavedIndex=true
indexPath: ./2tower_item_full.idx
# default: false
getFeatureFromFeatureService: true
# default: localhost:8980, feature service target
featureServiceURL: localhost:8085
itemIDColumn: tweet_id
itemFeatureColumns: item
# default: null, user model path must be provided if getFeatureFromFeatureService=false
# userModelPath:
# default: null, item model path must be provided if loadSavedIndex=false and initialDataPath is
# not orca predict result
itemModelPath: /home/yina/Documents/model/recys2021/2tower/guoqiong/item-model
# default: null, Only support parquet file
initialDataPath: /home/yina/Documents/data/recsys/preprocess_output/guoqiong/vec_feature_item.parquet
# default: 1, number of models used in inference service
modelParallelism: 1
```
2. load existing faiss index
```yaml
# Default: 8980, which port to create the server
servicePort: 8084
# Default: null, open a port for prometheus monitoring tool, if set, user can check the
# performance using prometheus
monitorPort: 1238
# default: 128, the dimensionality of the embedding vectors
# indexDim:
# default: false, if load saved index, set true
loadSavedIndex: true
# default: null, path to saved index path, must be provided if loadSavedIndex=true
indexPath: ./2tower_item_full.idx
# default: false
getFeatureFromFeatureService: true
# default: localhost:8980, feature service target
featureServiceURL: localhost:8085
# itemIDColumn:
# itemFeatureColumns:
# default: null, user model path must be provided if getFeatureFromFeatureService=false
# userModelPath:
# default: null, item model path must be provided if loadSavedIndex=false and initialDataPath is
# not orca predict result
# itemModelPath:
# default: null, Only support parquet file
# initialDataPath:
# default: 1, number of models used in inference service
# modelParallelism:
```
#### Recommender Service Config
Config with example:
```yaml
Default: 8980, which port to create the server
servicePort: 8980
# Default: null, open a port for prometheus monitoring tool, if set, user can check the
# performance using prometheus
monitorPort: 1237
# default: null, must be provided, item column name
itemIDColumn: tweet_id
# default: null, must be provided, column names for inference, order related.
inferenceColumns: present_media_language, present_media, tweet_type, language, hashtags, present_links, present_domains, tweet_id_engaged_with_user_id, engaged_with_user_follower_count, engaged_with_user_following_count, enaging_user_follower_count, enaging_user_following_count, len_hashtags, len_domains, len_links
# default: 0, if set, ranking service request will be divided
inferenceBatch: 0
# default: localhost:8980, recall service target
recallServiceURL: localhost:8084
# default: localhost:8980, feature service target
featureServiceURL: localhost:8082
# default: localhost:8980, inference service target
rankingServiceURL: localhost:8083
```
### Run Java Client
#### Generate proto java files
You should init a maven project and use proto files in [friesian gRPC project](https://github.com/analytics-zoo/friesian/tree/recsys-grpc/src/main/proto)
Make sure to add the following extensions and plugins in your pom.xml, and replace
*protocExecutable* with your own protoc executable.
```xml
<build>
<extensions>
<extension>
<groupId>kr.motd.maven</groupId>
<artifactId>os-maven-plugin</artifactId>
<version>1.6.2</version>
</extension>
</extensions>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.8.0</version>
<configuration>
<source>8</source>
<target>8</target>
</configuration>
</plugin>
<plugin>
<groupId>org.xolstice.maven.plugins</groupId>
<artifactId>protobuf-maven-plugin</artifactId>
<version>0.6.1</version>
<configuration>
<protocArtifact>com.google.protobuf:protoc:3.12.0:exe:${os.detected.classifier}</protocArtifact>
<pluginId>grpc-java</pluginId>
<pluginArtifact>io.grpc:protoc-gen-grpc-java:1.37.0:exe:${os.detected.classifier}</pluginArtifact>
<protocExecutable>/home/yina/Documents/protoc/bin/protoc</protocExecutable>
</configuration>
<executions>
<execution>
<goals>
<goal>compile</goal>
<goal>compile-custom</goal>
</goals>
</execution>
</executions>
</plugin>
</plugins>
</build>
```
Then you can generate the gRPC files with
```bash
mvn clean install
```
#### Call recommend service function using blocking stub
You can check the [Recommend service client example](https://github.com/analytics-zoo/friesian/blob/recsys-grpc/src/main/java/grpc/recommend/RecommendClient.java) on Github
```java
import com.intel.analytics.bigdl.friesian.serving.grpc.generated.recommender.RecommenderGrpc;
import com.intel.analytics.bigdl.friesian.serving.grpc.generated.recommender.RecommenderProto.*;
public class RecommendClient {
public static void main(String[] args) {
// Create a channel
ManagedChannel channel = ManagedChannelBuilder.forTarget(targetURL).usePlaintext().build();
// Init a recommend service blocking stub
RecommenderGrpc.RecommenderBlockingStub blockingStub = RecommenderGrpc.newBlockingStub(channel);
// Construct a request
int[] userIds = new int[]{1};
int candidateNum = 50;
int recommendNum = 10;
RecommendRequest.Builder request = RecommendRequest.newBuilder();
for (int id : userIds) {
request.addID(id);
}
request.setCandidateNum(candidateNum);
request.setRecommendNum(recommendNum);
RecommendIDProbs recommendIDProbs = null;
try {
recommendIDProbs = blockingStub.getRecommendIDs(request.build());
logger.info(recommendIDProbs.getIDProbListList());
} catch (StatusRuntimeException e) {
logger.warn("RPC failed: " + e.getStatus().toString());
}
}
}
```
### Run Python Client
Install the python packages listed below (you may encounter [pyspark error](https://stackoverflow.com/questions/58700384/how-to-fix-typeerror-an-integer-is-required-got-type-bytes-error-when-tryin) if you have python>=3.8 installed, try to downgrade to python<=3.7 and try again).
```bash
pip install jupyter notebook==6.1.4 grpcio grpcio-tools pandas fastparquet pyarrow
```
After you activate your server successfully, you can
#### Generate proto python files
Generate the files with
```bash
python -m grpc_tools.protoc -I../../protos --python_out=<path_to_output_folder> --grpc_python_out=<path_to_output_folder> <path_to_friesian>/src/main/proto/*.proto
```
#### Call recommend service function using blocking stub
You can check the [Recommend service client example](https://github.com/analytics-zoo/friesian/blob/recsys-grpc/Serving/WideDeep/recommend_client.ipynb) on Github
```python
# create a channel
channel = grpc.insecure_channel('localhost:8980')
# create a recommend service stub
stub = recommender_pb2_grpc.RecommenderStub(channel)
request = recommender_pb2.RecommendRequest(recommendNum=10, candidateNum=50, ID=[36407])
results = stub.getRecommendIDs(request)
print(results.IDProbList)
```
### Scale-out for Big Data
#### Redis Cluster
For large data set, Redis standalone has no enough memory to store whole data set, data sharding and Redis cluster are supported to handle it. You only need to set up a Redis Cluster to get it work.
First, start N Redis instance on N machines.
```
redis-server --cluster-enabled yes --cluster-config-file nodes-0.conf --cluster-node-timeout 50000 --appendonly no --save "" --logfile 0.log --daemonize yes --protected-mode no --port 6379
```
on each machine, choose a different port and start another M instances(M>=1), as the slave nodes of above N instances.
Then, call initialization command on one machine, if you choose M=1 above, use `--cluster-replicas 1`
```
redis-cli --cluster create 172.168.3.115:6379 172.168.3.115:6380 172.168.3.116:6379 172.168.3.116:6380 172.168.3.117:6379 172.168.3.117:6380 --cluster-replicas 1
```
and the Redis cluster would be ready.
#### Scale Service with Envoy
Each of the services could be scaled out. It is recommended to use the same resource, e.g. single machine with same CPU and memory, to test which service is bottleneck. From empirical observations, vector search and inference usually be.
##### How to run envoy:
1. [download](https://www.envoyproxy.io/docs/envoy/latest/start/install) and deploy envoy(below use docker as example):
* download: `docker pull envoyproxy/envoy-dev:21df5e8676a0f705709f0b3ed90fc2dbbd63cfc5`
2. run command: `docker run --rm -it -p 9082:9082 -p 9090:9090 envoyproxy/envoy-dev:79ade4aebd02cf15bd934d6d58e90aa03ef6909e --config-yaml "$(cat path/to/service-specific-envoy.yaml)" --parent-shutdown-time-s 1000000`
3. validate: run `netstat -tnlp` to see if the envoy process is listening to the corresponding port in the envoy config file.
4. For details on envoy and sample procedure, read [envoy](envoy.md).

View file

@ -0,0 +1,6 @@
User Guide
=========================
Getting Started
===========================================

View file

@ -0,0 +1,2 @@
Install Locally
=========================

View file

@ -0,0 +1,28 @@
# Paper
## Paper
* Dai, Jason Jinquan, et al. "BigDL 2.0: Seamless Scaling of AI Pipelines from Laptops to Distributed Cluster." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022. [paper](https://arxiv.org/ftp/arxiv/papers/2204/2204.01715.pdf) [video]() [demo]()
* Dai, Jason Jinquan, et al. "BigDL: A distributed deep learning framework for big data." Proceedings of the ACM Symposium on Cloud Computing. 2019. [paper](https://arxiv.org/abs/1804.05839)
## Citing
If you've found BigDL useful for your project, you may cite the [paper](https://arxiv.org/abs/1804.05839) as follows:
```
@inproceedings{SOCC2019_BIGDL,
title={BigDL: A Distributed Deep Learning Framework for Big Data},
author={Dai, Jason (Jinquan) and Wang, Yiheng and Qiu, Xin and Ding, Ding and Zhang, Yao and Wang, Yanzhang and Jia, Xianyan and Zhang, Li (Cherry) and Wan, Yan and Li, Zhichao and Wang, Jiao and Huang, Shengsheng and Wu, Zhongyuan and Wang, Yang and Yang, Yuhao and She, Bowen and Shi, Dongjie and Lu, Qi and Huang, Kai and Song, Guoqiong},
booktitle={Proceedings of the ACM Symposium on Cloud Computing},
publisher={Association for Computing Machinery},
pages={50--60},
year={2019},
series={SoCC'19},
doi={10.1145/3357223.3362707},
url={https://arxiv.org/pdf/1804.05839.pdf}
}
```

View file

@ -0,0 +1,2 @@
Use Cases
============================

View file

@ -1,4 +1,4 @@
AutoML Overview AutoML
*************** ***************
Nano provides built-in AutoML support through hyperparameter optimization. Nano provides built-in AutoML support through hyperparameter optimization.

View file

@ -0,0 +1,8 @@
Nano Key Features
================================
* `PyTorch Training <pytorch_train.html>`_
* `PyTorch Inference <pytorch_inference.html>`_
* `Tensorflow Training <tensorflow_train.html>`_
* `Tensorflow Inference <tensorflow_inference.html>`_
* `AutoML <hpo.html>`_

View file

@ -0,0 +1,36 @@
# Nano Installation
Note: For windows users, we recommend using Windows Subsystem for Linux 2 (WSL2) to run BigDL-Nano. Please refer to [Nano Windows install guide](../Howto/windows_guide.md) for instructions.
BigDL-Nano can be installed using pip and we recommend installing BigDL-Nano in a conda environment.
For PyTorch Users, you can install bigdl-nano along with some dependencies specific to PyTorch using the following commands.
```bash
conda create -n env
conda activate env
pip install bigdl-nano[pytorch]
```
For TensorFlow users, you can install bigdl-nano along with some dependencies specific to TensorFlow using the following commands.
```bash
conda create -n env
conda activate env
pip install bigdl-nano[tensorflow]
```
After installing bigdl-nano, you can run the following command to setup a few environment variables.
```bash
source bigdl-nano-init
```
The `bigdl-nano-init` scripts will export a few environment variable according to your hardware to maximize performance.
In a conda environment, `source bigdl-nano-init` will also be added to `$CONDA_PREFIX/etc/conda/activate.d/`, which will automaticly run when you activate your current environment.
In a pure pip environment, you need to run `source bigdl-nano-init` every time you open a new shell to get optimal performance and run `source bigdl-nano-unset-env` if you want to unset these environment variables.
---

View file

@ -1,49 +1,11 @@
# Nano User Guide # Nano in 5 minutes
## **1. Overview** BigDL-Nano is a Python package to transparently accelerate PyTorch and TensorFlow applications on Intel hardware. It provides a unified and easy-to-use API for several optimization techniques and tools, so that users can only apply a few lines of code changes to make their PyTorch or TensorFlow code run faster.
BigDL Nano is a Python package to transparently accelerate PyTorch and TensorFlow applications on Intel hardware. It provides a unified and easy-to-use API for several optimization techniques and tools, so that users can only apply a few lines of code changes to make their PyTorch or TensorFlow code run faster. ----
---
## **2. Install**
Note: For windows users, we recommend using Windows Subsystem for Linux 2 (WSL2) to run BigDL-Nano. Please refer to [Nano Windows install guide](../Howto/windows_guide.md) for instructions. ### **PyTorch Bite-sized Example**
BigDL-Nano can be installed using pip and we recommend installing BigDL-Nano in a conda environment.
For PyTorch Users, you can install bigdl-nano along with some dependencies specific to PyTorch using the following commands.
```bash
conda create -n env
conda activate env
pip install bigdl-nano[pytorch]
```
For TensorFlow users, you can install bigdl-nano along with some dependencies specific to TensorFlow using the following commands.
```bash
conda create -n env
conda activate env
pip install bigdl-nano[tensorflow]
```
After installing bigdl-nano, you can run the following command to setup a few environment variables.
```bash
source bigdl-nano-init
```
The `bigdl-nano-init` scripts will export a few environment variable according to your hardware to maximize performance.
In a conda environment, `source bigdl-nano-init` will also be added to `$CONDA_PREFIX/etc/conda/activate.d/`, which will automaticly run when you activate your current environment.
In a pure pip environment, you need to run `source bigdl-nano-init` every time you open a new shell to get optimal performance and run `source bigdl-nano-unset-env` if you want to unset these environment variables.
---
## **3. Get Started**
#### **3.1 PyTorch**
BigDL-Nano supports both PyTorch and PyTorch Lightning models and most optimizations require only changing a few "import" lines in your code and adding a few flags. BigDL-Nano supports both PyTorch and PyTorch Lightning models and most optimizations require only changing a few "import" lines in your code and adding a few flags.
@ -74,7 +36,8 @@ MyNano(use_ipex=True, num_processes=2).train()
For more details on the BigDL-Nano's PyTorch usage, please refer to the [PyTorch Training](../QuickStart/pytorch_train.md) and [PyTorch Inference](../QuickStart/pytorch_inference.md) page. For more details on the BigDL-Nano's PyTorch usage, please refer to the [PyTorch Training](../QuickStart/pytorch_train.md) and [PyTorch Inference](../QuickStart/pytorch_inference.md) page.
### **3.2 TensorFlow**
### **TensorFlow Bite-sized Example**
BigDL-Nano supports `tensorflow.keras` API and most optimizations require only changing a few "import" lines in your code and adding a few flags. BigDL-Nano supports `tensorflow.keras` API and most optimizations require only changing a few "import" lines in your code and adding a few flags.
@ -104,4 +67,4 @@ model.compile(optimizer='adam',
model.fit(x_train, y_train, epochs=5, num_processes=4) model.fit(x_train, y_train, epochs=5, num_processes=4)
``` ```
For more details on the BigDL-Nano's PyTorch usage, please refer to the [TensorFlow Training](../QuickStart/tensorflow_train.md) and [TensorFlow Inference](../QuickStart/tensorflow_inference.md) page. For more details on the BigDL-Nano's Tensorflow usage, please refer to the [TensorFlow Training](../QuickStart/tensorflow_train.md) and [TensorFlow Inference](../QuickStart/tensorflow_inference.md) page.

View file

@ -1,4 +1,4 @@
# BigDL-Nano PyTorch Inference Overview # PyTorch Inference
BigDL-Nano provides several APIs which can help users easily apply optimizations on inference pipelines to improve latency and throughput. Currently, performance accelerations are achieved by integrating extra runtimes as inference backend engines or using quantization methods on full-precision trained models to reduce computation during inference. InferenceOptimizer (`bigdl.nano.pytorch.InferenceOptimizer`) provides the APIs for all optimizations that you need for inference. BigDL-Nano provides several APIs which can help users easily apply optimizations on inference pipelines to improve latency and throughput. Currently, performance accelerations are achieved by integrating extra runtimes as inference backend engines or using quantization methods on full-precision trained models to reduce computation during inference. InferenceOptimizer (`bigdl.nano.pytorch.InferenceOptimizer`) provides the APIs for all optimizations that you need for inference.
@ -70,7 +70,7 @@ y_hat = ort_model(x)
trainer.validate(ort_model, dataloader) trainer.validate(ort_model, dataloader)
trainer.test(ort_model, dataloader) trainer.test(ort_model, dataloader)
trainer.predict(ort_model, dataloader) trainer.predict(ort_model, dataloader)
# note that `ort_model` is not trainable any more, so you can't use like # note that `ort_model` is not trainable any more, so you can't use like
# trainer.fit(ort_model, dataloader) # this is illegal # trainer.fit(ort_model, dataloader) # this is illegal
``` ```
### OpenVINO Acceleration ### OpenVINO Acceleration
@ -93,7 +93,7 @@ trainer = Trainer()
trainer.validate(ort_model, dataloader) trainer.validate(ort_model, dataloader)
trainer.test(ort_model, dataloader) trainer.test(ort_model, dataloader)
trainer.predict(ort_model, dataloader) trainer.predict(ort_model, dataloader)
# note that `ort_model` is not trainable any more, so you can't use like # note that `ort_model` is not trainable any more, so you can't use like
# trainer.fit(ort_model, dataloader) # this is illegal # trainer.fit(ort_model, dataloader) # this is illegal
``` ```
@ -122,7 +122,7 @@ trainer.validate(q_model, dataloader)
trainer.test(q_model, dataloader) trainer.test(q_model, dataloader)
trainer.predict(q_model, dataloader) trainer.predict(q_model, dataloader)
``` ```
This is a most basic usage to quantize a model with defaults, INT8 precision, and without search tuning space to control accuracy drop. This is a most basic usage to quantize a model with defaults, INT8 precision, and without search tuning space to control accuracy drop.
**Quantization with ONNXRuntime accelerator** **Quantization with ONNXRuntime accelerator**
@ -146,7 +146,7 @@ Using `accelerator='onnxruntime'` actually equals to converting the model from P
ort_model = InferenceOptimizer.trace(model, accelerator='onnruntime', input_sample=x): ort_model = InferenceOptimizer.trace(model, accelerator='onnruntime', input_sample=x):
ort_q_model = InferenceOptimizer.quantize(ort_model, accelerator='onnxruntime', calib_dataloader=dataloader) ort_q_model = InferenceOptimizer.quantize(ort_model, accelerator='onnxruntime', calib_dataloader=dataloader)
# run inference with transparent acceleration # run inference with transparent acceleration
y_hat = ort_q_model(x) y_hat = ort_q_model(x)
trainer.validate(ort_q_model, dataloader) trainer.validate(ort_q_model, dataloader)
trainer.test(ort_q_model, dataloader) trainer.test(ort_q_model, dataloader)
@ -174,7 +174,7 @@ Same as using ONNXRuntime accelerator, it equals to converting the model from Py
ov_model = InferenceOptimizer.trace(model, accelerator='openvino', input_sample=x): ov_model = InferenceOptimizer.trace(model, accelerator='openvino', input_sample=x):
ov_q_model = InferenceOptimizer.quantize(ov_model, accelerator='onnxruntime', calib_dataloader=dataloader) ov_q_model = InferenceOptimizer.quantize(ov_model, accelerator='onnxruntime', calib_dataloader=dataloader)
# run inference with transparent acceleration # run inference with transparent acceleration
y_hat = ov_q_model(x) y_hat = ov_q_model(x)
trainer.validate(ov_q_model, dataloader) trainer.validate(ov_q_model, dataloader)
trainer.test(ov_q_model, dataloader) trainer.test(ov_q_model, dataloader)

View file

@ -1,4 +1,4 @@
# BigDL-Nano PyTorch Training Overview # PyTorch Training
BigDL-Nano can be used to accelerate PyTorch or PyTorch-Lightning applications on training workloads. The optimizations in BigDL-Nano are delivered through an extended version of PyTorch-Lightning `Trainer`. These optimizations are either enabled by default or can be easily turned on by setting a parameter or calling a method. BigDL-Nano can be used to accelerate PyTorch or PyTorch-Lightning applications on training workloads. The optimizations in BigDL-Nano are delivered through an extended version of PyTorch-Lightning `Trainer`. These optimizations are either enabled by default or can be easily turned on by setting a parameter or calling a method.

View file

@ -1,12 +1,13 @@
# BigDL-Nano TensorFlow Inference Overview # TensorFlow Inference
BigDL-Nano provides several APIs which can help users easily apply optimizations on inference pipelines to improve latency and throughput. Currently, performance accelerations are achieved by integrating extra runtimes as inference backend engines or using quantization methods on full-precision trained models to reduce computation during inference. Keras Model (`bigdl.nano.tf.keras.Model`) and Sequential (`bigdl.nano.tf.keras.Sequential`) provides the APIs for all optimizations that you need for inference.
BigDL-Nano provides several APIs which can help users easily apply optimizations on inference pipelines to improve latency and throughput. Currently, performance accelerations are achieved by integrating extra runtimes as inference backend engines or using quantization methods on full-precision trained models to reduce computation during inference. Keras Model (`bigdl.nano.tf.keras.Model`) and Sequential (`bigdl.nano.tf.keras.Sequential`) provides the APIs for all optimizations that you need for inference.
For quantization, BigDL-Nano provides only post-training quantization in `Model.quantize()` for users to infer with models of 8-bit precision. Quantization-Aware Training is not available for now. Model conversion to 16-bit like BF16, and FP16 will be coming soon. For quantization, BigDL-Nano provides only post-training quantization in `Model.quantize()` for users to infer with models of 8-bit precision. Quantization-Aware Training is not available for now. Model conversion to 16-bit like BF16, and FP16 will be coming soon.
Before you go ahead with these APIs, you have to make sure BigDL-Nano is correctly installed for TensorFlow. If not, please follow [this](../Overview/nano.md) to set up your environment. Before you go ahead with these APIs, you have to make sure BigDL-Nano is correctly installed for TensorFlow. If not, please follow [this](../Overview/nano.md) to set up your environment.
## Quantization ## Quantization
Quantization is widely used to compress models to a lower precision, which not only reduces the model size but also accelerates inference. BigDL-Nano provides `Model.quantize()` API for users to quickly obtain a quantized model with accuracy control by specifying a few arguments. `Sequential` has similar usage, so we will only show how to use an instance of `Model` to enable quantization pipeline here. Quantization is widely used to compress models to a lower precision, which not only reduces the model size but also accelerates inference. BigDL-Nano provides `Model.quantize()` API for users to quickly obtain a quantized model with accuracy control by specifying a few arguments. `Sequential` has similar usage, so we will only show how to use an instance of `Model` to enable quantization pipeline here.
To use INC as your quantization engine, you can choose accelerator as `None` or `'onnxruntime'`. Otherwise, `accelerator='openvino'` means using OpenVINO POT to do quantization. To use INC as your quantization engine, you can choose accelerator as `None` or `'onnxruntime'`. Otherwise, `accelerator='openvino'` means using OpenVINO POT to do quantization.

View file

@ -1,4 +1,4 @@
# BigDL-Nano TensorFlow Training Overview # TensorFlow Training
BigDL-Nano can be used to accelerate TensorFlow Keras applications on training workloads. The optimizations in BigDL-Nano are delivered through BigDL-Nano's `Model` and `Sequential` classes, which have identical APIs with `tf.keras.Model` and `tf.keras.Sequential`. For most cases, you can just replace your `tf.keras.Model` with `bigdl.nano.tf.keras.Model` and `tf.keras.Sequential` with `bigdl.nano.tf.keras.Sequential` to benefit from BigDL-Nano. BigDL-Nano can be used to accelerate TensorFlow Keras applications on training workloads. The optimizations in BigDL-Nano are delivered through BigDL-Nano's `Model` and `Sequential` classes, which have identical APIs with `tf.keras.Model` and `tf.keras.Sequential`. For most cases, you can just replace your `tf.keras.Model` with `bigdl.nano.tf.keras.Model` and `tf.keras.Sequential` with `bigdl.nano.tf.keras.Sequential` to benefit from BigDL-Nano.
@ -38,6 +38,6 @@ model.compile(optimizer='adam',
model.fit(train_ds, epochs=3, validation_data=val_ds, num_processes=2) model.fit(train_ds, epochs=3, validation_data=val_ds, num_processes=2)
``` ```
Note that, different from the conventions in [BigDL-Nano PyTorch multi-instance training](./pytorch_train.html#multi-instance-training), the effective batch size will not change in TensorFlow multi-instance training, which means it is still the batch size you specify in your dataset. This is because TensorFlow's `MultiWorkerMirroredStrategy` will try to split the batch into multiple sub-batches for different workers. We chose this behavior to match the semantics of TensorFlow distributed training. Note that, different from the conventions in [BigDL-Nano PyTorch multi-instance training](./pytorch_train.html#multi-instance-training), the effective batch size will not change in TensorFlow multi-instance training, which means it is still the batch size you specify in your dataset. This is because TensorFlow's `MultiWorkerMirroredStrategy` will try to split the batch into multiple sub-batches for different workers. We chose this behavior to match the semantics of TensorFlow distributed training.
When you do want to increase your effective `batch_size`, you can do so by directly changing it in your dataset definition and you may also want to gradually increase the learning rate linearly to the `batch_size`, as described in this [paper](https://arxiv.org/abs/1706.02677) published by Facebook. When you do want to increase your effective `batch_size`, you can do so by directly changing it in your dataset definition and you may also want to gradually increase the learning rate linearly to the `batch_size`, as described in this [paper](https://arxiv.org/abs/1706.02677) published by Facebook.

View file

@ -0,0 +1,63 @@
BigDL-Nano
=========================
**BigDL-Nano** (or **Nano** for short) is a Python package to transparently accelerate PyTorch and TensorFlow applications on Intel hardware. It provides a unified and easy-to-use API for several optimization techniques and tools, so that users can only apply a few lines of code changes to make their PyTorch or TensorFlow code run faster.
-------
.. grid:: 1 2 2 2
:gutter: 2
.. grid-item-card::
**Get Started**
^^^
Documents in these sections helps you getting started quickly with Nano.
+++
:bdg-link:`Nano in 5 minutes <./Overview/nano.html>` |
:bdg-link:`Installation <./Overview/install.html>` |
:bdg-link:`Tutorials <./QuickStart/index.html>`
.. grid-item-card::
**Key Features Guide**
^^^
Each guide in this section provides you with in-depth information, concepts and knowledges about Nano key features.
+++
:bdg:`PyTorch` :bdg-link:`Infer <./Overview/pytorch_inference.html>` :bdg-link:`Train <./Overview/pytorch_train.html>` |
:bdg:`TensorFlow` :bdg-link:`Infer <./Overview/tensorflow_inference.html>` :bdg-link:`Train <./Overview/tensorflow_train.html>`
.. grid-item-card::
**How-to Guide**
^^^
How-to Guide provides bite-sized, actionable examples of how to use specific Nano features, different from our tutorials
which are full-length examples each implementing a full usage scenario.
+++
:bdg-link:`How-to-Guide <./Howto/index.html>`
.. grid-item-card::
**API Document**
^^^
API Document provides detailed description of Nano APIs.
+++
:bdg-link:`API Document <../PythonAPI/Nano/index.html>`
.. toctree::
:hidden:
BigDL-Nano Document <self>

View file

@ -2,25 +2,8 @@
--- ---
**Orca `AutoEstimator` provides similar APIs as Orca `Estimator` for distributed hyper-parameter tuning.** **Orca `AutoEstimator` provides similar APIs as Orca `Estimator` for distributed hyper-parameter tuning.**
### **Install**
We recommend using [conda](https://docs.conda.io/projects/conda/en/latest/user-guide/install/) to prepare the Python environment.
```bash
conda create -n bigdl-orca-automl python=3.7 # "bigdl-orca-automl" is conda environment name, you can use any name you like.
conda activate bigdl-orca-automl
pip install bigdl-orca[automl]
````
You can install the latest release version of BigDL Orca as follows:
```bash
pip install --pre --upgrade bigdl-orca[automl]
```
_Note that with extra key of [automl], `pip` will automatically install the additional dependencies for distributed hyper-parameter tuning,
including `ray[tune]==1.9.2`, `scikit-learn`, `tensorboard`, `xgboost`._
To use [Pytorch Estimator](#pytorch-autoestimator), you need to install Pytorch with `pip install torch==1.8.1`.
To use [TensorFlow/Keras AutoEstimator](#tensorflow-keras-autoestimator), you need to install Tensorflow with `pip install tensorflow==1.15.0`.
### **1. AutoEstimator** ### **1. AutoEstimator**
@ -28,11 +11,11 @@ To use [TensorFlow/Keras AutoEstimator](#tensorflow-keras-autoestimator), you ne
To perform distributed hyper-parameter tuning, user can first create an Orca `AutoEstimator` from standard TensorFlow Keras or PyTorch model, and then call `AutoEstimator.fit`. To perform distributed hyper-parameter tuning, user can first create an Orca `AutoEstimator` from standard TensorFlow Keras or PyTorch model, and then call `AutoEstimator.fit`.
Under the hood, the Orca `AutoEstimator` generates different trials and schedules them on each mode in the cluster. Each trial runs a different combination of hyper parameters, sampled from the user-desired hyper-parameter space. Under the hood, the Orca `AutoEstimator` generates different trials and schedules them on each mode in the cluster. Each trial runs a different combination of hyper parameters, sampled from the user-desired hyper-parameter space.
HDFS is used to save temporary results of each trial and all the results will be finally transferred to driver for further analysis. HDFS is used to save temporary results of each trial and all the results will be finally transferred to driver for further analysis.
### **2. Pytorch AutoEstimator** ### **2. Pytorch AutoEstimator**
User could pass *Creator Function*s, including *Data Creator Function*, *Model Creator Function* and *Optimizer Creator Function* to `AutoEstimator` for training. User could pass *Creator Function*s, including *Data Creator Function*, *Model Creator Function* and *Optimizer Creator Function* to `AutoEstimator` for training.
The *Creator Function*s should take a parameter of `config` as input and get the hyper-parameter values from `config` to enable hyper parameter search. The *Creator Function*s should take a parameter of `config` as input and get the hyper-parameter values from `config` to enable hyper parameter search.
@ -64,7 +47,7 @@ class LeNet(nn.Module):
self.conv2 = nn.Conv2d(20, 50, 5, 1) self.conv2 = nn.Conv2d(20, 50, 5, 1)
self.fc1 = nn.Linear(4*4*50, fc1_hidden_size) self.fc1 = nn.Linear(4*4*50, fc1_hidden_size)
self.fc2 = nn.Linear(fc1_hidden_size, 10) self.fc2 = nn.Linear(fc1_hidden_size, 10)
def forward(self, x): def forward(self, x):
pass pass
@ -75,7 +58,7 @@ def model_creator(config):
``` ```
#### **2.3 Optimizer Creator Function** #### **2.3 Optimizer Creator Function**
*Optimizer Creator Function* takes `model` and `config` as input, and returns a `torch.optim.Optimizer` object. *Optimizer Creator Function* takes `model` and `config` as input, and returns a `torch.optim.Optimizer` object.
```python ```python
import torch import torch
def optim_creator(model, config): def optim_creator(model, config):
@ -170,7 +153,7 @@ search_space = {
``` ```
#### **4.2 Advanced Search Algorithms** #### **4.2 Advanced Search Algorithms**
Beside grid search and random search, user could also choose to use some advanced hyper-parameter optimization methods, Beside grid search and random search, user could also choose to use some advanced hyper-parameter optimization methods,
such as [Ax](https://ax.dev/), [Bayesian Optimization](https://github.com/fmfn/BayesianOptimization), [Scikit-Optimize](https://scikit-optimize.github.io), etc. We supported all *Search Algorithms* in [Ray Tune](https://docs.ray.io/en/master/index.html). View the [Ray Tune Search Algorithms](https://docs.ray.io/en/master/tune/api_docs/suggestion.html) for more details. such as [Ax](https://ax.dev/), [Bayesian Optimization](https://github.com/fmfn/BayesianOptimization), [Scikit-Optimize](https://scikit-optimize.github.io), etc. We supported all *Search Algorithms* in [Ray Tune](https://docs.ray.io/en/master/index.html). View the [Ray Tune Search Algorithms](https://docs.ray.io/en/master/tune/api_docs/suggestion.html) for more details.
Note that you should install the dependency for your search algorithm manually. Note that you should install the dependency for your search algorithm manually.
@ -207,7 +190,7 @@ We support all *Schedulers* in [Ray Tune](https://docs.ray.io/en/master/index.ht
User can pass the *Scheduler* name to `scheduler` in `AutoEstimator.fit`. The *Scheduler* names supported are "fifo", "hyperband", "async_hyperband", "median_stopping_rule", "hb_bohb", "pbt", "pbt_replay". User can pass the *Scheduler* name to `scheduler` in `AutoEstimator.fit`. The *Scheduler* names supported are "fifo", "hyperband", "async_hyperband", "median_stopping_rule", "hb_bohb", "pbt", "pbt_replay".
The default `scheduler` is "fifo", which just runs trials in submission order. The default `scheduler` is "fifo", which just runs trials in submission order.
See examples below about how to use *Scheduler* in `AutoEstimator`. See examples below about how to use *Scheduler* in `AutoEstimator`.
```python ```python
scheduler_params = dict( scheduler_params = dict(
max_t=50, max_t=50,

View file

@ -0,0 +1,2 @@
Orca Key Features
=================================

View file

@ -0,0 +1,8 @@
Orca Key Features
=================================
* `Orca Context <orca-context.html>`_
* `Distributed Data Processing <data-parallel-processing.html>`_
* `Distributed Training and Inference <distributed-training-inference.html>`_
* `Distributed Hyper Parameter Tuning <distributed-tuning.html>`_
* `RayOnSpark <ray.html>`_

View file

@ -0,0 +1,45 @@
# Installation
## To use Distributed Data processing, training, and/or inference
We recommend using [conda](https://docs.conda.io/projects/conda/en/latest/user-guide/install/) to prepare the Python environment.
```bash
conda create -n py37 python=3.7 # "py37" is conda environment name, you can use any name you like.
conda activate py37
pip install bigdl-orca
```
You can install bigdl-orca nightly build version using
```bash
pip install --pre --upgrade bigdl-orca
```
## To use RayOnSpark
There're some additional dependencies required for running [RayOnSpark](ray.md). Use extra key `[ray]` to install.
```bash
pip install bigdl-orca[ray]
```
or to install nightly build, use
```bash
pip install --pre --upgrade bigdl-orca[ray]
```
## To use Orca AutoML
There're some additional dependencies required for Orca AutoML support. Use extra key `[automl]` to install.
```bash
pip install bigdl-orca[automl]
````
_Note that with extra key of [automl], `pip` will automatically install the additional dependencies for distributed hyper-parameter tuning,
including `ray[tune]==1.9.2`, `scikit-learn`, `tensorboard`, `xgboost`._
To use [Pytorch Estimator](#pytorch-autoestimator), you need to install Pytorch with `pip install torch==1.8.1`.
To use [TensorFlow/Keras AutoEstimator](#tensorflow-keras-autoestimator), you need to install Tensorflow with `pip install tensorflow==1.15.0`.

View file

@ -6,7 +6,7 @@
This error occurs while running Orca TF2 Estimator with spark backend, which may because the previous pyspark tensorflow job was not cleaned completely. You can retry later or you can set spark config `spark.python.worker.reuse=false` in your application. This error occurs while running Orca TF2 Estimator with spark backend, which may because the previous pyspark tensorflow job was not cleaned completely. You can retry later or you can set spark config `spark.python.worker.reuse=false` in your application.
If you are using `init_orca_context(cluster_mode="yarn-client")`: If you are using `init_orca_context(cluster_mode="yarn-client")`:
``` ```
conf = {"spark.python.worker.reuse": "false"} conf = {"spark.python.worker.reuse": "false"}
init_orca_context(cluster_mode="yarn-client", conf=conf) init_orca_context(cluster_mode="yarn-client", conf=conf)
@ -19,10 +19,10 @@ If you are using `init_orca_context(cluster_mode="yarn-client")`:
### **RuntimeError: Inter op parallelism cannot be modified after initialization** ### **RuntimeError: Inter op parallelism cannot be modified after initialization**
This error occurs if you build your TensorFlow model on the driver rather than on workers. You should build the complete model in `model_creator` which runs on each worker node. You can refer to the following examples: This error occurs if you build your TensorFlow model on the driver rather than on workers. You should build the complete model in `model_creator` which runs on each worker node. You can refer to the following examples:
**Wrong Example** **Wrong Example**
``` ```
model = ... model = ...
def model_creator(config): def model_creator(config):
model.compile(...) model.compile(...)
@ -85,3 +85,43 @@ To solve this issue, you need to set the path of `libhdfs.so` in Cloudera to the
# For yarn-cluster mode # For yarn-cluster mode
spark-submit --conf spark.executorEnv.ARROW_LIBHDFS_DIR=/opt/cloudera/parcels/CDH-5.15.2-1.cdh5.15.2.p0.3/lib64 \ spark-submit --conf spark.executorEnv.ARROW_LIBHDFS_DIR=/opt/cloudera/parcels/CDH-5.15.2-1.cdh5.15.2.p0.3/lib64 \
--conf spark.yarn.appMasterEnv.ARROW_LIBHDFS_DIR=/opt/cloudera/parcels/CDH-5.15.2-1.cdh5.15.2.p0.3/lib64 --conf spark.yarn.appMasterEnv.ARROW_LIBHDFS_DIR=/opt/cloudera/parcels/CDH-5.15.2-1.cdh5.15.2.p0.3/lib64
### **Spark Dynamic Allocation**
By design, BigDL does not support Spark Dynamic Allocation mode, and needs to allocate fixed resources for deep learning model training. Thus if your environment has already configured Spark Dynamic Allocation, or stipulated that Spark Dynamic Allocation must be used, you may encounter the following error:
> **requirement failed: Engine.init: spark.dynamicAllocation.maxExecutors and spark.dynamicAllocation.minExecutors must be identical in dynamic allocation for BigDL**
>
Here we provide a workaround for running BigDL under Spark Dynamic Allocation mode.
For `spark-submit` cluster mode, the first solution is to disable the Spark Dynamic Allocation mode in `SparkConf` when you submit your application as follows:
```bash
spark-submit --conf spark.dynamicAllocation.enabled=false
```
Otherwise, if you can not set this configuration due to your cluster settings, you can set `spark.dynamicAllocation.minExecutors` to be equal to `spark.dynamicAllocation.maxExecutors` as follows:
```bash
spark-submit --conf spark.dynamicAllocation.enabled=true \
--conf spark.dynamicAllocation.minExecutors 2 \
--conf spark.dynamicAllocation.maxExecutors 2
```
For other cluster modes, such as `yarn` and `k8s`, our program will initiate `SparkContext` for you, and the Spark Dynamic Allocation mode is disabled by default. Thus, generally you wouldn't encounter such problem.
If you are using Spark Dynamic Allocation, you have to disable barrier execution mode at the very beginning of your application as follows:
```python
from bigdl.orca import OrcaContext
OrcaContext.barrier_mode = False
```
For Spark Dynamic Allocation mode, you are also recommended to manually set `num_ray_nodes` and `ray_node_cpu_cores` equal to `spark.dynamicAllocation.minExecutors` and `spark.executor.cores` respectively. You can specify `num_ray_nodes` and `ray_node_cpu_cores` in `init_orca_context` as follows:
```python
init_orca_context(..., num_ray_nodes=2, ray_node_cpu_cores=4)
```

View file

@ -1,30 +1,12 @@
# The Orca Library # Orca in 5 minutes
## 1. Overview ### Overview
Most AI projects start with a Python notebook running on a single laptop; however, one usually needs to go through a mountain of pains to scale it to handle larger data set in a distributed fashion. The _**Orca**_ library seamlessly scales out your single node Python notebook across large clusters (so as to process distributed Big Data). Most AI projects start with a Python notebook running on a single laptop; however, one usually needs to go through a mountain of pains to scale it to handle larger data set in a distributed fashion. The _**Orca**_ library seamlessly scales out your single node Python notebook across large clusters (so as to process distributed Big Data).
## 2. Install ---
We recommend using [conda](https://docs.conda.io/projects/conda/en/latest/user-guide/install/) to prepare the Python environment.
```bash
conda create -n py37 python=3.7 # "py37" is conda environment name, you can use any name you like.
conda activate py37
pip install bigdl-orca
```
When installing bigdl-orca with pip, you can specify the extras key `[ray]` to additionally install the additional dependencies ### **Tensorflow Bite-sized Example**
essential for running [RayOnSpark](../../Ray/Overview/ray.md)
```bash
pip install bigdl-orca[ray]
```
You can install bigdl-orca nightly release version using
```bash
pip install --pre --upgrade bigdl-orca
pip install --pre --upgrade bigdl-orca[ray]
```
## 3. Run
This section uses TensorFlow 1.15, and you should install TensorFlow before running this example: This section uses TensorFlow 1.15, and you should install TensorFlow before running this example:
```bash ```bash
@ -37,7 +19,7 @@ First, initialize [Orca Context](orca-context.md):
from bigdl.orca import init_orca_context, OrcaContext from bigdl.orca import init_orca_context, OrcaContext
# cluster_mode can be "local", "k8s" or "yarn" # cluster_mode can be "local", "k8s" or "yarn"
sc = init_orca_context(cluster_mode="local", cores=4, memory="10g", num_nodes=1) sc = init_orca_context(cluster_mode="local", cores=4, memory="10g", num_nodes=1)
``` ```
Next, perform [data-parallel processing in Orca](data-parallel-processing.md) (supporting standard Spark Dataframes, TensorFlow Dataset, PyTorch DataLoader, Pandas, etc.): Next, perform [data-parallel processing in Orca](data-parallel-processing.md) (supporting standard Spark Dataframes, TensorFlow Dataset, PyTorch DataLoader, Pandas, etc.):
@ -47,7 +29,7 @@ from pyspark.sql.functions import array
spark = OrcaContext.get_spark_session() spark = OrcaContext.get_spark_session()
df = spark.read.parquet(file_path) df = spark.read.parquet(file_path)
df = df.withColumn('user', array('user')) \ df = df.withColumn('user', array('user')) \
.withColumn('item', array('item')) .withColumn('item', array('item'))
``` ```
@ -57,24 +39,19 @@ Finally, use [sklearn-style Estimator APIs in Orca](distributed-training-inferen
from tensorflow import keras from tensorflow import keras
from bigdl.orca.learn.tf.estimator import Estimator from bigdl.orca.learn.tf.estimator import Estimator
user = keras.layers.Input(shape=[1]) user = keras.layers.Input(shape=[1])
item = keras.layers.Input(shape=[1]) item = keras.layers.Input(shape=[1])
feat = keras.layers.concatenate([user, item], axis=1) feat = keras.layers.concatenate([user, item], axis=1)
predictions = keras.layers.Dense(2, activation='softmax')(feat) predictions = keras.layers.Dense(2, activation='softmax')(feat)
model = keras.models.Model(inputs=[user, item], outputs=predictions) model = keras.models.Model(inputs=[user, item], outputs=predictions)
model.compile(optimizer='rmsprop', model.compile(optimizer='rmsprop',
loss='sparse_categorical_crossentropy', loss='sparse_categorical_crossentropy',
metrics=['accuracy']) metrics=['accuracy'])
est = Estimator.from_keras(keras_model=model) est = Estimator.from_keras(keras_model=model)
est.fit(data=df, est.fit(data=df,
batch_size=64, batch_size=64,
epochs=4, epochs=4,
feature_cols=['user', 'item'], feature_cols=['user', 'item'],
label_cols=['label']) label_cols=['label'])
``` ```
## Get Started
See [TensorFlow](../QuickStart/orca-tf-quickstart.md) and [PyTorch](../QuickStart/orca-pytorch-quickstart.md) quickstart for more details.

View file

@ -2,9 +2,9 @@
--- ---
[Ray](https://github.com/ray-project/ray) is an open source distributed framework for emerging AI applications. [Ray](https://github.com/ray-project/ray) is an open source distributed framework for emerging AI applications.
With the _**RayOnSpark**_ support packaged in [BigDL Orca](../../Orca/Overview/orca.md), With the _**RayOnSpark**_ support packaged in [BigDL Orca](../Overview/orca.md),
Users can seamlessly integrate Ray applications into the big data processing pipeline on the underlying Big Data cluster Users can seamlessly integrate Ray applications into the big data processing pipeline on the underlying Big Data cluster
(such as [Hadoop/YARN](../../UserGuide/hadoop.md) or [K8s](../../UserGuide/k8s.md)). (such as [Hadoop/YARN](../../UserGuide/hadoop.md) or [K8s](../../UserGuide/k8s.md)).
_**Note:** BigDL has been tested on Ray 1.9.2 and you are highly recommended to use this tested version._ _**Note:** BigDL has been tested on Ray 1.9.2 and you are highly recommended to use this tested version._
@ -12,8 +12,8 @@ _**Note:** BigDL has been tested on Ray 1.9.2 and you are highly recommended to
### **1. Install** ### **1. Install**
We recommend using [conda](https://docs.conda.io/projects/conda/en/latest/user-guide/install/) to prepare the Python environment. We recommend using [conda](https://docs.conda.io/projects/conda/en/latest/user-guide/install/) to prepare the Python environment.
When installing bigdl-orca with pip, you can specify the extras key `[ray]` to install the additional dependencies When installing bigdl-orca with pip, you can specify the extras key `[ray]` to install the additional dependencies
for running Ray (i.e. `ray==1.9.2`, `psutil`, `aiohttp==3.7.0`, `aioredis==1.1.0`, `setproctitle`, `hiredis==1.1.0`, `async-timeout==3.0.1`): for running Ray (i.e. `ray==1.9.2`, `psutil`, `aiohttp==3.7.0`, `aioredis==1.1.0`, `setproctitle`, `hiredis==1.1.0`, `async-timeout==3.0.1`):
```bash ```bash
@ -23,7 +23,7 @@ conda activate py37
pip install bigdl-orca[ray] pip install bigdl-orca[ray]
``` ```
View [Python User Guide](../../UserGuide/python.html#install) and [Orca User Guide](../../Orca/Overview/orca.md) for more installation instructions. View [Python User Guide](../../UserGuide/python.html#install) and [Orca User Guide](../Overview/orca.md) for more installation instructions.
--- ---
### **2. Initialize** ### **2. Initialize**
@ -45,9 +45,9 @@ You can input the following RayOnSpark related arguments when you `init_orca_con
- `extra_params`: The key value dict for extra options to launch ray. For example, `extra_params={"dashboard-port": "11281", "temp-dir": "/tmp/ray/"}`. - `extra_params`: The key value dict for extra options to launch ray. For example, `extra_params={"dashboard-port": "11281", "temp-dir": "/tmp/ray/"}`.
- `include_webui`: Default is True for including web ui when starting ray. - `include_webui`: Default is True for including web ui when starting ray.
- `system_config`: The key value dict for overriding RayConfig defaults. Mainly for testing purposes. An example for system_config could be: `{"object_spilling_config":"{\"type\":\"filesystem\", \"params\":{\"directory_path\":\"/tmp/spill\"}}"}`. - `system_config`: The key value dict for overriding RayConfig defaults. Mainly for testing purposes. An example for system_config could be: `{"object_spilling_config":"{\"type\":\"filesystem\", \"params\":{\"directory_path\":\"/tmp/spill\"}}"}`.
- `num_ray_nodes`: The number of ray processes to start across the cluster. For Spark local mode, you don't need to specify this value. - `num_ray_nodes`: The number of ray processes to start across the cluster. For Spark local mode, you don't need to specify this value.
For Spark cluster mode, it is default to be the number of Spark executors. If spark.executor.instances can't be detected in your SparkContext, you need to explicitly specify this. It is recommended that num_ray_nodes is not larger than the number of Spark executors to make sure there are enough resources in your cluster. For Spark cluster mode, it is default to be the number of Spark executors. If spark.executor.instances can't be detected in your SparkContext, you need to explicitly specify this. It is recommended that num_ray_nodes is not larger than the number of Spark executors to make sure there are enough resources in your cluster.
- `ray_node_cpu_cores`: The number of available cores for each ray process. For Spark local mode, it is default to be the number of Spark local cores. - `ray_node_cpu_cores`: The number of available cores for each ray process. For Spark local mode, it is default to be the number of Spark local cores.
For Spark cluster mode, it is default to be the number of cores for each Spark executor. If spark.executor.cores or spark.cores.max can't be detected in your SparkContext, you need to explicitly specify this. It is recommended that ray_node_cpu_cores is not larger than the number of cores for each Spark executor to make sure there are enough resources in your cluster. For Spark cluster mode, it is default to be the number of cores for each Spark executor. If spark.executor.cores or spark.cores.max can't be detected in your SparkContext, you need to explicitly specify this. It is recommended that ray_node_cpu_cores is not larger than the number of cores for each Spark executor to make sure there are enough resources in your cluster.
By default, the Ray cluster would be launched using Spark barrier execution mode, you can turn it off via the configurations of `OrcaContext`: By default, the Ray cluster would be launched using Spark barrier execution mode, you can turn it off via the configurations of `OrcaContext`:
@ -58,7 +58,7 @@ from bigdl.orca import OrcaContext
OrcaContext.barrier_mode = False OrcaContext.barrier_mode = False
``` ```
View [Orca Context](../../Orca/Overview/orca-context.md) for more details. View [Orca Context](../Overview/orca-context.md) for more details.
--- ---
### **3. Run** ### **3. Run**
@ -72,7 +72,7 @@ View [Orca Context](../../Orca/Overview/orca-context.md) for more details.
class Counter(object): class Counter(object):
def __init__(self): def __init__(self):
self.n = 0 self.n = 0
def increment(self): def increment(self):
self.n += 1 self.n += 1
return self.n return self.n
@ -82,11 +82,11 @@ View [Orca Context](../../Orca/Overview/orca-context.md) for more details.
print(ray.get([c.increment.remote() for c in counters])) print(ray.get([c.increment.remote() for c in counters]))
``` ```
- You can retrieve the information of the Ray cluster via [`OrcaContext`](../../Orca/Overview/orca-context.md): - You can retrieve the information of the Ray cluster via [`OrcaContext`](../Overview/orca-context.md):
```python ```python
from bigdl.orca import OrcaContext from bigdl.orca import OrcaContext
ray_ctx = OrcaContext.get_ray_context() ray_ctx = OrcaContext.get_ray_context()
address_info = ray_ctx.address_info # The dictionary information of the ray cluster, including node_ip_address, object_store_address, webui_url, etc. address_info = ray_ctx.address_info # The dictionary information of the ray cluster, including node_ip_address, object_store_address, webui_url, etc.
redis_address = ray_ctx.redis_address # The redis address of the ray cluster. redis_address = ray_ctx.redis_address # The redis address of the ray cluster.
@ -96,7 +96,7 @@ View [Orca Context](../../Orca/Overview/orca-context.md) for more details.
```python ```python
from bigdl.orca import stop_orca_context from bigdl.orca import stop_orca_context
stop_orca_context() stop_orca_context()
``` ```

View file

@ -0,0 +1,44 @@
# Orca Tutorial
- [**Orca TensorFlow 1.15 Quickstart**](./orca-tf-quickstart.html)
> ![](../../../../image/colab_logo_32px.png)[Run in Google Colab](https://colab.research.google.com/github/intel-analytics/BigDL/blob/main/python/orca/colab-notebook/quickstart/tf_lenet_mnist.ipynb) &nbsp;![](../../../../image/GitHub-Mark-32px.png)[View source on GitHub](https://github.com/intel-analytics/BigDL/blob/main/python/orca/colab-notebook/quickstart/tf_lenet_mnist.ipynb)
In this guide we will describe how to scale out TensorFlow 1.15 programs using Orca in 4 simple steps.
---------------------------
- [**Orca TensorFlow 2 Quickstart**](./orca-tf2keras-quickstart.html)
> ![](../../../../image/colab_logo_32px.png)[Run in Google Colab](https://colab.research.google.com/github/intel-analytics/BigDL/blob/main/python/orca/colab-notebook/quickstart/tf2_keras_lenet_mnist.ipynb) &nbsp;![](../../../../image/GitHub-Mark-32px.png)[View source on GitHub](https://github.com/intel-analytics/BigDL/blob/main/python/orca/colab-notebook/quickstart/tf2_keras_lenet_mnist.ipynb)
In this guide we will describe how to to scale out TensorFlow 2 programs using Orca in 4 simple steps.
---------------------------
- [**Orca Keras 2.3 Quickstart**](./orca-keras-quickstart.html)
> ![](../../../../image/colab_logo_32px.png)[Run in Google Colab](https://colab.research.google.com/github/intel-analytics/BigDL/blob/main/python/orca/colab-notebook/quickstart/keras_lenet_mnist.ipynb) &nbsp;![](../../../../image/GitHub-Mark-32px.png)[View source on GitHub](https://github.com/intel-analytics/BigDL/blob/main/python/orca/colab-notebook/quickstart/keras_lenet_mnist.ipynb)
In this guide we will describe how to scale out Keras 2.3 programs using Orca in 4 simple steps.
---------------------------
- [**Orca PyTorch Quickstart**](./orca-pytorch-quickstart.html)
> ![](../../../../image/colab_logo_32px.png)[Run in Google Colab](https://colab.research.google.com/github/intel-analytics/BigDL/blob/main/python/orca/colab-notebook/quickstart/pytorch_lenet_mnist.ipynb) &nbsp;![](../../../../image/GitHub-Mark-32px.png)[View source on GitHub](https://github.com/intel-analytics/BigDL/blob/main/python/orca/colab-notebook/quickstart/pytorch_lenet_mnist.ipynb)
In this guide we will describe how to scale out PyTorch programs using Orca in 4 simple steps.
---------------------------
- [**Orca RayOnSpark Quickstart**](./ray-quickstart.html)
> ![](../../../../image/colab_logo_32px.png)[Run in Google Colab](https://colab.research.google.com/github/intel-analytics/BigDL/blob/branch-2.0/python/orca/colab-notebook/quickstart/ray_parameter_server.ipynb) &nbsp;![](../../../../image/GitHub-Mark-32px.png)[View source on GitHub](https://github.com/intel-analytics/BigDL/blob/branch-2.0/python/orca/colab-notebook/quickstart/ray_parameter_server.ipynb)
In this guide, we will describe how to use RayOnSpark to directly run Ray programs on Big Data clusters in 2 simple steps.
---------------------------

View file

@ -33,7 +33,7 @@ elif cluster_mode == "yarn": # For Hadoop/YARN cluster
sc = init_orca_context(cluster_mode="yarn", num_nodes=2, cores=2, memory="10g", driver_memory="10g", driver_cores=1, init_ray_on_spark=True) sc = init_orca_context(cluster_mode="yarn", num_nodes=2, cores=2, memory="10g", driver_memory="10g", driver_cores=1, init_ray_on_spark=True)
``` ```
This is the only place where you need to specify local or distributed mode. See [here](./../../Ray/Overview/ray.md#initialize) for more RayOnSpark related arguments when you `init_orca_context`. This is the only place where you need to specify local or distributed mode. See [here](./../Overview/ray.md#initialize) for more RayOnSpark related arguments when you `init_orca_context`.
By default, the Ray cluster would be launched using Spark barrier execution mode, you can turn it off via the configurations of `OrcaContext`: By default, the Ray cluster would be launched using Spark barrier execution mode, you can turn it off via the configurations of `OrcaContext`:
@ -43,7 +43,7 @@ from bigdl.orca import OrcaContext
OrcaContext.barrier_mode = False OrcaContext.barrier_mode = False
``` ```
View [Orca Context](./../../Orca/Overview/orca-context.md) for more details. View [Orca Context](./../Overview/orca-context.md) for more details.
**Note:** You should `export HADOOP_CONF_DIR=/path/to/hadoop/conf/dir` when running on Hadoop YARN cluster. View [Hadoop User Guide](./../../UserGuide/hadoop.md) for more details. **Note:** You should `export HADOOP_CONF_DIR=/path/to/hadoop/conf/dir` when running on Hadoop YARN cluster. View [Hadoop User Guide](./../../UserGuide/hadoop.md) for more details.
@ -76,10 +76,10 @@ dim = 10
class ParameterServer(object): class ParameterServer(object):
def __init__(self, dim): def __init__(self, dim):
self.parameters = np.zeros(dim) self.parameters = np.zeros(dim)
def get_parameters(self): def get_parameters(self):
return self.parameters return self.parameters
def update_parameters(self, update): def update_parameters(self, update):
self.parameters += update self.parameters += update

View file

@ -0,0 +1,63 @@
BigDL-Orca
=========================
Most AI projects start with a Python notebook running on a single laptop; however, one usually needs to go through a mountain of pains to scale it to handle larger data set in a distributed fashion. The **BigDL-Orca** (or **Orca** for short) library seamlessly scales out your single node Python notebook across large clusters (so as to process distributed Big Data).
-------
.. grid:: 1 2 2 2
:gutter: 2
.. grid-item-card::
**Get Started**
^^^
Documents in these sections helps you get started quickly with Orca.
+++
:bdg-link:`Orca in 5 minutes <./Overview/orca.html>` |
:bdg-link:`Installation <./Overview/install.html>`
.. grid-item-card::
**Key Features Guide**
^^^
Each guide in this section provides you with in-depth information, concepts and knowledges about Orca key features.
+++
:bdg-link:`Data <./Overview/data-parallel-processing.html>` |
:bdg-link:`Estimator <./Overview/distributed-training-inference.html>` |
:bdg-link:`RayOnSpark <./Overview/ray.html>`
.. grid-item-card::
**Tutorials**
^^^
Orca Tutorials and Examples.
+++
:bdg-link:`Tutorials <./QuickStart/index.html>`
.. grid-item-card::
**API Document**
^^^
API Document provides detailed description of Orca APIs.
+++
:bdg-link:`API Document <../PythonAPI/Orca/index.html>`
.. toctree::
:hidden:
BigDL-Orca Document <self>

View file

@ -26,7 +26,7 @@ az group create \
--location myLocation \ --location myLocation \
--output none --output none
``` ```
#### 2.2.2 Create Linux client with SGX support #### 2.2.2 Create Linux client with SGX support
Create Linux VM through Azure [CLI](https://docs.microsoft.com/en-us/azure/developer/javascript/tutorial/nodejs-virtual-machine-vm/create-linux-virtual-machine-azure-cli)/[Portal](https://docs.microsoft.com/en-us/azure/virtual-machines/linux/quick-create-portal)/Powershell. Create Linux VM through Azure [CLI](https://docs.microsoft.com/en-us/azure/developer/javascript/tutorial/nodejs-virtual-machine-vm/create-linux-virtual-machine-azure-cli)/[Portal](https://docs.microsoft.com/en-us/azure/virtual-machines/linux/quick-create-portal)/Powershell.
For size of the VM, please choose DC-V3 Series VM with more than 4 vCPU cores. For size of the VM, please choose DC-V3 Series VM with more than 4 vCPU cores.
@ -37,30 +37,32 @@ On `Subscribe` page, input your subscription, your Azure container registry, you
* Go to your Azure container regsitry, check `Repostirories`, and find `intel_corporation/bigdl-ppml-trusted-big-data-ml-python-graphene` * Go to your Azure container regsitry, check `Repostirories`, and find `intel_corporation/bigdl-ppml-trusted-big-data-ml-python-graphene`
* Login to the created VM. Then login to your Azure container registry, pull BigDL PPML image using this command: * Login to the created VM. Then login to your Azure container registry, pull BigDL PPML image using this command:
```bash ```bash
docker pull myContainerRegistry/intel_corporation/bigdl-ppml-trusted-big-data-ml-python-graphene docker pull myContainerRegistry/intel_corporation/bigdl-ppml-trusted-big-data-ml-python-graphene
``` ```
* Start container of this image * Start container of this image
```bash
#!/bin/bash
export LOCAL_IP=YOUR_LOCAL_IP ```bash
export DOCKER_IMAGE=intel_corporation/bigdl-ppml-trusted-big-data-ml-python-graphene #!/bin/bash
sudo docker run -itd \ export LOCAL_IP=YOUR_LOCAL_IP
--privileged \ export DOCKER_IMAGE=intel_corporation/bigdl-ppml-trusted-big-data-ml-python-graphene
--net=host \
--cpuset-cpus="0-5" \ sudo docker run -itd \
--oom-kill-disable \ --privileged \
--device=/dev/gsgx \ --net=host \
--device=/dev/sgx/enclave \ --cpuset-cpus="0-5" \
--device=/dev/sgx/provision \ --oom-kill-disable \
-v /var/run/aesmd/aesm.socket:/var/run/aesmd/aesm.socket \ --device=/dev/gsgx \
--name=spark-local \ --device=/dev/sgx/enclave \
-e LOCAL_IP=$LOCAL_IP \ --device=/dev/sgx/provision \
-e SGX_MEM_SIZE=64G \ -v /var/run/aesmd/aesm.socket:/var/run/aesmd/aesm.socket \
$DOCKER_IMAGE bash --name=spark-local \
``` -e LOCAL_IP=$LOCAL_IP \
-e SGX_MEM_SIZE=64G \
$DOCKER_IMAGE bash
```
### 2.3 Create AKS(Azure Kubernetes Services) or use existing AKs ### 2.3 Create AKS(Azure Kubernetes Services) or use existing AKs
First, login to your client VM and enter your BigDL PPML container: First, login to your client VM and enter your BigDL PPML container:
@ -89,34 +91,35 @@ You can check the information by running:
/ppml/trusted-big-data-ml/azure/create-aks.sh --help /ppml/trusted-big-data-ml/azure/create-aks.sh --help
``` ```
## 2.4 Create Azure Data Lake Store Gen 2 ### 2.4 Create Azure Data Lake Store Gen 2
### 2.4.1 Create Data Lake Storage account or use an existing one. #### 2.4.1 Create Data Lake Storage account or use an existing one.
The example command to create Data Lake store is as below: The example command to create Data Lake store is as below:
```bash ```bash
az dls account create --account myDataLakeAccount --location myLocation --resource-group myResourceGroup az dls account create --account myDataLakeAccount --location myLocation --resource-group myResourceGroup
``` ```
* Create Container to put user data * Create Container to put user data
Example command to create container
```bash
az storage fs create -n myFS --account-name myDataLakeAccount --auth-mode login
```
* Create folder, upload file/folder
Example command to create folder:
```bash
az storage fs directory create -n myDirectory -f myFS --account-name myDataLakeAccount --auth-mode login
```
Example command to upload file Example command to create container
```bash ```bash
az storage fs file upload -s "path/to/file" -p myDirectory/file -f myFS --account-name myDataLakeAccount --auth-mode login az storage fs create -n myFS --account-name myDataLakeAccount --auth-mode login
``` ```
Example command to upload directory * Create folder, upload file/folder
```bash
az storage fs directory upload -f myFS --account-name myDataLakeAccount -s "path/to/directory" -d myDirectory --recursive Example command to create folder
``` ```bash
### 2.4.2 Access data in Hadoop through ABFS(Azure Blob Filesystem) driver az storage fs directory create -n myDirectory -f myFS --account-name myDataLakeAccount --auth-mode login
```
Example command to upload file
```bash
az storage fs file upload -s "path/to/file" -p myDirectory/file -f myFS --account-name myDataLakeAccount --auth-mode login
```
Example command to upload directory
```bash
az storage fs directory upload -f myFS --account-name myDataLakeAccount -s "path/to/directory" -d myDirectory --recursive
```
#### 2.4.2 Access data in Hadoop through ABFS(Azure Blob Filesystem) driver
You can access Data Lake Storage in Hadoop filesytem by such URI: ```abfs[s]://file_system@account_name.dfs.core.windows.net/<path>/<path>/<file_name>``` You can access Data Lake Storage in Hadoop filesytem by such URI: ```abfs[s]://file_system@account_name.dfs.core.windows.net/<path>/<path>/<file_name>```
#### Authentication ##### Authentication
The ABFS driver supports two forms of authentication so that the Hadoop application may securely access resources contained within a Data Lake Storage Gen2 capable account. The ABFS driver supports two forms of authentication so that the Hadoop application may securely access resources contained within a Data Lake Storage Gen2 capable account.
- Shared Key: This permits users to access to ALL resources in the account. The key is encrypted and stored in Hadoop configuration. - Shared Key: This permits users to access to ALL resources in the account. The key is encrypted and stored in Hadoop configuration.
@ -124,13 +127,13 @@ The ABFS driver supports two forms of authentication so that the Hadoop applicat
By default, in our solution, we use shared key authentication. By default, in our solution, we use shared key authentication.
- Get Access key list of the storage account: - Get Access key list of the storage account:
```bash ```bash
az storage account keys list -g MyResourceGroup -n myDataLakeAccount az storage account keys list -g MyResourceGroup -n myDataLakeAccount
``` ```
Use one of the keys for authentication. Use one of the keys for authentication.
## 2.5 Create Azure Key Vault ### 2.5 Create Azure Key Vault
### 2.5.1 Create or use an existing Azure Key Vault #### 2.5.1 Create or use an existing Azure Key Vault
Example command to create key vault Example command to create key vault
```bash ```bash
az keyvault create -n myKeyVault -g myResourceGroup -l location az keyvault create -n myKeyVault -g myResourceGroup -l location
@ -142,29 +145,30 @@ Take note of the following properties for use in the next section:
* The name of your Azure key vault resource * The name of your Azure key vault resource
* The Azure tenant ID that the subscription belongs to * The Azure tenant ID that the subscription belongs to
### 2.5.2 Set access policy for the client VM #### 2.5.2 Set access policy for the client VM
* Run such command to get the system identity: * Run such command to get the system identity:
```bash ```bash
az vm identity assign -g myResourceGroup -n myVM az vm identity assign -g myResourceGroup -n myVM
``` ```
The output would be like this: The output would be like this:
```bash ```bash
{ {
"systemAssignedIdentity": "ff5505d6-8f72-4b99-af68-baff0fbd20f5", "systemAssignedIdentity": "ff5505d6-8f72-4b99-af68-baff0fbd20f5",
"userAssignedIdentities": {} "userAssignedIdentities": {}
} }
``` ```
Take note of the systemAssignedIdentity of the client VM. Take note of the systemAssignedIdentity of the client VM.
* Set access policy for client VM * Set access policy for client VM
Example command:
```bash
az keyvault set-policy --name myKeyVault --object-id <mySystemAssignedIdentity> --secret-permissions all --key-permissions all --certificate-permissions all
```
### 2.5.3 AKS access Key Vault Example command:
#### 2.5.3.1 Set access for AKS VM ScaleSet ```bash
##### a. Find your VM ScaleSet in your AKS, and assign system managed identity to VM ScaleSet. az keyvault set-policy --name myKeyVault --object-id <mySystemAssignedIdentity> --secret-permissions all --key-permissions all --certificate-permissions all
```
#### 2.5.3 AKS access Key Vault
##### 2.5.3.1 Set access for AKS VM ScaleSet
###### a. Find your VM ScaleSet in your AKS, and assign system managed identity to VM ScaleSet.
```bash ```bash
az vm identity assign -g myResourceGroup -n myAKSVMSS az vm identity assign -g myResourceGroup -n myAKSVMSS
``` ```
@ -179,50 +183,53 @@ userAssignedIdentities:
principalId: xxxxx principalId: xxxxx
``` ```
Take note of principalId of the first line as System Managed Identity of your VMSS. Take note of principalId of the first line as System Managed Identity of your VMSS.
##### b. Set access policy for AKS VM ScaleSet ###### b. Set access policy for AKS VM ScaleSet
Example command: Example command:
```bash ```bash
az keyvault set-policy --name myKeyVault --object-id <systemManagedIdentityOfVMSS> --secret-permissions get --key-permissions all az keyvault set-policy --name myKeyVault --object-id <systemManagedIdentityOfVMSS> --secret-permissions get --key-permissions all
``` ```
#### 2.5.3.2 Set access for AKS ##### 2.5.3.2 Set access for AKS
##### a. Enable Azure Key Vault Provider for Secrets Store CSI Driver support ###### a. Enable Azure Key Vault Provider for Secrets Store CSI Driver support
Example command: Example command:
```bash ```bash
az aks enable-addons --addons azure-keyvault-secrets-provider --name myAKSCluster --resource-group myResourceGroup az aks enable-addons --addons azure-keyvault-secrets-provider --name myAKSCluster --resource-group myResourceGroup
``` ```
* Verify the Azure Key Vault Provider for Secrets Store CSI Driver installation * Verify the Azure Key Vault Provider for Secrets Store CSI Driver installation
Example command:
```bash Example command:
kubectl get pods -n kube-system -l 'app in (secrets-store-csi-driver, secrets-store-provider-azure)' ```bash
``` kubectl get pods -n kube-system -l 'app in (secrets-store-csi-driver, secrets-store-provider-azure)'
Be sure that a Secrets Store CSI Driver pod and an Azure Key Vault Provider pod are running on each node in your cluster's node pools. ```
Be sure that a Secrets Store CSI Driver pod and an Azure Key Vault Provider pod are running on each node in your cluster's node pools.
* Enable Azure Key Vault Provider for Secrets Store CSI Driver to track of secret update in key vault * Enable Azure Key Vault Provider for Secrets Store CSI Driver to track of secret update in key vault
```bash ```bash
az aks update -g myResourceGroup -n myAKSCluster --enable-secret-rotation az aks update -g myResourceGroup -n myAKSCluster --enable-secret-rotation
``` ```
#### b. Provide an identity to access the Azure Key Vault ###### b. Provide an identity to access the Azure Key Vault
There are several ways to provide identity for Azure Key Vault Provider for Secrets Store CSI Driver to access Azure Key Vault: `An Azure Active Directory pod identity`, `user-assigned identity` or `system-assigned managed identity`. In our solution, we use user-assigned managed identity. There are several ways to provide identity for Azure Key Vault Provider for Secrets Store CSI Driver to access Azure Key Vault: `An Azure Active Directory pod identity`, `user-assigned identity` or `system-assigned managed identity`. In our solution, we use user-assigned managed identity.
* Enable managed identity in AKS * Enable managed identity in AKS
```bash ```bash
az aks update -g myResourceGroup -n myAKSCluster --enable-managed-identity az aks update -g myResourceGroup -n myAKSCluster --enable-managed-identity
``` ```
* Get user-assigned managed identity that you created when you enabled a managed identity on your AKS cluster * Get user-assigned managed identity that you created when you enabled a managed identity on your AKS cluster
Run:
```bash Run:
az aks show -g myResourceGroup -n myAKSCluster --query addonProfiles.azureKeyvaultSecretsProvider.identity.clientId -o tsv ```bash
``` az aks show -g myResourceGroup -n myAKSCluster --query addonProfiles.azureKeyvaultSecretsProvider.identity.clientId -o tsv
The output would be like: ```
```bash The output would be like:
xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx ```bash
``` xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
Take note of this output as your user-assigned managed identity of Azure KeyVault Secrets Provider ```
Take note of this output as your user-assigned managed identity of Azure KeyVault Secrets Provider
* Grant your user-assigned managed identity permissions that enable it to read your key vault and view its contents * Grant your user-assigned managed identity permissions that enable it to read your key vault and view its contents
Example command:
```bash Example command:
az keyvault set-policy -n myKeyVault --key-permissions get --spn xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx ```bash
az keyvault set-policy -n myKeyVault --secret-permissions get --spn xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx az keyvault set-policy -n myKeyVault --key-permissions get --spn xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
``` az keyvault set-policy -n myKeyVault --secret-permissions get --spn xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
#### c. Create a SecretProviderClass to access your Key Vault ```
###### c. Create a SecretProviderClass to access your Key Vault
On your client docker container, edit `/ppml/trusted-big-data-ml/azure/secretProviderClass.yaml` file, modify `<client-id>` to your user-assigned managed identity of Azure KeyVault Secrets Provider, and modify `<key-vault-name>` and `<tenant-id>` to your real key vault name and tenant id. On your client docker container, edit `/ppml/trusted-big-data-ml/azure/secretProviderClass.yaml` file, modify `<client-id>` to your user-assigned managed identity of Azure KeyVault Secrets Provider, and modify `<key-vault-name>` and `<tenant-id>` to your real key vault name and tenant id.
Then run: Then run:

View file

@ -0,0 +1,8 @@
Tutorials & Examples
=====================================
* `A Hello World Example <../Overview/quicktour.html>`__ is a very simple exmaple for getting started.
* `PPML e2e Example <../QuickStart/end-to-end.html>`__ introduces the end-to-end PPML workflow using SimpleQuery as an example.
* You can also find Trusted Data Analysis, Trusted ML, Trusted DL and Trusted FL examples in `more examples <https://github.com/intel-analytics/BigDL/tree/main/ppml/docs/examples.md>`__.

View file

@ -0,0 +1,35 @@
# PPML Introduction
## 1. What is BigDL PPML?
<video src="https://user-images.githubusercontent.com/61072813/184758908-da01f8ea-8f52-4300-9736-8c5ee981d4c0.mp4" width="100%" controls></video>
---
Protecting data privacy and confidentiality is critical in a world where data is everywhere. In recent years, more and more countries have enacted data privacy legislation or are expected to pass comprehensive legislation to protect data privacy, the importance of privacy and data protection is increasingly recognized.
To better protect sensitive data, it's necessary to ensure security for all dimensions of data lifecycle: data at rest, data in transit, and data in use. Data being transferred on a network is `in transit`, data in storage is `at rest`, and data being processed is `in use`.
<p align="center">
<img src="https://user-images.githubusercontent.com/61072813/177720405-60297d62-d186-4633-8b5f-ff4876cc96d6.png" alt="data lifecycle" width='390px' height='260px'/>
</p>
To protect data in transit, enterprises often choose to encrypt sensitive data prior to moving or use encrypted connections (HTTPS, SSL, TLS, FTPS, etc) to protect the contents of data in transit. For protecting data at rest, enterprises can simply encrypt sensitive files prior to storing them or choose to encrypt the storage drive itself. However, the third state, data in use has always been a weakly protected target. There are three emerging solutions seek to reduce the data-in-use attack surface: homomorphic encryption, multi-party computation, and confidential computing.
Among these security technologies, [Confidential computing](https://www.intel.com/content/www/us/en/security/confidential-computing.html) protects data in use by performing computation in a hardware-based [Trusted Execution Environment (TEE)](https://en.wikipedia.org/wiki/Trusted_execution_environment). [Intel® SGX](https://www.intel.com/content/www/us/en/developer/tools/software-guard-extensions/overview.html) is Intel's Trusted Execution Environment (TEE), offering hardware-based memory encryption that isolates specific application code and data in memory. [Intel® TDX](https://www.intel.com/content/www/us/en/developer/articles/technical/intel-trust-domain-extensions.html) is the next generation Intel's Trusted Execution Environment (TEE), introducing new, architectural elements to help deploy hardware-isolated, virtual machines (VMs) called trust domains (TDs).
[PPML](https://bigdl.readthedocs.io/en/latest/doc/PPML/Overview/ppml.html) (Privacy Preserving Machine Learning) in [BigDL 2.0](https://github.com/intel-analytics/BigDL) provides a Trusted Cluster Environment for secure Big Data & AI applications, even on untrusted cloud environment. By combining Intel Software Guard Extensions (SGX) with several other security technologies (e.g., attestation, key management service, private set intersection, federated learning, homomorphic encryption, etc.), BigDL PPML ensures end-to-end security enabled for the entire distributed workflows, such as Apache Spark, Apache Flink, XGBoost, TensorFlow, PyTorch, etc.
## 2. Why BigDL PPML?
PPML allows organizations to explore powerful AI techniques while working to minimize the security risks associated with handling large amounts of sensitive data. PPML protects data at rest, in transit and in use: compute and memory protected by SGX Enclaves, storage (e.g., data and model) protected by encryption, network communication protected by remote attestation and Transport Layer Security (TLS), and optional Federated Learning support.
<p align="left">
<img src="https://user-images.githubusercontent.com/61072813/177922914-f670111c-e174-40d2-b95a-aafe92485024.png" alt="data lifecycle" width='600px' />
</p>
With BigDL PPML, you can run trusted Big Data & AI applications
- **Trusted Spark SQL & Dataframe**: with the trusted Big Data analytics and ML/DL support, users can run standard Spark data analysis (such as Spark SQL, Dataframe, MLlib, etc.) in a secure and trusted fashion.
- **Trusted ML (Machine Learning)**: with the trusted Big Data analytics and ML/DL support, users can run distributed machine learning (such as MLlib, XGBoost) in a secure and trusted fashion.
- **Trusted DL (Deep Learning)**: with the trusted Big Data analytics and ML/DL support, users can run distributed deep learning (such as BigDL, Orca, Nano, DLlib) in a secure and trusted fashion.
- **Trusted FL (Federated Learning)**: with PSI (Private Set Intersection), Secured Aggregation and trusted federated learning support, users can build united model across different parties without compromising privacy, even if these parities have different datasets or features.

View file

@ -0,0 +1,14 @@
Advanced Topic
====================
* `Privacy Preserving Machine Learning (PPML) User Guide <ppml.html>`_
* `Trusted Big Data Analytics and ML <trusted_big_data_analytics_and_ml.html>`_
* `Trusted FL (Federated Learning) <trusted_fl.html>`_
* `Secure Your Services <../QuickStart/secure_your_services.html>`_
* `Building Linux Kernel from Source with SGX Enabled <../QuickStart/build_kernel_with_sgx.html>`_
* `Deploy the Intel SGX Device Plugin for Kubernetes <../QuickStart/deploy_intel_sgx_device_plugin_for_kubernetes.html>`_
* `Trusted Cluster Serving with Graphene on Kubernetes <../QuickStart/trusted-serving-on-k8s-guide.html>`_
* `TPC-H with Trusted SparkSQL on Kubernetes <../QuickStart/tpc-h_with_sparksql_on_k8s.html>`_
* `TPC-DS with Trusted SparkSQL on Kubernetes <../QuickStart/tpc-ds_with_sparksql_on_k8s.html>`_
* `Privacy Preserving Machine Learning (PPML) on Azure User Guide <azure_ppml.html>`_

View file

@ -230,29 +230,30 @@ Follow the guide below to run Spark on Kubernetes manually. Alternatively, you c
1. Enter `BigDL/ppml/trusted-big-data-ml/python/docker-graphene` dir. Refer to the previous section about [preparing data, keys and passwords](#2221-start-ppml-container). Then run the following commands to generate your enclave key and add it to your Kubernetes cluster as a secret. 1. Enter `BigDL/ppml/trusted-big-data-ml/python/docker-graphene` dir. Refer to the previous section about [preparing data, keys and passwords](#2221-start-ppml-container). Then run the following commands to generate your enclave key and add it to your Kubernetes cluster as a secret.
```bash ```bash
kubectl apply -f keys/keys.yaml kubectl apply -f keys/keys.yaml
kubectl apply -f password/password.yaml kubectl apply -f password/password.yaml
cd kubernetes cd kubernetes
bash enclave-key-to-secret.sh bash enclave-key-to-secret.sh
``` ```
2. Create the [RBAC(Role-based access control)](https://spark.apache.org/docs/latest/running-on-kubernetes.html#rbac) : 2. Create the [RBAC(Role-based access control)](https://spark.apache.org/docs/latest/running-on-kubernetes.html#rbac) :
```bash ```bash
kubectl create serviceaccount spark kubectl create serviceaccount spark
kubectl create clusterrolebinding spark-role --clusterrole=edit --serviceaccount=default:spark --namespace=default kubectl create clusterrolebinding spark-role --clusterrole=edit --serviceaccount=default:spark --namespace=default
``` ```
3. Generate K8s config file, modify `YOUR_DIR` to the location you want to store the config: 3. Generate K8s config file, modify `YOUR_DIR` to the location you want to store the config:
```bash ```bash
kubectl config view --flatten --minify > /YOUR_DIR/kubeconfig kubectl config view --flatten --minify > /YOUR_DIR/kubeconfig
``` ```
4. Create K8s secret, the secret created `YOUR_SECRET` should be the same as the password you specified in step 1: 4. Create K8s secret, the secret created `YOUR_SECRET` should be the same as the password you specified in step 1:
```bash ```bash
kubectl create secret generic spark-secret --from-literal secret=YOUR_SECRET kubectl create secret generic spark-secret --from-literal secret=YOUR_SECRET
``` ```
##### 2.2.3.2 Start the client container ##### 2.2.3.2 Start the client container
@ -309,75 +310,75 @@ sudo docker run -itd \
1. Run `docker exec -it spark-local-k8s-client bash` to enter the container. Then run the following command to init the Spark local K8s client. 1. Run `docker exec -it spark-local-k8s-client bash` to enter the container. Then run the following command to init the Spark local K8s client.
```bash ```bash
./init.sh ./init.sh
``` ```
2. We assume you have a working Network File System (NFS) configured for your Kubernetes cluster. Configure the `nfsvolumeclaim` on the last line to the name of the Persistent Volume Claim (PVC) of your NFS. Please prepare the following and put them in your NFS directory: 2. We assume you have a working Network File System (NFS) configured for your Kubernetes cluster. Configure the `nfsvolumeclaim` on the last line to the name of the Persistent Volume Claim (PVC) of your NFS. Please prepare the following and put them in your NFS directory:
- The data (in a directory called `data`) - The data (in a directory called `data`)
- The kubeconfig file. - The kubeconfig file.
3. Run the following command to start Spark-Pi example. When the application runs in `cluster` mode, you can run ` kubectl get pod ` to get the name and status of your K8s pod(e.g., driver-xxxx). Then you can run ` kubectl logs -f driver-xxxx ` to get the output of your application. 3. Run the following command to start Spark-Pi example. When the application runs in `cluster` mode, you can run ` kubectl get pod ` to get the name and status of your K8s pod(e.g., driver-xxxx). Then you can run ` kubectl logs -f driver-xxxx ` to get the output of your application.
```bash ```bash
#!/bin/bash #!/bin/bash
secure_password=`openssl rsautl -inkey /ppml/trusted-big-data-ml/work/password/key.txt -decrypt </ppml/trusted-big-data-ml/work/password/output.bin` && \ secure_password=`openssl rsautl -inkey /ppml/trusted-big-data-ml/work/password/key.txt -decrypt </ppml/trusted-big-data-ml/work/password/output.bin` && \
export TF_MKL_ALLOC_MAX_BYTES=10737418240 && \ export TF_MKL_ALLOC_MAX_BYTES=10737418240 && \
export SPARK_LOCAL_IP=$LOCAL_IP && \ export SPARK_LOCAL_IP=$LOCAL_IP && \
/opt/jdk8/bin/java \ /opt/jdk8/bin/java \
-cp '/ppml/trusted-big-data-ml/work/spark-3.1.2/conf/:/ppml/trusted-big-data-ml/work/spark-3.1.2/jars/*' \ -cp '/ppml/trusted-big-data-ml/work/spark-3.1.2/conf/:/ppml/trusted-big-data-ml/work/spark-3.1.2/jars/*' \
-Xmx8g \ -Xmx8g \
org.apache.spark.deploy.SparkSubmit \ org.apache.spark.deploy.SparkSubmit \
--master $RUNTIME_SPARK_MASTER \ --master $RUNTIME_SPARK_MASTER \
--deploy-mode $SPARK_MODE \ --deploy-mode $SPARK_MODE \
--name spark-pi-sgx \ --name spark-pi-sgx \
--conf spark.driver.host=$SPARK_LOCAL_IP \ --conf spark.driver.host=$SPARK_LOCAL_IP \
--conf spark.driver.port=$RUNTIME_DRIVER_PORT \ --conf spark.driver.port=$RUNTIME_DRIVER_PORT \
--conf spark.driver.memory=$RUNTIME_DRIVER_MEMORY \ --conf spark.driver.memory=$RUNTIME_DRIVER_MEMORY \
--conf spark.driver.cores=$RUNTIME_DRIVER_CORES \ --conf spark.driver.cores=$RUNTIME_DRIVER_CORES \
--conf spark.executor.cores=$RUNTIME_EXECUTOR_CORES \ --conf spark.executor.cores=$RUNTIME_EXECUTOR_CORES \
--conf spark.executor.memory=$RUNTIME_EXECUTOR_MEMORY \ --conf spark.executor.memory=$RUNTIME_EXECUTOR_MEMORY \
--conf spark.executor.instances=$RUNTIME_EXECUTOR_INSTANCES \ --conf spark.executor.instances=$RUNTIME_EXECUTOR_INSTANCES \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \ --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
--conf spark.kubernetes.container.image=$RUNTIME_K8S_SPARK_IMAGE \ --conf spark.kubernetes.container.image=$RUNTIME_K8S_SPARK_IMAGE \
--conf spark.kubernetes.driver.podTemplateFile=/ppml/trusted-big-data-ml/spark-driver-template.yaml \ --conf spark.kubernetes.driver.podTemplateFile=/ppml/trusted-big-data-ml/spark-driver-template.yaml \
--conf spark.kubernetes.executor.podTemplateFile=/ppml/trusted-big-data-ml/spark-executor-template.yaml \ --conf spark.kubernetes.executor.podTemplateFile=/ppml/trusted-big-data-ml/spark-executor-template.yaml \
--conf spark.kubernetes.executor.deleteOnTermination=false \ --conf spark.kubernetes.executor.deleteOnTermination=false \
--conf spark.network.timeout=10000000 \ --conf spark.network.timeout=10000000 \
--conf spark.executor.heartbeatInterval=10000000 \ --conf spark.executor.heartbeatInterval=10000000 \
--conf spark.python.use.daemon=false \ --conf spark.python.use.daemon=false \
--conf spark.python.worker.reuse=false \ --conf spark.python.worker.reuse=false \
--conf spark.kubernetes.sgx.enabled=$SGX_ENABLED \ --conf spark.kubernetes.sgx.enabled=$SGX_ENABLED \
--conf spark.kubernetes.sgx.driver.mem=$SGX_DRIVER_MEM \ --conf spark.kubernetes.sgx.driver.mem=$SGX_DRIVER_MEM \
--conf spark.kubernetes.sgx.driver.jvm.mem=$SGX_DRIVER_JVM_MEM \ --conf spark.kubernetes.sgx.driver.jvm.mem=$SGX_DRIVER_JVM_MEM \
--conf spark.kubernetes.sgx.executor.mem=$SGX_EXECUTOR_MEM \ --conf spark.kubernetes.sgx.executor.mem=$SGX_EXECUTOR_MEM \
--conf spark.kubernetes.sgx.executor.jvm.mem=$SGX_EXECUTOR_JVM_MEM \ --conf spark.kubernetes.sgx.executor.jvm.mem=$SGX_EXECUTOR_JVM_MEM \
--conf spark.kubernetes.sgx.log.level=$SGX_LOG_LEVEL \ --conf spark.kubernetes.sgx.log.level=$SGX_LOG_LEVEL \
--conf spark.authenticate=true \ --conf spark.authenticate=true \
--conf spark.authenticate.secret=$secure_password \ --conf spark.authenticate.secret=$secure_password \
--conf spark.kubernetes.executor.secretKeyRef.SPARK_AUTHENTICATE_SECRET="spark-secret:secret" \ --conf spark.kubernetes.executor.secretKeyRef.SPARK_AUTHENTICATE_SECRET="spark-secret:secret" \
--conf spark.kubernetes.driver.secretKeyRef.SPARK_AUTHENTICATE_SECRET="spark-secret:secret" \ --conf spark.kubernetes.driver.secretKeyRef.SPARK_AUTHENTICATE_SECRET="spark-secret:secret" \
--conf spark.authenticate.enableSaslEncryption=true \ --conf spark.authenticate.enableSaslEncryption=true \
--conf spark.network.crypto.enabled=true \ --conf spark.network.crypto.enabled=true \
--conf spark.network.crypto.keyLength=128 \ --conf spark.network.crypto.keyLength=128 \
--conf spark.network.crypto.keyFactoryAlgorithm=PBKDF2WithHmacSHA1 \ --conf spark.network.crypto.keyFactoryAlgorithm=PBKDF2WithHmacSHA1 \
--conf spark.io.encryption.enabled=true \ --conf spark.io.encryption.enabled=true \
--conf spark.io.encryption.keySizeBits=128 \ --conf spark.io.encryption.keySizeBits=128 \
--conf spark.io.encryption.keygen.algorithm=HmacSHA1 \ --conf spark.io.encryption.keygen.algorithm=HmacSHA1 \
--conf spark.ssl.enabled=true \ --conf spark.ssl.enabled=true \
--conf spark.ssl.port=8043 \ --conf spark.ssl.port=8043 \
--conf spark.ssl.keyPassword=$secure_password \ --conf spark.ssl.keyPassword=$secure_password \
--conf spark.ssl.keyStore=/ppml/trusted-big-data-ml/work/keys/keystore.jks \ --conf spark.ssl.keyStore=/ppml/trusted-big-data-ml/work/keys/keystore.jks \
--conf spark.ssl.keyStorePassword=$secure_password \ --conf spark.ssl.keyStorePassword=$secure_password \
--conf spark.ssl.keyStoreType=JKS \ --conf spark.ssl.keyStoreType=JKS \
--conf spark.ssl.trustStore=/ppml/trusted-big-data-ml/work/keys/keystore.jks \ --conf spark.ssl.trustStore=/ppml/trusted-big-data-ml/work/keys/keystore.jks \
--conf spark.ssl.trustStorePassword=$secure_password \ --conf spark.ssl.trustStorePassword=$secure_password \
--conf spark.ssl.trustStoreType=JKS \ --conf spark.ssl.trustStoreType=JKS \
--class org.apache.spark.examples.SparkPi \ --class org.apache.spark.examples.SparkPi \
--verbose \ --verbose \
local:///ppml/trusted-big-data-ml/work/spark-3.1.2/examples/jars/spark-examples_2.12-3.1.2.jar 100 2>&1 | tee spark-pi-sgx-$SPARK_MODE.log local:///ppml/trusted-big-data-ml/work/spark-3.1.2/examples/jars/spark-examples_2.12-3.1.2.jar 100 2>&1 | tee spark-pi-sgx-$SPARK_MODE.log
``` ```
You can run your own Spark application after changing `--class` and jar path. You can run your own Spark application after changing `--class` and jar path.

View file

@ -0,0 +1,92 @@
# A Hello World Example
In this section, you can get started with running a simple native python HelloWorld program and a simple native Spark Pi program locally in a BigDL PPML client container to get an initial understanding of the usage of ppml.
## a. Prepare Keys
* generate ssl_key
Download scripts from [here](https://github.com/intel-analytics/BigDL).
```
cd BigDL/ppml/
sudo bash scripts/generate-keys.sh
```
This script will generate keys under keys/ folder
* generate enclave-key.pem
```
openssl genrsa -3 -out enclave-key.pem 3072
```
This script generates a file enclave-key.pem which is used to sign image.
## b. Start the BigDL PPML client container
```
#!/bin/bash
# ENCLAVE_KEY_PATH means the absolute path to the "enclave-key.pem" in step a
# KEYS_PATH means the absolute path to the keys folder in step a
# LOCAL_IP means your local IP address.
export ENCLAVE_KEY_PATH=YOUR_LOCAL_ENCLAVE_KEY_PATH
export KEYS_PATH=YOUR_LOCAL_KEYS_PATH
export LOCAL_IP=YOUR_LOCAL_IP
export DOCKER_IMAGE=intelanalytics/bigdl-ppml-trusted-big-data-ml-python-graphene:devel
sudo docker pull $DOCKER_IMAGE
sudo docker run -itd \
--privileged \
--net=host \
--cpuset-cpus="0-5" \
--oom-kill-disable \
--device=/dev/gsgx \
--device=/dev/sgx/enclave \
--device=/dev/sgx/provision \
-v $ENCLAVE_KEY_PATH:/graphene/Pal/src/host/Linux-SGX/signer/enclave-key.pem \
-v /var/run/aesmd/aesm.socket:/var/run/aesmd/aesm.socket \
-v $KEYS_PATH:/ppml/trusted-big-data-ml/work/keys \
--name=bigdl-ppml-client-local \
-e LOCAL_IP=$LOCAL_IP \
-e SGX_MEM_SIZE=64G \
$DOCKER_IMAGE bash
```
## c. Run Python HelloWorld in BigDL PPML Client Container
Run the [script](https://github.com/intel-analytics/BigDL/blob/main/ppml/trusted-big-data-ml/python/docker-graphene/start-scripts/start-python-helloworld-sgx.sh) to run trusted [Python HelloWorld](https://github.com/intel-analytics/BigDL/blob/main/ppml/trusted-big-data-ml/python/docker-graphene/examples/helloworld.py) in BigDL PPML client container:
```
sudo docker exec -it bigdl-ppml-client-local bash work/start-scripts/start-python-helloworld-sgx.sh
```
Check the log:
```
sudo docker exec -it bigdl-ppml-client-local cat /ppml/trusted-big-data-ml/test-helloworld-sgx.log | egrep "Hello World"
```
The result should look something like this:
> Hello World
## d. Run Spark Pi in BigDL PPML Client Container
Run the [script](https://github.com/intel-analytics/BigDL/blob/main/ppml/trusted-big-data-ml/python/docker-graphene/start-scripts/start-spark-local-pi-sgx.sh) to run trusted [Spark Pi](https://github.com/apache/spark/blob/v3.1.2/examples/src/main/python/pi.py) in BigDL PPML client container:
```bash
sudo docker exec -it bigdl-ppml-client-local bash work/start-scripts/start-spark-local-pi-sgx.sh
```
Check the log:
```bash
sudo docker exec -it bigdl-ppml-client-local cat /ppml/trusted-big-data-ml/test-pi-sgx.log | egrep "roughly"
```
The result should look something like this:
> Pi is roughly 3.146760
<br />

View file

@ -0,0 +1,504 @@
## Develop your own Big Data & AI applications with BigDL PPML
First you need to create a `PPMLContext`, which wraps `SparkSession` and provides methods to read encrypted data file into plain-text RDD/DataFrame and write DataFrame to encrypted data file. Then you can read & write data through `PPMLContext`.
If you are familiar with Spark, you may find that the usage of `PPMLConext` is very similar to Spark.
### 1. Create PPMLContext
- create a PPMLContext with `appName`
This is the simplest way to create a `PPMLContext`. When you don't need to read/write encrypted files, you can use this way to create a `PPMLContext`.
<details open>
<summary>scala</summary>
```scala
import com.intel.analytics.bigdl.ppml.PPMLContext
val sc = PPMLContext.initPPMLContext("MyApp")
```
</details>
<details>
<summary>python</summary>
```python
from bigdl.ppml.ppml_context import *
sc = PPMLContext("MyApp")
```
</details>
If you want to read/write encrypted files, then you need to provide more information.
- create a PPMLContext with `appName` & `ppmlArgs`
`ppmlArgs` is ppml arguments in a Map, `ppmlArgs` varies according to the kind of Key Management Service (KMS) you are using. Key Management Service (KMS) is used to generate `primaryKey` and `dataKey` to encrypt/decrypt data. We provide 3 types of KMS ——SimpleKeyManagementService, EHSMKeyManagementService, AzureKeyManagementService.
Refer to [KMS Utils](https://github.com/intel-analytics/BigDL/blob/main/ppml/services/kms-utils/docker/README.md) to use KMS to generate `primaryKey` and `dataKey`, then you are ready to create **PPMLContext** with `ppmlArgs`.
- For `SimpleKeyManagementService`:
<details open>
<summary>scala</summary>
```scala
import com.intel.analytics.bigdl.ppml.PPMLContext
val ppmlArgs: Map[String, String] = Map(
"spark.bigdl.kms.type" -> "SimpleKeyManagementService",
"spark.bigdl.kms.simple.id" -> "your_app_id",
"spark.bigdl.kms.simple.key" -> "your_app_key",
"spark.bigdl.kms.key.primary" -> "/your/primary/key/path/primaryKey",
"spark.bigdl.kms.key.data" -> "/your/data/key/path/dataKey"
)
val sc = PPMLContext.initPPMLContext("MyApp", ppmlArgs)
```
</details>
<details>
<summary>python</summary>
```python
from bigdl.ppml.ppml_context import *
ppml_args = {"kms_type": "SimpleKeyManagementService",
"simple_app_id": "your_app_id",
"simple_app_key": "your_app_key",
"primary_key_path": "/your/primary/key/path/primaryKey",
"data_key_path": "/your/data/key/path/dataKey"
}
sc = PPMLContext("MyApp", ppml_args)
```
</details>
- For `EHSMKeyManagementService`:
<details open>
<summary>scala</summary>
```scala
import com.intel.analytics.bigdl.ppml.PPMLContext
val ppmlArgs: Map[String, String] = Map(
"spark.bigdl.kms.type" -> "EHSMKeyManagementService",
"spark.bigdl.kms.ehs.ip" -> "your_server_ip",
"spark.bigdl.kms.ehs.port" -> "your_server_port",
"spark.bigdl.kms.ehs.id" -> "your_app_id",
"spark.bigdl.kms.ehs.key" -> "your_app_key",
"spark.bigdl.kms.key.primary" -> "/your/primary/key/path/primaryKey",
"spark.bigdl.kms.key.data" -> "/your/data/key/path/dataKey"
)
val sc = PPMLContext.initPPMLContext("MyApp", ppmlArgs)
```
</details>
<details>
<summary>python</summary>
```python
from bigdl.ppml.ppml_context import *
ppml_args = {"kms_type": "EHSMKeyManagementService",
"kms_server_ip": "your_server_ip",
"kms_server_port": "your_server_port"
"ehsm_app_id": "your_app_id",
"ehsm_app_key": "your_app_key",
"primary_key_path": "/your/primary/key/path/primaryKey",
"data_key_path": "/your/data/key/path/dataKey"
}
sc = PPMLContext("MyApp", ppml_args)
```
</details>
- For `AzureKeyManagementService`
the parameter `clientId` is not necessary, you don't have to provide this parameter.
<details open>
<summary>scala</summary>
```scala
import com.intel.analytics.bigdl.ppml.PPMLContext
val ppmlArgs: Map[String, String] = Map(
"spark.bigdl.kms.type" -> "AzureKeyManagementService",
"spark.bigdl.kms.azure.vault" -> "key_vault_name",
"spark.bigdl.kms.azure.clientId" -> "client_id",
"spark.bigdl.kms.key.primary" -> "/your/primary/key/path/primaryKey",
"spark.bigdl.kms.key.data" -> "/your/data/key/path/dataKey"
)
val sc = PPMLContext.initPPMLContext("MyApp", ppmlArgs)
```
</details>
<details>
<summary>python</summary>
```python
from bigdl.ppml.ppml_context import *
ppml_args = {"kms_type": "AzureKeyManagementService",
"azure_vault": "your_azure_vault",
"azure_client_id": "your_azure_client_id",
"primary_key_path": "/your/primary/key/path/primaryKey",
"data_key_path": "/your/data/key/path/dataKey"
}
sc = PPMLContext("MyApp", ppml_args)
```
</details>
- create a PPMLContext with `sparkConf` & `appName` & `ppmlArgs`
If you need to set Spark configurations, you can provide a `SparkConf` with Spark configurations to create a `PPMLContext`.
<details open>
<summary>scala</summary>
```scala
import com.intel.analytics.bigdl.ppml.PPMLContext
import org.apache.spark.SparkConf
val ppmlArgs: Map[String, String] = Map(
"spark.bigdl.kms.type" -> "SimpleKeyManagementService",
"spark.bigdl.kms.simple.id" -> "your_app_id",
"spark.bigdl.kms.simple.key" -> "your_app_key",
"spark.bigdl.kms.key.primary" -> "/your/primary/key/path/primaryKey",
"spark.bigdl.kms.key.data" -> "/your/data/key/path/dataKey"
)
val conf: SparkConf = new SparkConf().setMaster("local[4]")
val sc = PPMLContext.initPPMLContext(conf, "MyApp", ppmlArgs)
```
</details>
<details>
<summary>python</summary>
```python
from bigdl.ppml.ppml_context import *
from pyspark import SparkConf
ppml_args = {"kms_type": "SimpleKeyManagementService",
"simple_app_id": "your_app_id",
"simple_app_key": "your_app_key",
"primary_key_path": "/your/primary/key/path/primaryKey",
"data_key_path": "/your/data/key/path/dataKey"
}
conf = SparkConf()
conf.setMaster("local[4]")
sc = PPMLContext("MyApp", ppml_args, conf)
```
</details>
### 2. Read and Write Files
To read/write data, you should set the `CryptoMode`:
- `plain_text`: no encryption
- `AES/CBC/PKCS5Padding`: for CSV, JSON and text file
- `AES_GCM_V1`: for PARQUET only
- `AES_GCM_CTR_V1`: for PARQUET only
To write data, you should set the `write` mode:
- `overwrite`: Overwrite existing data with the content of dataframe.
- `append`: Append content of the dataframe to existing data or table.
- `ignore`: Ignore current write operation if data / table already exists without any error.
- `error`: Throw an exception if data or table already exists.
- `errorifexists`: Throw an exception if data or table already exists.
<details open>
<summary>scala</summary>
```scala
import com.intel.analytics.bigdl.ppml.crypto.{AES_CBC_PKCS5PADDING, PLAIN_TEXT}
// read data
val df = sc.read(cryptoMode = PLAIN_TEXT)
...
// write data
sc.write(dataFrame = df, cryptoMode = AES_CBC_PKCS5PADDING)
.mode("overwrite")
...
```
</details>
<details>
<summary>python</summary>
```python
from bigdl.ppml.ppml_context import *
# read data
df = sc.read(crypto_mode = CryptoMode.PLAIN_TEXT)
...
# write data
sc.write(dataframe = df, crypto_mode = CryptoMode.AES_CBC_PKCS5PADDING)
.mode("overwrite")
...
```
</details>
<details><summary>expand to see the examples of reading/writing CSV, PARQUET, JSON and text file</summary>
The following examples use `sc` to represent a initialized `PPMLContext`
**read/write CSV file**
<details open>
<summary>scala</summary>
```scala
import com.intel.analytics.bigdl.ppml.PPMLContext
import com.intel.analytics.bigdl.ppml.crypto.{AES_CBC_PKCS5PADDING, PLAIN_TEXT}
// read a plain csv file and return a DataFrame
val plainCsvPath = "/plain/csv/path"
val df1 = sc.read(cryptoMode = PLAIN_TEXT).option("header", "true").csv(plainCsvPath)
// write a DataFrame as a plain csv file
val plainOutputPath = "/plain/output/path"
sc.write(df1, PLAIN_TEXT)
.mode("overwrite")
.option("header", "true")
.csv(plainOutputPath)
// read a encrypted csv file and return a DataFrame
val encryptedCsvPath = "/encrypted/csv/path"
val df2 = sc.read(cryptoMode = AES_CBC_PKCS5PADDING).option("header", "true").csv(encryptedCsvPath)
// write a DataFrame as a encrypted csv file
val encryptedOutputPath = "/encrypted/output/path"
sc.write(df2, AES_CBC_PKCS5PADDING)
.mode("overwrite")
.option("header", "true")
.csv(encryptedOutputPath)
```
</details>
<details>
<summary>python</summary>
```python
# import
from bigdl.ppml.ppml_context import *
# read a plain csv file and return a DataFrame
plain_csv_path = "/plain/csv/path"
df1 = sc.read(CryptoMode.PLAIN_TEXT).option("header", "true").csv(plain_csv_path)
# write a DataFrame as a plain csv file
plain_output_path = "/plain/output/path"
sc.write(df1, CryptoMode.PLAIN_TEXT)
.mode('overwrite')
.option("header", True)
.csv(plain_output_path)
# read a encrypted csv file and return a DataFrame
encrypted_csv_path = "/encrypted/csv/path"
df2 = sc.read(CryptoMode.AES_CBC_PKCS5PADDING).option("header", "true").csv(encrypted_csv_path)
# write a DataFrame as a encrypted csv file
encrypted_output_path = "/encrypted/output/path"
sc.write(df2, CryptoMode.AES_CBC_PKCS5PADDING)
.mode('overwrite')
.option("header", True)
.csv(encrypted_output_path)
```
</details>
**read/write PARQUET file**
<details open>
<summary>scala</summary>
```scala
import com.intel.analytics.bigdl.ppml.PPMLContext
import com.intel.analytics.bigdl.ppml.crypto.{AES_GCM_CTR_V1, PLAIN_TEXT}
// read a plain parquet file and return a DataFrame
val plainParquetPath = "/plain/parquet/path"
val df1 = sc.read(PLAIN_TEXT).parquet(plainParquetPath)
// write a DataFrame as a plain parquet file
plainOutputPath = "/plain/output/path"
sc.write(df1, PLAIN_TEXT)
.mode("overwrite")
.parquet(plainOutputPath)
// read a encrypted parquet file and return a DataFrame
val encryptedParquetPath = "/encrypted/parquet/path"
val df2 = sc.read(AES_GCM_CTR_V1).parquet(encryptedParquetPath)
// write a DataFrame as a encrypted parquet file
val encryptedOutputPath = "/encrypted/output/path"
sc.write(df2, AES_GCM_CTR_V1)
.mode("overwrite")
.parquet(encryptedOutputPath)
```
</details>
<details>
<summary>python</summary>
```python
# import
from bigdl.ppml.ppml_context import *
# read a plain parquet file and return a DataFrame
plain_parquet_path = "/plain/parquet/path"
df1 = sc.read(CryptoMode.PLAIN_TEXT).parquet(plain_parquet_path)
# write a DataFrame as a plain parquet file
plain_output_path = "/plain/output/path"
sc.write(df1, CryptoMode.PLAIN_TEXT)
.mode('overwrite')
.parquet(plain_output_path)
# read a encrypted parquet file and return a DataFrame
encrypted_parquet_path = "/encrypted/parquet/path"
df2 = sc.read(CryptoMode.AES_GCM_CTR_V1).parquet(encrypted_parquet_path)
# write a DataFrame as a encrypted parquet file
encrypted_output_path = "/encrypted/output/path"
sc.write(df2, CryptoMode.AES_GCM_CTR_V1)
.mode('overwrite')
.parquet(encrypted_output_path)
```
</details>
**read/write JSON file**
<details open>
<summary>scala</summary>
```scala
import com.intel.analytics.bigdl.ppml.PPMLContext
import com.intel.analytics.bigdl.ppml.crypto.{AES_CBC_PKCS5PADDING, PLAIN_TEXT}
// read a plain json file and return a DataFrame
val plainJsonPath = "/plain/json/path"
val df1 = sc.read(PLAIN_TEXT).json(plainJsonPath)
// write a DataFrame as a plain json file
val plainOutputPath = "/plain/output/path"
sc.write(df1, PLAIN_TEXT)
.mode("overwrite")
.json(plainOutputPath)
// read a encrypted json file and return a DataFrame
val encryptedJsonPath = "/encrypted/parquet/path"
val df2 = sc.read(AES_CBC_PKCS5PADDING).json(encryptedJsonPath)
// write a DataFrame as a encrypted parquet file
val encryptedOutputPath = "/encrypted/output/path"
sc.write(df2, AES_CBC_PKCS5PADDING)
.mode("overwrite")
.json(encryptedOutputPath)
```
</details>
<details>
<summary>python</summary>
```python
# import
from bigdl.ppml.ppml_context import *
# read a plain json file and return a DataFrame
plain_json_path = "/plain/json/path"
df1 = sc.read(CryptoMode.PLAIN_TEXT).json(plain_json_path)
# write a DataFrame as a plain json file
plain_output_path = "/plain/output/path"
sc.write(df1, CryptoMode.PLAIN_TEXT)
.mode('overwrite')
.json(plain_output_path)
# read a encrypted json file and return a DataFrame
encrypted_json_path = "/encrypted/parquet/path"
df2 = sc.read(CryptoMode.AES_CBC_PKCS5PADDING).json(encrypted_json_path)
# write a DataFrame as a encrypted parquet file
encrypted_output_path = "/encrypted/output/path"
sc.write(df2, CryptoMode.AES_CBC_PKCS5PADDING)
.mode('overwrite')
.json(encrypted_output_path)
```
</details>
**read textfile**
<details open>
<summary>scala</summary>
```scala
import com.intel.analytics.bigdl.ppml.PPMLContext
import com.intel.analytics.bigdl.ppml.crypto.{AES_CBC_PKCS5PADDING, PLAIN_TEXT}
// read from a plain csv file and return a RDD
val plainCsvPath = "/plain/csv/path"
val rdd1 = sc.textfile(plainCsvPath) // the default cryptoMode is PLAIN_TEXT
// read from a encrypted csv file and return a RDD
val encryptedCsvPath = "/encrypted/csv/path"
val rdd2 = sc.textfile(path=encryptedCsvPath, cryptoMode=AES_CBC_PKCS5PADDING)
```
</details>
<details>
<summary>python</summary>
```python
# import
from bigdl.ppml.ppml_context import *
# read from a plain csv file and return a RDD
plain_csv_path = "/plain/csv/path"
rdd1 = sc.textfile(plain_csv_path) # the default crypto_mode is "plain_text"
# read from a encrypted csv file and return a RDD
encrypted_csv_path = "/encrypted/csv/path"
rdd2 = sc.textfile(path=encrypted_csv_path, crypto_mode=CryptoMode.AES_CBC_PKCS5PADDING)
```
</details>
</details>
More usage with `PPMLContext` Python API, please refer to [PPMLContext Python API](https://github.com/intel-analytics/BigDL/blob/main/python/ppml/src/bigdl/ppml/README.md).

View file

@ -0,0 +1,175 @@
# PPML End-to-End Workflow Example
## E2E Architecture Overview
In this section we take SimpleQuery as an example to go through the entire BigDL PPML end-to-end workflow. SimpleQuery is simple example to query developers between the ages of 20 and 40 from people.csv.
<p align="center">
<img src="https://user-images.githubusercontent.com/61072813/178393982-929548b9-1c4e-4809-a628-10fafad69628.png" alt="data lifecycle" />
</p>
<video src="https://user-images.githubusercontent.com/61072813/184758702-4b9809f9-50ac-425e-8def-0ea1c5bf1805.mp4" width="100%" controls></video>
---
## Step 0. Preparation your environment
To secure your Big Data & AI applications in BigDL PPML manner, you should prepare your environment first, including K8s cluster setup, K8s-SGX plugin setup, key/password preparation, key management service (KMS) and attestation service (AS) setup, BigDL PPML client container preparation. **Please follow the detailed steps in** [Prepare Environment](./docs/prepare_environment.md).
## Step 1. Encrypt and Upload Data
Encrypt the input data of your Big Data & AI applications (here we use SimpleQuery) and then upload encrypted data to the nfs server. More details in [Encrypt Your Data](./services/kms-utils/docker/README.md#3-enroll-generate-key-encrypt-and-decrypt).
1. Generate the input data `people.csv` for SimpleQuery application
you can use [generate_people_csv.py](https://github.com/analytics-zoo/ppml-e2e-examples/blob/main/spark-encrypt-io/generate_people_csv.py). The usage command of the script is `python generate_people.py </save/path/of/people.csv> <num_lines>`.
2. Encrypt `people.csv`
```
docker exec -i $KMSUTIL_CONTAINER_NAME bash -c "bash /home/entrypoint.sh encrypt $appid $apikey $input_file_path"
```
## Step 2. Build Big Data & AI applications
To build your own Big Data & AI applications, refer to [develop your own Big Data & AI applications with BigDL PPML](#4-develop-your-own-big-data--ai-applications-with-bigdl-ppml). The code of SimpleQuery is in [here](https://github.com/intel-analytics/BigDL/blob/main/scala/ppml/src/main/scala/com/intel/analytics/bigdl/ppml/examples/SimpleQuerySparkExample.scala), it is already built into bigdl-ppml-spark_3.1.2-2.1.0-SNAPSHOT.jar, and the jar is put into PPML image.
## Step 3. Attestation
To enable attestation, you should have a running Attestation Service (EHSM-KMS here for example) in your environment. (You can start a KMS refering to [this link](https://github.com/intel-analytics/BigDL/tree/main/ppml/services/kms-utils/docker)). Configure your KMS app_id and app_key with `kubectl`, and then configure KMS settings in `spark-driver-template.yaml` and `spark-executor-template.yaml` in the container.
``` bash
kubectl create secret generic kms-secret --from-literal=app_id=your-kms-app-id --from-literal=app_key=your-kms-app-key
```
Configure `spark-driver-template.yaml` for example. (`spark-executor-template.yaml` is similar)
``` yaml
apiVersion: v1
kind: Pod
spec:
containers:
- name: spark-driver
securityContext:
privileged: true
env:
- name: ATTESTATION
value: true
- name: ATTESTATION_URL
value: your_attestation_url
- name: ATTESTATION_ID
valueFrom:
secretKeyRef:
name: kms-secret
key: app_id
- name: ATTESTATION_KEY
valueFrom:
secretKeyRef:
name: kms-secret
key: app_key
...
```
You should get `Attestation Success!` in logs after you [submit a PPML job](#step-4-submit-job) if the quote generated with user report is verified successfully by Attestation Service, or you will get `Attestation Fail! Application killed!` and the job will be stopped.
## Step 4. Submit Job
When the Big Data & AI application and its input data is prepared, you are ready to submit BigDL PPML jobs. You need to choose the deploy mode and the way to submit job first.
* **There are 4 modes to submit job**:
1. **local mode**: run jobs locally without connecting to cluster. It is exactly same as using spark-submit to run your application: `$SPARK_HOME/bin/spark-submit --class "SimpleApp" --master local[4] target.jar`, driver and executors are not protected by SGX.
<p align="left">
<img src="https://user-images.githubusercontent.com/61072813/174703141-63209559-05e1-4c4d-b096-6b862a9bed8a.png" width='250px' />
</p>
2. **local SGX mode**: run jobs locally with SGX guarded. As the picture shows, the client JVM is running in a SGX Enclave so that driver and executors can be protected.
<p align="left">
<img src="https://user-images.githubusercontent.com/61072813/174703165-2afc280d-6a3d-431d-9856-dd5b3659214a.png" width='250px' />
</p>
3. **client SGX mode**: run jobs in k8s client mode with SGX guarded. As we know, in K8s client mode, the driver is deployed locally as an external client to the cluster. With **client SGX mode**, the executors running in K8S cluster are protected by SGX, the driver running in client is also protected by SGX.
<p align="left">
<img src="https://user-images.githubusercontent.com/61072813/174703216-70588315-7479-4b6c-9133-095104efc07d.png" width='500px' />
</p>
4. **cluster SGX mode**: run jobs in k8s cluster mode with SGX guarded. As we know, in K8s cluster mode, the driver is deployed on the k8s worker nodes like executors. With **cluster SGX mode**, the driver and executors running in K8S cluster are protected by SGX.
<p align="left">
<img src="https://user-images.githubusercontent.com/61072813/174703234-e45b8fe5-9c61-4d17-93ef-6b0c961a2f95.png" width='500px' />
</p>
* **There are two options to submit PPML jobs**:
* use [PPML CLI](./docs/submit_job.md#ppml-cli) to submit jobs manually
* use [helm chart](./docs/submit_job.md#helm-chart) to submit jobs automatically
Here we use **k8s client mode** and **PPML CLI** to run SimpleQuery. Check other modes, please see [PPML CLI Usage Examples](./docs/submit_job.md#usage-examples). Alternatively, you can also use Helm to submit jobs automatically, see the details in [Helm Chart Usage](./docs/submit_job.md#helm-chart).
<details><summary>expand to see details of submitting SimpleQuery</summary>
1. enter the ppml container
```
docker exec -it bigdl-ppml-client-k8s bash
```
2. run simplequery on k8s client mode
```
#!/bin/bash
export secure_password=`openssl rsautl -inkey /ppml/trusted-big-data-ml/work/password/key.txt -decrypt </ppml/trusted-big-data-ml/work/password/output.bin`
bash bigdl-ppml-submit.sh \
--master $RUNTIME_SPARK_MASTER \
--deploy-mode client \
--sgx-enabled true \
--sgx-log-level error \
--sgx-driver-memory 64g \
--sgx-driver-jvm-memory 12g \
--sgx-executor-memory 64g \
--sgx-executor-jvm-memory 12g \
--driver-memory 32g \
--driver-cores 8 \
--executor-memory 32g \
--executor-cores 8 \
--num-executors 2 \
--conf spark.kubernetes.container.image=$RUNTIME_K8S_SPARK_IMAGE \
--name simplequery \
--verbose \
--class com.intel.analytics.bigdl.ppml.examples.SimpleQuerySparkExample \
--jars local:///ppml/trusted-big-data-ml/spark-encrypt-io-0.3.0-SNAPSHOT.jar \
local:///ppml/trusted-big-data-ml/work/data/simplequery/spark-encrypt-io-0.3.0-SNAPSHOT.jar \
--inputPath /ppml/trusted-big-data-ml/work/data/simplequery/people_encrypted \
--outputPath /ppml/trusted-big-data-ml/work/data/simplequery/people_encrypted_output \
--inputPartitionNum 8 \
--outputPartitionNum 8 \
--inputEncryptModeValue AES/CBC/PKCS5Padding \
--outputEncryptModeValue AES/CBC/PKCS5Padding \
--primaryKeyPath /ppml/trusted-big-data-ml/work/data/simplequery/keys/primaryKey \
--dataKeyPath /ppml/trusted-big-data-ml/work/data/simplequery/keys/dataKey \
--kmsType EHSMKeyManagementService
--kmsServerIP your_ehsm_kms_server_ip \
--kmsServerPort your_ehsm_kms_server_port \
--ehsmAPPID your_ehsm_kms_appid \
--ehsmAPIKEY your_ehsm_kms_apikey
```
3. check runtime status: exit the container or open a new terminal
To check the logs of the Spark driver, run
```
sudo kubectl logs $( sudo kubectl get pod | grep "simplequery.*-driver" -m 1 | cut -d " " -f1 )
```
To check the logs of an Spark executor, run
```
sudo kubectl logs $( sudo kubectl get pod | grep "simplequery-.*-exec" -m 1 | cut -d " " -f1 )
```
4. If you setup [PPML Monitoring](docs/prepare_environment.md#optional-k8s-monitioring-setup), you can check PPML Dashboard to monitor the status in http://kubernetes_master_url:3000
![image](https://user-images.githubusercontent.com/61072813/179948818-a2f6844f-0009-49d1-aeac-2e8c5a7ef677.png)
</details>
<br />
## Step 5. Decrypt and Read Result
When the job is done, you can decrypt and read result of the job. More details in [Decrypt Job Result](./services/kms-utils/docker/README.md#3-enroll-generate-key-encrypt-and-decrypt).
```
docker exec -i $KMSUTIL_CONTAINER_NAME bash -c "bash /home/entrypoint.sh decrypt $appid $apikey $input_path"
```
## Video Demo
<video src="https://user-images.githubusercontent.com/61072813/184758643-821026c3-40e0-4d4c-bcd3-8a516c55fc01.mp4" width="100%" controls></video>

View file

@ -10,205 +10,205 @@
1. Download and compile tpc-ds 1. Download and compile tpc-ds
```bash ```bash
git clone --recursive https://github.com/intel-analytics/zoo-tutorials.git git clone --recursive https://github.com/intel-analytics/zoo-tutorials.git
cd /path/to/zoo-tutorials cd /path/to/zoo-tutorials
git clone https://github.com/databricks/tpcds-kit.git git clone https://github.com/databricks/tpcds-kit.git
cd tpcds-kit/tools cd tpcds-kit/tools
make OS=LINUX make OS=LINUX
``` ```
2. Generate data 2. Generate data
```bash ```bash
cd /path/to/zoo-tutorials cd /path/to/zoo-tutorials
cd tpcds-spark/spark-sql-perf cd tpcds-spark/spark-sql-perf
sbt "test:runMain com.databricks.spark.sql.perf.tpcds.GenTPCDSData -d <dsdgenDir> -s <scaleFactor> -l <dataDir> -f parquet" sbt "test:runMain com.databricks.spark.sql.perf.tpcds.GenTPCDSData -d <dsdgenDir> -s <scaleFactor> -l <dataDir> -f parquet"
``` ```
`dsdgenDir` is the path of `tpcds-kit/tools`, `scaleFactor` is the size of the data, for example `-s 1` will generate 1G data, `dataDir` is the path to store generated data. `dsdgenDir` is the path of `tpcds-kit/tools`, `scaleFactor` is the size of the data, for example `-s 1` will generate 1G data, `dataDir` is the path to store generated data.
### Deploy PPML TPC-DS on Kubernetes ### Deploy PPML TPC-DS on Kubernetes
1. Compile Kit 1. Compile Kit
```bash ```bash
cd zoo-tutorials/tpcds-spark cd zoo-tutorials/tpcds-spark
sbt package sbt package
``` ```
2. Create external tables 2. Create external tables
```bash ```bash
$SPARK_HOME/bin/spark-submit \ $SPARK_HOME/bin/spark-submit \
--class "createTables" \ --class "createTables" \
--master <spark-master> \ --master <spark-master> \
--driver-memory 20G \ --driver-memory 20G \
--executor-cores <executor-cores> \ --executor-cores <executor-cores> \
--total-executor-cores <total-cores> \ --total-executor-cores <total-cores> \
--executor-memory 20G \ --executor-memory 20G \
--jars spark-sql-perf/target/scala-2.12/spark-sql-perf_2.12-0.5.1-SNAPSHOT.jar \ --jars spark-sql-perf/target/scala-2.12/spark-sql-perf_2.12-0.5.1-SNAPSHOT.jar \
target/scala-2.12/tpcds-benchmark_2.12-0.1.jar <dataDir> <dsdgenDir> <scaleFactor> target/scala-2.12/tpcds-benchmark_2.12-0.1.jar <dataDir> <dsdgenDir> <scaleFactor>
``` ```
3. Pull docker image 3. Pull docker image
```bash ```bash
sudo docker pull intelanalytics/bigdl-ppml-trusted-big-data-ml-python-graphene:2.1.0-SNAPSHOT sudo docker pull intelanalytics/bigdl-ppml-trusted-big-data-ml-python-graphene:2.1.0-SNAPSHOT
``` ```
4. Prepare SGX keys (following instructions [here](https://github.com/intel-analytics/BigDL/tree/main/ppml/trusted-big-data-ml/python/docker-graphene#11-prepare-the-keyspassworddataenclave-keypem "here")), make sure keys and tpcds-spark can be accessed on each K8S node 4. Prepare SGX keys (following instructions [here](https://github.com/intel-analytics/BigDL/tree/main/ppml/trusted-big-data-ml/python/docker-graphene#11-prepare-the-keyspassworddataenclave-keypem "here")), make sure keys and tpcds-spark can be accessed on each K8S node
5. Start a bigdl-ppml enabled Spark K8S client container with configured local IP, key, tpc-ds and kuberconfig path 5. Start a bigdl-ppml enabled Spark K8S client container with configured local IP, key, tpc-ds and kuberconfig path
```bash ```bash
export ENCLAVE_KEY=/YOUR_DIR/keys/enclave-key.pem export ENCLAVE_KEY=/YOUR_DIR/keys/enclave-key.pem
export DATA_PATH=/YOUR_DIR/zoo-tutorials/tpcds-spark export DATA_PATH=/YOUR_DIR/zoo-tutorials/tpcds-spark
export KEYS_PATH=/YOUR_DIR/keys export KEYS_PATH=/YOUR_DIR/keys
export SECURE_PASSWORD_PATH=/YOUR_DIR/password export SECURE_PASSWORD_PATH=/YOUR_DIR/password
export KUBERCONFIG_PATH=/YOUR_DIR/kuberconfig export KUBERCONFIG_PATH=/YOUR_DIR/kuberconfig
export LOCAL_IP=$local_ip export LOCAL_IP=$local_ip
export DOCKER_IMAGE=intelanalytics/bigdl-ppml-trusted-big-data-ml-python-graphene:2.1.0-SNAPSHOT export DOCKER_IMAGE=intelanalytics/bigdl-ppml-trusted-big-data-ml-python-graphene:2.1.0-SNAPSHOT
sudo docker run -itd \ sudo docker run -itd \
--privileged \ --privileged \
--net=host \ --net=host \
--name=spark-local-k8s-client \ --name=spark-local-k8s-client \
--oom-kill-disable \ --oom-kill-disable \
--device=/dev/sgx/enclave \ --device=/dev/sgx/enclave \
--device=/dev/sgx/provision \ --device=/dev/sgx/provision \
-v /var/run/aesmd/aesm.socket:/var/run/aesmd/aesm.socket \ -v /var/run/aesmd/aesm.socket:/var/run/aesmd/aesm.socket \
-v $ENCLAVE_KEY:/graphene/Pal/src/host/Linux-SGX/signer/enclave-key.pem \ -v $ENCLAVE_KEY:/graphene/Pal/src/host/Linux-SGX/signer/enclave-key.pem \
-v $DATA_PATH:/ppml/trusted-big-data-ml/work/tpcds-spark \ -v $DATA_PATH:/ppml/trusted-big-data-ml/work/tpcds-spark \
-v $KEYS_PATH:/ppml/trusted-big-data-ml/work/keys \ -v $KEYS_PATH:/ppml/trusted-big-data-ml/work/keys \
-v $SECURE_PASSWORD_PATH:/ppml/trusted-big-data-ml/work/password \ -v $SECURE_PASSWORD_PATH:/ppml/trusted-big-data-ml/work/password \
-v $KUBERCONFIG_PATH:/root/.kube/config \ -v $KUBERCONFIG_PATH:/root/.kube/config \
-e RUNTIME_SPARK_MASTER=k8s://https://$LOCAL_IP:6443 \ -e RUNTIME_SPARK_MASTER=k8s://https://$LOCAL_IP:6443 \
-e RUNTIME_K8S_SERVICE_ACCOUNT=spark \ -e RUNTIME_K8S_SERVICE_ACCOUNT=spark \
-e RUNTIME_K8S_SPARK_IMAGE=$DOCKER_IMAGE \ -e RUNTIME_K8S_SPARK_IMAGE=$DOCKER_IMAGE \
-e RUNTIME_DRIVER_HOST=$LOCAL_IP \ -e RUNTIME_DRIVER_HOST=$LOCAL_IP \
-e RUNTIME_DRIVER_PORT=54321 \ -e RUNTIME_DRIVER_PORT=54321 \
-e RUNTIME_EXECUTOR_INSTANCES=1 \ -e RUNTIME_EXECUTOR_INSTANCES=1 \
-e RUNTIME_EXECUTOR_CORES=4 \ -e RUNTIME_EXECUTOR_CORES=4 \
-e RUNTIME_EXECUTOR_MEMORY=20g \ -e RUNTIME_EXECUTOR_MEMORY=20g \
-e RUNTIME_TOTAL_EXECUTOR_CORES=4 \ -e RUNTIME_TOTAL_EXECUTOR_CORES=4 \
-e RUNTIME_DRIVER_CORES=4 \ -e RUNTIME_DRIVER_CORES=4 \
-e RUNTIME_DRIVER_MEMORY=10g \ -e RUNTIME_DRIVER_MEMORY=10g \
-e SGX_MEM_SIZE=64G \ -e SGX_MEM_SIZE=64G \
-e SGX_LOG_LEVEL=error \ -e SGX_LOG_LEVEL=error \
-e LOCAL_IP=$LOCAL_IP \ -e LOCAL_IP=$LOCAL_IP \
$DOCKER_IMAGE bash $DOCKER_IMAGE bash
``` ```
6. Attach to the client container 6. Attach to the client container
```bash ```bash
sudo docker exec -it spark-local-k8s-client bash sudo docker exec -it spark-local-k8s-client bash
``` ```
7. Modify `spark-executor-template.yaml`, add path of `enclave-key`, `tpcds-spark` and `kuberconfig` on host 7. Modify `spark-executor-template.yaml`, add path of `enclave-key`, `tpcds-spark` and `kuberconfig` on host
```yaml ```yaml
apiVersion: v1 apiVersion: v1
kind: Pod kind: Pod
spec: spec:
containers: containers:
- name: spark-executor - name: spark-executor
securityContext: securityContext:
privileged: true privileged: true
volumeMounts: volumeMounts:
... ...
- name: tpcds - name: tpcds
mountPath: /ppml/trusted-big-data-ml/work/tpcds-spark mountPath: /ppml/trusted-big-data-ml/work/tpcds-spark
- name: kubeconf - name: kubeconf
mountPath: /root/.kube/config mountPath: /root/.kube/config
volumes: volumes:
- name: enclave-key - name: enclave-key
hostPath: hostPath:
path: /root/keys/enclave-key.pem path: /root/keys/enclave-key.pem
... ...
- name: tpcds - name: tpcds
hostPath: hostPath:
path: /path/to/tpcds-spark path: /path/to/tpcds-spark
- name: kubeconf - name: kubeconf
hostPath: hostPath:
path: /path/to/kuberconfig path: /path/to/kuberconfig
``` ```
8. Execute TPC-DS queries 8. Execute TPC-DS queries
Optional argument `QUERY` is the query number to run. Multiple query numbers should be separated by space, e.g. `1 2 3`. If no query number is specified, all 1-99 queries would be executed. Optional argument `QUERY` is the query number to run. Multiple query numbers should be separated by space, e.g. `1 2 3`. If no query number is specified, all 1-99 queries would be executed.
```bash ```bash
secure_password=`openssl rsautl -inkey /ppml/trusted-big-data-ml/work/password/key.txt -decrypt </ppml/trusted-big-data-ml/work/password/output.bin` && \ secure_password=`openssl rsautl -inkey /ppml/trusted-big-data-ml/work/password/key.txt -decrypt </ppml/trusted-big-data-ml/work/password/output.bin` && \
export TF_MKL_ALLOC_MAX_BYTES=10737418240 && \ export TF_MKL_ALLOC_MAX_BYTES=10737418240 && \
export SPARK_LOCAL_IP=$LOCAL_IP && \ export SPARK_LOCAL_IP=$LOCAL_IP && \
export HDFS_HOST=$hdfs_host_ip && \ export HDFS_HOST=$hdfs_host_ip && \
export HDFS_PORT=$hdfs_port && \ export HDFS_PORT=$hdfs_port && \
export TPCDS_DIR=/ppml/trusted-big-data-ml/work/tpcds-spark \ export TPCDS_DIR=/ppml/trusted-big-data-ml/work/tpcds-spark \
export OUTPUT_DIR=hdfs://$HDFS_HOST:$HDFS_PORT/tpc-ds/output \ export OUTPUT_DIR=hdfs://$HDFS_HOST:$HDFS_PORT/tpc-ds/output \
export QUERY=3 export QUERY=3
/opt/jdk8/bin/java \ /opt/jdk8/bin/java \
-cp '$TPCDS_DIR/target/scala-2.12/tpcds-benchmark_2.12-0.1.jar:/ppml/trusted-big-data-ml/work/spark-3.1.2/conf/:/ppml/trusted-big-data-ml/work/spark-3.1.2/jars/*' \ -cp '$TPCDS_DIR/target/scala-2.12/tpcds-benchmark_2.12-0.1.jar:/ppml/trusted-big-data-ml/work/spark-3.1.2/conf/:/ppml/trusted-big-data-ml/work/spark-3.1.2/jars/*' \
-Xmx10g \ -Xmx10g \
-Dbigdl.mklNumThreads=1 \ -Dbigdl.mklNumThreads=1 \
org.apache.spark.deploy.SparkSubmit \ org.apache.spark.deploy.SparkSubmit \
--master $RUNTIME_SPARK_MASTER \ --master $RUNTIME_SPARK_MASTER \
--deploy-mode client \ --deploy-mode client \
--name spark-tpcds-sgx \ --name spark-tpcds-sgx \
--conf spark.driver.host=$LOCAL_IP \ --conf spark.driver.host=$LOCAL_IP \
--conf spark.driver.port=54321 \ --conf spark.driver.port=54321 \
--conf spark.driver.memory=10g \ --conf spark.driver.memory=10g \
--conf spark.driver.blockManager.port=10026 \ --conf spark.driver.blockManager.port=10026 \
--conf spark.blockManager.port=10025 \ --conf spark.blockManager.port=10025 \
--conf spark.scheduler.maxRegisteredResourcesWaitingTime=5000000 \ --conf spark.scheduler.maxRegisteredResourcesWaitingTime=5000000 \
--conf spark.worker.timeout=600 \ --conf spark.worker.timeout=600 \
--conf spark.python.use.daemon=false \ --conf spark.python.use.daemon=false \
--conf spark.python.worker.reuse=false \ --conf spark.python.worker.reuse=false \
--conf spark.network.timeout=10000000 \ --conf spark.network.timeout=10000000 \
--conf spark.starvation.timeout=250000 \ --conf spark.starvation.timeout=250000 \
--conf spark.rpc.askTimeout=600 \ --conf spark.rpc.askTimeout=600 \
--conf spark.sql.autoBroadcastJoinThreshold=-1 \ --conf spark.sql.autoBroadcastJoinThreshold=-1 \
--conf spark.io.compression.codec=lz4 \ --conf spark.io.compression.codec=lz4 \
--conf spark.sql.shuffle.partitions=8 \ --conf spark.sql.shuffle.partitions=8 \
--conf spark.speculation=false \ --conf spark.speculation=false \
--conf spark.executor.heartbeatInterval=10000000 \ --conf spark.executor.heartbeatInterval=10000000 \
--conf spark.executor.instances=24 \ --conf spark.executor.instances=24 \
--executor-cores 8 \ --executor-cores 8 \
--total-executor-cores 192 \ --total-executor-cores 192 \
--executor-memory 16G \ --executor-memory 16G \
--properties-file /ppml/trusted-big-data-ml/work/bigdl-2.1.0-SNAPSHOT/conf/spark-bigdl.conf \ --properties-file /ppml/trusted-big-data-ml/work/bigdl-2.1.0-SNAPSHOT/conf/spark-bigdl.conf \
--conf spark.kubernetes.authenticate.serviceAccountName=spark \ --conf spark.kubernetes.authenticate.serviceAccountName=spark \
--conf spark.kubernetes.container.image=$RUNTIME_K8S_SPARK_IMAGE \ --conf spark.kubernetes.container.image=$RUNTIME_K8S_SPARK_IMAGE \
--conf spark.kubernetes.executor.podTemplateFile=/ppml/trusted-big-data-ml/spark-executor-template.yaml \ --conf spark.kubernetes.executor.podTemplateFile=/ppml/trusted-big-data-ml/spark-executor-template.yaml \
--conf spark.kubernetes.executor.deleteOnTermination=false \ --conf spark.kubernetes.executor.deleteOnTermination=false \
--conf spark.kubernetes.executor.podNamePrefix=spark-tpcds-sgx \ --conf spark.kubernetes.executor.podNamePrefix=spark-tpcds-sgx \
--conf spark.kubernetes.sgx.enabled=true \ --conf spark.kubernetes.sgx.enabled=true \
--conf spark.kubernetes.sgx.executor.mem=32g \ --conf spark.kubernetes.sgx.executor.mem=32g \
--conf spark.kubernetes.sgx.executor.jvm.mem=6g \ --conf spark.kubernetes.sgx.executor.jvm.mem=6g \
--conf spark.kubernetes.sgx.log.level=$SGX_LOG_LEVEL \ --conf spark.kubernetes.sgx.log.level=$SGX_LOG_LEVEL \
--conf spark.authenticate=true \ --conf spark.authenticate=true \
--conf spark.authenticate.secret=$secure_password \ --conf spark.authenticate.secret=$secure_password \
--conf spark.kubernetes.executor.secretKeyRef.SPARK_AUTHENTICATE_SECRET="spark-secret:secret" \ --conf spark.kubernetes.executor.secretKeyRef.SPARK_AUTHENTICATE_SECRET="spark-secret:secret" \
--conf spark.kubernetes.driver.secretKeyRef.SPARK_AUTHENTICATE_SECRET="spark-secret:secret" \ --conf spark.kubernetes.driver.secretKeyRef.SPARK_AUTHENTICATE_SECRET="spark-secret:secret" \
--conf spark.authenticate.enableSaslEncryption=true \ --conf spark.authenticate.enableSaslEncryption=true \
--conf spark.network.crypto.enabled=true \ --conf spark.network.crypto.enabled=true \
--conf spark.network.crypto.keyLength=128 \ --conf spark.network.crypto.keyLength=128 \
--conf spark.network.crypto.keyFactoryAlgorithm=PBKDF2WithHmacSHA1 \ --conf spark.network.crypto.keyFactoryAlgorithm=PBKDF2WithHmacSHA1 \
--conf spark.io.encryption.enabled=true \ --conf spark.io.encryption.enabled=true \
--conf spark.io.encryption.keySizeBits=128 \ --conf spark.io.encryption.keySizeBits=128 \
--conf spark.io.encryption.keygen.algorithm=HmacSHA1 \ --conf spark.io.encryption.keygen.algorithm=HmacSHA1 \
--conf spark.ssl.enabled=true \ --conf spark.ssl.enabled=true \
--conf spark.ssl.port=8043 \ --conf spark.ssl.port=8043 \
--conf spark.ssl.keyPassword=$secure_password \ --conf spark.ssl.keyPassword=$secure_password \
--conf spark.ssl.keyStore=/ppml/trusted-big-data-ml/work/keys/keystore.jks \ --conf spark.ssl.keyStore=/ppml/trusted-big-data-ml/work/keys/keystore.jks \
--conf spark.ssl.keyStorePassword=$secure_password \ --conf spark.ssl.keyStorePassword=$secure_password \
--conf spark.ssl.keyStoreType=JKS \ --conf spark.ssl.keyStoreType=JKS \
--conf spark.ssl.trustStore=/ppml/trusted-big-data-ml/work/keys/keystore.jks \ --conf spark.ssl.trustStore=/ppml/trusted-big-data-ml/work/keys/keystore.jks \
--conf spark.ssl.trustStorePassword=$secure_password \ --conf spark.ssl.trustStorePassword=$secure_password \
--conf spark.ssl.trustStoreType=JKS \ --conf spark.ssl.trustStoreType=JKS \
--class "TPCDSBenchmark" \ --class "TPCDSBenchmark" \
--verbose \ --verbose \
$TPCDS_DIR/target/scala-2.12/tpcds-benchmark_2.12-0.1.jar \ $TPCDS_DIR/target/scala-2.12/tpcds-benchmark_2.12-0.1.jar \
$OUTPUT_DIR $QUERY $OUTPUT_DIR $QUERY
``` ```
After benchmark is finished, the performance result is saved as `part-*.csv` file under `<OUTPUT_DIR>/performance` directory. After benchmark is finished, the performance result is saved as `part-*.csv` file under `<OUTPUT_DIR>/performance` directory.

View file

@ -8,198 +8,198 @@
### Prepare TPC-H kit and data ### ### Prepare TPC-H kit and data ###
1. Generate data 1. Generate data
Go to [TPC Download](https://www.tpc.org/tpc_documents_current_versions/current_specifications5.asp) site, choose `TPC-H` source code, then download the TPC-H toolkits. **Follow the download instructions carefully.** Go to [TPC Download](https://www.tpc.org/tpc_documents_current_versions/current_specifications5.asp) site, choose `TPC-H` source code, then download the TPC-H toolkits. **Follow the download instructions carefully.**
After you download the tpc-h tools zip and uncompressed the zip file. Go to `dbgen` directory, and create `makefile` based on `makefile.suite`, and modify `makefile` according to the prompts inside, and run `make`. After you download the tpc-h tools zip and uncompressed the zip file. Go to `dbgen` directory, and create `makefile` based on `makefile.suite`, and modify `makefile` according to the prompts inside, and run `make`.
This should generate an executable called `dbgen` This should generate an executable called `dbgen`
``` ```
./dbgen -h ./dbgen -h
``` ```
gives you the various options for generating the tables. The simplest case is running: gives you the various options for generating the tables. The simplest case is running:
``` ```
./dbgen ./dbgen
``` ```
which generates tables with extension `.tbl` with scale 1 (default) for a total of rougly 1GB size across all tables. For different size tables you can use the `-s` option: which generates tables with extension `.tbl` with scale 1 (default) for a total of rougly 1GB size across all tables. For different size tables you can use the `-s` option:
``` ```
./dbgen -s 10 ./dbgen -s 10
``` ```
will generate roughly 10GB of input data. will generate roughly 10GB of input data.
You need to move all .tbl files to a new directory as raw data. You need to move all .tbl files to a new directory as raw data.
You can then either upload your data to remote file system or read them locally. You can then either upload your data to remote file system or read them locally.
2. Encrypt Data 2. Encrypt Data
Encrypt data with specified Key Management Service (`SimpleKeyManagementService`, or `EHSMKeyManagementService` , or `AzureKeyManagementService`). Details can be found here: https://github.com/intel-analytics/BigDL/tree/main/ppml/services/kms-utils/docker Encrypt data with specified Key Management Service (`SimpleKeyManagementService`, or `EHSMKeyManagementService` , or `AzureKeyManagementService`). Details can be found here: https://github.com/intel-analytics/BigDL/tree/main/ppml/services/kms-utils/docker
The example code of encrypt data with `SimpleKeyManagementService` is like below: The example code of encrypt data with `SimpleKeyManagementService` is like below:
``` ```
java -cp "$BIGDL_HOME/jars/bigdl-ppml-spark_3.1.2-2.1.0-SNAPSHOT.jar:$SPARK_HOME/conf/:$SPARK_HOME/jars/*:$BIGDL_HOME/jars/*" \ java -cp "$BIGDL_HOME/jars/bigdl-ppml-spark_3.1.2-2.1.0-SNAPSHOT.jar:$SPARK_HOME/conf/:$SPARK_HOME/jars/*:$BIGDL_HOME/jars/*" \
-Xmx10g \ -Xmx10g \
com.intel.analytics.bigdl.ppml.examples.tpch.EncryptFiles \ com.intel.analytics.bigdl.ppml.examples.tpch.EncryptFiles \
--inputPath xxx/dbgen-input \ --inputPath xxx/dbgen-input \
--outputPath xxx/dbgen-encrypted --outputPath xxx/dbgen-encrypted
--kmsType SimpleKeyManagementService --kmsType SimpleKeyManagementService
--simpleAPPID xxxxxxxxxxxx \ --simpleAPPID xxxxxxxxxxxx \
--simpleAPPKEY xxxxxxxxxxxx \ --simpleAPPKEY xxxxxxxxxxxx \
--primaryKeyPath /path/to/simple_encrypted_primary_key \ --primaryKeyPath /path/to/simple_encrypted_primary_key \
--dataKeyPath /path/to/simple_encrypted_data_key --dataKeyPath /path/to/simple_encrypted_data_key
``` ```
### Deploy PPML TPC-H on Kubernetes ### ### Deploy PPML TPC-H on Kubernetes ###
1. Pull docker image 1. Pull docker image
``` ```
sudo docker pull intelanalytics/bigdl-ppml-trusted-big-data-ml-python-graphene:2.1.0-SNAPSHOT sudo docker pull intelanalytics/bigdl-ppml-trusted-big-data-ml-python-graphene:2.1.0-SNAPSHOT
``` ```
2. Prepare SGX keys (following instructions [here](https://github.com/intel-analytics/BigDL/tree/main/ppml/trusted-big-data-ml/python/docker-graphene#11-prepare-the-keyspassworddataenclave-keypem "here")), make sure keys and tpch-spark can be accessed on each K8S node 2. Prepare SGX keys (following instructions [here](https://github.com/intel-analytics/BigDL/tree/main/ppml/trusted-big-data-ml/python/docker-graphene#11-prepare-the-keyspassworddataenclave-keypem "here")), make sure keys and tpch-spark can be accessed on each K8S node
3. Start a bigdl-ppml enabled Spark K8S client container with configured local IP, key, tpch and kuberconfig path 3. Start a bigdl-ppml enabled Spark K8S client container with configured local IP, key, tpch and kuberconfig path
``` ```
export ENCLAVE_KEY=/path/to/enclave-key.pem export ENCLAVE_KEY=/path/to/enclave-key.pem
export SECURE_PASSWORD_PATH=/path/to/password export SECURE_PASSWORD_PATH=/path/to/password
export DATA_PATH=/path/to/data export DATA_PATH=/path/to/data
export KEYS_PATH=/path/to/keys export KEYS_PATH=/path/to/keys
export KUBERCONFIG_PATH=/path/to/kuberconfig export KUBERCONFIG_PATH=/path/to/kuberconfig
export LOCAL_IP=$local_ip export LOCAL_IP=$local_ip
export DOCKER_IMAGE=intelanalytics/bigdl-ppml-trusted-big-data-ml-python-graphene:2.1.0-SNAPSHOT export DOCKER_IMAGE=intelanalytics/bigdl-ppml-trusted-big-data-ml-python-graphene:2.1.0-SNAPSHOT
sudo docker run -itd \ sudo docker run -itd \
--privileged \ --privileged \
--net=host \ --net=host \
--name=spark-local-k8s-client \ --name=spark-local-k8s-client \
--oom-kill-disable \ --oom-kill-disable \
--device=/dev/sgx/enclave \ --device=/dev/sgx/enclave \
--device=/dev/sgx/provision \ --device=/dev/sgx/provision \
-v /var/run/aesmd/aesm.socket:/var/run/aesmd/aesm.socket \ -v /var/run/aesmd/aesm.socket:/var/run/aesmd/aesm.socket \
-v $SECURE_PASSWORD_PATH:/ppml/trusted-big-data-ml/work/password \ -v $SECURE_PASSWORD_PATH:/ppml/trusted-big-data-ml/work/password \
-v $ENCLAVE_KEY:/graphene/Pal/src/host/Linux-SGX/signer/enclave-key.pem \ -v $ENCLAVE_KEY:/graphene/Pal/src/host/Linux-SGX/signer/enclave-key.pem \
-v $DATA_PATH:/ppml/trusted-big-data-ml/work/data \ -v $DATA_PATH:/ppml/trusted-big-data-ml/work/data \
-v $KEYS_PATH:/ppml/trusted-big-data-ml/work/keys \ -v $KEYS_PATH:/ppml/trusted-big-data-ml/work/keys \
-v $KUBERCONFIG_PATH:/root/.kube/config \ -v $KUBERCONFIG_PATH:/root/.kube/config \
-e RUNTIME_SPARK_MASTER=k8s://https://$LOCAL_IP:6443 \ -e RUNTIME_SPARK_MASTER=k8s://https://$LOCAL_IP:6443 \
-e RUNTIME_K8S_SERVICE_ACCOUNT=spark \ -e RUNTIME_K8S_SERVICE_ACCOUNT=spark \
-e RUNTIME_K8S_SPARK_IMAGE=$DOCKER_IMAGE \ -e RUNTIME_K8S_SPARK_IMAGE=$DOCKER_IMAGE \
-e RUNTIME_DRIVER_HOST=$LOCAL_IP \ -e RUNTIME_DRIVER_HOST=$LOCAL_IP \
-e RUNTIME_DRIVER_PORT=54321 \ -e RUNTIME_DRIVER_PORT=54321 \
-e RUNTIME_EXECUTOR_INSTANCES=1 \ -e RUNTIME_EXECUTOR_INSTANCES=1 \
-e RUNTIME_EXECUTOR_CORES=4 \ -e RUNTIME_EXECUTOR_CORES=4 \
-e RUNTIME_EXECUTOR_MEMORY=20g \ -e RUNTIME_EXECUTOR_MEMORY=20g \
-e RUNTIME_TOTAL_EXECUTOR_CORES=4 \ -e RUNTIME_TOTAL_EXECUTOR_CORES=4 \
-e RUNTIME_DRIVER_CORES=4 \ -e RUNTIME_DRIVER_CORES=4 \
-e RUNTIME_DRIVER_MEMORY=10g \ -e RUNTIME_DRIVER_MEMORY=10g \
-e SGX_MEM_SIZE=64G \ -e SGX_MEM_SIZE=64G \
-e SGX_LOG_LEVEL=error \ -e SGX_LOG_LEVEL=error \
-e LOCAL_IP=$LOCAL_IP \ -e LOCAL_IP=$LOCAL_IP \
$DOCKER_IMAGE bash $DOCKER_IMAGE bash
``` ```
4. Attach to the client container 4. Attach to the client container
``` ```
sudo docker exec -it spark-local-k8s-client bash sudo docker exec -it spark-local-k8s-client bash
``` ```
5. Modify `spark-executor-template.yaml`, add path of `enclave-key`, `tpch-spark` and `kuberconfig` on host 5. Modify `spark-executor-template.yaml`, add path of `enclave-key`, `tpch-spark` and `kuberconfig` on host
``` ```
apiVersion: v1 apiVersion: v1
kind: Pod kind: Pod
spec: spec:
containers: containers:
- name: spark-executor - name: spark-executor
securityContext: securityContext:
privileged: true privileged: true
volumeMounts: volumeMounts:
... ...
- name: tpch - name: tpch
mountPath: /ppml/trusted-big-data-ml/work/tpch-spark mountPath: /ppml/trusted-big-data-ml/work/tpch-spark
- name: kubeconf - name: kubeconf
mountPath: /root/.kube/config mountPath: /root/.kube/config
volumes: volumes:
- name: enclave-key - name: enclave-key
hostPath: hostPath:
path: /root/keys/enclave-key.pem path: /root/keys/enclave-key.pem
... ...
- name: tpch - name: tpch
hostPath: hostPath:
path: /path/to/tpch-spark path: /path/to/tpch-spark
- name: kubeconf - name: kubeconf
hostPath: hostPath:
path: /path/to/kuberconfig path: /path/to/kuberconfig
``` ```
6. Run PPML TPC-H 6. Run PPML TPC-H
```bash ```bash
secure_password=`openssl rsautl -inkey /ppml/trusted-big-data-ml/work/password/key.txt -decrypt </ppml/trusted-big-data-ml/work/password/output.bin` && \ secure_password=`openssl rsautl -inkey /ppml/trusted-big-data-ml/work/password/key.txt -decrypt </ppml/trusted-big-data-ml/work/password/output.bin` && \
export TF_MKL_ALLOC_MAX_BYTES=10737418240 && \ export TF_MKL_ALLOC_MAX_BYTES=10737418240 && \
export SPARK_LOCAL_IP=$LOCAL_IP && \ export SPARK_LOCAL_IP=$LOCAL_IP && \
export INPUT_DIR=xxx/dbgen-encrypted && \ export INPUT_DIR=xxx/dbgen-encrypted && \
export OUTPUT_DIR=xxx/dbgen-output && \ export OUTPUT_DIR=xxx/dbgen-output && \
/opt/jdk8/bin/java \ /opt/jdk8/bin/java \
-cp '/ppml/trusted-big-data-ml/work/bigdl-2.1.0-SNAPSHOT/lib/bigdl-ppml-spark_3.1.2-2.1.0-SNAPSHOT-jar-with-dependencies.jar:/ppml/trusted-big-data-ml/work/spark-3.1.2/conf/:/ppml/trusted-big-data-ml/work/spark-3.1.2/jars/*' \ -cp '/ppml/trusted-big-data-ml/work/bigdl-2.1.0-SNAPSHOT/lib/bigdl-ppml-spark_3.1.2-2.1.0-SNAPSHOT-jar-with-dependencies.jar:/ppml/trusted-big-data-ml/work/spark-3.1.2/conf/:/ppml/trusted-big-data-ml/work/spark-3.1.2/jars/*' \
-Xmx10g \ -Xmx10g \
-Dbigdl.mklNumThreads=1 \ -Dbigdl.mklNumThreads=1 \
org.apache.spark.deploy.SparkSubmit \ org.apache.spark.deploy.SparkSubmit \
--master $RUNTIME_SPARK_MASTER \ --master $RUNTIME_SPARK_MASTER \
--deploy-mode client \ --deploy-mode client \
--name spark-tpch-sgx \ --name spark-tpch-sgx \
--conf spark.driver.host=$LOCAL_IP \ --conf spark.driver.host=$LOCAL_IP \
--conf spark.driver.port=54321 \ --conf spark.driver.port=54321 \
--conf spark.driver.memory=10g \ --conf spark.driver.memory=10g \
--conf spark.driver.blockManager.port=10026 \ --conf spark.driver.blockManager.port=10026 \
--conf spark.blockManager.port=10025 \ --conf spark.blockManager.port=10025 \
--conf spark.scheduler.maxRegisteredResourcesWaitingTime=5000000 \ --conf spark.scheduler.maxRegisteredResourcesWaitingTime=5000000 \
--conf spark.worker.timeout=600 \ --conf spark.worker.timeout=600 \
--conf spark.python.use.daemon=false \ --conf spark.python.use.daemon=false \
--conf spark.python.worker.reuse=false \ --conf spark.python.worker.reuse=false \
--conf spark.network.timeout=10000000 \ --conf spark.network.timeout=10000000 \
--conf spark.starvation.timeout=250000 \ --conf spark.starvation.timeout=250000 \
--conf spark.rpc.askTimeout=600 \ --conf spark.rpc.askTimeout=600 \
--conf spark.sql.autoBroadcastJoinThreshold=-1 \ --conf spark.sql.autoBroadcastJoinThreshold=-1 \
--conf spark.io.compression.codec=lz4 \ --conf spark.io.compression.codec=lz4 \
--conf spark.sql.shuffle.partitions=8 \ --conf spark.sql.shuffle.partitions=8 \
--conf spark.speculation=false \ --conf spark.speculation=false \
--conf spark.executor.heartbeatInterval=10000000 \ --conf spark.executor.heartbeatInterval=10000000 \
--conf spark.executor.instances=24 \ --conf spark.executor.instances=24 \
--executor-cores 8 \ --executor-cores 8 \
--total-executor-cores 192 \ --total-executor-cores 192 \
--executor-memory 16G \ --executor-memory 16G \
--properties-file /ppml/trusted-big-data-ml/work/bigdl-2.1.0-SNAPSHOT/conf/spark-bigdl.conf \ --properties-file /ppml/trusted-big-data-ml/work/bigdl-2.1.0-SNAPSHOT/conf/spark-bigdl.conf \
--conf spark.kubernetes.authenticate.serviceAccountName=spark \ --conf spark.kubernetes.authenticate.serviceAccountName=spark \
--conf spark.kubernetes.container.image=$RUNTIME_K8S_SPARK_IMAGE \ --conf spark.kubernetes.container.image=$RUNTIME_K8S_SPARK_IMAGE \
--conf spark.kubernetes.executor.podTemplateFile=/ppml/trusted-big-data-ml/spark-executor-template.yaml \ --conf spark.kubernetes.executor.podTemplateFile=/ppml/trusted-big-data-ml/spark-executor-template.yaml \
--conf spark.kubernetes.executor.deleteOnTermination=false \ --conf spark.kubernetes.executor.deleteOnTermination=false \
--conf spark.kubernetes.executor.podNamePrefix=spark-tpch-sgx \ --conf spark.kubernetes.executor.podNamePrefix=spark-tpch-sgx \
--conf spark.kubernetes.sgx.enabled=true \ --conf spark.kubernetes.sgx.enabled=true \
--conf spark.kubernetes.sgx.executor.mem=32g \ --conf spark.kubernetes.sgx.executor.mem=32g \
--conf spark.kubernetes.sgx.executor.jvm.mem=10g \ --conf spark.kubernetes.sgx.executor.jvm.mem=10g \
--conf spark.kubernetes.sgx.log.level=$SGX_LOG_LEVEL \ --conf spark.kubernetes.sgx.log.level=$SGX_LOG_LEVEL \
--conf spark.authenticate=true \ --conf spark.authenticate=true \
--conf spark.authenticate.secret=$secure_password \ --conf spark.authenticate.secret=$secure_password \
--conf spark.kubernetes.executor.secretKeyRef.SPARK_AUTHENTICATE_SECRET="spark-secret:secret" \ --conf spark.kubernetes.executor.secretKeyRef.SPARK_AUTHENTICATE_SECRET="spark-secret:secret" \
--conf spark.kubernetes.driver.secretKeyRef.SPARK_AUTHENTICATE_SECRET="spark-secret:secret" \ --conf spark.kubernetes.driver.secretKeyRef.SPARK_AUTHENTICATE_SECRET="spark-secret:secret" \
--conf spark.authenticate.enableSaslEncryption=true \ --conf spark.authenticate.enableSaslEncryption=true \
--conf spark.network.crypto.enabled=true \ --conf spark.network.crypto.enabled=true \
--conf spark.network.crypto.keyLength=128 \ --conf spark.network.crypto.keyLength=128 \
--conf spark.network.crypto.keyFactoryAlgorithm=PBKDF2WithHmacSHA1 \ --conf spark.network.crypto.keyFactoryAlgorithm=PBKDF2WithHmacSHA1 \
--conf spark.io.encryption.enabled=true \ --conf spark.io.encryption.enabled=true \
--conf spark.io.encryption.keySizeBits=128 \ --conf spark.io.encryption.keySizeBits=128 \
--conf spark.io.encryption.keygen.algorithm=HmacSHA1 \ --conf spark.io.encryption.keygen.algorithm=HmacSHA1 \
--conf spark.ssl.enabled=true \ --conf spark.ssl.enabled=true \
--conf spark.ssl.port=8043 \ --conf spark.ssl.port=8043 \
--conf spark.ssl.keyPassword=$secure_password \ --conf spark.ssl.keyPassword=$secure_password \
--conf spark.ssl.keyStore=/ppml/trusted-big-data-ml/work/keys/keystore.jks \ --conf spark.ssl.keyStore=/ppml/trusted-big-data-ml/work/keys/keystore.jks \
--conf spark.ssl.keyStorePassword=$secure_password \ --conf spark.ssl.keyStorePassword=$secure_password \
--conf spark.ssl.keyStoreType=JKS \ --conf spark.ssl.keyStoreType=JKS \
--conf spark.ssl.trustStore=/ppml/trusted-big-data-ml/work/keys/keystore.jks \ --conf spark.ssl.trustStore=/ppml/trusted-big-data-ml/work/keys/keystore.jks \
--conf spark.ssl.trustStorePassword=$secure_password \ --conf spark.ssl.trustStorePassword=$secure_password \
--conf spark.ssl.trustStoreType=JKS \ --conf spark.ssl.trustStoreType=JKS \
--conf spark.bigdl.kms.type=SimpleKeyManagementService \ --conf spark.bigdl.kms.type=SimpleKeyManagementService \
--conf spark.bigdl.kms.simple.id=simpleAPPID \ --conf spark.bigdl.kms.simple.id=simpleAPPID \
--conf spark.bigdl.kms.simple.key=simpleAPIKEY \ --conf spark.bigdl.kms.simple.key=simpleAPIKEY \
--conf spark.bigdl.kms.key.primary=xxxx/primaryKey \ --conf spark.bigdl.kms.key.primary=xxxx/primaryKey \
--conf spark.bigdl.kms.key.data=xxxx/dataKey \ --conf spark.bigdl.kms.key.data=xxxx/dataKey \
--class com.intel.analytics.bigdl.ppml.examples.tpch.TpchQuery \ --class com.intel.analytics.bigdl.ppml.examples.tpch.TpchQuery \
--verbose \ --verbose \
/ppml/trusted-big-data-ml/work/bigdl-2.1.0-SNAPSHOT/lib/bigdl-ppml-spark_3.1.2-2.1.0-SNAPSHOT-jar-with-dependencies.jar \ /ppml/trusted-big-data-ml/work/bigdl-2.1.0-SNAPSHOT/lib/bigdl-ppml-spark_3.1.2-2.1.0-SNAPSHOT-jar-with-dependencies.jar \
$INPUT_DIR $OUTPUT_DIR aes/cbc/pkcs5padding plain_text [QUERY] $INPUT_DIR $OUTPUT_DIR aes/cbc/pkcs5padding plain_text [QUERY]
``` ```
The optional parameter [QUERY] is the number of the query to run e.g 1, 2, ..., 22. The optional parameter [QUERY] is the number of the query to run e.g 1, 2, ..., 22.
The result is in OUTPUT_DIR. There should be a file called TIMES.TXT with content formatted like: The result is in OUTPUT_DIR. There should be a file called TIMES.TXT with content formatted like:
>Q01 39.80204010 >Q01 39.80204010

View file

@ -0,0 +1,71 @@
BigDL-PPML
=========================
Protecting privacy and confidentiality is critical for large-scale data analysis and machine learning. BigDL PPML (BigDL Privacy Preserving Machine Learning) combines various low-level hardware and software security technologies (e.g., Intel® Software Guard Extensions (Intel® SGX), Security Key Management, Remote Attestation, Data Encryption, Federated Learning, etc.) so that users can continue applying standard Big Data and AI technologies (such as Apache Spark, Apache Flink, TensorFlow, PyTorch, etc.) without sacrificing privacy.
----------------------
.. grid:: 1 2 2 2
:gutter: 2
.. grid-item-card::
**Get Started**
^^^
Documents in these sections helps you getting started quickly with PPML.
+++
:bdg-link:`Introduction <./Overview/intro.html>` |
:bdg-link:`Hello World Example <./Overview/quicktour.html>`
.. grid-item-card::
**User Guide**
^^^
Provides you with in-depth information about PPML features and concepts and step-by-step guides.
+++
:bdg-link:`User Guide <./Overview/userguide.html>` |
:bdg-link:`Advanced Topics <./Overview/misc.html>`
.. grid-item-card::
**Tutorials**
^^^
PPML Tutorials and Examples.
+++
:bdg-link:`End-to-End Example <./Overview/examples.html>` |
:bdg-link:`More Examples <https://github.com/intel-analytics/BigDL/blob/main/ppml/docs/examples.md>`
.. grid-item-card::
**Videos**
^^^
Videos and Demos helps you quick understand the architecture and start hands-on work.
+++
:bdg-link:`Introduction <./Overview/intro.html#what-is-bigdl-ppml>` |
:bdg-link:`E2E Workflow <./QuickStart/end-to-end.html#e2e-architecture-overview>` |
:bdg-link:`E2E Demo <./QuickStart/end-to-end.html#video-demo>`
.. toctree::
:hidden:
BigDL-PPML Document <self>

View file

@ -1,3 +1,7 @@
# Clipping
--------
## ConstantGradientClipping ## ## ConstantGradientClipping ##
Set constant gradient clipping during the training process. Set constant gradient clipping during the training process.

File diff suppressed because it is too large Load diff

View file

@ -1,4 +1,5 @@
## Model Freeze # Model Freeze
To "freeze" a model means to exclude some layers of model from training. To "freeze" a model means to exclude some layers of model from training.
```scala ```scala

View file

@ -0,0 +1,13 @@
DLlib API
==================
.. toctree::
:maxdepth: 1
model.rst
core_layers.md
optim-Methods.md
regularizers.md
learningrate-Scheduler.md
freeze.md
clipping.md

View file

@ -1,3 +1,8 @@
# Learning Rate Scheduler
--------
## Poly ## ## Poly ##
**Scala:** **Scala:**

View file

@ -0,0 +1,17 @@
Model/Sequential
==================
dllib.keras.models.Model
---------------------------
.. autoclass:: bigdl.dllib.keras.models.Model
:members:
:undoc-members:
dllib.keras.models.Sequential
---------------------------
.. autoclass:: bigdl.dllib.keras.models.Sequential
:members:
:undoc-members:

View file

@ -1,3 +1,7 @@
# Optimizer
--------
## Adam ## ## Adam ##
**Scala:** **Scala:**
@ -11,16 +15,16 @@ optim = Adam(learningrate=1e-3, learningrate_decay=0.0, beta1=0.9, beta2=0.999,
An implementation of Adam optimization, first-order gradient-based optimization of stochastic objective functions. http://arxiv.org/pdf/1412.6980.pdf An implementation of Adam optimization, first-order gradient-based optimization of stochastic objective functions. http://arxiv.org/pdf/1412.6980.pdf
`learningRate` learning rate. Default value is 1e-3. `learningRate` learning rate. Default value is 1e-3.
`learningRateDecay` learning rate decay. Default value is 0.0. `learningRateDecay` learning rate decay. Default value is 0.0.
`beta1` first moment coefficient. Default value is 0.9. `beta1` first moment coefficient. Default value is 0.9.
`beta2` second moment coefficient. Default value is 0.999. `beta2` second moment coefficient. Default value is 0.999.
`Epsilon` for numerical stability. Default value is 1e-8. `Epsilon` for numerical stability. Default value is 1e-8.
**Scala example:** **Scala example:**
```scala ```scala
@ -66,7 +70,7 @@ def rosenBrock(x: Tensor[Float]): (Float, Tensor[Float]) = {
dxout.narrow(1, 2, d - 1).add(x0) dxout.narrow(1, 2, d - 1).add(x0)
(fout, dxout) (fout, dxout)
} }
val x = Tensor(2).fill(0) val x = Tensor(2).fill(0)
> print(optm.optimize(rosenBrock, x)) > print(optm.optimize(rosenBrock, x))
(0.0019999996 (0.0019999996
@ -76,7 +80,7 @@ val x = Tensor(2).fill(0)
**Python example:** **Python example:**
```python ```python
optim_method = Adam(learningrate=0.002) optim_method = Adam(learningrate=0.002)
optimizer = Optimizer( optimizer = Optimizer(
model=mlp_model, model=mlp_model,
training_rdd=train_data, training_rdd=train_data,
@ -104,10 +108,10 @@ optim_method = SGD(learningrate=1e-3,learningrate_decay=0.0,weightdecay=0.0,
weightdecays=None,bigdl_type="float") weightdecays=None,bigdl_type="float")
``` ```
A plain implementation of SGD which provides optimize method. After setting A plain implementation of SGD which provides optimize method. After setting
optimization method when create Optimize, Optimize will call optimization method at the end of optimization method when create Optimize, Optimize will call optimization method at the end of
each iteration. each iteration.
**Scala example:** **Scala example:**
```scala ```scala
val optimMethod = new SGD[Float](learningRate= 1e-3,learningRateDecay=0.0, val optimMethod = new SGD[Float](learningRate= 1e-3,learningRateDecay=0.0,
@ -123,7 +127,7 @@ optim_method = SGD(learningrate=1e-3,learningrate_decay=0.0,weightdecay=0.0,
momentum=0.0,dampening=DOUBLEMAX,nesterov=False, momentum=0.0,dampening=DOUBLEMAX,nesterov=False,
leaningrate_schedule=None,learningrates=None, leaningrate_schedule=None,learningrates=None,
weightdecays=None,bigdl_type="float") weightdecays=None,bigdl_type="float")
optimizer = Optimizer( optimizer = Optimizer(
model=mlp_model, model=mlp_model,
training_rdd=train_data, training_rdd=train_data,
@ -136,7 +140,7 @@ optimizer = Optimizer(
## Adadelta ## ## Adadelta ##
*AdaDelta* implementation for *SGD* *AdaDelta* implementation for *SGD*
It has been proposed in `ADADELTA: An Adaptive Learning Rate Method`. It has been proposed in `ADADELTA: An Adaptive Learning Rate Method`.
http://arxiv.org/abs/1212.5701. http://arxiv.org/abs/1212.5701.
@ -302,7 +306,7 @@ optimizer.setOptimMethod(optimMethod)
optim_method = LBFGS(max_iter=20, max_eval=DOUBLEMAX, \ optim_method = LBFGS(max_iter=20, max_eval=DOUBLEMAX, \
tol_fun=1e-5, tol_x=1e-9, n_correction=100, \ tol_fun=1e-5, tol_x=1e-9, n_correction=100, \
learning_rate=1.0, line_search=None, line_search_options=None) learning_rate=1.0, line_search=None, line_search_options=None)
optimizer = Optimizer( optimizer = Optimizer(
model=mlp_model, model=mlp_model,
training_rdd=train_data, training_rdd=train_data,
@ -353,7 +357,7 @@ optimizer.setOptimMethod(optimMethod)
optim_method = Ftrl(learningrate = 5e-3, \ optim_method = Ftrl(learningrate = 5e-3, \
learningrate_power = -0.5, \ learningrate_power = -0.5, \
initial_accumulator_value = 0.01) initial_accumulator_value = 0.01)
optimizer = Optimizer( optimizer = Optimizer(
model=mlp_model, model=mlp_model,
training_rdd=train_data, training_rdd=train_data,

View file

@ -1,3 +1,7 @@
# Regularizer
--------
## L1 Regularizer ## ## L1 Regularizer ##
**Scala:** **Scala:**

View file

@ -0,0 +1,7 @@
Friesian API
==================
.. toctree::
:maxdepth: 2
feature.rst

View file

@ -1,3 +1,6 @@
Orca AutoML
============================
orca.automl.auto_estimator orca.automl.auto_estimator
--------------------------- ---------------------------
@ -11,7 +14,7 @@ A general estimator supports automatic model tuning. It allows users to fit and
orca.automl.hp orca.automl.hp
---------------------------------------- ----------------------------------------
Sampling specs to be used in search space configuration. Sampling specs to be used in search space configuration.
.. automodule:: bigdl.orca.automl.hp .. automodule:: bigdl.orca.automl.hp
:members: :members:

View file

@ -0,0 +1,15 @@
Orca Context
=========
orca.init_orca_context
-------------------------
.. automodule:: bigdl.orca.common
:members: init_orca_context
:undoc-members:
:show-inheritance:

View file

@ -0,0 +1,20 @@
Orca Data
=========
orca.data.XShards
---------------------------
.. autoclass:: bigdl.orca.data.XShards
:members:
:undoc-members:
:show-inheritance:
orca.data.pandas
---------------------------
.. automodule:: bigdl.orca.data.pandas.preprocessing
:members:
:undoc-members:
:show-inheritance:

View file

@ -0,0 +1,10 @@
Orca API
==================
.. toctree::
:maxdepth: 2
context.rst
data.rst
orca.rst
automl.rst

View file

@ -1,4 +1,4 @@
Orca API Orca Learn
========= =========
orca.learn.bigdl.estimator orca.learn.bigdl.estimator
@ -88,12 +88,3 @@ orca.learn.openvino.estimator
:members: :members:
:undoc-members: :undoc-members:
:show-inheritance: :show-inheritance:
AutoML
------------------------------
.. toctree::
:maxdepth: 2
automl.rst

View file

@ -44,10 +44,10 @@ output of Cluster Serving job information should be displayed, if not, go to [Pr
1. `Duplicate registration of device factory for type XLA_CPU with the same priority 50` 1. `Duplicate registration of device factory for type XLA_CPU with the same priority 50`
This error is caused by Flink ClassLoader. Please put cluster serving related jars into `${FLINK_HOME}/lib`. This error is caused by Flink ClassLoader. Please put cluster serving related jars into `${FLINK_HOME}/lib`.
2. `servable Manager config dir not exist` 2. `servable Manager config dir not exist`
Check if `servables.yaml` exists in current directory. If not, download from [github](https://github.com/intel-analytics/bigdl/blob/master/ppml/trusted-realtime-ml/scala/docker-graphene/servables.yaml). Check if `servables.yaml` exists in current directory. If not, download from [github](https://github.com/intel-analytics/bigdl/blob/master/ppml/trusted-realtime-ml/scala/docker-graphene/servables.yaml).
### Still, I get no result ### Still, I get no result
If you still get empty result, raise issue [here](https://github.com/intel-analytics/bigdl/issues) and post the output/log of your serving job. If you still get empty result, raise issue [here](https://github.com/intel-analytics/bigdl/issues) and post the output/log of your serving job.

View file

@ -0,0 +1,66 @@
Cluster Serving
=========================
BigDL Cluster Serving is a lightweight distributed, real-time serving solution that supports a wide range of deep learning models (such as TensorFlow, PyTorch, Caffe, BigDL and OpenVINO models). It provides a simple pub/sub API, so that the users can easily send their inference requests to the input queue (using a simple Python API); Cluster Serving will then automatically manage the scale-out and real-time model inference across a large cluster (using distributed streaming frameworks such as Apache Spark Streaming, Apache Flink, etc.)
----------------------
.. grid:: 1 2 2 2
:gutter: 2
.. grid-item-card::
**Get Started**
^^^
Documents in these sections helps you getting started quickly with Serving.
+++
:bdg-link:`Serving in 5 minutes <./QuickStart/serving-quickstart.html>` |
:bdg-link:`Installation <./ProgrammingGuide/serving-installation.html>`
.. grid-item-card::
**Key Features Guide**
^^^
Each guide in this section provides you with in-depth information, concepts and knowledges about DLLib key features.
+++
:bdg-link:`Start Serving <./ProgrammingGuide/serving-start.html>` |
:bdg-link:`Inference <./ProgrammingGuide/serving-inference.html>`
.. grid-item-card::
**Examples**
^^^
Cluster Serving Examples and Tutorials.
+++
:bdg-link:`Examples <./Example/example.html>`
.. grid-item-card::
**MISC**
^^^
Cluster Serving
+++
:bdg-link:`FAQ <./FAQ/faq.html>` |
:bdg-link:`Contribute <./FAQ/contribute-guide.html>`
.. toctree::
:hidden:
Cluster Serving Document <self>

Some files were not shown because too many files have changed in this diff Show more