* refactor toc * refactor toc * Change to pydata-sphinx-theme and update packages requirement list for ReadtheDocs * Remove customized css for old theme * Add index page to each top bar section and limit dropdown maximum to be 4 * Use js to change 'More' to 'Libraries' * Add custom.css to conf.py for further css changes * Add BigDL logo and search bar * refactor toc * refactor toc and add overview * refactor toc and add overview * refactor toc and add overview * refactor get started * add paper and video section * add videos * add grid columns in landing page * add document roadmap to index * reapply search bar and github icon commit * reorg orca and chronos sections * Test: weaken ads by js * update: change left attrbute * update: add comments * update: change opacity to 0.7 * Remove useless theme template override for old theme * Add sidebar releases component in the home page * Remove sidebar search and restore top nav search button * Add BigDL handouts * Add back to homepage button to pages except from the home page * Update releases contents & styles in left sidebar * Add version badge to the top bar * Test: weaken ads by js * update: add comments * remove landing page contents * rfix chronos install * refactor install * refactor chronos section titles * refactor nano index * change chronos landing * revise chronos landing page * add document navigator to nano landing page * revise install landing page * Improve css of versions in sidebar * Make handouts image pointing to a page in new tab * add win guide to install * add dliib installation * revise title bar * rename index files * add index page for user guide * add dllib and orca API * update user guide landing page * refactor side bar * Remove extra style configuration of card components & make different card usage consistent * Remove extra styles for Nano how-to guides * Remove extra styles for Chronos how-to guides * Remove dark mode for now * Update index page description * Add decision tree for choosing BigDL libraries in index page * add dllib models api, revise core layers formats * Change primary & info color in light mode * Restyle card components * Restructure Chronos landing page * Update card style * Update BigDL library selection decision tree * Fix failed Chronos tutorials filter * refactor PPML documents * refactor and add friesian documents * add friesian arch diagram * update landing pages and fill key features guide index page * Restyle link card component * Style video frames in PPML sections * Adjust Nano landing page * put api docs to the last in index for convinience * Make badge horizontal padding smaller & small changes * Change the second letter of all header titles to be small capitalizd * Small changes on Chronos index page * Revise decision tree to make it smaller * Update: try to change the position of ads. * Bugfix: deleted nonexist file config * Update: update ad JS/CSS/config * Update: change ad. * Update: delete my template and change files. * Update: change chronos installation table color. * Update: change table font color to --pst-color-primary-text * Remove old contents in landing page sidebar * Restyle badge for usage in card footer again * Add quicklinks template on landing page sidebar * add quick links * Add scala logo * move tf, pytorch out of the link * change orca key features cards * fix typo * fix a mistake in wording * Restyle badge for card footer * Update decision tree * Remove useless html templates * add more api docs and update tutorials in dllib * update chronos install using new style * merge changes in nano doc from master * fix quickstart links in sidebar quicklinks * Make tables responsive * Fix overflow in api doc * Fix list indents problems in [User guide] section * Further fixes to nested bullets contents in [User Guide] section * Fix strange title in Nano 5-min doc * Fix list indent problems in [DLlib] section * Fix misnumbered list problems and other small fixes for [Chronos] section * Fix list indent problems and other small fixes for [Friesian] section * Fix list indent problem and other small fixes for [PPML] section * Fix list indent problem for developer guide * Fix list indent problem for [Cluster Serving] section * fix dllib links * Fix wrong relative link in section landing page Co-authored-by: Yuwen Hu <yuwen.hu@intel.com> Co-authored-by: Juntao Luo <1072087358@qq.com>
213 lines
9.7 KiB
Markdown
213 lines
9.7 KiB
Markdown
# Distributed Hyper-Parameter Tuning
|
|
|
|
---
|
|
|
|
**Orca `AutoEstimator` provides similar APIs as Orca `Estimator` for distributed hyper-parameter tuning.**
|
|
|
|
|
|
|
|
### **1. AutoEstimator**
|
|
|
|
To perform distributed hyper-parameter tuning, user can first create an Orca `AutoEstimator` from standard TensorFlow Keras or PyTorch model, and then call `AutoEstimator.fit`.
|
|
|
|
Under the hood, the Orca `AutoEstimator` generates different trials and schedules them on each mode in the cluster. Each trial runs a different combination of hyper parameters, sampled from the user-desired hyper-parameter space.
|
|
HDFS is used to save temporary results of each trial and all the results will be finally transferred to driver for further analysis.
|
|
|
|
### **2. Pytorch AutoEstimator**
|
|
|
|
User could pass *Creator Function*s, including *Data Creator Function*, *Model Creator Function* and *Optimizer Creator Function* to `AutoEstimator` for training.
|
|
|
|
The *Creator Function*s should take a parameter of `config` as input and get the hyper-parameter values from `config` to enable hyper parameter search.
|
|
|
|
#### **2.1 Data Creator Function**
|
|
You can define the train and validation datasets using *Data Creator Function*. The *Data Creator Function* takes `config` as input and returns a `torch.utils.data.DataLoader` object, as shown below.
|
|
```python
|
|
# "batch_size" is the hyper-parameter to be tuned.
|
|
def train_loader_creator(config):
|
|
train_loader = torch.utils.data.DataLoader(
|
|
datasets.MNIST(dir, train=True, download=True,
|
|
transform=transforms.Compose([
|
|
transforms.ToTensor(),
|
|
transforms.Normalize((0.1307,), (0.3081,))
|
|
])),
|
|
batch_size=config["batch_size"], shuffle=True)
|
|
return train_loader
|
|
```
|
|
The input data for Pytorch `AutoEstimator` can be a *Data Creator Function* or a tuple of numpy ndarrays in the form of (x, y), where x is training input data and y is training target data.
|
|
|
|
#### **2.2 Model Creator Function**
|
|
*Model Creator Function* also takes `config` as input and returns a `torch.nn.Module` object, as shown below.
|
|
|
|
```python
|
|
import torch.nn as nn
|
|
class LeNet(nn.Module):
|
|
def __init__(self, fc1_hidden_size=500):
|
|
super(LeNet, self).__init__()
|
|
self.conv1 = nn.Conv2d(1, 20, 5, 1)
|
|
self.conv2 = nn.Conv2d(20, 50, 5, 1)
|
|
self.fc1 = nn.Linear(4*4*50, fc1_hidden_size)
|
|
self.fc2 = nn.Linear(fc1_hidden_size, 10)
|
|
|
|
def forward(self, x):
|
|
pass
|
|
|
|
def model_creator(config):
|
|
# "fc1_hidden_size" is the hyper-parameter to be tuned.
|
|
model = LeNet(fc1_hidden_size=config["fc1_hidden_size"])
|
|
return model
|
|
```
|
|
|
|
#### **2.3 Optimizer Creator Function**
|
|
*Optimizer Creator Function* takes `model` and `config` as input, and returns a `torch.optim.Optimizer` object.
|
|
```python
|
|
import torch
|
|
def optim_creator(model, config):
|
|
return torch.optim.Adam(model.parameters(), lr=config["lr"])
|
|
```
|
|
|
|
Note that the `optimizer` argument in Pytorch `AutoEstimator` constructor could be a *Optimizer Creator Function* or a string, which is the name of Pytorch Optimizer. The above *Optimizer Creator Function* has the same functionality with "Adam".
|
|
|
|
#### **2.4 Create and Fit Pytorch AutoEstimator**
|
|
User could create a Pytorch `AutoEstimator` as below.
|
|
```python
|
|
from bigdl.orca.automl.auto_estimator import AutoEstimator
|
|
|
|
auto_est = AutoEstimator.from_torch(model_creator=model_creator,
|
|
optimizer=optim_creator,
|
|
loss=nn.NLLLoss(),
|
|
logs_dir="/tmp/orca_automl_logs",
|
|
resources_per_trial={"cpu": 2},
|
|
name="lenet_mnist")
|
|
```
|
|
Then user can perform distributed hyper-parameter tuning as follows. For more details about the `search_space` argument, view the *search space and search algorithms* [page](#search-space-and-search-algorithms).
|
|
```python
|
|
auto_est.fit(data=train_loader_creator,
|
|
validation_data=test_loader_creator,
|
|
search_space=search_space,
|
|
n_sampling=2,
|
|
epochs=1,
|
|
metric="accuracy")
|
|
```
|
|
Finally, user can get the best learned model and the best hyper-parameters for further deployment.
|
|
```python
|
|
best_model = auto_est.get_best_model() # a `torch.nn.Module` object
|
|
best_config = auto_est.get_best_config() # a dictionary of hyper-parameter names and values.
|
|
```
|
|
View the related [Python API doc](https://bigdl.readthedocs.io/en/latest/doc/PythonAPI/AutoML/automl.html#orca-automl-auto-estimator) for more details.
|
|
|
|
### **3. TensorFlow/Keras AutoEstimator**
|
|
Users can create an `AutoEstimator` for TensorFlow Keras from a `tf.keras` model (using a *Model Creator Function*). For example:
|
|
|
|
```python
|
|
def model_creator(config):
|
|
model = tf.keras.models.Sequential([tf.keras.layers.Dense(config["hidden_size"],
|
|
input_shape=(1,)),
|
|
tf.keras.layers.Dense(1)])
|
|
model.compile(loss="mse",
|
|
optimizer=tf.keras.optimizers.SGD(config["lr"]),
|
|
metrics=["mse"])
|
|
return model
|
|
|
|
auto_est = AutoEstimator.from_keras(model_creator=model_creator,
|
|
logs_dir="/tmp/orca_automl_logs",
|
|
resources_per_trial={"cpu": 2},
|
|
name="auto_keras")
|
|
```
|
|
|
|
Then user can perform distributed hyper-parameter tuning as follows. For more details about `search_space`, view the *search space and search algorithms* [page](#search-space-and-search-algorithms).
|
|
```python
|
|
auto_est.fit(data=train_data,
|
|
validation_data=val_data,
|
|
search_space=search_space,
|
|
n_sampling=2,
|
|
epochs=1,
|
|
metric="accuracy")
|
|
```
|
|
The `data` and `validation_data` in `fit` method can only be a tuple of numpy ndarrays. We haven't support *Data Create Function* now. The numpy ndarray should also be in the form of (x, y), where x is training input data and y is training target data.
|
|
|
|
Finally, user can get the best learned model and the best hyper-parameters for further deployment.
|
|
```python
|
|
best_model = auto_est.get_best_model() # a `torch.nn.Module` object
|
|
best_config = auto_est.get_best_config() # a dictionary of hyper-parameter names and values.
|
|
```
|
|
View the related [Python API doc](https://bigdl.readthedocs.io/en/latest/doc/PythonAPI/AutoML/automl.html#orca-automl-auto-estimator) for more details.
|
|
|
|
### **4. Search Space and Search Algorithms**
|
|
For Hyper-parameter Optimization, user should define the search space of various hyper-parameter values for neural network training, as well as how to search through the chosen hyper-parameter space.
|
|
|
|
#### **4.1 Basic Search Algorithms**
|
|
|
|
For basic search algorithms like **Grid Search** and **Random Search**, we provide several sampling functions with `automl.hp`. See [API doc](https://bigdl.readthedocs.io/en/latest/doc/PythonAPI/AutoML/automl.html#orca-automl-hp) for more details.
|
|
|
|
`AutoEstimator` requires a dictionary for the `search_space` argument in `fit`.
|
|
In the dictionary, the keys are the hyper-parameter names, and the values specify how to sample the search spaces for the hyper-parameters.
|
|
|
|
```python
|
|
from bigdl.orca.automl import hp
|
|
|
|
search_space = {
|
|
"fc1_hidden_size": hp.grid_search([500, 600]),
|
|
"lr": hp.loguniform(0.001, 0.1),
|
|
"batch_size": hp.choice([160, 320, 640]),
|
|
}
|
|
```
|
|
|
|
#### **4.2 Advanced Search Algorithms**
|
|
Beside grid search and random search, user could also choose to use some advanced hyper-parameter optimization methods,
|
|
such as [Ax](https://ax.dev/), [Bayesian Optimization](https://github.com/fmfn/BayesianOptimization), [Scikit-Optimize](https://scikit-optimize.github.io), etc. We supported all *Search Algorithms* in [Ray Tune](https://docs.ray.io/en/master/index.html). View the [Ray Tune Search Algorithms](https://docs.ray.io/en/master/tune/api_docs/suggestion.html) for more details.
|
|
Note that you should install the dependency for your search algorithm manually.
|
|
|
|
Take bayesian optimization as an instance. You need to first install the dependency with
|
|
|
|
```bash
|
|
pip install bayesian-optimization
|
|
```
|
|
|
|
And pass the search algorithm name to `search_alg` in `AutoEstimator.fit`.
|
|
```python
|
|
from bigdl.orca.automl import hp
|
|
|
|
search_space = {
|
|
"width": hp.uniform(0, 20),
|
|
"height": hp.uniform(-100, 100)
|
|
}
|
|
|
|
auto_estimator.fit(
|
|
data,
|
|
search_space=search_space,
|
|
metric="mean_loss",
|
|
mode="min",
|
|
search_alg="bayesopt",
|
|
)
|
|
```
|
|
See [API Doc](https://bigdl.readthedocs.io/en/latest/doc/PythonAPI/AutoML/automl.html#orca-automl-auto-estimator) for more details.
|
|
|
|
### **5. Scheduler**
|
|
*Scheduler* can stop/pause/tweak the hyper-parameters of running trials, making the hyper-parameter tuning process much efficient.
|
|
|
|
We support all *Schedulers* in [Ray Tune](https://docs.ray.io/en/master/index.html). See [Ray Tune Schedulers](https://docs.ray.io/en/master/tune/api_docs/schedulers.html#schedulers-ref) for more details.
|
|
|
|
User can pass the *Scheduler* name to `scheduler` in `AutoEstimator.fit`. The *Scheduler* names supported are "fifo", "hyperband", "async_hyperband", "median_stopping_rule", "hb_bohb", "pbt", "pbt_replay".
|
|
The default `scheduler` is "fifo", which just runs trials in submission order.
|
|
|
|
See examples below about how to use *Scheduler* in `AutoEstimator`.
|
|
```python
|
|
scheduler_params = dict(
|
|
max_t=50,
|
|
grace_period=1,
|
|
reduction_factor=3,
|
|
brackets=3,
|
|
)
|
|
|
|
auto_estimator.fit(
|
|
data,
|
|
search_space=search_space,
|
|
metric="mean_loss",
|
|
mode="min",
|
|
search_alg="skopt",
|
|
scheduler = "AsyncHyperBand",
|
|
scheduler_params=scheduler_params
|
|
)
|
|
```
|
|
*Scheduler* shares the same parameters as ray tune schedulers.
|
|
And `scheduler_params` are extra parameters for `scheduler` other than `metric` and `mode`.
|