Use AutoXGBoost to auto-tune XGBoost parameters

Run in Google Colab View source on GitHub

In this guide we will describe how to use Orca AutoXGBoost for automated xgboost tuning

Orca AutoXGBoost enables distributed automated hyper-parameter tuning for XGBoost, which includes AutoXGBRegressor and AutoXGBClassifier for sklearnXGBRegressor and XGBClassifier respectively. See more about xgboost scikit-learn API.

Step 0: Prepare Environment

Conda is needed to prepare the Python environment for running this example. Please refer to the install guide for more details.

conda create -n zoo python=3.7 # zoo is conda environment name, you can use any name you like.
conda activate zoo
pip install analytics-zoo[ray]
pip install torch==1.7.1 torchvision==0.8.2

Step 1: Init Orca Context

from zoo.orca import init_orca_context, stop_orca_context

if cluster_mode == "local":
    init_orca_context(cores=6, memory="2g", init_ray_on_spark=True) # run in local mode
elif cluster_mode == "k8s":
    init_orca_context(cluster_mode="k8s", num_nodes=2, cores=4, init_ray_on_spark=True) # run on K8s cluster
elif cluster_mode == "yarn":
    init_orca_context(
      cluster_mode="yarn-client", cores=4, num_nodes=2, memory="2g", init_ray_on_spark=True, 
      driver_memory="10g", driver_cores=1) # run on Hadoop YARN cluster

This is the only place where you need to specify local or distributed mode. View Orca Context for more details.

Note: You should export HADOOP_CONF_DIR=/path/to/hadoop/conf/dir when running on Hadoop YARN cluster. View Hadoop User Guide for more details.

Step 2: Define Search space

You should define a dictionary as your hyper-parameter search space.

The keys are hyper-parameter names you want to search for XGBRegressor, and you can specify how you want to sample each hyper-parameter in the values of the search space. See automl.hp for more details.

from zoo.orca.automl import hp

search_space = {
    "n_estimators": hp.grid_search([50, 100, 200]),
    "max_depth": hp.choice([2, 4, 6]),
}

Step 3: Automatically fit and search with Orca AutoXGBoost

First create an AutoXGBRegressor.

from zoo.orca.automl.xgboost import AutoXGBRegressor

auto_xgb_reg = AutoXGBRegressor(cpus_per_trial=2, 
                                name="auto_xgb_classifier",
                                min_child_weight=3,
                                random_state=2)

Next, use the AutoXGBRegressor to fit and search for the best hyper-parameter set.

auto_xgb_reg.fit(data=(X_train, y_train),
                 validation_data=(X_test, y_test),
                 search_space=search_space,
                 n_sampling=2,
                 metric="rmse")

Step 4: Get best model and hyper parameters

You can get the best learned model and the best hyper-parameter set for further deployment. The best model is an sklearn XGBRegressor instance.

best_model = auto_xgb_reg.get_best_model()
best_config = auto_xgb_reg.get_best_config()

Note: You should call stop_orca_context() when your application finishes.

3.9 KiB Raw Blame History