Kai Huang 189a2f5179 Update Orca How-to index (#7137 )

* update index

* fix style

* fix ray

* fix

* update

2022-12-30 17:11:11 +08:00

3.7 KiB

Raw Blame History

Enable AutoML for XGBoost

Run in Google Colab View source on GitHub

In this guide we will describe how to use Orca AutoXGBoost for automated xgboost tuning

Orca AutoXGBoost enables distributed automated hyper-parameter tuning for XGBoost, which includes AutoXGBRegressor and AutoXGBClassifier for sklearnXGBRegressor and XGBClassifier respectively. See more about xgboost scikit-learn API.

Step 0: Prepare Environment

Conda is needed to prepare the Python environment for running this example. Please refer to the install guide for more details.

Step 1: Init Orca Context

from bigdl.orca import init_orca_context, stop_orca_context

if cluster_mode == "local":
    init_orca_context(cores=6, memory="2g", init_ray_on_spark=True) # run in local mode
elif cluster_mode == "k8s":
    init_orca_context(cluster_mode="k8s", num_nodes=2, cores=4, init_ray_on_spark=True) # run on K8s cluster
elif cluster_mode == "yarn":
    init_orca_context(
      cluster_mode="yarn-client", cores=4, num_nodes=2, memory="2g", init_ray_on_spark=True, 
      driver_memory="10g", driver_cores=1) # run on Hadoop YARN cluster

This is the only place where you need to specify local or distributed mode. View Orca Context for more details.

Note: You should export HADOOP_CONF_DIR=/path/to/hadoop/conf/dir when running on Hadoop YARN cluster. View Hadoop User Guide for more details.

Step 2: Define Search space

You should define a dictionary as your hyper-parameter search space.

The keys are hyper-parameter names you want to search for XGBRegressor, and you can specify how you want to sample each hyper-parameter in the values of the search space. See automl.hp for more details.

from bigdl.orca.automl import hp

search_space = {
    "n_estimators": hp.grid_search([50, 100, 200]),
    "max_depth": hp.choice([2, 4, 6]),
}

Step 3: Automatically fit and search with Orca AutoXGBoost

First create an AutoXGBRegressor.

from bigdl.orca.automl.xgboost import AutoXGBRegressor

auto_xgb_reg = AutoXGBRegressor(cpus_per_trial=2, 
                                name="auto_xgb_classifier",
                                min_child_weight=3,
                                random_state=2)

Next, use the AutoXGBRegressor to fit and search for the best hyper-parameter set.

auto_xgb_reg.fit(data=(X_train, y_train),
                 validation_data=(X_test, y_test),
                 search_space=search_space,
                 n_sampling=2,
                 metric="rmse")

Step 4: Get best model and hyper parameters

You can get the best learned model and the best hyper-parameter set for further deployment. The best model is an sklearn XGBRegressor instance.

best_model = auto_xgb_reg.get_best_model()
best_config = auto_xgb_reg.get_best_config()

Note: You should call stop_orca_context() when your application finishes.

3.7 KiB Raw Blame History