3.9 KiB
		
	
	
	
	
	
	
	
			
		
		
	
	Use AutoXGBoost to auto-tune XGBoost parameters
Run in Google Colab  
View source on GitHub
In this guide we will describe how to use Orca AutoXGBoost for automated xgboost tuning
Orca AutoXGBoost enables distributed automated hyper-parameter tuning for XGBoost, which includes AutoXGBRegressor and AutoXGBClassifier for sklearnXGBRegressor and XGBClassifier respectively. See more about xgboost scikit-learn API.
Step 0: Prepare Environment
Conda is needed to prepare the Python environment for running this example. Please refer to the install guide for more details.
conda create -n zoo python=3.7 # zoo is conda environment name, you can use any name you like.
conda activate zoo
pip install analytics-zoo[ray]
pip install torch==1.7.1 torchvision==0.8.2
Step 1: Init Orca Context
from zoo.orca import init_orca_context, stop_orca_context
if cluster_mode == "local":
    init_orca_context(cores=6, memory="2g", init_ray_on_spark=True) # run in local mode
elif cluster_mode == "k8s":
    init_orca_context(cluster_mode="k8s", num_nodes=2, cores=4, init_ray_on_spark=True) # run on K8s cluster
elif cluster_mode == "yarn":
    init_orca_context(
      cluster_mode="yarn-client", cores=4, num_nodes=2, memory="2g", init_ray_on_spark=True, 
      driver_memory="10g", driver_cores=1) # run on Hadoop YARN cluster
This is the only place where you need to specify local or distributed mode. View Orca Context for more details.
Note: You should export HADOOP_CONF_DIR=/path/to/hadoop/conf/dir when running on Hadoop YARN cluster. View Hadoop User Guide for more details.
Step 2: Define Search space
You should define a dictionary as your hyper-parameter search space.
The keys are hyper-parameter names you want to search for XGBRegressor, and you can specify how you want to sample each hyper-parameter in the values of the search space. See automl.hp for more details.
from zoo.orca.automl import hp
search_space = {
    "n_estimators": hp.grid_search([50, 100, 200]),
    "max_depth": hp.choice([2, 4, 6]),
}
Step 3: Automatically fit and search with Orca AutoXGBoost
First create an AutoXGBRegressor.
from zoo.orca.automl.xgboost import AutoXGBRegressor
auto_xgb_reg = AutoXGBRegressor(cpus_per_trial=2, 
                                name="auto_xgb_classifier",
                                min_child_weight=3,
                                random_state=2)
Next, use the AutoXGBRegressor to fit and search for the best hyper-parameter set.
auto_xgb_reg.fit(data=(X_train, y_train),
                 validation_data=(X_test, y_test),
                 search_space=search_space,
                 n_sampling=2,
                 metric="rmse")
Step 4: Get best model and hyper parameters
You can get the best learned model and the best hyper-parameter set for further deployment. The best model is an sklearn XGBRegressor instance.
best_model = auto_xgb_reg.get_best_model()
best_config = auto_xgb_reg.get_best_config()
Note: You should call stop_orca_context() when your application finishes.