All Products
Search
Document Center

:XGBoost gbtree

Last Updated:Jul 25, 2025

Extreme Gradient Boosting (XGBoost) is an efficient gradient boosting tree algorithm and is widely used for classification and regression. The XGBoost gbtree model uses decision trees as base learners. This topic describes how to use a real estate dataset to train the XGBoost gbtree model and use the trained XGBoost gbtree model to predict the housing price.

Prepare data

For example, the Boston dataset is used as the training dataset. The following table describes the data structure of the dataset.

Column name

Description

Data type

crim

The urban crime rate per capita.

FLOAT

zn

The proportion of residential properties that exceed 25,000 square feet.

FLOAT

indus

The proportion of non-retail commercial properties per town.

FLOAT

chas

Specifies whether a river channel is nearby.

INT

nox

The nitrogen oxide concentration (one in ten million).

FLOAT

rm

The average number of rooms per residential building.

FLOAT

age

The proportion of principal residences built before the year of 1940.

FLOAT

dis

The average distance to five employment centers.

FLOAT

rad

The distance to the expressway.

INT

tax

The total property tax rate per USD 10,000.

INT

ptratio

The proportion of teachers and students per town.

FLOAT

b

The proportion of African-American urban residents.

FLOAT

lstat

The proportion of residents that have low income.

FLOAT

medv

The average value of principal residences.

FLOAT

Train a model

Use the boston.train table to train a model and the boston.test table to predict data. During model training, RDS SQLFlow automatically classifies training datasets to generate a training dataset and a validation dataset.

  1. Build MySQL-based rds_sqlflow.

  2. Connect to rds_sqlflow from the RDS SQLFlow client.

  3. In the RDS SQLFlow command-line interface of the RDS SQLFlow client, execute the following SQL statements to train a model:

    SELECT * FROM boston.train
    TO TRAIN xgboost.gbtree
    WITH
        objective="reg:squarederror",
        train.num_boost_round = 30
    COLUMN crim, zn, indus, chas, nox, rm, age, dis, rad, tax, ptratio, b, lstat
    LABEL medv
    INTO sqlflow_models.my_xgb_regression_model;

    The SELECT statement extracts training data from the boston.train table. The TO TRAIN statement specifies the model to be used, which is xgboost.gbtree. The WITH statement configures training parameters. The COLUMN statement specifies feature columns. The LABEL statement specifies the target column. Finally, the INTO statement defines the storage location of the trained model. For more information, see Training syntax.

Use a model to predict data

  1. After the xgboost.gbtree model is trained, use the model and execute the following SQL statements to predict housing prices:

    SELECT * FROM boston.test
    TO PREDICT boston.predict.medv
    USING sqlflow_models.my_xgb_regression_model;

    The SELECT statement specifies the dataset required for prediction. The TO PREDICT statement specifies the table that stores the prediction results. The USING statement specifies the model to be used. For more information, see Prediction syntax.

  2. After the prediction is complete, execute the following SQL statement to view the prediction results:

    SELECT * FROM boston.predict;

References

Introduction to RDS Custom