This topic describes the gradient boosting regression tree (GBRT) algorithm.

Overview

The GBRT algorithm is a member of the boosting family. It uses the forward distribution algorithm, but the weak learner is limited to the CART regression tree model. The philosophy of the forward distribution algorithm: An appropriate decision tree function is selected based on the current model and fitting function to minimize the loss function.

GBRT consists of the following parts:
  • Regression tree(RT): one of the decision tree categories and is used to predict actual values. GBRT is an iterative regression tree algorithm that consists of multiple regression trees. The conclusions of all regression trees accumulated to obtain the final result.
  • Gradient boosting(GB): The final result is determined by iterating multiple trees. Each tree is the conclusion and residual of the tree before learning.

Scenarios

GBRT is a regression model that is primarily used to fit values.

GBRT can be applied to epidemiology. For example, the early evidence of human mortality and morbidity comes from observational studies of regression analysis. If mortality (or morbidity) is the y_cols variables to be fitted in a regression model, socioeconomic status, education, and income can be used as dependent variables.

Parameters

The parameters in the following table are the values of the model_parameter parameters in the CREATE MODEL statement for creating a model. You can select the values based on your needs.

ParameterDescription
n_estimatorsThe number of iterations. A higher number of iterations indicates a better fitting. It is usually a positive integer. The default value is 100.
objectiveThe learning task and its learning objectives. Default value: ls. Valid values:
  • ls: least-squares.
  • lad: least absolute deviation.
  • huber: combines least-squares and least absolute deviation.
max_depthThe maximum depth of the tree. Default value: 7.
Note If this parameter is set to -1, the depth of the tree is not specified. However, to prevent overfitting, we recommend that you set this parameter appropriately.
random_stateThe random state. This parameter is usually a positive integer. Default value: 1.

Examples

Create a model and an offline training task.
/*polar4ai*/
CREATE MODEL gbrt1 WITH
( model_class = 'gbrt', x_cols = 'dx1,dx2', y_cols='y',
 model_parameter=(objective='ls')) AS (select * from db4ai.testdata1)
Use the model for prediction.
/*polar4ai*/select dx1,dx2 FROM
PREDICT(MODEL gbrt1, select * from db4ai.testdata1 limit 10)
WITH (x_cols = 'dx1,dx2', y_cols='')
Note The columns in x_cols and y_cols must use floating-point or integer data.