This topic describes the gradient boosting regression tree (GBRT) algorithm.
Overview
The GBRT algorithm is a member of the boosting family. It uses the forward distribution algorithm, but the weak learner is limited to the CART regression tree model. The philosophy of the forward distribution algorithm: An appropriate decision tree function is selected based on the current model and fitting function to minimize the loss function.
- Regression tree(RT): one of the decision tree categories and is used to predict actual values. GBRT is an iterative regression tree algorithm that consists of multiple regression trees. The conclusions of all regression trees accumulated to obtain the final result.
- Gradient boosting(GB): The final result is determined by iterating multiple trees. Each tree is the conclusion and residual of the tree before learning.
Scenarios
GBRT is a regression model that is primarily used to fit values.
GBRT can be applied to epidemiology. For example, the early evidence of human mortality and morbidity comes from observational studies of regression analysis. If mortality (or morbidity) is the y_cols
variables to be fitted in a regression model, socioeconomic status, education, and income can be used as dependent variables.
Parameters
The parameters in the following table are the values of the model_parameter
parameters in the CREATE MODEL
statement for creating a model. You can select the values based on your needs.
Parameter | Description |
---|---|
n_estimators | The number of iterations. A higher number of iterations indicates a better fitting. It is usually a positive integer. The default value is 100. |
objective | The learning task and its learning objectives. Default value: ls. Valid values:
|
max_depth | The maximum depth of the tree. Default value: 7. Note If this parameter is set to -1, the depth of the tree is not specified. However, to prevent overfitting, we recommend that you set this parameter appropriately. |
random_state | The random state. This parameter is usually a positive integer. Default value: 1. |
Examples
/*polar4ai*/
CREATE MODEL gbrt1 WITH
( model_class = 'gbrt', x_cols = 'dx1,dx2', y_cols='y',
model_parameter=(objective='ls')) AS (select * from db4ai.testdata1)
/*polar4ai*/select dx1,dx2 FROM
PREDICT(MODEL gbrt1, select * from db4ai.testdata1 limit 10)
WITH (x_cols = 'dx1,dx2', y_cols='')
x_cols
and y_cols
must use floating-point or integer data.