All Products
Search
Document Center

PolarDB:Random forest regression algorithm

Last Updated:Mar 28, 2026

Random forest regression is a PolarDB AI (Polar4AI) algorithm for predicting continuous numerical values. It builds multiple unrelated decision trees in parallel by randomly sampling data and features, then averages their outputs to produce a final prediction.

When to use this algorithm

Use random forest regression when the prediction target is a continuous number — such as a price, count, or score — and the dataset has tens of data dimensions and high accuracy requirements. For targets limited to a fixed set of categories, use a classification algorithm instead.

Example: Predict the hourly popularity of a social media topic. Input features include the number of discussion groups, participant count, and engagement level. The output is the average number of active discussion groups per hour — a positive floating-point value.

How it works

  1. Randomly select samples and features from the training data.

  2. Build multiple unrelated decision trees in parallel, each producing its own prediction.

  3. Average the predictions of all trees to get the final regression output.

Using random selection for both samples and features reduces overfitting and improves generalization compared to a single decision tree.

Input requirements

ColumnRequired typeNotes
x_cols (feature columns)Floating-point or integerAll feature columns must be numeric
y_cols (target column)Floating-point or integerThe model predicts a continuous numeric value

Parameters

Pass parameters through the model_parameter option in a CREATE MODEL statement.

ParameterTypeDefaultDescription
n_estimatorsPositive integer100Number of decision trees. A higher value improves fitting.
objectiveStringmseLoss function used during training. Valid values: mse (mean squared error) and mae (mean absolute error).
max_featuresString, integer, or float"sqrt"Maximum number of features considered at each split. See the table below for accepted values.
max_depthPositive integer or NoneNoneMaximum depth of each tree. When set to None, the depth of the tree is not specified.
n_jobsPositive integer4Number of parallel threads. A higher value speeds up model creation.
random_statePositive integer1Random seed for reproducibility.

Accepted values for max_features

ValueMaximum features used
"sqrt" (default)sqrt(n_features)
"log2"log2(n_features)
IntegerThe specified integer, between 0 and n_features (inclusive)
Floatmax_features × n_features

Examples

The following examples walk through the full workflow: create a model, evaluate it, then run predictions. All statements use the /*polar4ai*/ prefix to route the SQL to the Polar4AI engine.

Create a model

/*polar4ai*/CREATE MODEL randomforestreg1 WITH
( model_class = 'randomforestreg', x_cols = 'dx1,dx2', y_cols='y',
 model_parameter=(objective='mse')) AS (SELECT * FROM db4ai.testdata1);

Evaluate the model

Run EVALUATE to score the model against labeled data. The metrics='r2_score' option returns the R² (coefficient of determination), which measures how well the model explains variance in the target variable.

/*polar4ai*/SELECT dx1,dx2 FROM EVALUATE(MODEL randomforestreg1,
SELECT * FROM db4ai.testdata1 LIMIT 10) WITH
(x_cols = 'dx1,dx2',y_cols='y',metrics='r2_score');

Run predictions

/*polar4ai*/SELECT dx1,dx2 FROM
PREDICT(MODEL randomforestreg1, SELECT * FROM db4ai.testdata1 LIMIT 10)
WITH (x_cols = 'dx1,dx2');