All Products
Search
Document Center

Platform For AI:Random forest

Last Updated:Apr 02, 2026

A random forest is a classifier that contains multiple decision trees. The classification result is determined by the mode of the output classes from the individual trees.

Component configuration

You can configure the parameters for the random forest component in one of the following ways.

Method 1: Use the GUI

You can configure the component parameters on the Designer workflow page.

Tab

Parameter

Description

Field Settings

Feature Columns

By default, all columns are selected except for the label column and weight column.

Excluded Columns

The columns that are not used for training. This parameter cannot be used with Feature Columns.

Forced Conversion Columns

The columns are parsed based on the following rules:

  • Columns of the STRING, BOOLEAN, and DATETIME types are parsed as discrete types.

  • Columns of the DOUBLE and BIGINT types are parsed as continuous types.

Note

To parse a BIGINT column as CATEGORICAL, you must use the forceCategorical parameter to specify the type.

Weight Column Name

The column used to weight each sample row. Only numeric types are supported.

Label Column

The label column in the input table. STRING and numeric types are supported.

Parameter Settings

Number of Trees in the Forest

The value must be an integer from 1 to 1,000.

Position of an Individual Tree in the Forest

If the number of trees is N and algorithmTypes=[a,b], then:

  • The range [0,a) corresponds to the ID3 algorithm.

  • The range [a,b) corresponds to the CART algorithm.

  • The range [b,n] corresponds to the C4.5 algorithm.

For example, in a five-tree forest, if you set this parameter to [2,4], tree 1 uses the ID3 algorithm, trees 2 and 3 use the CART algorithm, and tree 4 uses the C4.5 algorithm. If you enter None, the algorithms are evenly distributed among the trees in the forest.

Number of Random Features for a Single Tree

The value must be in the range of [1,N], where N is the number of features.

Minimum Number of Records on a Leaf Node

A positive integer. The default value is 2.

Minimum Ratio of Records on a Leaf Node to Its Parent Node

The value must be in the range of [0,1]. The default value is 0.

Maximum Depth of a Single Tree

The value must be in the range of [1,+∞). The default value is infinity.

Number of Random Records for a Single Tree

The value must be in the range of (1000,1000000]. The default value is 100,000.

Method 2: Use a PAI command

You can configure the component parameters using a PAI command. You can use the SQL script component to run PAI commands. For more information, see the SQL script topic.

 PAI -name randomforests
     -project algo_public
     -DinputTableName="pai_rf_test_input"
     -DmodelName="pai_rf_test_model"
     -DforceCategorical="f1"
     -DlabelColName="class"
     -DfeatureColNames="f0,f1"
     -DmaxRecordSize="100000"
     -DminNumPer="0"
     -DminNumObj="2"
     -DtreeNum="3";

Parameter

Required

Description

Default value

inputTableName

Yes

The input table.

None

inputTablePartitions

No

The partitions in the input table that are used for training. The following formats are supported:

  • Partition_name=value

  • name1=value1/name2=value2: a multi-level format

Note

If you specify multiple partitions, separate them with commas (,).

All partitions

labelColName

Yes

The name of the label column in the input table.

None

modelName

Yes

The name of the output model.

None

treeNum

Yes

The number of trees in the forest. The value must be an integer from 1 to 1000.

100

excludedColNames

No

The columns that are not used for training. This parameter cannot be used with featureColNames.

Empty

weightColName

No

The name of the weight column in the input table.

None

featureColNames

No

The names of the feature columns in the input table that are used for training.

All columns except for the ones specified by labelColName and weightColName.

forceCategorical

No

The following parsing rules apply:

  • Columns of the STRING, BOOLEAN, and DATETIME types are parsed as discrete types.

  • Columns of the DOUBLE and BIGINT types are parsed as continuous types.

Note

To parse a BIGINT column as CATEGORICAL, you must use the forceCategorical parameter to specify the type.

INT is parsed as a continuous type.

algorithmTypes

No

The position of the algorithm for a single tree in the forest. If a forest has N trees and algorithmTypes=[a,b] is specified:

  • [0,a) is the ID3 algorithm.

  • [a,b) is the CART algorithm.

  • [b,n] specifies the C4.5 algorithm.

For example, in a forest that has five trees, if you set this parameter to [2,4], tree 1 uses the ID3 algorithm, trees 2 and 3 use the CART algorithm, and tree 4 uses the C4.5 algorithm. If you enter None, the algorithms are evenly distributed in the forest.

The algorithms are evenly distributed in the forest.

randomColNum

No

The number of random features selected for each split when a single tree is generated. The value must be in the range of [1,N], where N is the number of features.

log 2N

minNumObj

No

The minimum number of records on a leaf node. The value must be a positive integer.

2

minNumPer

No

The minimum ratio of records on a leaf node to its parent node. The value must be in the range of [0,1].

0.0

maxTreeDeep

No

The maximum depth of a single tree. The value must be in the range of [1,+∞).

infinity

maxRecordSize

No

The number of random records for a single tree. The value must be in the range of (1000,1000000].

100000

Examples

  1. Use an SQL statement to generate training data.

    create table pai_rf_test_input as
    select * from
    (
      select 1 as f0,2 as f1, "good" as class
      union all
      select 1 as f0,3 as f1, "good" as class
      union all
      select 1 as f0,4 as f1, "bad" as class
      union all
      select 0 as f0,3 as f1, "good" as class
      union all
      select 0 as f0,4 as f1, "bad" as class
    )tmp;
  2. Submit the parameters for the random forest component using a PAI command.

    PAI -name randomforests
         -project algo_public
         -DinputTableName="pai_rf_test_input"
         -Dmodelname="pai_rf_test_model"
         -DforceCategorical="f1"
         -DlabelColName="class"
         -DfeatureColNames="f0,f1"
         -DmaxRecordSize="100000"
         -DminNumPer="0"
         -DminNumObj="2"
         -DtreeNum="3";
  3. View the Predictive Model Markup Language (PMML) of the model.

    <?xml version="1.0" encoding="utf-8"?>
    <PMML xmlns="http://www.dmg.org/PMML-4_2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" version="4.2" xsi:schemaLocation="http://www.dmg.org/PMML-4_2 http://www.dmg.org/v4-2/pmml-4-2.xsd">
      <Header copyright="Copyright (c) 2014, Alibaba Inc." description="">
        <Application name="ODPS/PMML" version="0.1.0"/>
        <Timestamp>Tue, 12 Jul 2016 07:04:48 GMT</Timestamp>
      </Header>
      <DataDictionary numberOfFields="2">
        <DataField name="f0" optype="continuous" dataType="integer"/>
        <DataField name="f1" optype="continuous" dataType="integer"/>
        <DataField name="class" optype="categorical" dataType="string">
          <Value value="bad"/>
          <Value value="good"/>
        </DataField>
      </DataDictionary>
      <MiningModel modelName="xlab_m_random_forests_1_75078_v0" functionName="classification" algorithmName="RandomForests"/>
        <MiningSchema>
          <MiningField name="f0" usageType="active"/>
          <MiningField name="f1" usageType="active"/>
          <MiningField name="class" usageType="target"/>
        </MiningSchema>
        <Segmentation multipleModelMethod="majorityVote">
          <Segment id="0">
            <True/>
            <TreeModel modelName="xlab_m_random_forests_1_75078_v0" functionName="classification" algorithmName="RandomForests">
              <MiningSchema>
                <MiningField name="f0" usageType="active"/>
                <MiningField name="f1" usageType="active"/>
                <MiningField name="class" usageType="target"/>
              </MiningSchema>
              <Node id="1">
                <True/>
                <ScoreDistribution value="bad" recordCount="2"/>
                <ScoreDistribution value="good" recordCount="3"/>
                <Node id="2" score="good">
                  <SimplePredicate field="f1" operator="equal" value="2"/>
                  <ScoreDistribution value="good" recordCount="1"/>
                </Node>
                <Node id="3" score="good">
                  <SimplePredicate field="f1" operator="equal" value="3"/>
                  <ScoreDistribution value="good" recordCount="2"/>
                </Node>
                <Node id="4" score="bad"
                  <SimplePredicate field="f1" operator="equal" value="4"/>
                  <ScoreDistribution value="bad" recordCount="2"/>
                </Node>
              </Node>
            </TreeModel>
          </Segment>
          <Segment id="1">
            <True/>
            <TreeModel modelName="xlab_m_random_forests_1_75078_v0" functionName="classification" algorithmName="RandomForests">
              <MiningSchema>
                <MiningField name="f0" usageType="active"/>
                <MiningField name="f1" usageType="active"/>
                <MiningField name="class" usageType="target"/>
              </MiningSchema>
              <Node id="1">
                <True/>
                <ScoreDistribution value="bad" recordCount="2"/>
                <ScoreDistribution value="good" recordCount="3"/>
                <Node id="2" score="good">
                  <SimpleSetPredicate field="f1" booleanOperator="isIn">
                    <Array n="2" type="integer"2 3</Array>
                  </SimpleSetPredicate>
                  <ScoreDistribution value="good" recordCount="3"/>
                </Node>
                <Node id="3" score="bad">
                  <SimpleSetPredicate field="f1" booleanOperator="isNotIn">
                    <Array n="2" type="integer"2 3</Array>
                  </SimpleSetPredicate>
                  <ScoreDistribution value="bad" recordCount="2"/>
                </Node>
              </Node>
            </TreeModel>
          </Segment>
          <Segment id="2">
            <True/>
            <TreeModel modelName="xlab_m_random_forests_1_75078_v0" functionName="classification" algorithmName="RandomForests">
              <MiningSchema>
                <MiningField name="f0" usageType="active"/>
                <MiningField name="f1" usageType="active"/>
                <MiningField name="class" usageType="target"/>
              </MiningSchema>
              <Node id="1">
                <True/>
                <ScoreDistribution value="bad" recordCount="2"/>
                <ScoreDistribution value="good" recordCount="3"/>
                <Node id="2" score="bad">
                  <SimplePredicate field="f0" operator="lessOrEqual" value="0.5"/>
                  <ScoreDistribution value="bad" recordCount="1"/>
                  <ScoreDistribution value="good" recordCount="1"/>
                </Node>
                <Node id="3" score="good">
                  <SimplePredicate field="f0" operator="greaterThan" value="0.5"/>
                  <ScoreDistribution value="bad" recordCount="1"/>
                  <ScoreDistribution value="good" recordCount="2"/>
                </Node>
              </Node>
            </TreeModel>
          </Segment>
        </Segmentation>
      </MiningModel>
    </PMML>
  4. View the visual output of the model.Visual output of the random forest model