All Products
Search
Document Center

Platform For AI:Random Sampling

Last Updated:Jun 14, 2023

The Random Sampling component randomly samples the input data. You can specify the proportion or number of samples. The samples are independent of each other.

Configure the component

You can use one of the following methods to configure the Random Sampling component.

Method 1: Configure the component on the pipeline page

Configure the component parameters on the pipeline page of Machine Learning Designer.

Tab

Parameter

Description

Parameters Setting

Sample Size

The value must be a positive integer.

Sampling Fraction

The value must be a floating-point number. Valid values: (0,1).

Sampling with Replacement

By default, this check box is not selected. If you select this check box, sampling with replacement is enabled.

Random Seed

By default, the system determines the value.

Tuning

Cores

The value must be a positive integer. By default, the system determines the value.

Memory Size per Core

The value must be a positive integer. Unit: MB. Valid values: (1,65536). By default, the system determines the value.

Method 2: Use PAI commands

Configure the component parameters by using PAI commands. You can use the SQL Script component to call PAI commands. For more information, see SQL Script.

PAI -name RandomSample \
    -project algo_public \
    -Dlifecycle="28" \
    -DoutputTableName="test2" \
    -Dreplace="false" \
    -DsampleSize="500" \
    -DinputPartitions="pt=20150501" \
    -DinputTableName="bank_data_partition";

Parameter

Required

Description

Default value

inputTableName

Yes

The name of the input table.

No default value

inputTablePartitions

No

The partitions that are selected from the input table for training. The following formats are supported:

  • Partition_name=value

  • name1=value1/name2=value2: multi-level partitions

Note

Separate multiple partitions with commas (,)

No default value

outputTableName

Yes

The name of the output table.

No default value

sampleSize

No

The number of samples.

Note
  • If both the sampleSize and sampleRatio parameters are empty, an error is returned.

  • If both the sampleSize and sampleRatio parameters are specified, the sampleSize parameter takes precedence.

No default value

sampleRatio

No

The sampling proportion. The value must be a floating-point number. Valid values: (0,1).

No default value

replace

No

Specifies whether to enable sampling with replacement. The value must be of the BOOLEAN type.

false

randomSeed

No

The random seed. The value must be a positive integer.

Determined by the system

lifecycle

No

The lifecycle of the output table. Valid values: [1,3650].

No default value

coreNum

No

The number of cores used in computing. The value must be a positive integer.

Determined by the system

memSizePerCore

No

The memory size of each core. Valid values: (1,65536). Unit: MB.

Determined by the system