This topic describes the Add ID Column component provided by Machine Learning Studio. This component allows you to append an ID column to the first column of a table.

Background information

The Add ID Column component can be used for the tables that have a maximum of 1,000,000,000 x 1,023 rows.

Configure the component

You can use one of the following methods to configure the Add ID Column component.

Method 1: Configure the component on the pipeline page

Configure the component parameters on the pipeline page of Machine Learning Designer.
TabParameterDescription
Parameters SettingAll Selected by DefaultBy default, all columns in the input table are selected. Specific columns may not be used for training. These columns do not affect the prediction result.
ID ColumnThe default value of this parameter is append_id.
TuningCoresThe number of cores.
Memory Size per CoreThe memory size of each core. Unit: MB. Valid values: (1,65536).

Method 2: Use PAI commands

Configure the component parameters by using PAI commands. You can use the SQL Script component to call PAI commands. For more information, see SQL Script.
PAI -name AppendId
    -project algo_public
    -DinputTableName=maple_test_appendid_basic_input
    -DoutputTableName=maple_test_appendid_basic_output;
ParameterRequiredDescriptionDefault value
inputTableNameYesThe name of the input table. No default value
selectedColNamesNoThe columns that are selected from the input table for training. The column names must be separated by commas (,). Columns of the INT and DOUBLE types are supported. If the input data is in the sparse format, columns of the STRING type are supported. All columns
inputTablePartitionsNoThe partitions that are selected from the input table for training. The following formats are supported:
  • Partition_name=value
  • name1=value1/name2=value2: multi-level partitions
Note If you specify multiple partitions, separate them with commas (,).
All partitions
outputTableNameYesThe name of the output table. No default value
IDColNameNoThe name of the added ID column. append_id
lifecycleNoThe lifecycle of the output table. No default value
coreNumNoThe number of cores. Determined by the system
memSizePerCoreNoThe memory size of each core. Unit: MB. Valid values: (1,65536). Determined by the system

Example

PAI -name AppendId
    -project algo_public
    -DinputTableName=maple_test_appendid_basic_input
    -DoutputTableName=maple_test_appendid_basic_output;
  • Input data
    col0col1col2col3col4
    100.0aaaaThu Oct 01 00:00:00 CST 2015true
    111.0aaaaThu Oct 01 00:00:00 CST 2015false
    122.0aaaaThu Oct 01 00:00:00 CST 2015true
    133.0aaaaThu Oct 01 00:00:00 CST 2015true
    144.0aaaaThu Oct 01 00:00:00 CST 2015true
  • Output table
    append_idcol0col1col2col3col4
    0100.0aaaaThu Oct 01 00:00:00 CST 2015true
    1111.0aaaaThu Oct 01 00:00:00 CST 2015false
    2122.0aaaaThu Oct 01 00:00:00 CST 2015true
    3133.0aaaaThu Oct 01 00:00:00 CST 2015true
    4144.0aaaaThu Oct 01 00:00:00 CST 2015true