This topic describes the Data Type Conversion component provided by Machine Learning Studio. You can use the Data Type Conversion component to convert features of all data types into features of the STRING, DOUBLE, or INT data type. This component also allows you to replace missing values if exceptions occur during data type conversion.

Background information

  • You can convert the data types of table fields.
  • You can convert multiple data types of table fields at the same time.
  • You can select whether to reserve original columns.

Configure the component

You can configure the component by using one of the following methods:
  • Machine Learning Platform for AI (PAI) console
    Tab Parameter Description
    Fields Setting Convert to Double Type Columns The columns whose data types need to be converted into the DOUBLE data type.
    Default Imputed Value When Conversion Fails The default value that is imputed when conversion to the DOUBLE data type fails.
    Convert to Int Type Columns The columns whose data types need to be converted into the INT data type.
    Default Imputed Value When Conversion Fails The default value that is imputed when conversion to the INT data type fails.
    Convert to String Type Columns The columns whose data types need to be converted into the STRING data type.
    Default Imputed Value When Conversion Fails The default value that is imputed when conversion to the STRING data type fails.
    Reserve Original Columns Specifies whether to reserve original columns. Add prefix typed_ to column names.
    Memory Size per Node The memory size of each core. Unit: MB. Valid values: 1024 to 64 × 1024.
    Cores The number of cores used in computing. This parameter is used with Memory Size per Node. Valid values: [1,9999].
  • PAI command
    pai -project algo_public
        -name type_transform_v1
        -DinputTable=type_test
        -Dcols_to_string="f0"
        -Ddefault_double_value=0.0
        -DoutputTable=type_test_output;
    Parameter Required Description Default value
    inputTable Yes The name of the input table. No default value
    inputTablePartitions No The partitions selected from the input table for training. Specify this parameter in one of the following formats:
    • Partition_name=value
    • name1=value1/name2=value2: multi-level partitions
    Note If you specify multiple partitions, separate them with commas (,).
    All partitions
    outputTable Yes The output table of data type conversion. No default value
    reserveOldFeat No Specifies whether to reserve original columns. No default value
    cols_to_double No The feature columns whose data types need to be converted into DOUBLE. No default value
    cols_to_string No The feature columns whose data types need to be converted into STRING. No default value
    cols_to_int No The feature columns whose data types need to be converted into INT. No default value
    default_int_value No The value that is imputed when cols_to_int is not specified. 0
    default_double_value No The value that is imputed when cols_to_double is not specified. 0.0
    default_string_value No The value that is imputed when cols_to_string is not specified. ""
    coreNum No The number of cores. This parameter is used with memSizePerCore. Valid values: [1,9999]. Automatically allocated
    memSizePerCore No The memory size of each core. Unit: MB. Valid values: [1024,64 × 1024]. Automatically allocated
    lifecycle No The lifecycle of the output table. 7

Example

  • Test data
    create table transform_test as
    select * from
    (
    select true as f0,2.0 as f1,1 as f2 from dual union all
    select false as f0,3.0 as f1,1 as f2 from dual union all
    select false as f0,4.0 as f1,1 as f2 from dual union all
    select true as f0,3.0 as f1,1 as f2 from dual union all
    select false as f0,3.0 as f1,1 as f2 from dual union all
    select false as f0,4.0 as f1,1 as f2 from dual union all
    select true as f0,3.0 as f1,1 as f2 from dual union all
    select false as f0,5.0 as f1,1 as f2 from dual union all
    select false as f0,3.0 as f1,1 as f2 from dual union all
    select true as f0,4.0 as f1,1 as f2 from dual union all
    select false as f0,3.0 as f1,1 as f2 from dual union all
    select true as f0,4.0 as f1,1 as f2 from dual
    )tmp;
  • Training data
    f0 f1 f2
    false 3.0 1
    false 3.0 1
    true 2.0 1
    true 4.0 1
    false 4.0 1
    false 3.0 1
    false 3.0 1
    true 3.0 1
    false 4.0 1
    true 4.0 1
    false 5.0 1
    true 3.0 1
  • PAI command for training
    pai -project projectxlib4
        -name type_transform_v1
        -DinputTable=transform_test
        -Dcols_to_double=f0
        -Dcols_to_int=f1
        -Dcols_to_string=f2
        -DoutputTable=trans_test_output;
  • Output
    Output table
    f0 f1 f2
    0.0 3 1
    0.0 3 1
    1.0 2 1
    1.0 4 1
    0.0 4 1
    0.0 3 1
    1.0 3 1
    0.0 4 1
    0.0 3 1
    0.0 5 1
    1.0 3 1
    1.0 4 1