This topic describes the Data Type Conversion component provided by Machine Learning Designer. You can use the Data Type Conversion component to convert features of all data types into features of the STRING, DOUBLE, or INT data type. This component also allows you to replace missing values if exceptions occur during data type conversion.

Background information

  • You can convert the data types of table fields.
  • You can convert multiple data types of table fields at the same time.
  • You can convert fields of ODPS 2.0 numeric data types, such as DECIMAL, FLOAT, and INT.
    Note This feature is available only in the China (Beijing), China (Shanghai), China (Hangzhou), China (Shenzhen), China (Zhangjiakou), and China (Chengdu) regions.
  • You can select whether to reserve original columns.

Configure the component

You can use one of the following methods to configure the component parameters.

Method 1: Using the Machine Learning Platform for AI (PAI) console

Configure the component parameters on the pipeline page of Machine Learning Designer.
TabParameterDescription
Fields SettingConvert to Double Type ColumnsThe columns whose data types need to be converted into the DOUBLE data type.
Default Imputed Value When Conversion FailsThe default value that is inputted when conversion to the DOUBLE data type fails.
Convert to Int Type ColumnsThe columns whose data types need to be converted into the INT data type.
Default Imputed Value When Conversion FailsThe default value that is inputted when conversion to the INT data type fails.
Convert to String Type ColumnsThe columns whose data types need to be converted into the STRING data type.
Default Imputed Value When Conversion FailsThe default value that is inputted when conversion to the STRING data type fails.
Reserve Original ColumnsSpecifies whether to reserve original columns. Column names are prefixed with typed_ after data type conversion.
Memory Size per NodeValid values: 1024 to 65536 (64 × 1024). Unit: MB.
CoresThe number of cores used in computing. This parameter must be used with the Memory Size per Node parameter. Valid values: [1,9999].

Method 2: Using PAI commands

Configure the component parameters by using PAI commands. You can use the SQL Script component to call PAI commands. For more information, see SQL Script.
pai -project algo_public
    -name type_transform_v1
    -DinputTable=type_test
    -Dcols_to_string="f0"
    -Ddefault_double_value=0.0
    -DoutputTable=type_test_output;
ParameterRequiredDescriptionDefault value
inputTableYesThe name of the input table. None
inputTablePartitionsNoThe partitions selected from the input table for training. Specify this parameter in one of the following formats:
  • Partition_name=value
  • name1=value1/name2=value2: multi-level partitions
Note If you specify multiple partitions, separate them with commas (,).
All partitions
outputTableYesThe output table of data type conversion. None
reserveOldFeatNoSpecifies whether to reserve original columns. None
cols_to_doubleNoThe feature columns whose data types need to be converted into DOUBLE. None
cols_to_stringNoThe feature columns whose data types need to be converted into STRING. None
cols_to_intNoThe feature columns whose data types need to be converted into INT. None
default_int_valueNoThe value that is inputted when the cols_to_int parameter is not specified. 0
default_double_valueNoThe value that is inputted when the cols_to_double parameter is not specified. 0.0
default_string_valueNoThe value that is inputted when the cols_to_string parameter is not specified. ""
coreNumNoThe number of cores. This parameter must be used with the memSizePerCore parameter. Valid values: [1,9999]. Determined by the system
memSizePerCoreNoThe memory size of each core. Valid values: 1024 to 65536 (64 × 1024). Unit: MB. Determined by the system
lifecycleNoThe lifecycle of the output table. 7

Examples

  • Test data
    create table transform_test as
    select * from
    (
    select true as f0,2.0 as f1,1 as f2 from dual union all
    select false as f0,3.0 as f1,1 as f2 from dual union all
    select false as f0,4.0 as f1,1 as f2 from dual union all
    select true as f0,3.0 as f1,1 as f2 from dual union all
    select false as f0,3.0 as f1,1 as f2 from dual union all
    select false as f0,4.0 as f1,1 as f2 from dual union all
    select true as f0,3.0 as f1,1 as f2 from dual union all
    select false as f0,5.0 as f1,1 as f2 from dual union all
    select false as f0,3.0 as f1,1 as f2 from dual union all
    select true as f0,4.0 as f1,1 as f2 from dual union all
    select false as f0,3.0 as f1,1 as f2 from dual union all
    select true as f0,4.0 as f1,1 as f2 from dual
    )tmp;
  • Training data
    f0f1f2
    false3.01
    false3.01
    true2.01
    true4.01
    false4.01
    false3.01
    false3.01
    true3.01
    false4.01
    true4.01
    false5.01
    true3.01
  • PAI command for training
    pai -project projectxlib4
        -name type_transform_v1
        -DinputTable=transform_test
        -Dcols_to_double=f0
        -Dcols_to_int=f1
        -Dcols_to_string=f2
        -DoutputTable=trans_test_output;
  • Output
    Result table
    f0f1f2
    0.031
    0.031
    1.021
    1.041
    0.041
    0.031
    1.031
    0.041
    0.031
    0.051
    1.031
    1.041