Type Conversion is a data processing component that converts features from any data type to STRING, DOUBLE, or INT. This component also fills missing values when a conversion error occurs to ensure data integrity and consistency.
Algorithm description
-
Converts the data type of a table field to another type.
-
Simultaneously converts multiple fields to different data types.
-
Converts fields of ODPS 2.0 numeric data types, such as decimal, float, and int.
NoteThis feature is available only in the China (Beijing), China (Shanghai), China (Hangzhou), China (Shenzhen), China (Zhangjiakou), and China (Chengdu) regions.
-
Provides an option to keep the original data columns.
Component Configuration
Method 1: Visual configuration
In the Designer workflow, add the Type Conversion component and configure its parameters in the right-side pane:
|
Parameter type |
Parameter |
Description |
|
Field Settings |
Columns to convert to DOUBLE |
Converts the selected field to the DOUBLE type. |
|
Default fill value for DOUBLE conversion errors |
The default value to use if the conversion to the DOUBLE type fails. |
|
|
Columns to convert to INT |
Converts the selected field to the INT type. |
|
|
Default fill value for INT conversion errors |
The default value to use if the conversion to the INT type fails. |
|
|
Columns to convert to STRING |
Converts the selected field to the STRING type. |
|
|
Default fill value for STRING conversion errors |
The default value to use if the conversion to the STRING type fails. |
|
|
Keep Original Columns |
The prefix for column names is "typed_". |
|
|
Memory Size per Node |
Value range: 1024 MB to 65536 MB. |
|
|
Number of Nodes |
Used in conjunction with the Memory size per node parameter. Value range: 1 to 9999. |
Method 2: PAI command
You can use a PAI command to configure the Type Conversion component. You can run PAI commands using a SQL Script component. For more information, see SQL Script.
pai -project algo_public
-name type_transform_v1
-DinputTable=type_test
-Dcols_to_string="f0"
-Ddefault_double_value=0.0
-DoutputTable=type_test_output;
|
Parameter |
Required |
Default value |
Description |
|
inputTable |
Yes |
None |
The name of the input table. |
|
inputTablePartitions |
No |
All partitions |
The partitions in the input table to use for training. The following formats are supported:
Note
If you specify multiple partitions, separate them with commas (,). |
|
outputTable |
Yes |
None |
The sink table for the type conversion results. |
|
reserveOldFeat |
No |
None |
Specifies whether to keep the original data columns. |
|
cols_to_double |
No |
None |
The feature columns to convert to the DOUBLE type. |
|
cols_to_string |
No |
None |
The feature columns to convert to the STRING type. |
|
cols_to_int |
No |
None |
The feature columns to convert to the INT type. |
|
default_int_value |
No |
0 |
The value to use when a feature field is empty. |
|
default_double_value |
No |
0.0 |
The value to use when a feature field is empty. |
|
default_string_value |
No |
"" |
The value to use when a feature field is empty. |
|
coreNum |
No |
Calculated automatically |
The number of nodes. Use this parameter with memSizePerCore. Value range: 1 to 9999. |
|
memSizePerCore |
No |
Calculated automatically |
The memory size of a single node, in MB. Value range: 1024 to 65536. |
|
lifecycle |
No |
7 |
The lifecycle of the output table. |
Example
-
Generate test data
create table transform_test as select * from ( select true as f0,2.0 as f1,1 as f2 union all select false as f0,3.0 as f1,1 as f2 union all select false as f0,4.0 as f1,1 as f2 union all select true as f0,3.0 as f1,1 as f2 union all select false as f0,3.0 as f1,1 as f2 union all select false as f0,4.0 as f1,1 as f2 union all select true as f0,3.0 as f1,1 as f2 union all select false as f0,5.0 as f1,1 as f2 union all select false as f0,3.0 as f1,1 as f2 union all select true as f0,4.0 as f1,1 as f2 union all select false as f0,3.0 as f1,1 as f2 union all select true as f0,4.0 as f1,1 as f2 )tmp; -
View the training data
f0
f1
f2
false
3.0
1
false
3.0
1
true
2.0
1
true
4.0
1
false
4.0
1
false
3.0
1
false
3.0
1
true
3.0
1
false
4.0
1
true
4.0
1
false
5.0
1
true
3.0
1
-
PAI training command
pai -project projectxlib4 -name type_transform_v1 -DinputTable=transform_test -Dcols_to_double=f0 -Dcols_to_int=f1 -Dcols_to_string=f2 -DoutputTable=trans_test_output; -
Output description
Result table
f0
f1
f2
0.0
3
1
0.0
3
1
1.0
2
1
1.0
4
1
0.0
4
1
0.0
3
1
1.0
3
1
0.0
4
1
0.0
3
1
0.0
5
1
1.0
3
1
1.0
4
1