This topic describes how to use the Table to KV component provided by Machine Learning Designer to convert a table in the common format into a table in the key-value format.

Limits

  • The output table does not show the null values in the input table. You can specify the columns that you want to retain in the output table. The specified columns are retained in their original formats.
  • If an input key_map table exists, the columns that are converted into the output table are the columns whose keys exist in both the key_map and key-value tables.
  • If an input key_map table exists and a data type specified in the key_map table is different from the data type of a specific column in the input table, the output key_map table uses the data type that you specified.
  • The columns that you want to convert into the key-value format in the input table must be of the BIGINT or DOUBLE data type.

Configure the Table to KV component

You can use one of the following methods to configure the Table to KV component.

Method 1: Configure the Table to KV component in a visualized manner

You can configure the parameters of the Table to KV component on the pipeline page of Machine Learning Designer of Machine Learning Platform for AI (PAI). Machine Learning Designer is formerly known as Machine Learning Studio. The following table describes the parameters.
Tab Parameter Description
Fields Setting Columns to Convert The names of the columns that you want to convert.
Reserved Columns The names of the columns that you want to retain.
KV Delimiter The delimiters that are used between keys and values. Colons (:) are used by default.
KV Pair Delimiter The delimiters that are used between key-value pairs. Commas (,) are used by default.
Parameters Setting Convert Columns to IDs Specifies whether to convert columns into IDs. Valid values:
  • Yes
  • No
Tuning Cores The number of cores. The system automatically allocates cores that are used for training based on the amount of input data.
Memory Size The memory size of each core. The system automatically allocates the memory based on the amount of input data. Unit: MB.

Method 2: Configure the Table to KV component by using PAI commands

You can configure the parameters of the Table to KV component by using PAI commands. You can use the SQL Script component to run PAI commands. For more information, see SQL Script. The following table describes the parameters.
PAI -name TableToKV
    -project algo_public
    -DinputTableName=maple_tabletokv_basic_input
    -DoutputTableName=maple_tabletokv_basic_output
    -DselectedColNames=col0,col1,col2
    -DappendColNames=rowid;
Parameter Required Description Default value
inputTableName Yes The name of the input table. No default value
inputTablePartitions No The partitions that are selected from the input table for training. You can specify this parameter in the Partition_name=value format.

If multi-level partitions exist, you can specify this parameter in the name1=value1/name2=value2; format.

Separate multiple partitions with commas (,).

All partitions selected
selectedColNames No The names of the columns that you select to convert. The data types of the columns that you select must be BIGINT or DOUBLE. All columns selected
appendColNames No The names of the columns that you want to retain. The specified columns are retained in their original formats. No default value
outputTableName Yes The name of the output table. No default value
kvDelimiter No The delimiters that are used between keys and values. Colons (:)
itemDelimiter No The delimiters that are used between key-value pairs. Commas (,)
convertColToIndexId No Specifies whether to convert columns into IDs. Valid values:
  • 1: Columns are converted into IDs.
  • 0: Columns are not converted into IDs.
0
inputKeyMapTableName No The name of the input index table.

This parameter takes effect only when the convertColToIndexId parameter is set to 1. If this parameter is not specified, the system automatically calculates a set of IDs.

null
outputKeyMapTableName Determined based on convertColToIndexId The name of the output index table. This parameter is required only when the convertColToIndexId parameter is set to 1. No default value
lifecycle No The lifecycle of the output table. The value of this parameter must be a positive integer. No default value
coreNum No The number of cores. The value of this parameter must be a positive integer. Valid values: [1,9999]. This parameter must be used together with the memSizePerCore parameter. Determined by the system
memSizePerCore No The memory size of each core. Unit: MB. The value of this parameter must be a positive integer. Valid values: [1024,64 × 1024]. Determined by the system

Example

  • Input data
    drop table if exists test;
    create table test as
    select * from
        (
            select 0 as rowid, 1 as col0, 1.1 as col1, 2 as col2  from dual union all
            select 1 as rowid, 0 as col0, 1.2 as col1, 3 as col2 from dual union all
            select 2 as rowid, 1 as col0, 2.3 as col1, 4 as col2 from dual union all
            select 3 as rowid, 1 as col0, 0.0 as col1, 5 as col2 from dual
        ) tmp;
  • PAI commands
    PAI -name TableToKV
        -project algo_public
        -DinputTableName=test
        -DoutputTableName=test_output
        -DselectedColNames=col0,col1,col2
        -DconvertColToIndexId=1
        -DoutputKeyMapTableName=test_key_map
        -DappendColNames=rowid;
  • Output tables
    Output table test_output
    rowid kv
    0 0:1,1:1.1,2:2
    1 0:0,1:1.2,2:3
    2 0:1,1:2.3,2:4
    3 0:1,1:0,2:5
    Output table test_key_map
    col_name col_index col_datatype
    col0 0 bigint
    col1 1 double
    col2 2 bigint