This topic describes the Convert Row, Column, and Value to KV Pair component provided by Machine Learning Designer (formerly known as Machine Learning Studio).

The component converts a trituple table (row,col,value) to a key-value table (row,[col_id:value]).

The data type of the trituple table (row,col,value) is "XXD" or "XXL". "X" represents any data type, "D" represents the DOUBLE data type, and "L" represents the BIGINT data type. In the key-value table generated after conversion, the data types of row and value are the same as those of the original input data. The generated index table of col maps to the col_id column. The data type of the col_id column is BIGINT.

The following tables provide a conversion example.
  • The following table is a trituple table.
    idwordcount
    01a10
    01b20
    01c30
  • The following table is the generated key-value table.
    idkey_value
    011:10;2:20;3:30
    Note You can customize the delimiter between a key and a value, as well as between two key-value pairs in the key_value column.
  • The following table is the generated index table.
    keykey_id
    a1
    b2
    c3

You can configure the component by using the Machine Learning Platform for AI (PAI) console or a PAI command.

Configure the component

You can use one of the following methods to configure the Convert Row, Column, and Value to KV Pair component.

Method 1: Configure the component on the pipeline page

You can configure the parameters of the Row, Column, and Value to KV Pair component on the pipeline page of Machine Learning Designer of Machine Learning Platform for AI (PAI). Machine Learning Designer is formerly known as Machine Learning Studio. The following table describes the parameters.
TabParameterDescription
Fields SettingColumns Reserved During KV ConversionThe name of the column that remains unchanged after a trituple table is converted to a key-value table.
Output KeysThe keys in the key-value table.
Output ValuesThe values in the key-value table.
Key Column in Input Index TableThe name of the key column in the index table.
Index ID Column of Key in Input Index TableThe name of the index column in the index table.
KV DelimiterThe delimiter used to separate a key and a value in the key-value table. The default delimiter is a colon (:).
KV Pair DelimiterThe delimiter used to separate key-value pairs. Default value: comma (,).
TuningTotal Number of InstancesThe total number of instances. The value must be a positive integer. By default, the system calculates the value based on the amount of input data.
Memory Size (MB)The total memory. The value must be a positive integer. By default, the system calculates the value based on the amount of input data.

Method 2: Use PAI commands

Configure the component parameters by using PAI commands. You can use the SQL Script component to call PAI commands. For more information, see SQL Script.
PAI -name triple_to_kv
    -project algo_public
    -DinputTableName=test_data
    -DoutputTableName=test_kv_out
    -DindexOutputTableName=test_index_out
    -DidColName=id
    -DkeyColName=word
    -DvalueColName=count
    -DinputTablePartitions=ds=test1
    -DindexInputTableName=test_index_input
    -DindexInputKeyColName=word
    -DindexInputKeyIdColName=word_id
    -DkvDelimiter=:
    -DpairDelimiter=;
    -Dlifecycle=3
ParameterRequiredDescriptionDefault value
inputTableNameYesThe name of the input table. No default value
idColNameYesThe name of the column that remains unchanged after a trituple table is converted to a key-value table. No default value
keyColNameYesThe name of the column that lists keys in the key-value table. No default value
valueColNameYesThe name of the column that lists values in the key-value table. No default value
outputTableNameYesThe name of the generated key-value table. No default value
indexOutputTableNameYesThe name of the generated index table. No default value
indexInputTableNameNoThe name of the existing index table. The table must contain data. No default value
indexInputKeyColNameNoThe name of the key column in the index table. This parameter is required if you set the indexInputTableName parameter. No default value
indexInputKeyIdColNameNoThe name of the index column in the index table. This parameter is required if you set the indexInputTableName parameter. No default value
inputTablePartitionsNoThe partition names of the input table. You can enter only one partition name. No default value
kvDelimiterNoThe delimiter used to separate a key and a value in the key-value table. Colons (:)
pairDelimiterNoThe delimiter used to separate key-value pairs. Comma (,)
lifecycleNoThe lifecycle of the output table. No default value
coreNumNoThe total number of instances. The value must be a positive integer. Automatically calculated based on the amount of input data
memSizePerCoreNoThe total memory. The value must be a positive integer. Automatically calculated based on the amount of input data

Examples

  • Input
    drop table if exists triple2kv_test_input;
    create table triple2kv_test_input as
    select
      *
    from
    (
      select '01' as id, 'a' as word, 10 as count from dual
        union all
          select '01' as id, 'b' as word, 20 as count from dual
        union all
          select '01' as id, 'c' as word, 30 as count from dual
        union all
          select '02' as id, 'a' as word, 100 as count from dual
        union all
          select '02' as id, 'd' as word, 200 as count from dual
        union all
          select '02' as id, 'e' as word, 300 as count from dual
    ) tmp;
  • PAI command
    PAI -name triple_to_kv
        -project algo_public
        -DinputTableName=triple2kv_test_input
        -DoutputTableName=triple2kv_test_input_out
        -DindexOutputTableName=triple2kv_test_input_index_out
        -DidColName=id
        -DkeyColName=word
        -DvalueColName=count
        -Dlifecycle=1;
  • Output
    • Key-value table triple2kv_test_input_out
      +------------+------------+
      | id         | key_value  |
      +------------+------------+
      | 02         | 1:100;4:200;5:300 |
      | 01         | 1:10;2:20;3:30 |
      +------------+------------+
    • Index table triple2kv_test_input_index_out
      +------------+------------+
      | key        | key_id     |
      +------------+------------+
      | a          | 1          |
      | b          | 2          |
      | c          | 3          |
      | d          | 4          |
      | e          | 5          |
      +------------+------------+