This topic describes the Convert Row, Column, and Value to KV Pair component provided by Machine Learning Studio.

The component converts a trituple table (row,col,value) to a key-value table (row,[col_id:value]).

The data type of the trituple table (row,col,value) is "XXD" or "XXL". "X" represents any data type, "D" represents the DOUBLE data type, and "L" represents the BIGINT data type. In the key-value table generated after conversion, the data types of row and value are the same as those of the original input data. The generated index table of col maps to the col_id column. The data type of the col_id column is BIGINT.

The following tables provide a conversion example.
  • The following table is a trituple table.
    id word count
    01 a 10
    01 b 20
    01 c 30
  • The following table is the generated key-value table.
    id key_value
    01 1:10;2:20;3:30
    Note The delimiter between a key and a value, as well as between two key-value pairs in the key_value column can be customized.
  • The following table is the generated index table.
    key key_id
    a 1
    b 2
    c 3

You can configure the component by using the Machine Learning Platform for AI console or a PAI command.

Configure the component

  • Machine Learning Platform for AI console
    Tab Parameter Description
    Fields Settings Columns Reserved During KV Conversion The column name remains unchanged after a trituple table is converted to a key-value table.
    Output Keys The keys in the key-value table.
    Output Values The values in the key-value table.
    Key Column in Input Index Table The name of the key column in the index table.
    Index ID Column of Key in Input Index Table The name of the index column in the index table.
    KV Delimiter The delimiter used to separate a key and a value in the key-value table. The default delimiter is a colon (:).
    KV Pair Delimiter The delimiter used to separate key-value pairs. The default delimiter is a comma (,).
    Tuning Total Number of Instances The total number of instances. The value must be a positive integer. By default, the system calculates the value based on the amount of input data.
    Memory Size (MB) The total memory. The value must be a positive integer. By default, the system calculates the value based on the amount of input data.
  • PAI command
    PAI -name triple_to_kv
        -project algo_public
        -DinputTableName=test_data
        -DoutputTableName=test_kv_out
        -DindexOutputTableName=test_index_out
        -DidColName=id
        -DkeyColName=word
        -DvalueColName=count
        -DinputTablePartitions=ds=test1
        -DindexInputTableName=test_index_input
        -DindexInputKeyColName=word
        -DindexInputKeyIdColName=word_id
        -DkvDelimiter=:
        -DpairDelimiter=;
        -Dlifecycle=3
    Parameter Required Description Default value
    inputTableName Yes The name of the input table. No default value
    idColName Yes The column name remains unchanged after a trituple table is converted to a key-value table. No default value
    keyColName Yes The name of the column that lists keys in the key-value table. No default value
    valueColName Yes The name of the column that lists values in the key-value table. No default value
    outputTableName Yes The name of the generated key-value table. No default value
    indexOutputTableName Yes The name of the generated index table. No default value
    indexInputTableName No The name of the existing index table. The table must contain data. No default value
    indexInputKeyColName No The name of the key column in the index table. This parameter is required if you configure indexInputTableName. No default value
    indexInputKeyIdColName No The name of the index column in the index table. This parameter is required if you configure indexInputTableName. No default value
    inputTablePartitions No The partition names of the input table. You can enter only one partition name. No default value
    kvDelimiter No The delimiter used to separate a key and a value in the key-value table. Colon (:)
    pairDelimiter No The delimiter used to separate key-value pairs. Comma (,)
    lifecycle No The lifecycle of the output table. No default value
    coreNum No The total number of instances. The value must be a positive integer. Automatically calculated based on the amount of input data
    memSizePerCore No The total memory. The value must be a positive integer. Automatically calculated based on the amount of input data

Example

  • Input
    drop table if exists triple2kv_test_input;
    create table triple2kv_test_input as
    select
      *
    from
    (
      select '01' as id, 'a' as word, 10 as count from dual
        union all
          select '01' as id, 'b' as word, 20 as count from dual
        union all
          select '01' as id, 'c' as word, 30 as count from dual
        union all
          select '02' as id, 'a' as word, 100 as count from dual
        union all
          select '02' as id, 'd' as word, 200 as count from dual
        union all
          select '02' as id, 'e' as word, 300 as count from dual
    ) tmp;
  • PAI command
    PAI -name triple_to_kv
        -project algo_public
        -DinputTableName=triple2kv_test_input
        -DoutputTableName=triple2kv_test_input_out
        -DindexOutputTableName=triple2kv_test_input_index_out
        -DidColName=id
        -DkeyColName=word
        -DvalueColName=count
        -Dlifecycle=1;
  • Output
    • Key-value table triple2kv_test_input_out
      +------------+------------+
      | id         | key_value  |
      +------------+------------+
      | 02         | 1:100;4:200;5:300 |
      | 01         | 1:10;2:20;3:30 |
      +------------+------------+
    • Index table triple2kv_test_input_index_out
      +------------+------------+
      | key        | key_id     |
      +------------+------------+
      | a          | 1          |
      | b          | 2          |
      | c          | 3          |
      | d          | 4          |
      | e          | 5          |
      +------------+------------+