You can use the KV to Table component to convert a table in the key-value format into a table in a common format. The key is converted into a column name of the table, and the value is converted into the value of the column that corresponds to the row.

Background information

Definition of the key-value table format: The key represents the column name index, and the value supports data of the BIGINT, DOUBLE, or STRING type. Users can import a custom key_map table into this component. The key_map table contains mappings of column names and keys. Regardless of whether a key_map table is imported, this component exports a key_map table to record the mappings of column names and keys after conversion, such as 1:10, 2:20, and 3:30.

Definition of the key_map table format: The key_map table format contains mappings of column names and indexes and data type information, which correspond to the col_name, col_index, and col_datatype parameters. The values of the three parameters must be of the STRING data type. If you do not specify col_datatype, the default value of this parameter is double.
col_name col_index col_datatype
col1 1 bigint
col2 2 double

KV to Table

You can configure the component by using one of the following methods:
  • Machine Learning Platform for AI console
    Tab Parameter Description
    Fields Setting KV Column The names of the columns that contain data in the key-value format.
    Appended Columns The names of appended columns.
    KV Delimiter The delimiter used between keys and values. Colons (:) are used by default.
    KV Pair Delimiter The delimiter used between key-value pairs. Commas (,) are used by default.
    Parameters Setting Reserve the First 1,200 Columns Specifies whether to reserve only the first 1,200 columns if the number of columns in the key_map table after conversion exceeds 1,200.
    Tuning Computing Cores The number of cores. The system automatically allocates the number of cores used for training based on the volume of input data.
    Memory Size per Core The memory size of each core. The system automatically allocates the memory size based on the volume of input data. Unit: MB.
  • PAI command
    PAI -name KVToTable
        -project algo_public
        -DinputTableName=test
        -DoutputTableName=test_out
        -DoutputKeyMapTableName=test_keymap_out
        -DkvColName=kv;
    Parameter Required Description Default value
    inputTableName Yes The name of the input table. No default value
    kvColName Yes The names of the columns that contain data in the key-value format. No default value
    outputTableName Yes The name of the output table. No default value
    outputKeyMapTableName Yes The name of the output index table. No default value
    inputKeyMapTableName No The name of the input index table. No default value
    appendColName No The names of appended columns. No default value
    inputTablePartitions No The partitions selected from the input table for training. Specify this parameter in one of the following formats:
    • Partition_name=value
    • name1=value1/name2=value2: multi-level partitions
    Note If you specify multiple partitions, separate them with commas (,).
    All partitions
    kvDelimiter No The delimiter used between keys and values. :
    itemDelimiter No The delimiter used between key-value pairs. ,
    top1200 No Specifies whether to reserve only the first 1,200 columns. Valid values:
    • true
    • false
    true
    lifecycle No The lifecycle of the output table. No default value
    coreNum No The number of cores used in computing. Set the value to a positive integer. Automatically allocated
    memSizePerCore No The memory size of each core. Unit: MB. Valid values: (100,64 × 1024). Automatically allocated

Example

  • Data generation
        drop table if exists test;
            create table test as
              select
                *
              from
              (
                select '1:1,2:2,3:-3.3' as kv from dual
                union all
                select '1:10,2:20,3:-33.3' as kv from dual
              ) tmp;
  • PAI command
     PAI -name KVToTable
        -project algo_public
        -DinputTableName=test
        -DoutputTableName=test_out
        -DoutputKeyMapTableName=test_keymap_out
        -DkvColName=kv;
  • Output
    • Output table
      +------------+------------+------------+
      | kv_1       | kv_2       | kv_3       |
      +------------+------------+------------+
      | 1.0        | 2.0        | -3.3       |
      | 10.0       | 20.0       | -33.3      |
      +------------+------------+------------+
    • Output mapping table
      +------------+------------+------------+
      | col_name   | col_index  | col_type   |
      +------------+------------+------------+
      | kv_1       | 1          | double     |
      | kv_2       | 2          | double     |
      | kv_3       | 3          | double     |
      +------------+------------+------------+

Algorithm scale

The converted columns include the appended columns and the columns converted by using the KV to Table component. The columns converted by using the KV to Table component are exported before the appended columns. If the number of converted columns exceeds the maximum number of columns of the key_map table, and the top1200 parameter is set to true, the maximum number of columns is exported. Otherwise, an error is reported. A maximum of 1,200 columns can be exported.

The number of data records cannot exceed 100 million.

FAQ

  • Q: Which columns are converted if the input includes a key_map table?

    A: The converted columns are the columns whose keys exist in both the key_map and key-value tables.

  • Q: What is the data type of the key column after conversion if the input includes a key_map table?

    A: The data type of the converted key column is the same as that of the key_map table. If the data type of the key_map table is not specified, the data type of the converted key column is DOUBLE.

  • Q: What is the naming convention for the converted key column if a key_map table is imported?

    A: The naming convention is the names of columns in key:value format + "" + key.

    The following characters are not supported:

    %&()*+-. /;<>=?

  • Q: Why are column names contradictory to each other?

    A: If you specify an appended column, and the name of the appended column is the same as that of the converted key column, an error is reported.

  • Q: Which types of columns can be converted?

    A: Only the columns of the numeric data types can be converted.

  • Q: What would happen if the length of the column name exceeds 128 characters?

    A: Only the first 128 characters of the column name are reserved.

  • Q: What do I do if a row contains duplicate keys?

    A: Calculate the sum of the values that correspond to these keys.