You can use the KV to Table component to convert a table in the key-value format into a table in a common format. The key is converted into a column name of the table, and the value is converted into the value of the column that corresponds to the row.

Background information

Definition of the key-value table format: The key represents the column name index, and the value supports data of the BIGINT, DOUBLE, or STRING type. You can import a custom key_map table into this component. The key_map table contains the mappings of column names and keys. Regardless of whether a key_map table is imported, this component exports a key_map table to record the mappings of column names and keys after conversion, such as 1:10, 2:20, and 3:30.

Definition of the key_map table format: The key_map table format contains the mappings of column names and indexes and data type information, which correspond to the col_name, col_index, and col_datatype parameters. The values of the three parameters must be of the STRING data type. If you do not specify the col_datatype parameter, the default value of this parameter is double.
col_namecol_indexcol_datatype
col11bigint
col22double

Component parameters

You can use one of the following methods to configure the KV to Table component:

Method 1: Configure the component on the pipeline page

Configure the component on the pipeline page of Machine Learning Designer.
TabParameterDescription
Fields SettingKV ColumnThe names of the columns that contain data in the key-value format.
Appended ColumnsThe names of appended columns.
KV DelimiterThe delimiter used between keys and values. By default, colons (:) are used.
KV Pair DelimiterThe delimiter used between key-value pairs. By default, commas (,) are used.
Parameters SettingReserve the First 1,200 ColumnsSpecifies whether to reserve only the first 1,200 columns if the number of columns in the key_map table after conversion exceeds 1,200.
TuningComputing CoresThe number of cores. The system automatically allocates cores used for training based on the volume of input data.
Memory Size per CoreThe memory size of each core. The system automatically allocates the memory based on the volume of input data. Unit: MB.

Method 2: Use PAI commands

Configure the component by using PAI commands. You can use the SQL Script component to run PAI commands. For more information, see SQL Script.
PAI -name KVToTable
    -project algo_public
    -DinputTableName=test
    -DoutputTableName=test_out
    -DoutputKeyMapTableName=test_keymap_out
    -DkvColName=kv;
ParameterRequiredDescriptionDefault value
inputTableNameYesThe name of the input table. N/A
kvColNameYesThe names of the columns that contain data in the key-value format. N/A
outputTableNameYesThe name of the output table. N/A
outputKeyMapTableNameYesThe name of the output index table. N/A
inputKeyMapTableNameNoThe name of the input index table. N/A
appendColNameNoThe names of appended columns. N/A
inputTablePartitionsNoThe partitions that are selected from the input table for training. Specify this parameter in one of the following formats:
  • Partition_name=value
  • name1=value1/name2=value2: multi-level partitions
Note If you specify multiple partitions, separate them with commas (,).
All partitions
kvDelimiterNoThe delimiter used between keys and values. Colons (:)
itemDelimiterNoThe delimiter used between key-value pairs. Commas (,)
top1200NoSpecifies whether to reserve only the first 1,200 columns. Valid values:
  • true
  • false
true
lifecycleNoThe lifecycle of the output table. N/A
coreNumNoThe number of cores used in computing. The value must be a positive integer. Determined by the system
memSizePerCoreNoThe memory size of each core. Unit: MB. Valid values: (100,64 × 1024). Determined by the system

Examples

  • Generate input data
    drop table if exists test;
    create table test as
    select * from
        (
            select '1:1,2:2,3:-3.3' as kv from dual union all
            select '1:10,2:20,3:-33.3' as kv from dual
        ) tmp;
  • Run PAI commands
     PAI -name KVToTable
        -project algo_public
        -DinputTableName=test
        -DoutputTableName=test_out
        -DoutputKeyMapTableName=test_keymap_out
        -DkvColName=kv;
  • Output description
    • Output table
      +------------+------------+------------+
      | kv_1       | kv_2       | kv_3       |
      +------------+------------+------------+
      | 1.0        | 2.0        | -3.3       |
      | 10.0       | 20.0       | -33.3      |
      +------------+------------+------------+
    • Output mapping table
      +------------+------------+------------+
      | col_name   | col_index  | col_type   |
      +------------+------------+------------+
      | kv_1       | 1          | double     |
      | kv_2       | 2          | double     |
      | kv_3       | 3          | double     |
      +------------+------------+------------+

Algorithm scale

The converted columns include the appended columns and the columns converted by using the KV to Table component. The columns converted by using the KV to Table component are exported before the appended columns. If the number of converted columns exceeds the maximum number of columns of the key_map table, and the top1200 parameter is set to true, the maximum number of columns is exported. Otherwise, an error is reported. A maximum of 1,200 columns can be exported.

The number of data records cannot exceed 100 million.

FAQ

  • Q: Which columns are converted if the input includes a key_map table?

    A: The converted columns are the columns whose keys exist in both the key_map and key-value tables.

  • Q: What is the data type of the key column after conversion if the input includes a key_map table?

    A: The data type of the converted key column is the same as that of the key_map table. If the data type of the key_map table is not specified, the data type of the converted key column is DOUBLE.

  • Q: What is the naming convention for the converted key column if a key_map table is imported?

    A: The naming convention is Name of the column in the key:value format + "" + Key.

    The following characters are not supported:

    %&()*+-./;<>=?

  • Q: Why are column names conflict with each other?

    A: If you specify an appended column and the name of the appended column is the same as that of the converted key column, an error is reported.

  • Q: Which types of columns can be converted?

    A: Only columns of the numeric data types can be converted.

  • Q: What would happen if the length of the column name exceeds 128 characters?

    A: Only the first 128 characters of the column name are reserved.

  • Q: What do I do if a row contains duplicate keys?

    A: Calculate the sum of the values that correspond to these keys.