This topic describes the Convert Row, Column, and Value to KV Pair component provided by Machine Learning Designer (formerly known as Machine Learning Studio).
The component converts a trituple table (row,col,value) to a key-value table (row,[col_id:value]).
The data type of the trituple table (row,col,value) is "XXD" or "XXL". "X" represents any data type, "D" represents the DOUBLE data type, and "L" represents the BIGINT data type. In the key-value table generated after conversion, the data types of row and value are the same as those of the original input data. The generated index table of col maps to the col_id column. The data type of the col_id column is BIGINT.
- The following table is a trituple table.
id word count 01 a 10 01 b 20 01 c 30 - The following table is the generated key-value table.
id key_value 01 1:10;2:20;3:30 Note You can customize the delimiter between a key and a value, as well as between two key-value pairs in the key_value column. - The following table is the generated index table.
key key_id a 1 b 2 c 3
You can configure the component by using the Machine Learning Platform for AI (PAI) console or a PAI command.
Configure the component
You can use one of the following methods to configure the Convert Row, Column, and Value to KV Pair component.
Method 1: Configure the component on the pipeline page
Tab | Parameter | Description |
---|---|---|
Fields Setting | Columns Reserved During KV Conversion | The name of the column that remains unchanged after a trituple table is converted to a key-value table. |
Output Keys | The keys in the key-value table. | |
Output Values | The values in the key-value table. | |
Key Column in Input Index Table | The name of the key column in the index table. | |
Index ID Column of Key in Input Index Table | The name of the index column in the index table. | |
KV Delimiter | The delimiter used to separate a key and a value in the key-value table. The default delimiter is a colon (:). | |
KV Pair Delimiter | The delimiter used to separate key-value pairs. Default value: comma (,). | |
Tuning | Total Number of Instances | The total number of instances. The value must be a positive integer. By default, the system calculates the value based on the amount of input data. |
Memory Size (MB) | The total memory. The value must be a positive integer. By default, the system calculates the value based on the amount of input data. |
Method 2: Use PAI commands
PAI -name triple_to_kv
-project algo_public
-DinputTableName=test_data
-DoutputTableName=test_kv_out
-DindexOutputTableName=test_index_out
-DidColName=id
-DkeyColName=word
-DvalueColName=count
-DinputTablePartitions=ds=test1
-DindexInputTableName=test_index_input
-DindexInputKeyColName=word
-DindexInputKeyIdColName=word_id
-DkvDelimiter=:
-DpairDelimiter=;
-Dlifecycle=3
Parameter | Required | Description | Default value |
---|---|---|---|
inputTableName | Yes | The name of the input table. | No default value |
idColName | Yes | The name of the column that remains unchanged after a trituple table is converted to a key-value table. | No default value |
keyColName | Yes | The name of the column that lists keys in the key-value table. | No default value |
valueColName | Yes | The name of the column that lists values in the key-value table. | No default value |
outputTableName | Yes | The name of the generated key-value table. | No default value |
indexOutputTableName | Yes | The name of the generated index table. | No default value |
indexInputTableName | No | The name of the existing index table. The table must contain data. | No default value |
indexInputKeyColName | No | The name of the key column in the index table. This parameter is required if you set the indexInputTableName parameter. | No default value |
indexInputKeyIdColName | No | The name of the index column in the index table. This parameter is required if you set the indexInputTableName parameter. | No default value |
inputTablePartitions | No | The partition names of the input table. You can enter only one partition name. | No default value |
kvDelimiter | No | The delimiter used to separate a key and a value in the key-value table. | Colons (:) |
pairDelimiter | No | The delimiter used to separate key-value pairs. | Comma (,) |
lifecycle | No | The lifecycle of the output table. | No default value |
coreNum | No | The total number of instances. The value must be a positive integer. | Automatically calculated based on the amount of input data |
memSizePerCore | No | The total memory. The value must be a positive integer. | Automatically calculated based on the amount of input data |
Examples
- Input
drop table if exists triple2kv_test_input; create table triple2kv_test_input as select * from ( select '01' as id, 'a' as word, 10 as count from dual union all select '01' as id, 'b' as word, 20 as count from dual union all select '01' as id, 'c' as word, 30 as count from dual union all select '02' as id, 'a' as word, 100 as count from dual union all select '02' as id, 'd' as word, 200 as count from dual union all select '02' as id, 'e' as word, 300 as count from dual ) tmp;
- PAI command
PAI -name triple_to_kv -project algo_public -DinputTableName=triple2kv_test_input -DoutputTableName=triple2kv_test_input_out -DindexOutputTableName=triple2kv_test_input_index_out -DidColName=id -DkeyColName=word -DvalueColName=count -Dlifecycle=1;
- Output
- Key-value table triple2kv_test_input_out
+------------+------------+ | id | key_value | +------------+------------+ | 02 | 1:100;4:200;5:300 | | 01 | 1:10;2:20;3:30 | +------------+------------+
- Index table triple2kv_test_input_index_out
+------------+------------+ | key | key_id | +------------+------------+ | a | 1 | | b | 2 | | c | 3 | | d | 4 | | e | 5 | +------------+------------+
- Key-value table triple2kv_test_input_out