This topic describes the data types and parameters supported by Table Store Reader-Internal and how to configure it by using the code editor.

Table Store is a NoSQL database service built on the Apsara distributed operating system that allows you to store and access large amounts of structured data in real time. Table Store organizes data into instances and tables. Using data sharding and load balancing technologies, Table Store seamlessly expands the data scale.

Table Store Reader-Internal is used to export data for the Table Store Internal model, whereas Table Store Reader is used to export data for the Table Store Public model.

The Table Store Internal model supports the multi-version mode and normal mode. Table Store Reader-Internal can export data in the two modes:

  • Multi-version mode: Table Store stores multiple versions of column values, and this mode allows you to export data of multiple versions.

    Table Store Reader-Internal converts a cell to a 4-tuple of a one-dimensional table: PrimaryKey (columns 1 to 4), ColumnName, Timestamp, and Value. This process is similar to that for the multi-version mode of HBase Reader. Each {PrimaryKeyy, ColumnName, Timestamp, Value} tuple is sent to a writer as four columns in Data Integration records.

  • Normal mode: This mode allows you to export the latest version of each column in each row, which is the same as the normal mode of HBase Reader. For more information, see the normal mode of HBase Reader in Configure HBase Reader.

Table Store Reader-Internal connects to a Table Store server and reads data by using the official Java SDK. Table Store Reader-Internal optimizes the read process by providing features such as performing retry attempts when a timeout or exception occurs.

Table Store Reader-Internal supports all Table Store data types. The following table lists the data types supported by Table Store Reader-Internal.
Data Integration data type Table Store data type
LONG INTEGER
DOUBLE DOUBLE
STRING STRING
BOOLEAN BOOLEAN
BYTES BINARY

Parameters

Parameter Description Required Default value
mode The mode in which Table Store Reader-Internal reads data. Valid values: normal and multiVersion. Yes None
endpoint The endpoint of the Table Store server. Yes None
accessId The AccessKey ID for accessing Table Store. Yes None
accessKey The AccessKey secret for accessing Table Store. Yes None
instanceName The name of the Table Store instance. The instance is an entity for you to use and manage Table Store.

After you activate the Table Store service, you must create an instance in the console before creating and managing tables. Instances are the basic unit for managing Table Store resources. All access control and resource measurement for applications are completed at the instance level.

Yes None
table The name of the source table. You can specify only one table as the source table. Multi-table synchronization is not required for Table Store. Yes None
range The range of the data to export, in the format of [begin,end).
  • If the value of the begin parameter is smaller than that of the end parameter, data is read in forward order.
  • If the value of the begin parameter is larger than that of the end parameter, data is read in reverse order.
  • The value of the begin parameter cannot be equal to that of the end parameter.
  • The following value types are supported: STRING, INT, and BINARY. Binary data is passed in as Base64 strings in binary format. INF_MIN represents an infinitely small value and INF_MAX represents an infinitely large value.
No By default, data is read from the beginning of the table to the end of the table.
range:{"begin"} The start of the data to export. Enter an empty array, a primary key prefix, or a complete primary key. In forward order, the default primary key suffix is INF_MIN. In reverse order, the default primary key suffix is INF_MAX.

This parameter specifies the value range of the Table Store primary key and is used for data filtering. If you do not specify this parameter, the minimum value is used by default.

The JSON format does not support binary data. If the value type in the PrimaryKey column is BINARY, you must use the Java method Base64.encodeBase64String to convert binary data to a string, and then enter the string as the value of the parameter. A Java example is described as follows:
  • byte[] bytes = "hello".getBytes();: constructs binary data, which is the byte value of the string hello.
  • String inputValue = Base64.encodeBase64String(bytes): calls the Base64.encodeBase64String method to convert the binary data to a string.

After you run the preceding code, the string "aGVsbG8=" is returned for the inputValue parameter.

Finally, set this parameter to {"type":"binary","value" : "aGVsbG8="}.

No Data is read from the beginning of the table.
range:{"end"} The end of the data to export. Enter an empty array, a primary key prefix, or a complete primary key. In forward order, the default primary key suffix is INF_MAX. In reverse order, the default primary key suffix is INF_MIN.

The JSON format does not support binary data. If the value type in the PrimaryKey column is BINARY, you must use the Java method Base64.encodeBase64String to convert binary data to a string, and then enter the string as the value of the parameter. A Java example is described as follows:

  • byte[] bytes = "hello".getBytes();: constructs binary data, which is the byte value of the string hello.
  • String inputValue = Base64.encodeBase64String(bytes): calls the Base64.encodeBase64String method to convert the binary data to a string.

After you run the preceding code, the string "aGVsbG8=" is returned for the inputValue parameter.

Finally, set this parameter to {"type":"binary", "value":"aGVsbG8="}.

No Data is read until the end of the table.
range:{"split"} If an excessively large amount of data needs to be exported, you can specify this parameter to split one node to multiple concurrent threads.
Note
  • The value for the split parameter must be the shard key, which is the first column of the primary key, and the value type must be the same as that of the partition key.
  • The specified value must fall within the value range of the begin and end parameters.
  • The values for the split parameter must be sorted in the descending or ascending order based on the data reading order that is determined by values of the begin and end parameters.
No No sharding rule is specified by default.
column The columns to be exported. Both regular and constant columns can be exported.

Mode: The multi-version mode is supported.

Normal column format: {"name":"{your column name}"}

Yes None
timeRange (only applicable to the multi-version mode) The time range of the requested data, in the format of [begin,end).
Note The value of the begin parameter must be smaller than that of the end parameter.
No The data of all versions is read by default.
timeRange:{"begin"} (only applicable to the multi-version mode) The start time for reading data. Valid values: 0 to LONG_MAX. No 0
timeRange:{"end"} (only applicable to the multi-version mode) The end time for reading data. Valid values: 0 to LONG_MAX. No LONG_MAX (9223372036854775806L)
maxVersion (only applicable to the multi-version mode) The specified version of the requested data. Valid values: 1 to INT32_MAX. No The data of all versions is read by default.

Configure Table Store Reader-Internal by using the codeless UI

Currently, the codeless user interface (UI) is not supported for Table Store Reader-Internal.

Configure Table Store Reader-Internal by using the code editor

  • Multi-version mode
    {
        "type": "job",
        "version": "1.0",
        "configuration": {
            "reader": {
                "plugin": "otsreader-internalreader",
                "parameter": {
                    "mode": "multiVersion",
                    "endpoint": "",
                    "accessId": "",
                    "accessKey": "",
                    "instanceName": "",
                    "table": "",
                    "range": {
                        "begin": [
                            {
                                "type": "string",
                                "value": "a"
                            },
                            {
                                "type": "INF_MIN"
                            }
                        ],
                        "end": [
                            {
                                "type": "string",
                                "value": "g"
                            },
                            {
                                "type": "INF_MAX"
                            }
                        ],
                        "split": [
                            {
                                "type": "string",
                                "value": "b"
                            },
                            {
                                "type": "string",
                                "value": "c"
                            }
                        ]
                    },
                    "column": [
                        {
                            "name": "attr1"
                        }
                    ],
                    "timeRange": {
                        "begin": 1400000000,
                        "end": 1600000000
                    },
                    "maxVersion": 10
                }
            }
        },
        "writer": {}
    }
  • Normal mode
    {
        "type": "job",
        "version": "1.0",
        "configuration": {
            "reader": {
                "plugin": "otsreader-internalreader",
                "parameter": {
                    "mode": "normal",
                    "endpoint": "",
                    "accessId": "",
                    "accessKey": "",
                    "instanceName": "",
                    "table": "",
                    "range": {
                        "begin": [
                            {
                                "type": "string",
                                "value": "a"
                            },
                            {
                                "type": "INF_MIN"
                            }
                        ],
                        "end": [
                            {
                                "type": "string",
                                "value": "g"
                            },
                            {
                                "type": "INF_MAX"
                            }
                        ],
                        "split": [
                            {
                                "type": "string",
                                "value": "b"
                            },
                            {
                                "type": "string",
                                "value": "c"
                            }
                        ]
                    },
                    "column": [
                        {
                            "name": "pk1"
                        },
                        {
                            "name": "pk2"
                        },
                        {
                            "name": "attr1"
                        },
                        {
                            "type": "string",
                            "value": ""
                        },
                        {
                            "type": "int",
                            "value": ""
                        },
                        {
                            "type": "double",
                            "value": ""
                        },
                        {
                            "type": "binary",
                            "value": "aGVsbG8="
                        }
                    ]
                }
            }
        },
        "writer": {}
    }