This topic describes the data types and parameters supported by Table Store Reader and how to configure it by using the code editor.

Table Store Reader can read incremental data from Table Store based on the specified range. Currently, Table Store Reader can read incremental data in the following ways:
  • Reads data from the entire table.
  • Reads data based on the specified range.
  • Reads data from the specified shard.

Table Store is a NoSQL database service built on the Apsara distributed operating system that allows you to store and access large amounts of structured data in real time. Table Store organizes data into instances and tables. Using data sharding and load balancing technologies, Table Store seamlessly expands the data scale.

Table Store Reader connects to the Table Store server through the official Table Store Java SDK and reads data from the server. Then, Table Store Reader converts the data to a format that is readable by Data Integration based on the official data synchronization protocols, and sends the converted data to a writer.

Table Store Reader splits a sync node to concurrent tasks based on the table range to synchronize data in a Table Store table. Each thread is responsible for running a task.

Table Store Reader supports all Table Store data types. The following table lists the data types supported by Table Store Reader.
Category Table Store data type
Integer INTEGER
Floating point DOUBLE
String STRING
Boolean BOOLEAN
Binary BINARY
Note Table Store does not support data of the DATE type. Applications use the LONG-type UNIX timestamp to indicate the time.

Parameters

Parameter Description Required Default value
endpoint The endpoint of the Table Store server. For more information, see Endpoint. Yes None
accessId The AccessKey ID for accessing Table Store. Yes None
accessKey The AccessKey secret for accessing Table Store. Yes None
instanceName The name of the Table Store instance. The instance is an entity for you to use and manage Table Store.

After you activate the Table Store service, you must create an instance in the console before creating and managing tables.

Instances are the basic unit for managing Table Store resources. All access control and resource measurement for applications are completed at the instance level.

Yes None
table The name of the source table. You can specify only one table as the source table. Multi-table synchronization is not required for Table Store. Yes None
column The columns to be synchronized from the source table. The columns are described in a JSON array. Table Store is a NoSQL database service. You must specify column names for Table Store Reader to read data.
  • You can specify common columns. For example, you can specify {"name":"col1"} for Table Store Reader to read data in column 1.
  • You can specify certain columns to read. Table Store Reader only reads specified columns.
  • You can specify constant columns. For example, you can specify {"type":"STRING", "value":"DataX"} to read the column in which data is of the STRING type and the data value is DataX. The type parameter specifies the constant type. The supported types are STRING, INT, DOUBLE, BOOLEAN, BINARY, INF_MIN, and INF_MAX. If the constant type is BINARY, the constant value must be Base64-encoded. INF_MIN indicates the minimum value specified by Table Store, and INF_MAX indicates the maximum value specified by Table Store. If you set the type to INF_MIN or INF_MAX, do not set the value. Otherwise, errors may occur.
  • You cannot specify a function or custom expression, because Table Store does not provide functions or expressions similar to those of SQL. Table Store Reader cannot read columns that contain functions or expressions.
Yes None
begin and end The Table Store table range from which data is to be read. You can specify both or neither of the two parameters. The begin and end parameters define the value ranges of primary key columns in the Table Store table. Make sure that you specify the value ranges for all primary key columns in the table. If you do not need to limit a range, specify the parameters as {"type":"INF_MIN"} and {"type":"INF_MAX"}. For example, to read certain data from a Table Store table with the primary key of [DeviceID, SellerID], specify the begin and end parameters in the following way:
"range": {
      "begin": [
        {"type":"INF_MIN"}, // The minimum value of the DeviceID field.
        {"type":"INT", "value":"0"} // The minimum value of the SellerID field.
      ], 
      "end": [
        {"type":"INF_MAX"}, // The maximum value of the DeviceID field.
        {"type":"INT", "value":"9999"} // The maximum value of the SellerID field.
      ]
    }
To read all data from the table, specify the begin and end parameters in the following way:
"range": {
      "begin": [
        {"type":"INF_MIN"}, // The minimum value of the DeviceID field.
        {"type":"INF_MIN"} // The minimum value of the SellerID field.
      ], 
      "end": [
        {"type":"INF_MAX"}, // The maximum value of the DeviceID field.
          {"type":"INF_MAX"} // The maximum value of the SellerID field.
      ]
    }
Yes None
split The custom rule for data sharding. This parameter is an advanced setting. We recommend that you do not set this parameter.

If data is unevenly distributed in a Table Store table and the automatic sharding feature of Table Store Reader fails to work, you can customize a sharding rule.

The sharding rule specified by the split parameter must fall in the range specified by the begin and end parameters and must be the values of partition key columns. That is, you only need to specify the values of partition key columns instead of the values of primary key columns in the split parameter.

To read data from a Table Store table with the primary key of [DeviceID, SellerID], specify the following parameters:
"range": {
      "begin": {
        {"type":"INF_MIN"}, // The minimum value of the DeviceID field.
        {"type":"INF_MIN"} // The minimum value of the SellerID field.
      }, 
      "end": {
        {"type":"INF_MAX"}, // The maximum value of the DeviceID field.
        {"type":"INF_MAX"} // The maximum value of the SellerID field.
      },
       // The specified sharding rule. If you specify a sharding rule, the sync node is split to concurrent tasks based on the values of the begin, end, and split parameters. Data is sharded only based on the partition key, that is, the first column of the primary key.
       // The data type of the partition key can be INF_MIN, INF_MAX, STRING, or INT.
            "split":[
                                {"type":"STRING", "value":"1"},
                                {"type":"STRING", "value":"2"},
                                {"type":"STRING", "value":"3"},
                                {"type":"STRING", "value":"4"},
                                {"type":"STRING", "value":"5"}
                    ]
    }
No None

Configure Table Store Reader by using the codeless UI

Currently, the codeless user interface (UI) is not supported for Table Store Reader.

Configure Table Store Reader by using the code editor

In the following code, a node is configured to read data from a Table Store table.
{
    "type":"job",
    "version":"2.0",// The version number.
    "steps":[
        {
            "stepType":"ots",// The reader type.
            "parameter":{
                "datasource":"",// The connection name.
                "column":[ // The columns to be synchronized.
                    {
                        "name":"column1" // The name of the column.
                    },
                    {
                        "name":"column2"
                    },
                    {
                        "name":"column3"
                    },
                    {
                        "name":"column4"
                    },
                    {
                        "name":"column5"
                    }
                ],
                "range":{
                    "split":[
                        {
                            "type":"INF_MIN"
                        },
                        {
                            "type":"STRING",
                            "value":"splitPoint1"
                        },
                        {
                            "type":"STRING",
                            "value":"splitPoint2"
                        },
                        {
                            "type":"STRING",
                            "value":"splitPoint3"
                        },
                        {
                            "type":"INF_MAX"
                        }
                    ],
                    "end":[
                        {
                            "type":"INF_MAX"
                        },
                        {
                            "type":"INF_MAX"
                        },
                        {
                            "type":"STRING",
                            "value":"end1"
                        },
                        {
                            "type":"INT",
                            "value":"100"
                        }
                    ],
                    "begin":[
                        {
                            "type":"INF_MIN"
                        },
                        {
                            "type":"INF_MIN"
                        },
                        {
                            "type":"STRING",
                            "value":"begin1"
                        },
                        {
                            "type":"INT",
                            "value":"0"
                        }
                    ]
                },
                "table":""// The name of the table to be synchronized.
            },
            "name":"Reader",
            "category":"reader"
        },
        { 
            "stepType":"stream",
            "parameter":{},
            "name":"Writer",
            "category":"writer"
        }
    ],
    "setting":{
        "errorLimit":{
            "record":"0"// The maximum number of dirty data records allowed.
        },
        "speed":{
            "throttle":false,// Specifies whether to enable bandwidth throttling. A value of false indicates that the bandwidth is not throttled. A value of true indicates that the bandwidth is throttled. The maximum transmission rate takes effect only if you set this parameter to true.
            "concurrent":1,// The maximum number of concurrent threads.
        }
    },
    "order":{
        "hops":[
            {
                "from":"Reader",
                "to":"Writer"
            }
        ]
    }
}