This topic describes the data types and parameters that are supported by Tablestore Reader and how to configure Tablestore Reader by using the codeless user interface (UI) and code editor.

Tablestore Reader reads incremental data from Tablestore based on the specified range. Tablestore Reader reads incremental data in the following ways:
  • Reads data from the entire table.
  • Reads data based on the specified range.
  • Reads data from the specified shard.

Tablestore is a NoSQL database service that is built on the Apsara distributed operating system and allows you to store and access large amounts of structured data in real time. Tablestore organizes data into instances and tables. It can seamlessly expand the data scale by using data sharding and load balancing technologies.

Tablestore Reader connects to the Tablestore server by using Tablestore SDK for Java and reads data from the server. Then, Tablestore Reader converts the data into a format that is readable to Data Integration based on the official data synchronization protocols, and sends the converted data to a writer.

Tablestore Reader splits a synchronization node into multiple concurrent tasks based on the table range to synchronize data in a Tablestore table. Each Tablestore Reader thread runs a task.

Tablestore Reader supports all Tablestore data types. The following table lists the data types supported by Tablestore Reader.
Category Tablestore data type
Integer INTEGER
Floating point DOUBLE
String STRING
Boolean BOOLEAN
Binary BINARY
Note Tablestore does not support DATE-type data. The application layer uses the LONG-type UNIX timestamp to indicate time.

Parameters

Parameter Description Required Default value
endpoint The endpoint of the Tablestore server. For more information, see Endpoints. Yes No default value
accessId The AccessKey ID of the account that you use to connect to the Tablestore server. Yes No default value
accessKey The AccessKey secret of the account that you use to connect to the Tablestore server. Yes No default value
instanceName The name of the Tablestore instance. The instance is an entity for you to use and manage Tablestore.

After you activate Tablestore, you must create an instance in the Tablestore console before you can create and manage tables.

Instances are the basic units that you can use to manage Tablestore resources. Access control and resource metering for applications are implemented at the instance level.

Yes No default value
table The name of the table from which you want to read data. You can specify only one table. Multi-table synchronization is not required for Tablestore. Yes No default value
column The names of the columns from which you want to read data. Specify the names in a JSON array. Tablestore is a NoSQL database service. You must specify column names for Tablestore Reader to read data.
  • You can specify common columns. For example, you can specify {"name":"col1"} for Tablestore Reader to read data from column 1.
  • You can specify partial columns. Tablestore Reader reads only the specified columns.
  • You can specify constant columns. For example, you can specify {"type":"STRING", "value":"DataX"} for Tablestore Reader to read data from the column in which data is of the STRING type and the data value is DataX. The type parameter specifies the constant type. The supported types are STRING, INT, DOUBLE, BOOLEAN, BINARY, INF_MIN, and INF_MAX. If the constant type is BINARY, the constant value must be Base64-encoded. INF_MIN indicates the minimum value specified by Tablestore, and INF_MAX indicates the maximum value specified by Tablestore. If you set the type to INF_MIN or INF_MAX, do not set the value. If you set the value, errors may occur.
  • You cannot specify a function or custom expression. This is because Tablestore does not provide functions or expressions that are similar to those of SQL. Tablestore Reader cannot read data from columns that contain functions or expressions.
Yes No default value
begin and end The Tablestore table range from which you want to read data. You must specify both or neither of the two parameters.
The begin and end parameters specify a range for primary key columns in the Tablestore table. Make sure that you specify a range for each primary key column in the table. If you do not need to limit a range, specify the parameters as {"type":"INF_MIN"} and {"type":"INF_MAX"}. The type parameter specifies the type of the data that you want to read.
Note
  • Make sure that the number of primary keys is the same as the number of ranges indicated by begin and end. For example, the Tablestore table has n primary keys, and n is greater than or equal to 1. In this case, you must specify n ranges indicated by begin and end.
  • If the Tablestore table has multiple primary keys and the range specified for the first scanned primary key is (INF_MIN,INF_MAX), Tablestore Reader does not scan other primary keys. Instead, it extracts all data from the table.
For example, to read data from a Tablestore table with the primary keys of [DeviceID, SellerID], specify the begin and end parameters in one of the following ways:
  • Example 1:
    Extract INT-type data when the range specified for DeviceID is (INF_MIN,INF_MAX) and the range specified for SellerID is (0,9999).
    "range": {
          "begin": [
            {"type":"INF_MIN"},  // The minimum value of the DeviceID field. 
            {"type":"INT", "value":"0"}  // The minimum value of the SellerID field. 
          ], 
          "end": [
            {"type":"INF_MAX"}, // The maximum value of the DeviceID field. 
            {"type":"INT", "value":"9999"} // The maximum value of the SellerID field. 
          ]
        }
  • Example 2:
    Extract all data from the Tablestore table.
    "range": {
          "begin": [
            {"type":"INF_MIN"},  // The minimum value of the DeviceID field. 
            {"type":"INF_MIN"} // The minimum value of the SellerID field. 
          ], 
          "end": [
            {"type":"INF_MAX"}, // The maximum value of the DeviceID field. 
              {"type":"INF_MAX"} // The maximum value of the SellerID field. 
          ]
        }
Yes Left empty
split The custom rule for data sharding. This parameter is an advanced configuration item. We recommend that you do not set this parameter.

If data is unevenly distributed in a Tablestore table and the automatic sharding feature of Tablestore Reader fails to work, you can customize a sharding rule.

The sharding rule that is specified by the split parameter must fall in the range that is specified by the begin and end parameters and must be the values of the partition key. This means that you specify only the values of the partition key instead of the values of all the primary key columns in the split parameter.

To read data from a Tablestore table with the primary keys of [DeviceID, SellerID], specify the following parameters:
"range": {
      "begin": {
        {"type":"INF_MIN"},  // The minimum value of the DeviceID field. 
        {"type":"INF_MIN"}  // The minimum value of the SellerID field. 
      }, 
      "end": {
        {"type":"INF_MAX"}, // The maximum value of the DeviceID field. 
        {"type":"INF_MAX"} // The maximum value of the SellerID field. 
      }, 
       // The specified sharding rule. If you specify a sharding rule, the synchronization node is split into concurrent tasks based on the values of the begin, end, and split parameters. Data is sharded based only on the partition key, which is the first primary key column. 
       // The data type of the partition key can be INF_MIN, INF_MAX, STRING, or INT. 
            "split":[
                                {"type":"STRING", "value":"1"},
                                {"type":"STRING", "value":"2"},
                                {"type":"STRING", "value":"3"},
                                {"type":"STRING", "value":"4"},
                                {"type":"STRING", "value":"5"}
                    ]
    }
No No default value

Configure Tablestore Reader by using the codeless UI

This method is not supported.

Configure Tablestore Reader by using the code editor

In the following code, a synchronization node is configured to read data from a Tablestore table by using the code editor. Fore more information, see Create a sync node by using the code editor.
{
    "type":"job",
    "version":"2.0",// The version number. 
    "steps":[
        {
            "stepType":"ots",// The reader type. 
            "parameter":{
                "datasource":"",// The name of the data source. 
                "column":[// The names of the columns from which you want to read data. 
                    {
                        "name":"column1"// The name of the column. 
                    },
                    {
                        "name":"column2"
                    },
                    {
                        "name":"column3"
                    },
                    {
                        "name":"column4"
                    },
                    {
                        "name":"column5"
                    }
                ],
                "range":{
                    "split":[
                        {
                            "type":"INF_MIN"
                        },
                        {
                            "type":"STRING",
                            "value":"splitPoint1"
                        },
                        {
                            "type":"STRING",
                            "value":"splitPoint2"
                        },
                        {
                            "type":"STRING",
                            "value":"splitPoint3"
                        },
                        {
                            "type":"INF_MAX"
                        }
                    ],
                    "end":[
                        {
                            "type":"INF_MAX"
                        },
                        {
                            "type":"INF_MAX"
                        },
                        {
                            "type":"STRING",
                            "value":"end1"
                        },
                        {
                            "type":"INT",
                            "value":"100"
                        }
                    ],
                    "begin":[
                        {
                            "type":"INF_MIN"
                        },
                        {
                            "type":"INF_MIN"
                        },
                        {
                            "type":"STRING",
                            "value":"begin1"
                        },
                        {
                            "type":"INT",
                            "value":"0"
                        }
                    ]
                },
                "table":""// The name of the table from which you want to read data. 
            },
            "name":"Reader",
            "category":"reader"
        },
        { 
            "stepType":"stream",
            "parameter":{},
            "name":"Writer",
            "category":"writer"
        }
    ],
    "setting":{
        "errorLimit":{
            "record":"0"// The maximum number of dirty data records allowed. 
        },
        "speed":{
            "throttle":true,// Specifies whether to enable bandwidth throttling. The value false indicates that bandwidth throttling is disabled, and the value true indicates that bandwidth throttling is enabled. The mbps parameter takes effect only when the throttle parameter is set to true. 
            "concurrent":1 // The maximum number of parallel threads. 
            "mbps":"12"// The maximum transmission rate.
        }
    },
    "order":{
        "hops":[
            {
                "from":"Reader",
                "to":"Writer"
            }
        ]
    }
}