This topic describes the data types and parameters supported by Tablestore Reader and how to configure it by using the code editor.

Tablestore Reader can read incremental data from Tablestore based on the specified range. Tablestore Reader can read incremental data in the following ways:
  • Reads data from the entire table.
  • Reads data based on the specified range.
  • Reads data from the specified shard.

Tablestore is a NoSQL database service that is built on the Apsara distributed operating system. The service allows you to store and access large volumes of structured data in real time. Tablestore organizes data into instances and tables. It uses data sharding and load balancing technologies to seamlessly expand the data scale.

Tablestore Reader connects to the Tablestore server by using Tablestore SDK for Java and reads data from the server. Then, Tablestore Reader converts the data into a format that is readable to Data Integration based on the official data synchronization protocols, and sends the converted data to a writer.

Tablestore Reader splits a synchronization node into multiple concurrent tasks based on the table range to synchronize data in a Tablestore table. Each Tablestore Reader thread runs a task.

Tablestore Reader supports all Tablestore data types. The following table lists the data types.
Category Tablestore data type
Integer INTEGER
Floating point DOUBLE
String STRING
Boolean Boolean
Binary BINARY
Note Tablestore does not support DATE-type data. Applications use the LONG-type UNIX timestamp to indicate time.

Parameters

Parameter Description Required Default value
endpoint The endpoint of the Tablestore server. For more information, see Endpoints. Yes None
accessId The AccessKey ID of the account that you use to connect to the Tablestore server. Yes None
accessKey The AccessKey secret of the account that you use to connect to the Tablestore server. Yes None
instanceName The name of the Tablestore instance. The instance is an entity for you to use and manage Tablestore.

After you activate the Tablestore service, you must create an instance in the Alibaba Cloud Management Console before you can create and manage tables.

Instances are the basic unit that you can use to manage Tablestore resources. Access control and resource measurement for applications are implemented at the instance level.

Yes None
table The name of the source table. You can specify only one table as the source table. Multi-table synchronization is not required for Tablestore. Yes None
column The columns that you want to synchronize from the source table. The columns are described in a JSON array. Tablestore is a NoSQL database service. You must specify column names for Tablestore Reader to read data.
  • You can specify common columns. For example, you can specify {"name":"col1"} for Tablestore Reader to read data in column 1.
  • You can specify partial columns. Tablestore Reader reads only the specified columns.
  • You can specify constant columns. For example, you can specify {"type":"STRING", "value":"DataX"} for Tablestore Reader to read the column in which data is of the STRING type and the data value is DataX. The type parameter specifies the constant type. The supported types are STRING, INT, DOUBLE, Boolean, BINARY, INF_MIN, and INF_MAX. If the constant type is BINARY, the constant value must be Base64-encoded. INF_MIN indicates the minimum value specified by Tablestore, and INF_MAX indicates the maximum value specified by Tablestore. If you set the type to INF_MIN or INF_MAX, do not set the value. If you set the value, errors may occur.
  • You cannot specify a function or custom expression. This is because Tablestore does not provide functions or expressions that are similar to those of SQL. Tablestore Reader cannot read columns that contain functions or expressions.
Yes None
begin and end The Tablestore table range from which you want to read data. You can specify both or neither of the two parameters.
The begin and end parameters define the value ranges of primary key columns in the Tablestore table. Make sure that you specify the value ranges for all primary key columns in the table. If you do not need to limit a range, specify the parameters as {"type":"INF_MIN"} and {"type":"INF_MAX"}. The type parameter specifies the type of the data that you want to read.
Note
  • Make sure that the number of primary keys is the same as that of the values for begin and that of the values for end. For example, the Tablestore table has n primary keys, and n is greater than or equal to 1. In this case, you must specify n values for both begin and end.
  • If the Tablestore table has multiple primary keys and the value range of the first scanned primary key is (INF_MIN,INF_MAX), Tablestore Reader does not scan other primary keys. Instead, it extracts all data from the table.
For example, to read data from a Tablestore table with the primary key of [DeviceID, SellerID], specify the begin and end parameters in one of the following ways:
  • Example 1:
    Extract INT-type data whose DeviceID is in the range of (INF_MIN,INF_MAX) and SellerID is in the range of (0,9999).
    "range": {
          "begin": [
            {"type":"INF_MIN"},  // The minimum value of the DeviceID field.
            {"type":"INT", "value":"0"}  // The minimum value of the SellerID field.
          ], 
          "end": [
            {"type":"INF_MAX"}, // The maximum value of the DeviceID field.
            {"type":"INT", "value":"9999"} // The maximum value of the SellerID field.
          ]
        }
  • Example 2:
    Extract full data from the Tablestore table.
    "range": {
          "begin": [
            {"type":"INF_MIN"},  // The minimum value of the DeviceID field.
            {"type":"INF_MIN"} // The minimum value of the SellerID field.
          ], 
          "end": [
            {"type":"INF_MAX"}, // The maximum value of the DeviceID field.
              {"type":"INF_MAX"} // The maximum value of the SellerID field.
          ]
        }
Yes None
split The custom rule for data sharding. This parameter is an advanced configuration item. We recommend that you do not set this parameter.

If data is unevenly distributed in a Tablestore table and the automatic sharding feature of Tablestore Reader fails to work, you can customize a sharding rule.

The sharding rule that is specified by the split parameter must fall in the range that is specified by the begin and end parameters and must be the values of the partition key. This means that you specify only the values of the partition key instead of the values of primary key columns in the split parameter.

To read data from a Tablestore table with the primary key of [DeviceID, SellerID], specify the following parameters:
"range": {
      "begin": {
        {"type":"INF_MIN"},  // The minimum value of the DeviceID field.
        {"type":"INF_MIN"}  // The minimum value of the SellerID field.
      }, 
      "end": {
        {"type":"INF_MAX"}, // The maximum value of the DeviceID field.
        {"type":"INF_MAX"} // The maximum value of the SellerID field.
      },
       // The specified sharding rule. If you specify a sharding rule, the synchronization node is split into concurrent tasks based on the values of the begin, end, and split parameters. Data is sharded based only on the partition key, which the first column of the primary key.
       // The data type of the partition key can be INF_MIN, INF_MAX, STRING, or INT.
            "split":[
                                {"type":"STRING", "value":"1"},
                                {"type":"STRING", "value":"2"},
                                {"type":"STRING", "value":"3"},
                                {"type":"STRING", "value":"4"},
                                {"type":"STRING", "value":"5"}
                    ]
    }
No None

Configure Tablestore Reader by using the codeless UI

This method is not supported.

Configure Tablestore Reader by using the code editor

You can configure Tablestore Reader by using the code editor. For more information, see Create a sync node by using the code editor. The following code shows a configuration example:
{
    "type":"job",
    "version":"2.0",// The version number.
    "steps":[
        {
            "stepType":"ots",// The reader type.
            "parameter":{
                "datasource":"", // The data source.
                "column":[// The columns that you want to synchronize from the source table.
                    {
                        "name":"column1"// The name of the column.
                    },
                    {
                        "name":"column2"
                    },
                    {
                        "name":"column3"
                    },
                    {
                        "name":"column4"
                    },
                    {
                        "name":"column5"
                    }
                ],
                "range":{
                    "split":[
                        {
                            "type":"INF_MIN"
                        },
                        {
                            "type":"STRING",
                            "value":"splitPoint1"
                        },
                        {
                            "type":"STRING",
                            "value":"splitPoint2"
                        },
                        {
                            "type":"STRING",
                            "value":"splitPoint3"
                        },
                        {
                            "type":"INF_MAX"
                        }
                    ],
                    "end":[
                        {
                            "type":"INF_MAX"
                        },
                        {
                            "type":"INF_MAX"
                        },
                        {
                            "type":"STRING",
                            "value":"end1"
                        },
                        {
                            "type":"INT",
                            "value":"100"
                        }
                    ],
                    "begin":[
                        {
                            "type":"INF_MIN"
                        },
                        {
                            "type":"INF_MIN"
                        },
                        {
                            "type":"STRING",
                            "value":"begin1"
                        },
                        {
                            "type":"INT",
                            "value":"0"
                        }
                    ]
                },
                "table":""// The name of the source table.
            },
            "name":"Reader",
            "category":"reader"
        },
        { 
            "stepType":"stream",
            "parameter":{},
            "name":"Writer",
            "category":"writer"
        }
    ],
    "setting":{
        "errorLimit":{
            "record":"0"// The maximum number of dirty data records allowed.
        },
        "speed":{
            "throttle":false,// Specifies whether to enable bandwidth throttling. The value false indicates that bandwidth throttling is disabled, and the value true indicates that bandwidth throttling is enabled. The concurrent parameter takes effect only when the throttle parameter is set to true.
            "concurrent":1 // The maximum number of concurrent threads.
        }
    },
    "order":{
        "hops":[
            {
                "from":"Reader",
                "to":"Writer"
            }
        ]
    }
}