This topic describes the data types and parameters that are supported by Tablestore Reader and how to configure Tablestore Reader by using the codeless user interface (UI) and code editor.
- Reads data from the entire table.
- Reads data based on the specified range.
- Reads data from the specified shard.
Tablestore is a NoSQL database service that is built on the Apsara distributed operating system and allows you to store and access large amounts of structured data in real time. Tablestore organizes data into instances and tables. It can seamlessly expand the data scale by using data sharding and load balancing technologies.
Tablestore Reader connects to the Tablestore server by using Tablestore SDK for Java and reads data from the server. Then, Tablestore Reader converts the data into a format that is readable to Data Integration based on the official data synchronization protocols, and sends the converted data to a writer.
Tablestore Reader splits a synchronization node into multiple concurrent tasks based on the table range to synchronize data in a Tablestore table. Each Tablestore Reader thread runs a task.
Category | Tablestore data type |
---|---|
Integer | INTEGER |
Floating point | DOUBLE |
String | STRING |
Boolean | BOOLEAN |
Binary | BINARY |
Parameters
Parameter | Description | Required | Default value |
---|---|---|---|
endpoint | The endpoint of the Tablestore server. For more information, see Endpoints. | Yes | No default value |
accessId | The AccessKey ID of the account that you use to connect to the Tablestore server. | Yes | No default value |
accessKey | The AccessKey secret of the account that you use to connect to the Tablestore server. | Yes | No default value |
instanceName | The name of the Tablestore instance. The instance is an entity for you to use and
manage Tablestore.
After you activate Tablestore, you must create an instance in the Tablestore console before you can create and manage tables. Instances are the basic units that you can use to manage Tablestore resources. Access control and resource metering for applications are implemented at the instance level. |
Yes | No default value |
table | The name of the table from which you want to read data. You can specify only one table. Multi-table synchronization is not required for Tablestore. | Yes | No default value |
column | The names of the columns from which you want to read data. Specify the names in a
JSON array. Tablestore is a NoSQL database service. You must specify column names
for Tablestore Reader to read data.
|
Yes | No default value |
begin and end | The Tablestore table range from which you want to read data. You must specify both
or neither of the two parameters.
The begin and end parameters specify a range for primary key columns in the Tablestore table. Make
sure that you specify a range for each primary key column in the table. If you do
not need to limit a range, specify the parameters as
{"type":"INF_MIN"} and {"type":"INF_MAX"} . The type parameter specifies the type of the data that you want to read.
Note
For example, to read data from a Tablestore table with the primary keys of
[DeviceID, SellerID] , specify the begin and end parameters in one of the following ways:
|
Yes | Left empty |
split | The custom rule for data sharding. This parameter is an advanced configuration item.
We recommend that you do not set this parameter.
If data is unevenly distributed in a Tablestore table and the automatic sharding feature of Tablestore Reader fails to work, you can customize a sharding rule. The sharding rule that is specified by the split parameter must fall in the range that is specified by the begin and end parameters and must be the values of the partition key. This means that you specify only the values of the partition key instead of the values of all the primary key columns in the split parameter. To read data from a Tablestore table with the primary keys of
[DeviceID, SellerID] , specify the following parameters:
|
No | No default value |
Configure Tablestore Reader by using the codeless UI
This method is not supported.
Configure Tablestore Reader by using the code editor
{
"type":"job",
"version":"2.0",// The version number.
"steps":[
{
"stepType":"ots",// The reader type.
"parameter":{
"datasource":"",// The name of the data source.
"column":[// The names of the columns from which you want to read data.
{
"name":"column1"// The name of the column.
},
{
"name":"column2"
},
{
"name":"column3"
},
{
"name":"column4"
},
{
"name":"column5"
}
],
"range":{
"split":[
{
"type":"INF_MIN"
},
{
"type":"STRING",
"value":"splitPoint1"
},
{
"type":"STRING",
"value":"splitPoint2"
},
{
"type":"STRING",
"value":"splitPoint3"
},
{
"type":"INF_MAX"
}
],
"end":[
{
"type":"INF_MAX"
},
{
"type":"INF_MAX"
},
{
"type":"STRING",
"value":"end1"
},
{
"type":"INT",
"value":"100"
}
],
"begin":[
{
"type":"INF_MIN"
},
{
"type":"INF_MIN"
},
{
"type":"STRING",
"value":"begin1"
},
{
"type":"INT",
"value":"0"
}
]
},
"table":""// The name of the table from which you want to read data.
},
"name":"Reader",
"category":"reader"
},
{
"stepType":"stream",
"parameter":{},
"name":"Writer",
"category":"writer"
}
],
"setting":{
"errorLimit":{
"record":"0"// The maximum number of dirty data records allowed.
},
"speed":{
"throttle":true,// Specifies whether to enable bandwidth throttling. The value false indicates that bandwidth throttling is disabled, and the value true indicates that bandwidth throttling is enabled. The mbps parameter takes effect only when the throttle parameter is set to true.
"concurrent":1 // The maximum number of parallel threads.
"mbps":"12"// The maximum transmission rate.
}
},
"order":{
"hops":[
{
"from":"Reader",
"to":"Writer"
}
]
}
}