OTSStream is a plug-in that allows you to export incremental data from Tablestore. This topic describes how to configure an OTSStream sync node.
Background information
Unlike plug-ins that are used to export full data, OTSStream supports only the multi-version mode and cannot be used to export data in specified columns. Incremental data can be considered as operations logs that include data and operation information. For more information, see Configure OTSStream Reader.
- If the node is scheduled to run by day, the node reads the data that is generated during the last 24 hours, but does not read the data that is generated in the last 5 minutes. We recommend that you schedule the node to run at intervals of hours.
- The end time that you specify cannot be later than the current system time. Therefore, the end time must be at least 5 minutes earlier than the scheduled time to run the node.
- When the node is scheduled to run by day, the read data may be incomplete.
- The node cannot be scheduled to run by week or month.
The time period from the start time to the end time must include the time when operations are performed on the Tablestore table. Assume that you inserted two data records to a Tablestore table at 16:20:00 on October 19, 2017. You can set the start time to 20171019161000 and the end time to 20171019162600.
Create a connection
Configure a sync node on the codeless user interface (UI)
Configure a batch sync node by using the code editor
To configure the batch sync node by using the code editor, click Switch to Code Editor in the toolbar, and then click OK.
{
"type": "job",
"version": "1.0",
"configuration": {
"reader": {
"plugin": "otsstream",
"parameter": {
"datasource": "otsstream",// The name of the connection. Use the name of the connection that you have created.
"dataTable": "person",// The name of the table from which the incremental data is exported. You must enable the Stream feature for the table when you create the table, or by calling the UpdateTable operation after you create the table.
"startTimeString": "${startTime}",// The start time (included) in milliseconds of the incremental data. The format is yyyymmddhh24miss.
"endTimeString": "${endTime}",// The end time (excluded) in milliseconds of the incremental data.
"statusTable":"TableStoreStreamReaderStatusTable",// The name of the table that is used to store status records.
"maxRetries": 30,// The maximum number of retries of each request.
"isExportSequenceInfo": false,
}
},
"writer": {
"plugin": "odps",
"parameter": {
"datasource":"odps_first",// The name of the connection.
"table": "person",// The name of the destination table.
"truncate": true,
"partition": "pt=${bdp.system.bizdate}",// The information about the partition.
"column": [// The column to which data is written.
"id",
"colname",
"version",
"colvalue",
"optype",
"sequenceinfo"
]
}
},
"setting": {
"speed": {
"mbps": 7,// The maximum transmission rate.
"concurrent": 7// The maximum number of concurrent threads.
}
}
}
}
"startTimeString": "${startTime}",
// The start time (included) in milliseconds of the incremental data. The format is yyyymmddhh24miss."endTimeString": "${endTime}",
// The end time (excluded) in milliseconds of the incremental data. The format is yyyymmddhh24miss."startTimestampMillis":""
: The start time (included) in milliseconds of the incremental data.OTSStream Reader searches for the status records in the table that is specified by the statusTable parameter based on the time that is specified by the startTimestampMillis parameter and reads data from this time point.
If OTSStream Reader cannot find status records of this time point in the table that is specified by the statusTable parameter, OTSStream Reader reads incremental data that is retained by the system from the first entry, and skips the data that is written earlier than the time that is specified by the startTimestampMillis parameter.
"endTimestampMillis":" "
: The end time (excluded) in milliseconds of the incremental data.OTSStream Reader reads data from the time that is specified by the startTimestampMilli parameter and stops to read data that is written later than or equal to the time that is specified by the endTimestampMilli parameter.
When OTSStream Reader reads all the incremental data, the reading process is ended even if the time that is specified by the endTimestampMillis parameter has not arrived.
If isExportSequenceInfo is set to true ("isExportSequenceInfo": true
), the system exports an extra column for time series information. The time series
information contains the time when data is written. The default value is false, which indicates that no time series information is exported.