Use DataX to migrate historical time series data from an OpenTSDB database to a Time Series Database (TSDB) instance. DataX is an open source offline data synchronization tool developed by Alibaba Group. It supports data synchronization between various data sources, including MySQL, Oracle, SQL Server, PostgreSQL, Hadoop Distributed File System (HDFS), Hive, AnalyticDB for MySQL, HBase, Tablestore (OTS), MaxCompute (previously known as Open Data Processing Service (ODPS)), and Distributed Relational Database Service (DRDS).
How it works
DataX orchestrates the migration through two plug-ins:
OpenTSDB Reader: Queries data from the OpenTSDB database backed by ApsaraDB for HBase.
TSDB Writer: Writes data points to the TSDB database by calling the HTTP endpoint
/api/put.
Each plug-in runs as a parallel process. Both the OpenTSDB source (ApsaraDB for HBase) and the TSDB destination must be reachable from every process in the migration job. If either endpoint is inaccessible, the job throws a connection exception.
Prerequisites
Before you begin, ensure that you have:
A Linux host with network access to both the OpenTSDB (ApsaraDB for HBase) cluster and the TSDB HTTP endpoint
Java Development Kit (JDK) 1.8 or later installed (JDK 1.8 recommended)
Python 2.6.x installed
OpenTSDB 2.3.x (DataX is compatible with OpenTSDB 2.3.x only; other versions may cause compatibility issues)
TSDB 2.4.x or later (earlier versions may cause compatibility issues)
Set up and verify DataX
Before running a migration, verify that DataX is installed and working correctly using the built-in Stream Reader and Stream Writer plug-ins. These plug-ins have no external dependencies and simulate a simple read/write cycle.
Download the DataX package and extract it to a directory. The rest of this guide refers to that directory as
DATAX_HOME.Run the built-in smoke test:
cd ${DATAX_HOME} python bin/datax.py job/job.jsonConfirm the output matches the following summary. A non-zero value in Read and write failures indicates an installation problem.
Task start time: 2019-04-26 11:18:07 Task end time: 2019-04-26 11:18:17 Execution time: 10s Average traffic: 253.91KB/s Write rate: 10000rec/s Records obtained: 100000 Read and write failures: 0
Migrate data from OpenTSDB to TSDB
Configure the migration task
Create a configuration file named opentsdb2tsdb.json in the parent directory of datax/:
{
"job": {
"content": [
{
"reader": {
"name": "opentsdbreader",
"parameter": {
"endpoint": "http://192.168.1.100:4242",
"column": ["m"],
"beginDateTime": "2019-01-01 00:00:00",
"endDateTime": "2019-01-01 03:00:00"
}
},
"writer": {
"name": "tsdbhttpwriter",
"parameter": {
"endpoint": "http://192.168.1.101:8242"
}
}
}
],
"setting": {
"speed": {
"channel": 1
}
}
}
}OpenTSDB Reader parameters
| Parameter | Type | Required | Description | Default | Example |
|---|---|---|---|---|---|
endpoint | String | Yes | HTTP endpoint of the OpenTSDB database | None | http://127.0.0.1:4242 |
column | Array | Yes | Metrics to migrate | [] | ["m"] |
beginDateTime | String | Yes | Start of the time range to migrate. Used with endDateTime. Minutes and seconds are ignored — the value is rounded down to the nearest hour. For example, 2019-04-18 03:35:00 becomes 2019-04-18 03:00:00. | None | 2019-05-13 15:00:00 |
endDateTime | String | Yes | End of the time range to migrate. Used with beginDateTime. Subject to the same hour-rounding behavior as beginDateTime. | None | 2019-05-13 17:00:00 |
TSDB Writer parameters
| Parameter | Type | Required | Description | Default | Example |
|---|---|---|---|---|---|
endpoint | String | Yes | HTTP endpoint of the TSDB database | None | http://127.0.0.1:8242 |
batchSize | Integer | No | Number of records written per batch. Must be greater than 0. | 100 | 100 |
maxRetryTime | Integer | No | Maximum number of retries after a write failure. Must be greater than 1. | 3 | 3 |
ignoreWriteError | Boolean | No | If true, write errors are ignored and the job continues. If false, the job stops when the retry limit is exceeded. | false | false |
Run the migration
cd ${DATAX_HOME}/..
ls
# datax/ datax.tar.gz opentsdb2tsdb.json
python datax/bin/datax.py opentsdb2tsdb.jsonVerify the results
A successful migration produces output similar to:
Task start time: 2019-04-26 11:47:06
Task end time: 2019-04-26 11:47:16
Execution time: 10s
Average traffic: 98.92KB/s
Write rate: 868rec/s
Records obtained: 8685
Read and write failures: 0Read and write failures: 0 confirms that all records were transferred without errors. If this value is greater than 0, see Troubleshooting for common failure causes.
Troubleshooting
Connection exception on start
The migration job fails immediately with a connection exception if the ApsaraDB for HBase cluster or the TSDB HTTP endpoint (/api/put) is unreachable from the host running DataX.
Steps to resolve:
Verify network connectivity from the migration host to the OpenTSDB endpoint (default port: 4242).
Verify network connectivity from the migration host to the TSDB endpoint (default port: 8242).
Confirm that both endpoints are reachable from every parallel process before retrying.
Non-zero read and write failures
Check the DataX task log for the specific error. Common causes and fixes:
| Cause | Fix |
|---|---|
| Network interruption between the migration host and the TSDB endpoint | Restore network connectivity and retry |
| TSDB write quota exceeded | Reduce batchSize or lower the channel concurrency setting |
| Retry limit reached | Increase maxRetryTime, or set ignoreWriteError to true to skip failed records and complete the job |
Unexpected time range in migrated data
beginDateTime and endDateTime are automatically rounded down to the nearest hour. For example, a range of [03:35, 04:55) becomes [03:00, 04:00). Recalculate your time range in whole hours and re-run the migration.
FAQ
Can I change the Java Virtual Machine (JVM) memory size for a migration job?
Yes. Pass the JVM flags directly when running DataX:
python datax/bin/datax.py opentsdb2tsdb.json -j "-Xms4096m -Xmx4096m"