This topic describes how to use DataX to migrate data from an OpenTSDB database to a Time Series Database (TSDB) database. DataX is an open source tool developed by Alibaba Group.
Background
This section describes DataX, OpenTSDB Reader, and TSDB Writer. OpenTSDB Reader and TSDB Writer are the plug-ins used to migrate data and are powered by DataX. , see .
DataX
DataX is an offline data synchronization tool that is widely used within Alibaba Group. You can use DataX to efficiently synchronize data between various disparate data sources, including MySQL, Oracle, SQL Server, PostgreSQL, Hadoop Distributed File System (HDFS), Hive, AnalyticDB for MySQL, HBase, Tablestore (OTS), MaxCompute, and Distributed Relational Database Service (DRDS). MaxCompute is previously known as Open Data Processing Service (ODPS).
OpenTSDB Reader
OpenTSDB Reader is a plug-in powered by DataX. You can use OpenTSDB Reader to query data from an OpenTSDB database.
TSDB Writer
TSDB Writer is a plug-in powered by DataX. You can use TSDB Writer to write data points to a TSDB database that is developed by Alibaba Cloud.
Note
Make sure that the TSDB database is accessible to each process of the migration task.
TSDB Writer calls the HTTP endpoint
/api/put
to write data. If you need to migrate data, make sure that each process of the migration task can access the HTTP endpoint provided by the TSDB database. Otherwise, a connection exception is thrown.
Make sure that ApsaraDB for HBase that serves as the underlying storage for OpenTSDB is accessible to each process of the migration task.
OpenTSDB Reader queries data from ApsaraDB for HBase. If you need to migrate data, make sure that each process of the migration task can access the ApsaraDB for HBase cluster. Otherwise, a connection exception is thrown.
The specified start time and end time are automatically rounded down to the hour.
The specified start time and end time are automatically rounded down to the hour. For example, if the specified time range is
[3:35, 4:55)
onApril 18, 2019
, the time range is rounded to[3:00, 4:00)
.
Procedure
Configure an environment and install tools
Linux
Install Java Development Kit (JDK) 1.8 or later. We recommend that you use JDK 1.8. You can download JDK from the
Install Python. We recommend that you use Python 2.6.x. You can download Python from the
DataX is compatible only with OpenTSDB 2.3.x. If a version of OpenTSDB other than 2.3.x is used, compatibility issues can occur.
DataX is compatible only with TSDB 2.4.x or later. If an earlier version of TSDB is used, compatibility issues can occur.
Download DataX and the plug-ins.
Use the built-in script provided by DataX to test whether data can be migrated as expected.
The plug-ins used in the test are Stream Reader and Stream Writer. Stream Reader and Stream Writer do not require external dependencies. Therefore, Stream Reader and Stream Writer are suitable for testing whether data can be migrated as expected. Stream Reader and Stream Writer are used to simulate a simple data migration process. Stream Reader generates random character strings. Stream Writer receives the strings and prints them to your CLI.
Install DataX and the plug-ins.
Extract the DataX installation package to a specified directory. For example, you can specify a directory named DATAX_HOME. Then, you can use DataX to migrate data. You can use the following sample code:
$ cd ${DATAX_HOME} $ python bin/datax.py job/job.json
Check whether data is migrated as expected.
The following sample shows the summary information returned if the data is migrated as expected:
Task start time: 2019-04-26 11:18:07 Task end time: 2019-04-26 11:18:17 Execution time: 10s Average traffic: 253.91KB/s Write rate: 10000rec/s Records obtained: 100000 Read and write failures: 0
Configure and start a task to migrate data from an OpenTSDB database to a TSDB database.
Configure a task to migrate data.
Configure a task named
opentsdb2tsdb.json
to migrate data from an OpenTSDB database to a TSDB database. You can use the following sample code to configure the task:{ "job":{ "content":[ { "reader":{ "name":"opentsdbreader", "parameter":{ "endpoint":"http://192.168.1.100:4242", "column":[ "m" ], "startTime":"2019-01-01 00:00:00", "endTime":"2019-01-01 03:00:00" } }, "writer":{ "name":"tsdbhttpwriter", "parameter":{ "endpoint":"http://192.168.1.101:8242" } } } ], "setting":{ "speed":{ "channel":1 } } } }
The following tables describe the parameters.
Parameters for OpenTSDB Reader
Parameter
Type
Required
Description
Default value
Example
endpoint
String
Yes
The HTTP endpoint of the OpenTSDB database.
None
http://127.0.0.1:4242
column
Array
Yes
The metrics that you want to migrate.
[]
["m"]
beginDateTime
String
Yes
The beginning of the time range to migrate. This parameter is used together with the endDateTime parameter.
None
2019-05-13 15:00:00
endDateTime
String
Yes
The end of the time range to migrate. This parameter is used together with the beginDateTime parameter.
None
2019-05-13 17:00:00
Parameters for TSDB Writer
Parameter
Type
Required
Description
Default value
Example
endpoint
String
Yes
The HTTP endpoint of the TSDB database.
None
http://127.0.0.1:8242
batchSize
Integer
No
The number of data records that you want to migrate at a time. The value must be an integer greater than 0.
100
100
maxRetryTime
Integer
No
The maximum number of retries allowed after a failure occurs. The value must be an integer greater than 1.
3
3
ignoreWriteError
Boolean
No
Specifies whether to ignore the maxRetryTime parameter. If the ignoreWriteError parameter is set to true, the system ignores write errors and attempts to write data again. If the ignoreWriteError parameter is set to false, the task for writing data is terminated when the maximum number of retries is exceeded.
false
false
Start the migration task.
$ cd ${DATAX_HOME}/.. $ ls datax/ datax.tar.gz opentsdb2tsdb.json $ python datax/bin/datax.py opentsdb2tsdb.json
Check whether data is migrated as expected.
The following sample shows the summary information returned if the data is migrated as expected:
Task start time: 2019-04-26 11:47:06 Task end time: 2019-04-26 11:47:16 Execution time: 10s Average traffic: 98.92KB/s Write rate: 868rec/s Records obtained: 8685 Read and write failures: 0
FAQ
Can I change the Java Virtual Machine (JVM) memory size for a migration process?
Yes, you can change the JVM memory size for a migration process. If you want to change the JVM memory size for a task that migrates data from an OpenTSDB database to a TSDB database, run the following command:
python datax/bin/datax.py opentsdb2tsdb.json -j "-Xms4096m -Xmx4096m"