Background information
This topic describes how to use DataX to migrate data from OpenTSDB to Time Series Database (TSDB). DataX is an open source tool that is provided by Alibaba Cloud for data synchronization.
For more information about how to use DataX, see README.
This topic introduces the DataX tool, and the OpenTSDB Reader and TSDB Writer plug-ins that are used in the sample migration task.
DataX
DataX is an offline data synchronization tool that is widely used in Alibaba Group. DataX provides an efficient method to synchronize data between disparate data sources, such as MySQL, Oracle, SQL Server, PostgreSQL, Hadoop Distributed File System (HDFS), Hive, AnalyticDB for MySQL, HBase, Tablestore, MaxCompute, and Distributed Relational Database Service (DRDS).
OpenTSDB Reader
OpenTSDB Reader is a DataX plug-in that reads data from OpenTSDB.
TSDB Writer
TSDB Writer is a DataX plug-in that allows you to write data points into TSDB. TSDB is developed by Alibaba Cloud.
Quick start
Step 1: Prepare the environment
- Linux
- JDK (Only version 1.8 and later are supported. We recommend that you use version 1.8.)
- Python (We recommend that you use Python 2.6.x.)
- OpenTSDB (DataX is currently compatible with only OpenTSDB 2.3.x.)
- TSDB (DataX is currently compatible with only TSDB 2.4.x and later.)
Step 2: Download DataX and its plug-ins
To download DataX and its plug-ins, click here.
Step 3: Use the default migration task of DataX to walk through the migration process
This topic uses an example to describe the migration process. In this example, Stream Reader and Stream Writer are used for data migration. These two plug-ins do not depend on external environments, and therefore are suitable to be used in the test. Stream Reader generates random strings, and Stream Writer receives the strings and prints the strings to the screens. This simulates a simple data migration process.
Deploy the tool
Extract the downloaded installation package to a directory, for example, DATAX_HOME, and start the migration task.
$ cd ${DATAX_HOME}
$ python bin/datax.py job/job.json
Check whether the task is successful
You can view the summary after the migration task is complete. The following information indicates that the migration task is successful.
Time of task startup : 2019-04-26 11:18:07
Time of task end : 2019-04-26 11:18:17
Total time elapsed : 10s
Average traffic of the task : 253.91KB/s
Record writing speed : 10000rec/s
Total records read : 100000
Total read/write failures : 0
To view the recorded command lines, visit this web page: Quick start for data migration.
Step 4: Configure and start the task for migrating data from OpenTSDB to TSDB
The sample migration task that uses Stream Reader and Stream Writer shows that DataX can be used to migrate data as expected. Now, you can start migrating data from OpenTSDB to TSDB. OpenTSDB Reader and TSDB Writer can be used for data migration.
Configure a migration task
Configure a task named opentsdb2tsdb.json
to synchronously migrate data from OpenTSDB to TSDB. The complete configuration information is described as follows. For more information about each parameter, see the “Parameters” section.
{
"job": {
"content": [
{
"reader": {
"name": "opentsdbreader",
"parameter": {
"endpoint": "http://192.168.1.100:4242",
"column": [
"m"
],
"startTime": "2019-01-01 00:00:00",
"endTime": "2019-01-01 03:00:00"
}
},
"writer": {
"name": "tsdbhttpwriter",
"parameter": {
"endpoint": "http://192.168.1.101:8242"
}
}
}
],
"setting": {
"speed": {
"channel": 1
}
}
}
}
Start the OpenTSDB-to-TSDB migration task
$ cd ${DATAX_HOME}/..
$ ls
datax/ datax.tar.gz opentsdb2tsdb.json
$ python datax/bin/datax.py opentsdb2tsdb.json
Check whether the task is successful
You can view the summary after the migration task is complete. The following information indicates that the migration task is successful.
Time of task startup : 2019-04-26 11:47:06
Time of task end : 2019-04-26 11:47:16
Total time elapsed : 10s
Average traffic of the task : 98.92KB/s
Record writing speed : 868rec/s
Total records read : 8685
Total read/write failures : 0
To view the recorded command line, visit this web page: Migrate data from OpenTSDB to TSDB.
Parameters
The following tables describe the relevant parameters.
OpenTSDB Reader parameters
Parameter | Type | Required | Description | Default value | Example |
---|---|---|---|---|---|
endpoint | String | Yes | The endpoint that is used to connect to the source OpenTSDB database through HTTP. | N/A | http://127.0.0.1:4242 |
column | Array | Yes | The metrics to be migrated. | [] |
["m"] |
beginDateTime | String | Yes | The start time of a specified time range. The data points during the time range are to be migrated. You can determine the time range by specifying the beginDateTime and endDateTime parameters. | N/A | 2019-05-13 15:00:00 |
endDateTime | String | Yes | The end time of a specified time range. The data points during the time range are to be migrated. You can determine the time range by specifying the beginDateTime and endDateTime parameters. | N/A | 2019-05-13 17:00:00 |
TSDB Writer parameters
Parameter | Type | Required | Description | Default value | Example |
---|---|---|---|---|---|
endpoint | String | Yes | The endpoint that is used to connect to the destination TSDB database through HTTP. | N/A | http://127.0.0.1:8242 |
batchSize | Integer | No | The number of records that are written for each batch. The value must be greater than 0. | 100 | 100 |
maxRetryTime | Integer | No | The number of retries after a failure occurs. The value must be greater than 1. | 3 | 3 |
ignoreWriteError | Boolean | No | Specifies whether to ignore write errors. If you set this parameter to true, write errors are ignored and the write task continues. Otherwise, the write task stops. | false | false |
Considerations
Ensure network connection with TSDB
TSDB Writer writes data by using the HTTP API. The specific API endpoint is /api/put
. Therefore, the migration task processes must be able to access the HTTP API that is provided by TSDB. Otherwise, a connection error occurs.
Ensure network connection with HBase
OpenTSDB Reader reads data by establishing direct connections to HBase. HBase is the underlying data storage system for OpenTSDB. Therefore, you must make sure that the migration task processes are connected to HBase clusters as expected. Otherwise, a connection error occurs.
Retain only the hour components for the specified start time and end time
If you specify the start time and end time, the minute and second components are automatically ignored. For example, if you specify [3:35, 4:55)
on 2019-4-18
, [3:00, 4:00)
is used.
FAQ
Question: Can I change the JVM memory size for a migration process?
Answer: Yes, you can change the JVM memory size for a migration process. For example, if you migrate data from OpenTSDB to TSDB, run the following command to change the JVM memory size:
python datax/bin/datax.py opentsdb2tsdb.json -j "-Xms4096m -Xmx4096m"
Question: How can I set an IP address whitelist for TSDB?
Answer: To view the detailed procedure, you can navigate through Quick Start > Set the IP address whitelist in the TSDB documentation.
Question: How can I set an IP address whitelist for ApsaraDB for HBase?
Answer: To view the detailed procedure, you can navigate through Operation and Maintenance Guide > Configure the whitelist in the ApsaraDB for HBase documentation.
Question: How can I configure Virtual Private Cloud (VPC) settings if I run a migration task on an Elastic Compute Service (ECS) instance? What are the frequently asked questions about VPC?
Answer: For more information, see Cases for configuring ECS security groups and VPC FAQ.