Background information

For information about DataX, see README.

This topic describes DataX and the two plug-ins that are used to migrate data: OpenTSDB Reader and TSDB Writer.

DataX

DataX is an offline data synchronization tool that is widely used within Alibaba Group. You can use DataX to efficiently synchronize data between various disparate data sources such as MySQL, Oracle, SQL Server, PostgreSQL, Hadoop Distributed File System (HDFS), Hive, AnalyticDB for MySQL, HBase, Tablestore (OTS), MaxCompute (previously known as ODPS), and Distributed Relational Database Service (DRDS).

OpenTSDB Reader

OpenTSDB Reader is a plug-in provided by DataX to read data from OpenTSDB.

TSDB Writer

TSDB Writer is a plug-in provided by DataX to write data to Lindorm TSDB.

Quick Start

Step 1: Prepare an environment
  • Linux
  • Java Development Kit (JDK): You can use 1.8 or later. We recommend that you use JDK 1.8.
  • Python: We recommend that you use Python 2.6.x.
  • OpenTSDB: We recommend that you use OpenTSDB 2.3.x. If you use other versions, compatibility issues may occur.
  • Lindorm TSDB: Only V2.4.x and later are supported. If you use other versions, compatibility issues may occur.
Step 2: Download DataX and the plug-ins

Click DataX to download DataX and the MySQL Reader and TSDB Writer plug-ins.

Step 3: Use the built-in script of DataX to test whether the data can be migrated as expected

The plug-ins used in the test are Stream Reader and Stream Writer. These plug-ins do not require external dependencies. Stream Reader and Stream Writer are used to simulate a simple data migration process. Stream Reader generates random character strings. Stream Writer receives the strings and prints them on your CLI.

  • Install DataX

    Decompress the installation package to a specified directory and then run the datax.py script to start the migration task.

    $ cd ${DATAX_HOME}
    $ python bin/datax.py job/job.json        
  • Check the migration result

    If the following information is returned, the data is migrated:

    Task start time                    : 2019-04-26 11:18:07
    Task end time                    : 2019-04-26 11:18:17
    Time consumed                    :                 10s
    Average traffic                    :          253.91KB/s
    Write rate                    :          10000rec/s
    Number of records obtained                    :              100000
    Number of write and read failures                    :                   0
                            

    For more information about the commands, watch Data migration quick start.

Step 4: Configure a task to migrate data from OpenTSDB to Lindorm TSDB

After the test performed in Step 3 confirms that DataX can be used to migrate data, you can use DataX to migrate data from OpenTSDB Reader to TSDB Writer.

  • Configure a migration task

    Configure a task to migrate data from OpenTSDB to Lindorm TSDB and name the task opentsdb2tsdb.json. The following code provides an example of the configurations. For more information about the parameters, see the "Parameter description" section.

    {
      "job": {
        "content": [
          {
            "reader": {
              "name": "opentsdbreader",
              "parameter": {
                "endpoint": "http://192.168.1.100:4242",
                "column": [
                  "m"
                ],
                "startTime": "2019-01-01 00:00:00",
                "endTime": "2019-01-01 03:00:00"
              }
            },
            "writer": {
              "name": "tsdbhttpwriter",
              "parameter": {
                "endpoint": "http://192.168.1.101:8242"
              }
            }
          }
        ],
        "setting": {
          "speed": {
            "channel": 1
          }
        }
      }
    }
                            
  • Start a task to migrate data from OpenTSDB to Lindorm TSDB
    $ cd ${DATAX_HOME}/..
    $ ls
      datax/  datax.tar.gz  opentsdb2tsdb.json
    $ python datax/bin/datax.py opentsdb2tsdb.json
                            
  • Check the migration result

    If the following information is returned, the data is migrated:

    Task start time                    : 2019-04-26 11:47:06
    Task end time                    : 2019-04-26 11:47:16
    Time consumed                    :                 10s
    Average traffic                    :          98.92KB/s
    Write rate                    :          868rec/s
    Number of records obtained                    :              8685
    Number of write and read failures                    :                   0
                            

    For more information about the commands, watch video Migrate data from OpenTSDB to Lindorm TSDB.

Parameter description

The following tables describe the parameters.

OpenTSDB Reader
Parameter Type Required Description Default value Example
endpoint String Yes The HTTP endpoint of OpenTSDB. None http://127.0.0.1:4242
column Array Yes The list of metrics to be migrated. [] ["m"]
beginDateTime String Yes The start time of the time range in which the data to be migrated was generated. This parameter is used together with the endDateTime parameter to specify the time range in which the data to be migrated was generated. None 2019-05-13 15:00:00
endDateTime String Yes The end time of the time range in which the data to be migrated was generated. This parameter is used in together with the beginDateTime parameter to specify the time range in which the data to be migrated was generated. None 2019-05-13 17:00:00
TSDB Writer
Parameter Type Required Description Default value Example
endpoint String Yes The HTTP endpoint of Lindorm TSDB. None http://127.0.0.1:8242
batchSize Integer No The number of data records to migrate at a time. The value must be an integer greater than 0. 100 100
maxRetryTime Integer No The maximum number of retries allowed after a failure occurs. The value must be an integer greater than 1. 3 3
ignoreWriteError Boolean No Specifies whether to ignore the maxRetryTime parameter. If the ignoreWriteError parameter is set to true, the system ignores write errors and attempts to write data again. If the ignoreWriteError parameter is set to false, the write task is terminated when the number of retry failures reaches the limit specified by the maxRetryTime parameter. false false

Note

Make sure that DataX can access Lindorm TSDB.

TSDB Writer writes data by calling the /api/put HTTP API operation. Therefore, to ensure successful data migration, make sure that the processes of migration tasks can access the HTTP API provided by Lindorm TSDB. Otherwise, a connection exception is reported.

Make sure that DataX can access ApsaraDB for HBase that serves as the storage for OpenTSDB.

OpenTSDB Reader directly reads data from ApsaraDB for HBase. Therefore, to ensure successful data migration, make sure that the processes of your migration tasks can access the ApsaraDB for HBase clusters. Otherwise, a connection exception is reported.

The specified start and end time is automatically rounded down to the hour.

The specified start time and end time are automatically rounded down to the hour. For example, if the specified time range is [3:35, 4:55) on April 18, 2019, the time range is converted to [3:00, 4:00).

FAQ

Can I adjust the Java Virtual Machine (JVM) memory size for a migration process?

Yes, you can adjust the JVM memory size for a migration process. For example, you can run the following command to adjust the JVM memory size for the task that is used to migrate data from OpenTSDB to Lindorm TSDB:

python datax/bin/datax.py opentsdb2tsdb.json -j "-Xms4096m -Xmx4096m"
            

How do I configure an IP address whitelist for Lindorm TSDB?

How do I configure an IP address whitelist for ApsaraDB for HBase?

For information about how to configure an IP address whitelist for ApsaraDB for HBase, see Configure IP address whitelists and security groups in the Operation and Maintenance Guide of ApsaraDB for HBase.

If my migration task is to migrate data from OpenTSDB instances hosted on Elastic Compute Service (ECS) to a virtual private cloud (VPC), how do I configure the VPC, and what are the problems I may encounter?

See Use cases of ECS security groups and VPC FAQ.