All Products
Search
Document Center

Migrate data from an OpenTSDB database to a TSDB database

Last Updated: Aug 25, 2021

This topic describes how to use DataX to migrate data from an OpenTSDB database to a Time Series Database (TSDB) database. DataX is an open source tool developed by Alibaba Group.

Background

This section describes DataX, OpenTSDB Reader, and TSDB Writer. OpenTSDB Reader and TSDB Writer are the plug-ins used to migrate data and are powered by DataX. , see .

  • DataX

    DataX is an offline data synchronization tool that is widely used within Alibaba Group. You can use DataX to efficiently synchronize data between various disparate data sources, including MySQL, Oracle, SQL Server, PostgreSQL, Hadoop Distributed File System (HDFS), Hive, AnalyticDB for MySQL, HBase, Tablestore (OTS), MaxCompute, and Distributed Relational Database Service (DRDS). MaxCompute is previously known as Open Data Processing Service (ODPS).

  • OpenTSDB Reader

    OpenTSDB Reader is a plug-in powered by DataX. You can use OpenTSDB Reader to query data from an OpenTSDB database.

  • TSDB Writer

    TSDB Writer is a plug-in powered by DataX. You can use TSDB Writer to write data points to a TSDB database that is developed by Alibaba Cloud.

Note

  • Make sure that the TSDB database is accessible to each process of the migration task.

    TSDB Writer calls the HTTP endpoint /api/put to write data. If you need to migrate data, make sure that each process of the migration task can access the HTTP endpoint provided by the TSDB database. Otherwise, a connection exception is thrown.

  • Make sure that ApsaraDB for HBase that serves as the underlying storage for OpenTSDB is accessible to each process of the migration task.

    OpenTSDB Reader queries data from ApsaraDB for HBase. If you need to migrate data, make sure that each process of the migration task can access the ApsaraDB for HBase cluster. Otherwise, a connection exception is thrown.

  • The specified start time and end time are automatically rounded down to the hour.

    The specified start time and end time are automatically rounded down to the hour. For example, if the specified time range is [3:35, 4:55) on April 18, 2019, the time range is rounded to [3:00, 4:00).

Procedure

  1. Configure an environment and install tools

    • Linux

    • Install Java Development Kit (JDK) 1.8 or later. We recommend that you use JDK 1.8. You can download JDK from the

    • Install Python. We recommend that you use Python 2.6.x. You can download Python from the

    • DataX is compatible only with OpenTSDB 2.3.x. If a version of OpenTSDB other than 2.3.x is used, compatibility issues can occur.

    • DataX is compatible only with TSDB 2.4.x or later. If an earlier version of TSDB is used, compatibility issues can occur.

    • Download DataX and the plug-ins.

  2. Use the built-in script provided by DataX to test whether data can be migrated as expected.

    The plug-ins used in the test are Stream Reader and Stream Writer. Stream Reader and Stream Writer do not require external dependencies. Therefore, Stream Reader and Stream Writer are suitable for testing whether data can be migrated as expected. Stream Reader and Stream Writer are used to simulate a simple data migration process. Stream Reader generates random character strings. Stream Writer receives the strings and prints them to your CLI.

    1. Install DataX and the plug-ins.

      Extract the DataX installation package to a specified directory. For example, you can specify a directory named DATAX_HOME. Then, you can use DataX to migrate data. You can use the following sample code:

      $ cd ${DATAX_HOME}
      $ python bin/datax.py job/job.json
    2. Check whether data is migrated as expected.

      The following sample shows the summary information returned if the data is migrated as expected:

      Task start time: 2019-04-26 11:18:07
      Task end time: 2019-04-26 11:18:17
      Execution time: 10s
      Average traffic: 253.91KB/s
      Write rate: 10000rec/s
      Records obtained: 100000
      Read and write failures: 0
  3. Configure and start a task to migrate data from an OpenTSDB database to a TSDB database.

    1. Configure a task to migrate data.

      Configure a task named opentsdb2tsdb.json to migrate data from an OpenTSDB database to a TSDB database. You can use the following sample code to configure the task:

      {
      "job":{
      "content":[
      {
      "reader":{
      "name":"opentsdbreader",
      "parameter":{
      "endpoint":"http://192.168.1.100:4242",
      "column":[
      "m"
      ],
      "startTime":"2019-01-01 00:00:00",
      "endTime":"2019-01-01 03:00:00"
      }
      },
      "writer":{
      "name":"tsdbhttpwriter",
      "parameter":{
      "endpoint":"http://192.168.1.101:8242"
      }
      }
      }
      ],
      "setting":{
      "speed":{
      "channel":1
      }
      }
      }
      }

      The following tables describe the parameters.

      Parameters for OpenTSDB Reader

      Parameter

      Type

      Required

      Description

      Default value

      Example

      endpoint

      String

      Yes

      The HTTP endpoint of the OpenTSDB database.

      None

      http://127.0.0.1:4242

      column

      Array

      Yes

      The metrics that you want to migrate.

      []

      ["m"]

      beginDateTime

      String

      Yes

      The beginning of the time range to migrate. This parameter is used together with the endDateTime parameter.

      None

      2019-05-13 15:00:00

      endDateTime

      String

      Yes

      The end of the time range to migrate. This parameter is used together with the beginDateTime parameter.

      None

      2019-05-13 17:00:00

      Parameters for TSDB Writer

      Parameter

      Type

      Required

      Description

      Default value

      Example

      endpoint

      String

      Yes

      The HTTP endpoint of the TSDB database.

      None

      http://127.0.0.1:8242

      batchSize

      Integer

      No

      The number of data records that you want to migrate at a time. The value must be an integer greater than 0.

      100

      100

      maxRetryTime

      Integer

      No

      The maximum number of retries allowed after a failure occurs. The value must be an integer greater than 1.

      3

      3

      ignoreWriteError

      Boolean

      No

      Specifies whether to ignore the maxRetryTime parameter. If the ignoreWriteError parameter is set to true, the system ignores write errors and attempts to write data again. If the ignoreWriteError parameter is set to false, the task for writing data is terminated when the maximum number of retries is exceeded.

      false

      false

    2. Start the migration task.

      $ cd ${DATAX_HOME}/..
      $ ls
        datax/  datax.tar.gz  opentsdb2tsdb.json
      $ python datax/bin/datax.py opentsdb2tsdb.json
    3. Check whether data is migrated as expected.

      The following sample shows the summary information returned if the data is migrated as expected:

      Task start time: 2019-04-26 11:47:06
      Task end time: 2019-04-26 11:47:16
      Execution time: 10s
      Average traffic: 98.92KB/s
      Write rate: 868rec/s
      Records obtained: 8685
      Read and write failures: 0

FAQ

Can I change the Java Virtual Machine (JVM) memory size for a migration process?

Yes, you can change the JVM memory size for a migration process. If you want to change the JVM memory size for a task that migrates data from an OpenTSDB database to a TSDB database, run the following command:

python datax/bin/datax.py opentsdb2tsdb.json -j "-Xms4096m -Xmx4096m"