Background information

This topic describes how to use DataX to migrate data from a MySQL database to a database powered by the time series engine of ApsaraDB for Lindorm (Lindorm). DataX is an open source tool provided by Alibaba Group.

For information about how to use DataX, see README.

The following sections describe DataX, the MySQL Reader plug-in, and the TSDB Writer plug-in. MySQL Reader and TSDB Writer are provided by DataX to migrate data.

DataX

DataX is an offline data synchronization tool that is widely used within Alibaba Group. You can use DataX to synchronize data between various disparate data sources such as MySQL, Oracle, SQL Server, PostgreSQL, Hadoop Distributed File System (HDFS), Hive, AnalyticDB (ADS), HBase, Tablestore (OTS), MaxCompute, and Distributed Relational Database Service (DRDS).

MySQL Reader

DataX provides the MySQL Reader plug-in that is used to read data from a MySQL database.

TSDB Writer

DataX provides the TSDB Writer plug-in that is used to write data to a Lindorm time series database.

Quick Start

  1. Configure an environment.
    • Linux
    • Java Development Kit (JDK): You can use JDK 1.8 or later. We recommend that you use JDK 1.8.
    • Python: We recommend that you use Python 2.6.x.
    • MySQL: Only MySQL versions 5.x are supported. Earlier MySQL versions are not fully compatible with DataX.
    • Lindorm time series engine: Only V2.4.x and later are supported. Earlier versions are not fully compatible with DataX.
  2. Download DataX and the plug-ins.

    Click DataX to download DataX and the MySQL Reader and TSDB Writer plug-ins.

  3. Use the built-in script provided by DataX to test whether data can be migrated.

    The plug-ins used in the test are Stream Reader and Stream Writer. These plug-ins do not require external dependencies. Stream Reader and Stream Writer are used to simulate a data migration process. Stream Reader generates random strings. Stream Writer receives the strings and then prints them on your CLI.

Install DataX

Decompress the installation package to a specified directory and then run the datax.py script to start the migration task.

$ cd ${DATAX_HOME}
$ python bin/datax.py job/job.json
            

Check the migration result.

If the following information is returned, the data is migrated:

Task start time                    : 2019-04-26 11:18:07
Task end time                    : 2019-04-26 11:18:17
Time consumed                    :                 10s
Average traffic                    :          253.91KB/s
Write rate                    :          10000rec/s
Number of records obtained                    :              100000
Number of write and read failures                    :                   0
            

For more information, watch Data migration quick start.

Step 4: Configure and start a task to migrate data from a MySQL database to a Lindorm time series

In the previous step, the Stream Reader and Stream Writer plug-ins are used to test the migration process of DataX. The test results indicate that DataX can migrate data. The following parts describe how to use the MySQL Reader and TSDB Writer plug-ins to migrate data from a MySQL database to a Lindorm time series database.

Configure a migration task

Configure a task to migrate data from a MySQL database to a Lindorm time series database. In this example, the task name is opentsdb2tsdb.json. The following code provides an example of the task configuration. For more information about the parameters, see the "Parameter description" section.

{
  "job": {
    "content": [
      {
        "reader": {
          "name": "mysqlreader",
          "parameter": {
            "column": [
              "name",
              "type",
              "create_time",
              "price"
            ],
            "connection": [
              {
                "jdbcUrl": [
                  "jdbc:mysql://127.0.0.1:3306/datax"
                ],
                "table": [
                  "book"
                ]
              }
            ],
            "password": "yuzhouwan",
            "splitPk": "id",
            "username": "root"
          }
        },
        "writer": {
          "name": "tsdbwriter",
          "parameter": {
            "column": [
              "name",
              "type",
              "create_time",
              "price"
            ],
            "columnType": [
              "tag",
              "tag",
              "timestamp",
              "metric_num"
            ],
            "sourceDbType": "RDB",
            "tsdbAddress": "localhost",
            "tsdbPort": 8242
          }
        }
      }
    ],
    "setting": {
      "speed": {
        "channel": 1
      }
    }
  }
}
            

Start the task

$ cd ${DATAX_HOME}/..
$ ls
  datax/  datax.tar.gz  mysql2tsdb.json
$ python datax/bin/datax.py mysql2tsdb.json
            

Check the migration result

If the following information is returned, the data is migrated:

Task start time                    : 2019-05-21 18:25:16
Task end time                    : 2019-05-21 18:25:27
Time consumed                    :                 11s
Average traffic                    :          3B/s
Write rate                    :          0rec/s
Number of records obtained                    :              3
Number of write and read failures                    :                   0
            

For more information, watch Migrate data from a MySQL database to a Lindorm time series database.

Parameter description

The following tables describe the parameters.

MySQL Reader

Parameter Type Required Description Default value Example
jdbcUrl String Yes The database URL that is used when you use Java Database Connectivity (JDBC) to connect to your MySQL database. None jdbc:mysql://127.0.0.1:3306/datax
username String Yes The username that is used to connect to the MySQL database. None root
password String Yes The password that is used to connect to the MySQL database. None root
table String Yes The table that contains the data to be migrated. None book
column Array Yes The names of the columns that you want to migrate. [] ["m"]
splitPk String No If splitPk is configured, the value of splitPk is used to shard data. None id

TSDB Writer

Parameter Type Required Description Default value Example
sourceDbType String No Valid values: TSDB and RDB. The value of TSDB specifies a time series engine, such as OpenTSDB, InfluxDB, Prometheus, or TimeScale. The value of RDB specifies a relational database engine, such as MySQL, Oracle, PostgreSQL, or Distributed Relational Database Service (DRDS). TSDB RDB
column Array Yes The names of the columns that you want to migrate. [] ["name","type","create_time","price"]
columnType Array Yes The data types of the columns in a Lindorm time series database after the columns are migrated. Valid values: timestamp, tag, metric_num, and metric_string. The value of timestamp indicates that the column contains timestamps. The value of tag indicates that the column contains tags. The value of metric_num indicates that the column contains data of the numeric data types. The value of metric_string indicates that the column contains data of the STRING data type. [] ["tag","tag","timestamp","metric_num"]
tsdbAddress String Yes The IP address of your Lindorm time series database. None 127.0.0.1
tsdbPort int Yes The port of your Lindorm time series database. None 8242
batchSize int No The number of data records that you want to migrate at a time. The value must be an integer greater than 0. 100 100

Note

Make sure that DataX can access the Lindorm time series database.

TSDB Writer writes data by calling the /api/put HTTP API operation. Therefore, to ensure successful data migration, make sure that the processes of migration tasks can access the HTTP API provided by Lindorm times series engine. Otherwise, a connection exception is reported.

Make sure that DataX can access the MySQL database.

MySQL Reader reads data by using JDBC. Therefore, to ensure successful data migration, make sure that the processes of migration tasks can access the JDBC API provided by MySQL. Otherwise, a connection exception is reported.

The sequence of column names must be consistent.

Make sure that the sequence of the column names configured in TSDB Writer is the same as that of the column names configured in MySQL Reader. Otherwise, data cannot be correctly processed.

FAQ

Can I adjust the Java virtual machine (JVM) memory size for a migration process?

Yes, you can adjust the JVM memory size for a migration process. If you want to adjust the JVM memory size for a task that migrates data from a MySQL database to a Lindorm time series database, run the following command:

python datax/bin/datax.py mysql2tsdb.json -j "-Xms4096m -Xmx4096m"    

How do I configure an IP address whitelist for my Lindorm time series database?

If my migration task is to migrate data from OpenTSDB instances hosted on ECS to Virtual Private Cloud (VPC), how do I configure the VPC, and what are the problems I may encounter?

See Use cases of ECS security groups and VPC FAQ.