All Products
Search
Document Center

Migrate data from MySQL to TSDB

Last Updated: Sep 07, 2020

Background information

This topic describes how to DataX to migrate data from MySQL to Time Series Database (TSDB). DataX is an open source tool that is provided by Alibaba Cloud for data synchronization.

For more information about how to use DataX, see README.

This topic introduces the DataX tool, and the MySQL Reader and TSDB Writer plug-ins that are used in the sample migration task.

DataX

DataX is an offline data synchronization tool that is widely used in Alibaba Group. DataX provides an efficient method to synchronize data between disparate data sources, such as MySQL, Oracle, SQL Server, PostgreSQL, Hadoop Distributed File System (HDFS), Hive, AnalyticDB for MySQL, HBase, Tablestore, MaxCompute, and Distributed Relational Database Service (DRDS).

MySQL Reader

MySQL Reader is a DataX plug-in that allows you to read data from MySQL.

TSDB Writer

TSDB Writer is a DataX plug-in that allows you to write data points into TSDB. TSDB is developed by Alibaba Cloud.

Quick start

Step 1: Prepare the environment

  • Linux
  • JDK (Only version 1.8 and later are supported. We recommend that you use version 1.8.)
  • Python (We recommend that you use Python 2.6.x.)
  • MySQL (DataX is currently compatible with only MySQL 5.x.)
  • TSDB (DataX is currently compatible with only TSDB 2.4.x and later.)

Step 2: Download DataX and its plug-ins

To download DataX and its plug-ins, click here.

Step 3: Use the default migration task of DataX to walk through the migration process

This topic uses an example to describe the migration process. In this example, Stream Reader and Stream Writer are used for data migration. These two plug-ins do not depend on external environments, and therefore are suitable to be used in the test. Stream Reader generates random strings, and Stream Writer receives the strings and prints the strings to the screens. This simulates a simple data migration process.

Deploy the tool

Extract the downloaded installation package to a directory, for example, DATAX_HOME, and start the migration task.

  1. $ cd ${DATAX_HOME}
  2. $ python bin/datax.py job/job.json

Check whether the task is successful

You can view the summary after the migration task is complete. The following information indicates that the migration task is successful.

  1. Time of task startup : 2019-04-26 11:18:07
  2. Time of task end : 2019-04-26 11:18:17
  3. Total time elapsed : 10s
  4. Average traffic of the task : 253.91KB/s
  5. Record writing speed : 10000rec/s
  6. Total records read : 100000
  7. Total read/write failures : 0

To view the recorded command lines, visit this web page: Quick start for data migration.

Step 4: Configure and start the task for migrating data from MySQL to TSDB

The sample migration task that uses Stream Reader and Stream Writer shows that DataX can be used to migrate data as expected. Now, you can start migrating data from MySQL to TSDB. MySQL Reader and TSDB Writer can be used for data migration.

Configure a migration task

Configure a task named mysql2tsdb.json to synchronously migrate data from MySQL to TSDB. The complete configuration information is described as follows. For more information about each parameter, see the “Parameters” section.

  1. {
  2. "job": {
  3. "content": [
  4. {
  5. "reader": {
  6. "name": "mysqlreader",
  7. "parameter": {
  8. "column": [
  9. "name",
  10. "type",
  11. "create_time",
  12. "price"
  13. ],
  14. "connection": [
  15. {
  16. "jdbcUrl": [
  17. "jdbc:mysql://127.0.0.1:3306/datax"
  18. ],
  19. "table": [
  20. "book"
  21. ]
  22. }
  23. ],
  24. "password": "yuzhouwan",
  25. "splitPk": "id",
  26. "username": "root"
  27. }
  28. },
  29. "writer": {
  30. "name": "tsdbwriter",
  31. "parameter": {
  32. "column": [
  33. "name",
  34. "type",
  35. "create_time",
  36. "price"
  37. ],
  38. "columnType": [
  39. "tag",
  40. "tag",
  41. "timestamp",
  42. "metric_num"
  43. ],
  44. "sourceDbType": "RDB",
  45. "tsdbAddress": "localhost",
  46. "tsdbPort": 8242
  47. }
  48. }
  49. }
  50. ],
  51. "setting": {
  52. "speed": {
  53. "channel": 1
  54. }
  55. }
  56. }
  57. }

Start the MySQL-to-TSDB migration task

  1. $ cd ${DATAX_HOME}/..
  2. $ ls
  3. datax/ datax.tar.gz mysql2tsdb.json
  4. $ python datax/bin/datax.py mysql2tsdb.json

Check whether the task is successful

You can view the summary after the migration task is complete. The following information indicates that the migration task is successful.

  1. Time of task startup : 2019-05-21 18:25:16
  2. Time of task end : 2019-05-21 18:25:27
  3. Total time elapsed : 11s
  4. Average traffic of the task : 3B/s
  5. Record writing speed : 0rec/s
  6. Total records read : 3
  7. Total read/write failures : 0

To view the recorded command line, visit this web page: Migrate data from MySQL to TSDB.

Parameters

The following tables describe the relevant parameters.

MySQL Reader parameters

Parameter Type Required Description Default value Example
jdbcUrl String Yes The JDBC connection URL of the database. N/A jdbc:mysql://127.0.0.1:3306/datax
username String Yes The username that is used to connect to the data source. N/A root
password String Yes The password that is used to connect to the data source. N/A root
table String Yes The table to be synchronized. N/A book
column Array Yes The columns to be synchronized in the table. [] ["m"]
splitPk String No The primary key based on which the table is sharded. N/A (By default, the parameter is not specified.) id

TSDB Writer parameters

Parameter Type Required Description Default value Example
sourceDbType String No Valid values: TSDB and RDB. The value TSDB indicates database types such as OpenTSDB, InfluxDB, Prometheus, and TimescaleDB. The value RDB indicates database types such as MySQL, Oracle, PostgreSQL, and DRDS. TSDB RDB
column Array Yes The columns to be synchronized in the table. [] ["name","type","create_time","price"]
columnType Array Yes The column type in TSDB. The table fields in a relational database are organized based on the column types in TSDB.
Valid values:
timestamp,
tag,
metric_num (numeric values),
and metric_string (string values).
[] ["tag","tag","timestamp","metric_num"]
tsdbAddress String Yes The IP address of TSDB. N/A 127.0.0.1
tsdbPort Integer Yes The port number of TSDB. N/A 8242
batchSize Integer No The number of records that are written for each batch. The value must be greater than 0. 100 100

Considerations

Network connection with TSDB

TSDB Writer writes data by using the HTTP API. The specific API endpoint is /api/put. Therefore, the migration task processes must be able to access the HTTP API that is provided by TSDB. Otherwise, a connection error occurs.

Network connection with MySQL

MySQL Reader reads data by using JDBC. Therefore, the migration task processes must be able to access the JDBC API that is provided by MySQL. Otherwise, a connection error occurs.

Consistency of the column sequence between MySQL Reader and TSDB Writer

You must make sure that the sequence of columns in TSDB Writer is the same as that in MySQL Reader. Otherwise, the data becomes disordered.

FAQ

Question: Can I change the JVM memory size for a migration process?

Answer: Yes, you can change the JVM memory size for a migration process. For example, if you migrate data from MySQL to TSDB, run the following command to change the JVM memory size:

  1. python datax/bin/datax.py mysql2tsdb.json -j "-Xms4096m -Xmx4096m"

Question: How can I set an IP address whitelist for TSDB?

Answer: To view the detailed procedure, you can navigate through Quick Start > Set the IP address whitelist in the TSDB documentation.

Question: How can I configure Virtual Private Cloud (VPC) settings if I run a migration task on an Elastic Compute Service (ECS) instance? What are the frequently asked questions about VPC?

Answer: For more information, see Cases for configuring ECS security groups and VPC FAQ.