Background information
This topic describes how to DataX to migrate data from MySQL to Time Series Database (TSDB). DataX is an open source tool that is provided by Alibaba Cloud for data synchronization.
For more information about how to use DataX, see README.
This topic introduces the DataX tool, and the MySQL Reader and TSDB Writer plug-ins that are used in the sample migration task.
DataX
DataX is an offline data synchronization tool that is widely used in Alibaba Group. DataX provides an efficient method to synchronize data between disparate data sources, such as MySQL, Oracle, SQL Server, PostgreSQL, Hadoop Distributed File System (HDFS), Hive, AnalyticDB for MySQL, HBase, Tablestore, MaxCompute, and Distributed Relational Database Service (DRDS).
MySQL Reader
MySQL Reader is a DataX plug-in that allows you to read data from MySQL.
TSDB Writer
TSDB Writer is a DataX plug-in that allows you to write data points into TSDB. TSDB is developed by Alibaba Cloud.
Quick start
Step 1: Prepare the environment
- Linux
- JDK (Only version 1.8 and later are supported. We recommend that you use version 1.8.)
- Python (We recommend that you use Python 2.6.x.)
- MySQL (DataX is currently compatible with only MySQL 5.x.)
- TSDB (DataX is currently compatible with only TSDB 2.4.x and later.)
Step 2: Download DataX and its plug-ins
To download DataX and its plug-ins, click here.
Step 3: Use the default migration task of DataX to walk through the migration process
This topic uses an example to describe the migration process. In this example, Stream Reader and Stream Writer are used for data migration. These two plug-ins do not depend on external environments, and therefore are suitable to be used in the test. Stream Reader generates random strings, and Stream Writer receives the strings and prints the strings to the screens. This simulates a simple data migration process.
Deploy the tool
Extract the downloaded installation package to a directory, for example, DATAX_HOME, and start the migration task.
$ cd ${DATAX_HOME}
$ python bin/datax.py job/job.json
Check whether the task is successful
You can view the summary after the migration task is complete. The following information indicates that the migration task is successful.
Time of task startup : 2019-04-26 11:18:07
Time of task end : 2019-04-26 11:18:17
Total time elapsed : 10s
Average traffic of the task : 253.91KB/s
Record writing speed : 10000rec/s
Total records read : 100000
Total read/write failures : 0
To view the recorded command lines, visit this web page: Quick start for data migration.
Step 4: Configure and start the task for migrating data from MySQL to TSDB
The sample migration task that uses Stream Reader and Stream Writer shows that DataX can be used to migrate data as expected. Now, you can start migrating data from MySQL to TSDB. MySQL Reader and TSDB Writer can be used for data migration.
Configure a migration task
Configure a task named mysql2tsdb.json
to synchronously migrate data from MySQL to TSDB. The complete configuration information is described as follows. For more information about each parameter, see the “Parameters” section.
{
"job": {
"content": [
{
"reader": {
"name": "mysqlreader",
"parameter": {
"column": [
"name",
"type",
"create_time",
"price"
],
"connection": [
{
"jdbcUrl": [
"jdbc:mysql://127.0.0.1:3306/datax"
],
"table": [
"book"
]
}
],
"password": "yuzhouwan",
"splitPk": "id",
"username": "root"
}
},
"writer": {
"name": "tsdbwriter",
"parameter": {
"column": [
"name",
"type",
"create_time",
"price"
],
"columnType": [
"tag",
"tag",
"timestamp",
"metric_num"
],
"sourceDbType": "RDB",
"tsdbAddress": "localhost",
"tsdbPort": 8242
}
}
}
],
"setting": {
"speed": {
"channel": 1
}
}
}
}
Start the MySQL-to-TSDB migration task
$ cd ${DATAX_HOME}/..
$ ls
datax/ datax.tar.gz mysql2tsdb.json
$ python datax/bin/datax.py mysql2tsdb.json
Check whether the task is successful
You can view the summary after the migration task is complete. The following information indicates that the migration task is successful.
Time of task startup : 2019-05-21 18:25:16
Time of task end : 2019-05-21 18:25:27
Total time elapsed : 11s
Average traffic of the task : 3B/s
Record writing speed : 0rec/s
Total records read : 3
Total read/write failures : 0
To view the recorded command line, visit this web page: Migrate data from MySQL to TSDB.
Parameters
The following tables describe the relevant parameters.
MySQL Reader parameters
Parameter | Type | Required | Description | Default value | Example |
---|---|---|---|---|---|
jdbcUrl | String | Yes | The JDBC connection URL of the database. | N/A | jdbc:mysql://127.0.0.1:3306/datax |
username | String | Yes | The username that is used to connect to the data source. | N/A | root |
password | String | Yes | The password that is used to connect to the data source. | N/A | root |
table | String | Yes | The table to be synchronized. | N/A | book |
column | Array | Yes | The columns to be synchronized in the table. | [] |
["m"] |
splitPk | String | No | The primary key based on which the table is sharded. | N/A (By default, the parameter is not specified.) | id |
TSDB Writer parameters
Parameter | Type | Required | Description | Default value | Example |
---|---|---|---|---|---|
sourceDbType | String | No | Valid values: TSDB and RDB. The value TSDB indicates database types such as OpenTSDB, InfluxDB, Prometheus, and TimescaleDB. The value RDB indicates database types such as MySQL, Oracle, PostgreSQL, and DRDS. | TSDB |
RDB |
column | Array | Yes | The columns to be synchronized in the table. | [] |
["name","type","create_time","price"] |
columnType | Array | Yes | The column type in TSDB. The table fields in a relational database are organized based on the column types in TSDB. Valid values: timestamp, tag, metric_num (numeric values), and metric_string (string values). |
[] |
["tag","tag","timestamp","metric_num"] |
tsdbAddress | String | Yes | The IP address of TSDB. | N/A | 127.0.0.1 |
tsdbPort | Integer | Yes | The port number of TSDB. | N/A | 8242 |
batchSize | Integer | No | The number of records that are written for each batch. The value must be greater than 0. | 100 | 100 |
Considerations
Network connection with TSDB
TSDB Writer writes data by using the HTTP API. The specific API endpoint is /api/put
. Therefore, the migration task processes must be able to access the HTTP API that is provided by TSDB. Otherwise, a connection error occurs.
Network connection with MySQL
MySQL Reader reads data by using JDBC. Therefore, the migration task processes must be able to access the JDBC API that is provided by MySQL. Otherwise, a connection error occurs.
Consistency of the column sequence between MySQL Reader and TSDB Writer
You must make sure that the sequence of columns in TSDB Writer is the same as that in MySQL Reader. Otherwise, the data becomes disordered.
FAQ
Question: Can I change the JVM memory size for a migration process?
Answer: Yes, you can change the JVM memory size for a migration process. For example, if you migrate data from MySQL to TSDB, run the following command to change the JVM memory size:
python datax/bin/datax.py mysql2tsdb.json -j "-Xms4096m -Xmx4096m"
Question: How can I set an IP address whitelist for TSDB?
Answer: To view the detailed procedure, you can navigate through Quick Start > Set the IP address whitelist in the TSDB documentation.
Question: How can I configure Virtual Private Cloud (VPC) settings if I run a migration task on an Elastic Compute Service (ECS) instance? What are the frequently asked questions about VPC?
Answer: For more information, see Cases for configuring ECS security groups and VPC FAQ.