Background information
This topic describes how to use DataX to migrate data from Prometheus Service (Prometheus) to a database powered by the time series engine of ApsaraDB for Lindorm (Lindorm). DataX is an open source tool provided by Alibaba Group. For information about how to use DataX, see README.
The following sections describe DataX, the Prometheus Reader plug-in, and the TSDB Writer plug-in. Prometheus Reader and TSDB Writer are provided by DataX to migrate data.
DataX
DataX is an offline data synchronization tool that is widely used within Alibaba Group. You can use DataX to synchronize data between various disparate data sources such as MySQL, Oracle, SQL Server, PostgreSQL, Hadoop Distributed File System (HDFS), Hive, AnalyticDB (ADS), HBase, Tablestore (OTS), MaxCompute, and Distributed Relational Database Service (DRDS).
Prometheus Reader
DataX provides the Prometheus Reader plug-in that is used to read data from Prometheus.
TSDB Writer
DataX provides the TSDB Writer that is used to write data to a Lindorm time series database.
Quick Start
- Linux
- Java Development Kit (JDK): You can use JDK 1.8 or later. We recommend that you use JDK 1.8.
- Python: We recommend that you use Python 2.6.x.
- Prometheus: Only Prometheus versions 2.9.x are supported. Earlier versions are not fully compatible with DataX.
- Lindorm time series engine: Only V2.4.x and later are supported. Earlier versions are not fully compatible with DataX.
Click DataX to download DataX and the Prometheus Reader and TSDB Writer plug-ins.
The plug-ins used in the test are Stream Reader and Stream Writer. These plug-ins do not require external dependencies. Stream Reader and Stream Writer are used to simulate a data migration process. Stream Reader generates random strings. Stream Writer receives the strings and then prints them on your CLI.
- Install DataX
Decompress the installation package to a specified directory and then run the datax.py script to start the migration task.
$ cd ${DATAX_HOME} $ python bin/datax.py job/job.json
- Check the migration result
If the following information is returned, the data is migrated:
Task start time : 2019-04-26 11:18:07 Task end time : 2019-04-26 11:18:17 Time consumed : 10s Average traffic : 253.91KB/s Write rate : 10000rec/s Number of records obtained : 100000 Number of write and read failures : 0
For more information, watch Data migration quick start.
In Step 3, the Stream Reader and Stream Writer plug-ins are used to test the migration process of DataX. The test results indicate that DataX can migrate data. The following parts describe how to use the Prometheus Reader and TSDB Writer plug-ins to migrate data from Prometheus to a Lindorm time series database.
- Configure a migration task
Configure a task to migrate data from Prometheus to a Lindorm time series database. In this example, the task name is
opentsdb2tsdb.json
. The following code provides an example of the task configuration. For more information about the parameters, see the "Parameter description" section.{ "job": { "content": [ { "reader": { "name": "prometheusreader", "parameter": { "endpoint": "http://localhost:9090", "column": [ "up" ], "beginDateTime": "2019-05-20T16:00:00Z", "endDateTime": "2019-05-20T16:00:10Z" } }, "writer": { "name": "tsdbhttpwriter", "parameter": { "endpoint": "http://localhost:8242" } } } ], "setting": { "speed": { "channel": 1 } } } }
- Start the task
$ cd ${DATAX_HOME}/.. $ ls datax/ datax.tar.gz prometheus2tsdb.json $ python datax/bin/datax.py prometheus2tsdb.json
- Check the migration result
If the following information is returned, the data is migrated:
Task start time : 2019-05-20 20:22:39 Task end time : 2019-05-20 20:22:50 Time consumed : 10s Average traffic : 122.07KB/s Write rate : 1000rec/s Number of records obtained : 10000 Number of write and read failures : 0
For more information, watch Migrate data from Prometheus to a Lindorm time series database.
Parameter description
The following tables describe the parameters.
Parameter | Type | Required | Description | Default value | Example |
---|---|---|---|---|---|
endpoint | String | Yes | The HTTP endpoint of Prometheus. | None | http://127.0.0.1:9090 |
column | Array | Yes | The columns that you want to migrate. | [] |
["m"] |
beginDateTime | String | Yes | This parameter is used together with the endDateTime parameter to specify a time range. The data that is generated within this time range is migrated. | None | 2019-05-13 15:00:00 |
endDateTime | String | Yes | This parameter is used together with the beginDateTime parameter to specify a time range. The data that is generated within this time range is migrated. | None | 2019-05-13 17:00:00 |
Parameter | Type | Required | Description | Default value | Example |
---|---|---|---|---|---|
endpoint | String | Yes | The HTTP endpoint of your Lindorm time series database. | None | http://127.0.0.1:8242 |
batchSize | Integer | No | The number of data records that you want to migrate at a time. The value must be an integer greater than 0. | 100 | 100 |
maxRetryTime | Integer | No | The maximum number of retries allowed after a failed attempt. The value must be an integer greater than 1. | 3 | 3 |
ignoreWriteError | Boolean | No | Specifies whether to ignore the maxRetryTime parameter. Valid values: true and false. If this parameter is set to true, the system ignores the write failures and keeps retrying the write task. If this parameter is set to false, the write task is terminated after the number of retries reaches the value that you specify in the maxRetryTime parameter. | false | false |
Note
TSDB Writer writes data by calling the /api/put
HTTP API operation. Therefore, to ensure successful data migration, make sure that
the processes of migration tasks can access the HTTP API provided by Lindorm times
series engine. Otherwise, a connection exception is reported.
Prometheus Reader reads data by calling the /api/v1/query_range
operation. Therefore, to ensure successful data migration, make sure that the processes
of migration tasks can access the HTTP API provided by Prometheus. Otherwise, a connection
exception is reported.
FAQ
Can I adjust the Java virtual machine (JVM) memory size for a migration process?
Yes, you can adjust the JVM memory size for a migration process. If you want to adjust the JVM memory size for a task that migrates data from Prometheus to a Lindorm time series database, run the following command:
python datax/bin/datax.py prometheus2tsdb.json -j "-Xms4096m -Xmx4096m"
How do I configure an IP address whitelist for my Lindorm time series database?
If my migration task is to migrate data from OpenTSDB instances hosted on ECS to Virtual Private Cloud (VPC), how do I configure the VPC, and what are the problems I may encounter?
See Use cases of ECS security groups and VPC FAQ.