Background information

This topic describes how to use DataX to migrate data from Prometheus Service (Prometheus) to a database powered by the time series engine of ApsaraDB for Lindorm (Lindorm). DataX is an open source tool provided by Alibaba Group. For information about how to use DataX, see README.

The following sections describe DataX, the Prometheus Reader plug-in, and the TSDB Writer plug-in. Prometheus Reader and TSDB Writer are provided by DataX to migrate data.

DataX

DataX is an offline data synchronization tool that is widely used within Alibaba Group. You can use DataX to synchronize data between various disparate data sources such as MySQL, Oracle, SQL Server, PostgreSQL, Hadoop Distributed File System (HDFS), Hive, AnalyticDB (ADS), HBase, Tablestore (OTS), MaxCompute, and Distributed Relational Database Service (DRDS).

Prometheus Reader

DataX provides the Prometheus Reader plug-in that is used to read data from Prometheus.

TSDB Writer

DataX provides the TSDB Writer that is used to write data to a Lindorm time series database.

Quick Start

Step 1: Configure an environment
  • Linux
  • Java Development Kit (JDK): You can use JDK 1.8 or later. We recommend that you use JDK 1.8.
  • Python: We recommend that you use Python 2.6.x.
  • Prometheus: Only Prometheus versions 2.9.x are supported. Earlier versions are not fully compatible with DataX.
  • Lindorm time series engine: Only V2.4.x and later are supported. Earlier versions are not fully compatible with DataX.
Step 2: Download DataX and the plug-ins

Click DataX to download DataX and the Prometheus Reader and TSDB Writer plug-ins.

Step 3: Use the built-in script provided by DataX to test whether data can be migrated as expected

The plug-ins used in the test are Stream Reader and Stream Writer. These plug-ins do not require external dependencies. Stream Reader and Stream Writer are used to simulate a data migration process. Stream Reader generates random strings. Stream Writer receives the strings and then prints them on your CLI.

  • Install DataX

    Decompress the installation package to a specified directory and then run the datax.py script to start the migration task.

    $ cd ${DATAX_HOME}
    $ python bin/datax.py job/job.json        
  • Check the migration result

    If the following information is returned, the data is migrated:

    Task start time                    : 2019-04-26 11:18:07
    Task end time                    : 2019-04-26 11:18:17
    Time consumed                    :                 10s
    Average traffic                    :          253.91KB/s
    Write rate                    :          10000rec/s
    Number of records obtained                    :              100000
    Number of write and read failures                    :                   0
                            

    For more information, watch Data migration quick start.

Step 4: Configure and start a task to migrate data from Prometheus to a Lindorm time series database

In Step 3, the Stream Reader and Stream Writer plug-ins are used to test the migration process of DataX. The test results indicate that DataX can migrate data. The following parts describe how to use the Prometheus Reader and TSDB Writer plug-ins to migrate data from Prometheus to a Lindorm time series database.

  • Configure a migration task

    Configure a task to migrate data from Prometheus to a Lindorm time series database. In this example, the task name is opentsdb2tsdb.json. The following code provides an example of the task configuration. For more information about the parameters, see the "Parameter description" section.

    {
      "job": {
        "content": [
          {
            "reader": {
              "name": "prometheusreader",
              "parameter": {
                "endpoint": "http://localhost:9090",
                "column": [
                  "up"
                ],
                "beginDateTime": "2019-05-20T16:00:00Z",
                "endDateTime": "2019-05-20T16:00:10Z"
              }
            },
            "writer": {
              "name": "tsdbhttpwriter",
              "parameter": {
                "endpoint": "http://localhost:8242"
              }
            }
          }
        ],
        "setting": {
          "speed": {
            "channel": 1
          }
        }
      }
    }
                            
  • Start the task
    $ cd ${DATAX_HOME}/..
    $ ls
      datax/  datax.tar.gz  prometheus2tsdb.json
    $ python datax/bin/datax.py prometheus2tsdb.json
                            
  • Check the migration result

    If the following information is returned, the data is migrated:

    Task start time                    : 2019-05-20 20:22:39
    Task end time                    : 2019-05-20 20:22:50
    Time consumed                    :                 10s
    Average traffic                    :          122.07KB/s
    Write rate                    :          1000rec/s
    Number of records obtained                    :              10000
    Number of write and read failures                    :                   0
                            

    For more information, watch Migrate data from Prometheus to a Lindorm time series database.

Parameter description

The following tables describe the parameters.

Prometheus Reader
Parameter Type Required Description Default value Example
endpoint String Yes The HTTP endpoint of Prometheus. None http://127.0.0.1:9090
column Array Yes The columns that you want to migrate. [] ["m"]
beginDateTime String Yes This parameter is used together with the endDateTime parameter to specify a time range. The data that is generated within this time range is migrated. None 2019-05-13 15:00:00
endDateTime String Yes This parameter is used together with the beginDateTime parameter to specify a time range. The data that is generated within this time range is migrated. None 2019-05-13 17:00:00
TSDB Writer
Parameter Type Required Description Default value Example
endpoint String Yes The HTTP endpoint of your Lindorm time series database. None http://127.0.0.1:8242
batchSize Integer No The number of data records that you want to migrate at a time. The value must be an integer greater than 0. 100 100
maxRetryTime Integer No The maximum number of retries allowed after a failed attempt. The value must be an integer greater than 1. 3 3
ignoreWriteError Boolean No Specifies whether to ignore the maxRetryTime parameter. Valid values: true and false. If this parameter is set to true, the system ignores the write failures and keeps retrying the write task. If this parameter is set to false, the write task is terminated after the number of retries reaches the value that you specify in the maxRetryTime parameter. false false

Note

Make sure that DataX can access the Lindorm time series database.

TSDB Writer writes data by calling the /api/put HTTP API operation. Therefore, to ensure successful data migration, make sure that the processes of migration tasks can access the HTTP API provided by Lindorm times series engine. Otherwise, a connection exception is reported.

Make sure that DataX can access Prometheus.

Prometheus Reader reads data by calling the /api/v1/query_range operation. Therefore, to ensure successful data migration, make sure that the processes of migration tasks can access the HTTP API provided by Prometheus. Otherwise, a connection exception is reported.

FAQ

Can I adjust the Java virtual machine (JVM) memory size for a migration process?

Yes, you can adjust the JVM memory size for a migration process. If you want to adjust the JVM memory size for a task that migrates data from Prometheus to a Lindorm time series database, run the following command:

python datax/bin/datax.py prometheus2tsdb.json -j "-Xms4096m -Xmx4096m"

How do I configure an IP address whitelist for my Lindorm time series database?

If my migration task is to migrate data from OpenTSDB instances hosted on ECS to Virtual Private Cloud (VPC), how do I configure the VPC, and what are the problems I may encounter?

See Use cases of ECS security groups and VPC FAQ.