All Products
Search
Document Center

Migrate data from OpenTSDB to TSDB

Last Updated: May 27, 2020

Background information

This topic describes how to use DataX to migrate data from OpenTSDB to Time Series Database (TSDB). DataX is an open source tool that is provided by Alibaba Cloud for data synchronization.

For more information about how to use DataX, see README.

This topic introduces the DataX tool, and the OpenTSDB Reader and TSDB Writer plug-ins that are used in the sample migration task.

DataX

DataX is an offline data synchronization tool that is widely used in Alibaba Group. DataX provides an efficient method to synchronize data between disparate data sources, such as MySQL, Oracle, SQL Server, PostgreSQL, Hadoop Distributed File System (HDFS), Hive, AnalyticDB for MySQL, HBase, Tablestore, MaxCompute, and Distributed Relational Database Service (DRDS).

OpenTSDB Reader

OpenTSDB Reader is a DataX plug-in that reads data from OpenTSDB.

TSDB Writer

TSDB Writer is a DataX plug-in that allows you to write data points into TSDB. TSDB is developed by Alibaba Cloud.

Quick start

Step 1: Prepare the environment

  • Linux
  • JDK (Only version 1.8 and later are supported. We recommend that you use version 1.8.)
  • Python (We recommend that you use Python 2.6.x.)
  • OpenTSDB (DataX is currently compatible with only OpenTSDB 2.3.x.)
  • TSDB (DataX is currently compatible with only TSDB 2.4.x and later.)

Step 2: Download DataX and its plug-ins

To download DataX and its plug-ins, click here.

Step 3: Use the default migration task of DataX to walk through the migration process

This topic uses an example to describe the migration process. In this example, Stream Reader and Stream Writer are used for data migration. These two plug-ins do not depend on external environments, and therefore are suitable to be used in the test. Stream Reader generates random strings, and Stream Writer receives the strings and prints the strings to the screens. This simulates a simple data migration process.

Deploy the tool

Extract the downloaded installation package to a directory, for example, DATAX_HOME, and start the migration task.

  1. $ cd ${DATAX_HOME}
  2. $ python bin/datax.py job/job.json

Check whether the task is successful

You can view the summary after the migration task is complete. The following information indicates that the migration task is successful.

  1. Time of task startup : 2019-04-26 11:18:07
  2. Time of task end : 2019-04-26 11:18:17
  3. Total time elapsed : 10s
  4. Average traffic of the task : 253.91KB/s
  5. Record writing speed : 10000rec/s
  6. Total records read : 100000
  7. Total read/write failures : 0

To view the recorded command lines, visit this web page: Quick start for data migration.

Step 4: Configure and start the task for migrating data from OpenTSDB to TSDB

The sample migration task that uses Stream Reader and Stream Writer shows that DataX can be used to migrate data as expected. Now, you can start migrating data from OpenTSDB to TSDB. OpenTSDB Reader and TSDB Writer can be used for data migration.

Configure a migration task

Configure a task named opentsdb2tsdb.json to synchronously migrate data from OpenTSDB to TSDB. The complete configuration information is described as follows. For more information about each parameter, see the “Parameters” section.

  1. {
  2. "job": {
  3. "content": [
  4. {
  5. "reader": {
  6. "name": "opentsdbreader",
  7. "parameter": {
  8. "endpoint": "http://192.168.1.100:4242",
  9. "column": [
  10. "m"
  11. ],
  12. "startTime": "2019-01-01 00:00:00",
  13. "endTime": "2019-01-01 03:00:00"
  14. }
  15. },
  16. "writer": {
  17. "name": "tsdbhttpwriter",
  18. "parameter": {
  19. "endpoint": "http://192.168.1.101:8242"
  20. }
  21. }
  22. }
  23. ],
  24. "setting": {
  25. "speed": {
  26. "channel": 1
  27. }
  28. }
  29. }
  30. }

Start the OpenTSDB-to-TSDB migration task

  1. $ cd ${DATAX_HOME}/..
  2. $ ls
  3. datax/ datax.tar.gz opentsdb2tsdb.json
  4. $ python datax/bin/datax.py opentsdb2tsdb.json

Check whether the task is successful

You can view the summary after the migration task is complete. The following information indicates that the migration task is successful.

  1. Time of task startup : 2019-04-26 11:47:06
  2. Time of task end : 2019-04-26 11:47:16
  3. Total time elapsed : 10s
  4. Average traffic of the task : 98.92KB/s
  5. Record writing speed : 868rec/s
  6. Total records read : 8685
  7. Total read/write failures : 0

To view the recorded command line, visit this web page: Migrate data from OpenTSDB to TSDB.

Parameters

The following tables describe the relevant parameters.

OpenTSDB Reader parameters

Parameter Type Required Description Default value Example
endpoint String Yes The endpoint that is used to connect to the source OpenTSDB database through HTTP. N/A http://127.0.0.1:4242
column Array Yes The metrics to be migrated. [] ["m"]
beginDateTime String Yes The start time of a specified time range. The data points during the time range are to be migrated. You can determine the time range by specifying the beginDateTime and endDateTime parameters. N/A 2019-05-13 15:00:00
endDateTime String Yes The end time of a specified time range. The data points during the time range are to be migrated. You can determine the time range by specifying the beginDateTime and endDateTime parameters. N/A 2019-05-13 17:00:00

TSDB Writer parameters

Parameter Type Required Description Default value Example
endpoint String Yes The endpoint that is used to connect to the destination TSDB database through HTTP. N/A http://127.0.0.1:8242
batchSize Integer No The number of records that are written for each batch. The value must be greater than 0. 100 100
maxRetryTime Integer No The number of retries after a failure occurs. The value must be greater than 1. 3 3
ignoreWriteError Boolean No Specifies whether to ignore write errors. If you set this parameter to true, write errors are ignored and the write task continues. Otherwise, the write task stops. false false

Considerations

Ensure network connection with TSDB

TSDB Writer writes data by using the HTTP API. The specific API endpoint is /api/put. Therefore, the migration task processes must be able to access the HTTP API that is provided by TSDB. Otherwise, a connection error occurs.

Ensure network connection with HBase

OpenTSDB Reader reads data by establishing direct connections to HBase. HBase is the underlying data storage system for OpenTSDB. Therefore, you must make sure that the migration task processes are connected to HBase clusters as expected. Otherwise, a connection error occurs.

Retain only the hour components for the specified start time and end time

If you specify the start time and end time, the minute and second components are automatically ignored. For example, if you specify [3:35, 4:55) on 2019-4-18, [3:00, 4:00) is used.

FAQ

Question: Can I change the JVM memory size for a migration process?

Answer: Yes, you can change the JVM memory size for a migration process. For example, if you migrate data from OpenTSDB to TSDB, run the following command to change the JVM memory size:

  1. python datax/bin/datax.py opentsdb2tsdb.json -j "-Xms4096m -Xmx4096m"

Question: How can I set an IP address whitelist for TSDB?

Answer: To view the detailed procedure, you can navigate through Quick Start > Set the IP address whitelist in the TSDB documentation.

Question: How can I set an IP address whitelist for ApsaraDB for HBase?

Answer: To view the detailed procedure, you can navigate through Operation and Maintenance Guide > Configure the whitelist in the ApsaraDB for HBase documentation.

Question: How can I configure Virtual Private Cloud (VPC) settings if I run a migration task on an Elastic Compute Service (ECS) instance? What are the frequently asked questions about VPC?

Answer: For more information, see Cases for configuring ECS security groups and VPC FAQ.