All Products
Search
Document Center

Migrate data to TSDB by using DataWorks

Last Updated: Sep 04, 2020

Background information

This topic describes how to migrate data from OpenTSDB to Time Series Database (TSDB) by using the Data Integration service of DataWorks.

DataWorks is an important platform as a service (PaaS) of Alibaba Cloud. It offers a wide range of services, including data Aggregation, data development, dataService studio, DataAnalysis, and data governance. DataWorks also provides a one-stop data development and management console, which helps enterprises implement data mining and unlock the full potential of valuable data. The data development service of DataWorks is used in the data migration process in this topic. If you are new to DataWorks, see the DataWorks documentation for more information.

Currently, DataWorks supports migrating data from the following types of data sources to TSDB: TSDBOpenTSDBPrometheusInfluxDB, and MySQL.

Quick start

Step 1: Log on to the DataWorks console

To log on to the DataWorks console, click here. If no workspaces are available in the console, you must create a workspace. Then, you can view the created workspace on the Workspaces page in the DataWorks console. Figure 1 shows an example.

DataWorks workspace

Figure 1: Workspaces

Step 2: Create a sync node on the DataStudio page

In the upper-left corner of the page, right-click Business Flow, and then click Create Workflow. Figure 2 shows the position of the Create Workflow option.

Create a DataWorks workflow

Figure 2: Create a workflow

In the dialog box that appears, enter a workflow name, for example, migration_from_opentsdb_to_tsdb. Figure 3 shows the dialog box where you can create a workflow.

DataWorks workflow name

Figure 3: Enter a workflow name

Follow the three steps in figure 4 to create a sync node.

Create a sync node in DataWorks

Figure 4: Create a sync node

In the dialog box that appears, enter a name for the sync node, for example, node1. Figure 5 shows the dialog box where you can create a node.

Enter a name for the sync node

Figure 5: Enter a name for the sync node

After the sync node is created, node1 is displayed in the blank section on the right of the page. Double-click node1. On the page that appears, configure the sync node. Figure 6 shows the page where the node is displayed.

Configure the sync node in DataWorks

Figure 6: Configure the sync node

By default, the sync node node1 is configured based on the codeless UI. If you want to configure the node by using the code editor, you can click the rightmost icon in the top toolbar. Figure 7 shows the page where you can configure the node.

DataWorks code editor

Figure 7: Code editor

The default sync node synchronizes data from Stream Reader to Stream Writer. Stream Reader is the source that generates random strings, and Stream Writer is the target that receives and prints the generated random strings. For more information about how to configure Stream Reader and Stream Writer, click the corresponding topics at the top of the page. Figure 8 shows the sync node that is configured based on the code editor.

DataWorks default sync node

Figure 8: Default sync node

Stream Reader and Stream Writer can synchronize data without depending on external resources. To run the sync node, you can click the Run icon in the upper-left corner. Then, you can view the execution process in the section that appears at the bottom of the page. Figure 9 shows the execution process.

Run the default sync node in DataWorks

Figure 9: Run the default sync node

Step 3: Modify configurations

Change the configurations of the default sync node to migrate data from OpenTSDB to TSDB.

Click dataworks_logo_import to import a configuration template. Figure 10 shows the position of the icon that you must click.

Import a template

Figure 10: Import a template

In the dialog box that appears, set the source connection type to OpenTSDB and the target connection type to TSDB. Figure 11 shows the dialog box where you can configure the template.

Import a template (OpenTSDB-to-TSDB migration)

Figure 11: Configure the import template

Click OK. Then, the values for the stepType parameters are changed to opentsdb and tsdb. Other configuration items are also automatically changed to migrate data from OpenTSDB to TSDB. In addition, the topic names in the help documentation are also changed. You can click the new topic names to obtain details about how to configure “OpenTSDB Reader” and “TSDB Writer”. Figure 12 shows the new topic names.

Changes after the template is imported

Figure 12: Changes after the template is imported

Then, modify the configurations based on the help documentation. You must specify the following five parameters: endpoint, column, beginDateTime, endDateTime, and endpoint. The first endpoint parameter specifies the OpenTSDB endpoint, and the second endpoint parameter specifies the TSDB endpoint. The column parameter determines the metrics that are to be migrated. The beginDateTime and endDateTime parameters determine the time range during which the data is to be migrated. The sample code is described as follows:

  1. {
  2. "type": "job",
  3. "steps": [
  4. {
  5. "stepType": "opentsdb",
  6. "parameter": {
  7. "endpoint": "http://host:4242",
  8. "column": [
  9. "m"
  10. ],
  11. "beginDateTime": "20190101000000",
  12. "endDateTime": "20190101030000"
  13. },
  14. "name": "Reader",
  15. "category": "reader"
  16. },
  17. {
  18. "stepType": "tsdb",
  19. "parameter": {
  20. "endpoint": "http://host:8242"
  21. },
  22. "name": "Writer",
  23. "category": "writer"
  24. }
  25. ],
  26. "version": "2.0",
  27. "order": {
  28. "hops": [
  29. {
  30. "from": "Reader",
  31. "to": "Writer"
  32. }
  33. ]
  34. },
  35. "setting": {
  36. "errorLimit": {
  37. "record": "0"
  38. },
  39. "speed": {
  40. "throttle": false,
  41. "concurrent": 1,
  42. "dmu": 1
  43. }
  44. }
  45. }

Step 4: Modify the whitelist

To use the default resource group of DataWorks, you must add the CIDR block of the region to the whitelist. For example, to migrate data from OpenTSDB to TSDB, you must configure a whitelist for OpenTSDB and TSDB, respectively.

  1. Find the CIDR blocks that must be added to the whitelist based on the region where the DataWorks workspace resides. For more information, you can navigate through User Guide > Data Integration > Common configurations > Configure a whitelist in the DataWorks V2.0 documentation. The China (Shanghai) region is used as an example to describe how to configure the whitelist.

  2. If your user-created OpenTSDB instances are hosted on an ECS instance, add the corresponding CIDR blocks to the security groups of the ECS instance. The added CIDR blocks must include those for the HBase nodes and TSD nodes. HBase is the underlying data storage system for OpenTSDB. For more information, see Cases for configuring ECS security groups and VPC FAQ.

  3. Then, add the corresponding CIDR blocks to the whitelist of the TSDB instance that runs on the cloud. For more information, you can navigate through Quick Start > Set the IP address whitelist in the TSDB documentation.

Step 5: Synchronize data

Click the Run icon to run the sync node. Figure 14 shows an example of the execution process.

Run the sync node to migrate data from OpenTSDB to TSDB

Figure 13: Run the sync node to migrate data from OpenTSDB to TSDB

Step 6: Create exclusive resource groups

By default, shared resource groups of DataWorks are used to run nodes. The shared resources may be preempted, and the performance for data migration may be negatively affected. If you have high requirements for the performance, we recommend that you create exclusive resource groups. For more information, see DataWorks exclusive resources. To purchase and use exclusive resource groups, see Exclusive resource mode.