Background information

This topic describes how to use the data integration feature provided by DataWorks to migrate data from OpenTSDB to Lindorm TSDB. DataWorks is a platform as a service (PaaS) provided by Alibaba Cloud. DataWorks offers all-around services, such as Data Integration, Create a solution, DataService Studio, DataAnalysis. It also provides an end-to-end data development and management console to help enterprises mine and explore the value of data. This topic provides an example on how to migrate data. DataStudio is used in the example.

You can use DataWorks to migrate data to TSDB Writer from various data sources, including TSDB Reader, Configure OpenTSDB Reader, Configure Prometheus Reader, Configure InfluxDB Reader, and Configure MySQL Reader.

Quick Start

  1. Log on to the DataWorks console

    Log on to the DataWorks console.For more information about the console, see Overview of the DataWorks console. If no workspace exists, create a workspace. For more information about how to create a workspace, see Create a workspace. After the workspace is created, you can view the information about the workspace on the Workspaces page.

  2. Use DataStudio to create a sync node

    In the left-side navigation pane of the homepage of DataStudio, right-click Business Flow and click Create Workflow.

    In the dialog box that appears, specify a workflow name, such as migration_from_opentsdb_to_tsdb.

    To create a sync node, perform the three steps.

    In the dialog box that appears, specify a sync node name, such as node1.

    After the sync node is created, you can view the sync node in the blank area on the right of the page. Double-click the sync node to open the configuration page for the sync node.

    By default, the codeless UI is used to configure the sync node. To use the code editor in the platform to configure the sync node, click the Switch to Code Editor icon on the rightmost side of the tool bar.

    The default source data store for the sync node is Stream Reader and the default destination data store is Stream Writer. Stream Reader creates random character strings. Stream Writer receives the strings and prints them. For more information, click the links provided at the top of the code editor to view the documentation.

    No external resources are required to migrate data from Stream Reader to Stream Writer. You can click the Run icon in the upper-left corner of the code editor to view the execution process of the script.

  3. Modify the configuration of the sync node.

    Change the source data store to OpenTSDB and the destination data store to TSDB.

    Click the Apply Template icon to apply a configuration template.

    In the dialog box that appears, select OpenTSDB for the Source Connection Type parameter and TSDB for the Target Connection Type parameter.

    Click OK. The value of the first stepType parameter changes to opentsdb and the value of the second stepType parameter changes to tsdb. In addition, the two links at the top of the code editor are renamed Configure OpenTSDB Reader and How to Configure tsdb Writer.

    Modify the configuration based on the instructions in the topics provided by the two links. The following five parameters are required: endpoint, column, beginDateTime, endDateTime, and endpoint. The first endpoint parameter specifies the endpoint of OpenTSDB. The column parameter specifies the list of metrics to synchronize. The beginDateTime and endDateTime parameters specify the time range in which data to be migrated was generated. The second endpoint parameter specifies the endpoint of Lindorm TSDB. The following code provides an example:
    {
        "type": "job",
        "steps": [
            {
                "stepType": "opentsdb",
                "parameter": {
                    "endpoint": "http://host:4242",
                    "column": [
                        "m"
                    ],
                    "beginDateTime": "20190101000000",
                    "endDateTime": "20190101030000"
                },
                "name": "Reader",
                "category": "reader"
            },
            {
                "stepType": "tsdb",
                "parameter": {
                    "endpoint": "http://host:8242"
                },
                "name": "Writer",
                "category": "writer"
            }
        ],
        "version": "2.0",
        "order": {
            "hops": [
                {
                    "from": "Reader",
                    "to": "Writer"
                }
            ]
        },
        "setting": {
            "errorLimit": {
                "record": "0"
            },
            "speed": {
                "throttle": false,
                "concurrent": 1,
                "dmu": 1
            }
        }
    }
  4. Edit a whitelist
    To use the default resource group of DataWorks, you must add the CIDR blocks and IP addresses that correspond to the region of the resource group to the whitelist. To migrate data from OpenTSDB to Lindorm TSDB, add the CIDR blocks and IP addresses to the whitelist of OpenTSDB and the whitelist of Lindorm TSDB.
    1. Obtain the required CIDR blocks and IP addresses based on the region in which your DataWorks workspace is deployed. For more information, navigate through DataWorks 2.0 > Data Integration > Network connectivity > Configure a whitelist.
    2. If your self-managed OpenTSDB instances are hosted on Elastic Compute Service (ECS) instances, add the corresponding CIDR blocks and IP addresses to the security groups of the ECS instances that host the OpenTSDB instances. The CIDR blocks and IP addresses of the ApsaraDB for HBase instances that are used for data storage and the Lindorm TSDB instances must be also added to the security groups. For more information, see Use cases of ECS security groups and FAQ.
    3. Add the CIDR blocks and IP addresses to the whitelists of your Lindorm TSDB instances deployed on the cloud. For more information, see Configure a whitelist.
  5. Synchronize data

    Click the Run icon to run the sync node.

  6. Configure exclusive resource groups

    By default, shared resource groups of DataWorks are used to run the sync node. Resource preemption may occur. Therefore, high performance of data migration cannot be ensured. If you require high performance, we recommend that you configure exclusive resource groups for the sync node. For more information about exclusive resource groups, see DataWorks exclusive resources. For more information about how to purchase and use exclusive resource groups, see Exclusive resource group mode.