This topic describes how to use DataWorks to connect MaxCompute to LindormDFS of ApsaraDB for Lindorm (Lindorm). This way, you can synchronize data between MaxCompute and LindormDFS. You can synchronize data from MaxCompute to LindormDFS or from LindormDFS to MaxCompute.

Note MaxCompute, LindormDFS, and DataWorks must be deployed in the same region.

Before you begin

  1. Activate LindormDFS. For more information, see Activate the LindormSearch service.
  2. Activate MaxCompute. For more information, see Activate MaxCompute and DataWorks.
  3. Activate DataWorks. For more information, see DataWorks overview.

Procedure

  1. Configure an exclusive resource group that is used for data integration in DataWorks. For more information, see Exclusive resource group mode . When you configure DataWorks, take note of the following points:
    • The exclusive resource group for data integration in DataWorks and LindormDFS must be deployed in the same zone.
    • If you encounter one of the following issues, submit a ticket to obtain DataWorks technical support. Issue 1: No DataWorks resources are available in the zone in which LindormDFS is deployed. Issue 2: The exclusive resource group for data integration in DataWorks and LindormDFS are deployed in the same region but in different zones.
    • DataWorks and LindormDFS must use the same virtual private cloud (VPC) and vSwitch. For more information, see Exclusive resource group mode.
  2. Configure a task to synchronize data.

    For more information about how to configure a task to synchronize data, see Configure a sync node by using the codeless UI. You can perform the following steps to configure the parameters:

    1. Configure a data source and a data destination.

      If you synchronize data from MaxCompute to LindormDFS, configure HDFS as the data destination.

      If you synchronize data from LindormDFS to MaxCompute, configure HDFS as the data source.

    2. You can also use a script to configure a task to synchronize data. You can perform the operations as instructed.
    3. Add the configuration parameters of LindormDFS to the configuration script.
      • If you synchronize data to LindormDFS, configure HDFS Writer. For more information, see HDFS Writer.
      • If you synchronize data from LindormDFS, configure HDFS Reader. For more information, see HDFS Reader. Configure a data source when you configure HDFS Writer or HDFS Reader. For more information about how to configure a data source, see Configure an HDFS connection. You can specify only an exclusive resource group for data integration as an HDFS data source. Therefore, before you configure a data source, configure an exclusive resource group for data integration . For more information, see Create and use an exclusive resource group for Data Integration.
        Note
        • The default resource group of MaxCompute does not support the parameters that are used to configure high availability (HA) for Hadoop. If you need to connect to LindormDFS in HA mode, create a custom resource group. For more information, see Create a custom resource group for Data Integration. When you specify a data source, configure defaultFS and the Hadoop configuration. If you want to obtain information about defaultFS and the Hadoop configuration, click Generate Configuration Items in the Lindorm console.
        • If an error that indicates a network connection timeout occurs when you configure a data source, troubleshoot the issue based on the description in Select a network connectivity solution.

Verify that data is synchronized from MaxCompute to LindormDFS

The following example is provided to verify that data is synchronized from MaxCompute to LindormDFS.

  1. Create a test table in MaxCompute. For more information, see Create a table in MaxCompute.
    CREATE TABLE IF NOT EXISTS maxcompute2lindormstore
    (
     name             STRING COMMENT 'Name',
     gender           STRING COMMENT 'Gender',
     age              INT COMMENT 'Age',
    );
  2. Insert test data into the test table.
    insert into maxcompute2lindormstore values('User 1','Male',20);
    insert into maxcompute2lindormstore values('User 2','Male',20);
    insert into maxcompute2lindormstore values('User 3','Female',20);
    insert into maxcompute2lindormstore values('User 4','Female',20);
  3. Create a directory on LindormDFS.
    hadoop fs -mkdir hdfs://${Instance ID}/maxcompute2lindormstore

    In the command, ${Instance ID} specifies the ID of the Lindorm instance for which LindormDFS is activated.

  4. Write a script in DataWorks to synchronize data.

    Write a script to configure MaxCompute Reader and HDFS Writer. For more information, see MaxCompute Reader and HDFS Writer.

    {
        "type": "job",
        "steps": [
            {
                "stepType": "odps",
                "parameter": {
                    "partition": [],
                    "datasource": "odps_first",
                    "column": [
                        "*"
                    ],
                    "emptyAsNull": false,
                    "table": "maxcompute2lindormstore"
                },
                "name": "Reader",
                "category": "reader"
            },
            {
                "stepType": "hdfs",
                "parameter": {
                    "path": "/maxcompute2lindormstore",
                    "fileName": "maxcompute2lindormstore",
                    "datasource": "xxxx",
                    "column": [
                        {
                            "name": "name",
                            "type": "string"
                        },
                        {
                            "name": "gender",
                            "type": "string"
                        },
                        {
                            "name": "age",
                            "type": "int"
                        }
                    ],
                    "writeMode": "append",
                    "encoding": "UTF-8",
                    "fieldDelimiter": ",",
                    "fileType": "text"
                },
                "name": "Writer",
                "category": "writer"
            }
        ],
        "version": "2.0",
        "order": {
            "hops": [
                {
                    "from": "Reader",
                    "to": "Writer"
                }
            ]
        },
        "setting": {
            "errorLimit": {
                "record": ""
            },
            "speed": {
                "throttle": false,
                "concurrent": 2
            }
        }
    }
    {
        "type": "job",
        "steps": [
            {
                "stepType": "odps",
                "parameter": {
                    "partition": [],
                    "datasource": "odps_first",
                    "column": [
                        "*"
                    ],
                    "emptyAsNull": false,
                    "table": "maxcompute2lindormstore"
                },
                "name": "Reader",
                "category": "reader"
            },
            {
                "stepType": "hdfs",
                "parameter": {
                    "path": "/maxcompute2lindormstore",
                    "fileName": "maxcompute2lindormstore",
                    "datasource": "xxxx",
                    "column": [
                        {
                            "name": "name",
                            "type": "string"
                        },
                        {
                            "name": "gender",
                            "type": "string"
                        },
                        {
                            "name": "age",
                            "type": "int"
                        }
                    ],
                    "writeMode": "append",
                    "encoding": "UTF-8",
                    "fieldDelimiter": ",",
                    "fileType": "text"
                },
                "name": "Writer",
                "category": "writer"
            }
        ],
        "version": "2.0",
        "order": {
            "hops": [
                {
                    "from": "Reader",
                    "to": "Writer"
                }
            ]
        },
        "setting": {
            "errorLimit": {
                "record": ""
            },
            "speed": {
                "throttle": false,
                "concurrent": 2
            }
        }
    }
  5. Load the created exclusive resource group for data integration and run the script.
  6. Check whether data is synchronized from MaxCompute to LindormDFS.
    hadoop fs -cat /maxcompute2lindormstore/*

Verify that data is synchronized from LindormDFS to MaxCompute

The following example is provided to verify that data is synchronized from LindormDFS to MaxCompute.

Note The test data used in this section is the data that you synchronize from MaxCompute to LindormDFS in the Verify that data is synchronized from MaxCompute to LindormDFS section. In this section, you synchronize the test data from LindormDFS to another table in MaxCompute.
  1. Create a test table in MaxCompute. For more information, see Create a table in MaxCompute.
    CREATE TABLE IF NOT EXISTS lindormstore2maxcompute
    (
     name             STRING COMMENT 'Name',
     gender           STRING COMMENT 'Gender',
     age              INT COMMENT 'Age'
    );
  2. Write a script in DataWorks to synchronize data. Write a script to configure MaxCompute Reader and HDFS Writer. For more information, see MaxCompute Reader and HDFS Writer.
    {
        "type": "job",
        "steps": [
            {
                "stepType": "hdfs",
                "parameter": {
                    "path": "/maxcompute2lindormstore",
                    "fileName": "maxcompute2lindormstore*",
                    "datasource": "xxxx",
                    "column": [
                        {
                            "index": 0,
                            "type": "string"
                        },
                        {
                            "index": 1,
                            "type": "string"
                        },
                        {
                            "index": 2,
                            "type": "long"
                        }
                    ],
                    "encoding": "UTF-8",
                    "fieldDelimiter": ",",
                    "fileType": "text"
                },
                "name": "Reader",
                "category": "reader"
            },
            {
                "stepType": "odps",
                "parameter":{
                    "partition":"",
                    "truncate":true,
                    "compress":false,
                    "datasource":"odps_first",
                    "column": [
                            "name",
                            "gender",
                            "age"
                    ],
                    "guid": null,
                    "emptyAsNull": false,
                    "table": "lindormstore2maxcompute"
                },
                "name": "Writer",
                "category": "writer"
            }
        ],
        "version": "2.0",
        "order": {
            "hops": [
                {
                    "from": "Reader",
                    "to": "Writer"
                }
            ]
        },
        "setting": {
            "errorLimit": {
                "record": ""
            },
            "speed": {
                "concurrent": 2,
                "throttle": false
            }
        }
    }
  3. Load the created exclusive resource group for data integration and run the script.
  4. Check whether the data is synchronized from LindormDFS to MaxCompute.