edit-icon download-icon

Data integration when the network of data source (one side only) is disconnected

Last Updated: Apr 04, 2018

Scenarios

Complex network environments are characteristic of the following two conditions.

  • Either the data source or the data target is in the private network environment.

    • VPC environment (except the RDS) <—> Public network environment

    • Financial Cloud environment <—> Public network environment

    • Local user-created environment without the public network <—> Public network environment

  • Both the data source and target are in the private network environment.

    • VPC environment (except the RDS) <—> VPC environment (except the RDS)

    • Financial Cloud environment <—> Financial Cloud environment

    • Local user-created environment without the public network <—> Local user-created environment without the public network

    • Local user-created environment without the public network <—> VPC environment (except the RDS)

    • Local user-created environment without the public network <—> Financial Cloud environment

Data Integration provides the network penetration ability in complex network environments. By deploying Data Integration agents, synchronous data transmission can be implemented between any network environments. The following describes the specific implementation logics and procedures and assumes that the network of one end of data sources cannot be connected. For the scenarios where both ends are unreachable, see Data sync when the network of data source (both sides) is disconnected.

Implementation logics

For the complex network environments where either the data source or the data target is in the private network environment, deploy the Data Integration agent on the machine in the same network environment as that of the end which is in the private environment and connect to the external public network through the agent. Private network environments are characteristic of the following two conditions:

  • The database built on ECS is purchased with no public IP address or elastic public IP address assigned.

  • Local IDCs with no public IP address.

ECS

The data synchronization method in this scenario is shown in the following figure.

1

  • Because ECS2 server cannot access the public network, an ECS1 machine that is in the same network segment as ECS2 and has the ability to access the public network is required for agent deployment.

  • Set ECS1 as the resource group, and run the synchronization task on the machine.

    Note:

    You must grant database permissions to the ECS2 server to access relevant database and read the data of the database to ECS1. The command for granting permissions is as follows.

    1. grant all privileges on *.* to 'demo_test'@'%' identified by 'password'; --> "%" stands for authorizing all the IP addresses<br>

The user-created data source synchronization task on ECS2 runs in the custom resource group. To authorize the machine of the custom resource group, you must add internal and external IP address and the port of ECS2 to the safety group of ECS1. See Add security group.

Local IDCs with no public IP address

The data synchronization method in this scenario is shown in the following figure.

2

  • Because machine 1 cannot access the public network, machine 2 is in the same network segment as machine 1 and has the ability to access the public network is required for agent deployment.

  • Set machine 2 as the scheduling resource group, and run the synchronization task on the machine.

Configure the data source

Procedure

  1. Log on to the DataWorks console as a developer and click Enter Project to enter the relevant project.

  2. Click Data Integration from the upper menu and go the Offline Sync > Data Sources page.

  3. Click New Source to show the supported data source types.

  4. Select the data source without a public IP address from the data sources for the relational database MySQL.

    • Source data source (with no public IP):

    3

    The configuration items are as follows.

    • Type: Data source without a public IP address.

    • Name: It is a combination of letters, numbers, and underscores ().It must begin with a letter or an underscore () and must not exceed 60 characters.

    • Description: It is a brief description of the data source that must not exceed 80 characters.

    • Select resources group: The machine on which the target agent is deployed to connect to the external public network. The synchronization task of data source in special network environment can run in the resource group. To add source group, see Add scheduling resources.

      • JDBCUrl: The JDBC URL. Format: jdbc:mysql://ServerIP:Port/database.

      • Username/Password: The username and password used to connect to the database.

      • Test Connectivity: Data sources with public IP addresses do not support connectivity tests. You can click Complete to complete the source-end configuration.

    • Target data source (with a public network):

    4

    Configuration item description:

    • Name: It is a combination of letters, numbers, and underscores (). It must begin with a letter or an underscore () and must not exceed 60 characters.

    • Description: It is a brief description of the data source characters that must not exceed 80 characters.

    • ODPS endpoint: Defaults to read-only. The value is auto-read from the system configuration.

    • ODPS item name: The corresponding MaxCompute project indicator.

    • Access ID: The Access ID corresponding to the MaxCompute project owner’s cloud account.

    • AccessKey: The AccessKey corresponding to the MaxCompute project owner’s cloud account is used with an Access ID. The AccessKey (AK) is the same with the password.

    • Test Connectivity: The connectivity test is supported.

Configure a synchronization task

  1. Select the source. Because the data source has no public IP, the network of the data source is unavailable. You must run the synchronization task in the script mode. Click Convert Scripts.

    5

  2. Import a template.

    6
    Parameter description:

    -Source type: The data source name is automatically selected base on the data source selected in the wizard mode.

    • Type of objective: You can select a target data source from the drop-down list.

      Note:

      If adding data sources on the page is supported by the database, you can select data sources from the template. If not, you must edit relevant data source information in JSON code section of the template and click Add Data Source.

  3. An example of how to switch into the script mode.

    7

    configuration tasks resources group: You can change and view the resource groups for the synchronization task. Collapsed by default.

    1. {
    2. "type": "job",
    3. "configuration": {
    4. "setting": {
    5. "speed": {
    6. "concurrent": "1",//Number of concurrent tasks
    7. "mbps": "1"//Maximum task speed
    8. },
    9. "errorLimit": {
    10. "record": "0"//Maximum number of error records
    11. }
    12. },
    13. "reader": {
    14. "parameter": {
    15. "splitPk": "id",//Delimiter
    16. "column": [//Column name of the target end
    17. "name",
    18. "tag",
    19. "age",
    20. "balance",
    21. "gender",
    22. "birthday"
    23. ],
    24. "table": "source",//Table name on the source end
    25. "where": "ds = '20171218'",//Filter condition
    26. "datasource": "private_source"//Data source name, which must be consistent with the name of the added data source
    27. },
    28. "plugin": "mysql"
    29. },
    30. "writer": {
    31. "parameter": {
    32. "partition": "ds='${bdp.system.bizdate}'",//Partition information
    33. "truncate": true,
    34. "column": [//Column of the target end
    35. "name",
    36. "tag",
    37. "age",
    38. "balance",
    39. "gender",
    40. "birthday"
    41. ],
    42. "table": "random_generated_data",//Table name of the target end
    43. "datasource": "odps_mrtest2222"//Data source name, which must be consistent with the name of the added data source
    44. },
    45. "plugin": "odps"
    46. }
    47. },
    48. "version": "1.0"
    49. }

Run the synchronization task

You can run the synchronization task in the following methods:

  • Click Run on the Data Integration page.

  • Run the task by scheduling. For more information, see Scheduling configuration.

Thank you! We've received your feedback.