edit-icon download-icon

Data sync when the network of data source (both sides) is disconnected

Last Updated: Apr 04, 2018

Scenarios

Complex network environments are characteristic of the following two conditions.

  • Either the data source or the data target is in the private network environment.

    VPC environment (except the RDS) <—> Public network environment

    Financial Cloud environment <—> Public network environment

    Local user-created environment without the public network <—> Public network environment

  • Both the data source and target are in the private network environment.

    VPC environment (except the RDS) <—> VPC environment (except the RDS)

    Financial Cloud environment <—> Financial Cloud environment

    Local user-created environment without the public network <—> Local user-created environment without the public network

    Local user-created environment without the public network <—> VPC environment (except the RDS)

    Local user-created environment without the public network <—> Financial Cloud environment

Data Integration provides the network penetration ability in the complex network environments. By deploying Data Integration agents, synchronous data transmission can be implemented between any network environments. The following describes the specific implementation logics and procedures and assumes that the network of both ends of data sources cannot be connected. For the scenarios where only one end is unreachable, see Database data sync (without public network IP)
.

Implementation logics

For the complex network environments where both ends of data sources are in the private network environment, deploy the Data Integration agent for the both ends under the same network environment, where the source agent is for pushing data to the Data Integration server and the target agent is for pulling the data to the local device. During data transmission, the transmission timeliness and security are guaranteed by data blocking, compression, and encryption.

The data synchronization method in this scenario is shown in the following figure.

02

Procedures

Configure the data sources

  1. Log on to the DataWorks console as a developer and click Enter Project to enter the project management page.

  2. Click Data Integration from the upper menu and navigate to the Offline Sync > Data Sources page.

  3. Click New Source to show the supported data source types. See the following figure.

  4. Select the data source without a public IP address from the FTP data sources.

    Add a source data source

    04

    Configuration item description:

    • Type: Data source without a public IP address.

    • Name: It is a combination of letters, numbers, and underscores (). It must begin with a letter or an underscore () and cannot exceed 60 characters.

    • Description: It is a brief description of the data source up to 80 characters.

    • Select resources group: It is the machine on which the agent is deployed. The source agent is for pushing data to the Data Integration server. To add source group, see Add scheduling resources.

    • Protocol: FTP or SFTP.

      *Host: The default FTP port is port 21 and the default SFTP port is port 22.

    • Username/Password: The username and password used to connect to the database.

    • Test Connectivity: Data sources with public IP addresses do not support connectivity tests. Click Finish to complete the source-end configuration.

    Add a target data source

    05

    Resource group: The machine on which the target agent is deployed. The target agent is for pulling data to the local device. To add source group, see Add scheduling resources.

Select the script mode

  1. Click Data Integration from the upper menu, and go to Sync Tasks page.

  2. Choose New > Script Mode on the page.

    06

    On the script mode page, select an appropriate template that contains key parameters of synchronization tasks, and enter the required information. Note that the script mode cannot be switched to the wizard mode.

  3. Select the FTP-to-FTP import template.

    07

    • Source type: The data source name is automatically selected base on the data source selected in the wizard mode.

    • Type of objective: You can select a target data source from the drop-down list.

    Note: If adding data sources on the page is supported by the database, you can select data sources from the template. If not, you must edit relevant data source information in JSON code section of the template and click Add Data Source directly.

  4. Configure a synchronization task, as shown in the following figure.

    08

    Configure the resources group: You can change and view the resource groups for the synchronization task. The default source and target groups are the resource groups that you have selected when you add the data source.

    1. {
    2. "configuration": {
    3. "setting": {
    4. "speed": {
    5. "concurrent": "1",//Number of concurrent tasks
    6. "mbps": "1"//Maximum task speed
    7. },
    8. "errorLimit": {
    9. "record": "0"//Maximum number of error records
    10. }
    11. },
    12. "reader": {
    13. "parameter": {
    14. "fieldDelimiter": ",",//Delimiter
    15. "encoding": "UTF-8",//Encoding format
    16. "column": [//Data source column
    17. {
    18. "index": 0,
    19. "type": "string"
    20. },
    21. {
    22. "index": 1,
    23. "type": "string"
    24. }
    25. ],
    26. "path": [//File path
    27. "/home/wb-zww354475/ww.txt"
    28. ],
    29. "datasource": "lzz_test3"//Data source name, which must be consistent with the name of the added data source
    30. },
    31. "plugin": "ftp"
    32. },
    33. "writer": {
    34. "parameter": {
    35. "writeMode": "truncate",//Writing mode
    36. "fieldDelimiter": ",",//Delimiter
    37. "fileName": "ww",//File name
    38. "path": "/home/wb-zww354475/ww_test",//File path
    39. "dateFormat": "yyyy-MM-dd HH:mm:ss",
    40. "datasource": "lzz_test4",//Data source name, which must be consistent with the name of the added data source
    41. "fileFormat": "csv"//File type
    42. },
    43. "plugin": "ftp"
    44. }
    45. },
    46. "type": "job",
    47. "version": "1.0"
    48. }

Run the synchronization task

You can run the synchronization task in the following methods.

Go to the Data Integration page and click Run.

Run the task by scheduling. For more information on scheduling submission steps, see Scheduling Configuration.

Thank you! We've received your feedback.