edit-icon download-icon

Configure data synchronization task in script mode

Last Updated: Apr 17, 2018

Create a task in script mode

  1. Log on to the DataWorks console as a developer, and click Enter Project.

  2. Click Data Integration in the upper menu and navigate to the Sync Tasks page.

  3. Choose New > Script Model on the page.

    Script-Mode

  4. Select a Source Type and a Type of objective in the Import Templates dialog box, as shown in the following figure.

    ImportTemplates

  5. Click OK to enter the script mode configuration page and complete the configuration as needed. For more information, see the following figure.

    HelpManual

    If you have any questions, click Help Manual in the upper-right corner.

  6. Click Save.

    Note:

    • If you want to select a new template, click Import Templates in the toolbar. Note that the existing content is overwritten once the new template is imported. Hence, proceed with caution.

    • You can click Switch to Script in the created wizard mode if you want to switch to the script mode.

Basic configurations of the script mode

The configurations of Data Integration at the JSON framework level include the following information.

  1. {
  2. "type": "job",
  3. "version": "1.0",
  4. "configuration": {
  5. "settting": {
  6. "key": "value"
  7. },
  8. "reader": {
  9. "plugin": "Enter the storage type of source data",
  10. "parameter": {
  11. "key": "value"
  12. }
  13. },
  14. "writer": {
  15. "plugin": "Enter the storage type of target data".
  16. "parameter": {
  17. "key": "value"
  18. }
  19. }
  20. }
  21. }
  • type: It specifies the currently submitted synchronization task and only supports the Job parameter. The entry must be Job.

  • version: Currently, only the version number 1.0 is supported for all jobs, and thus the entry must be 1.0.

System tuning configuration

The job setting field describes the job configuration parameters related to the global information, except the source and the target, such as job throttling and job type conversion.

  1. {
  2. "type": "job",
  3. "version": "1.0",
  4. "configuration": {
  5. "setting": {
  6. "errorLimit": {},
  7. "speed": {}
  8. }
  9. }
  10. }
  • configuration.setting.errorLimit (Dirty data control)

    It supports customizing the monitoring and alarm of dirty data, including setting a threshold of dirty data records. If the number of dirty data records exceeds the threshold during a job transmission, the job is aborted with an error. See the following:

    1. {
    2. "type": "job",
    3. "version": "1.0",
    4. "configuration": {
    5. "setting": {
    6. "errorLimit": {
    7. "record": 1024
    8. }
    9. }
    10. }
    11. }

    In the preceding configuration, the errorLimit threshold is set as 1024, which means the job is aborted with an error when the number of dirty data records exceeds 1,024 during the job transmission.

  • configuration.setting.speed (Throttling)

It supports channel throttling, which means you can specify a maximum bandwidth for each job.

When the transmission bandwidth is 1 Mbit/s, the configuration appears as follows:

  1. {
  2. "type": "job",
  3. "configuration": {
  4. "setting": {
  5. "speed": {
  6. "mbps": 1
  7. }
  8. }
  9. }
  10. }

Note:

  • The traffic measured value is a Data Integration metric and does not represent the actual NIC traffic. In most cases, the NIC traffic is two to three times of the channel traffic, which depends on the serialization of the data storage system.

  • The splitting key is inapplicable to a single semi-structured file. For multiple files, a maximum job rate can be set to improve the synchronization speed, and the maximum job rate is determined by the number of files. For example, the maximum job rate can be set up to n Mbit/s for n files, and the synchronization speed of n Mbit/s is applied even if n+1 Mbit/s is set as the maximum rate while n-1 Mbit/s is applied if the maximum rate is set as n-1 Mbit/s.

  • Only when a maximum job rate and a splitting key are configured for a relational database, table splitting can be performed according to the set maximum job rate. Relational databases only support numeric splitting keys. The Oracle databases support both numeric and string splitting keys.

Thank you! We've received your feedback.