Object Storage Service (OSS) supports the following batch synchronization solutions: One-click batch synchronization to OSS (Cyclical Full), One-click batch synchronization to OSS (Cyclical Increment), One-click batch synchronization to OSS (Once Full), One-click batch synchronization to OSS (Once Increment), and One-click batch synchronization to OSS (Once Full then cyclical increment). This topic describes how to create a batch synchronization solution to synchronize all data in a database to OSS. In this example, One-click batch synchronization to OSS (Once Full then cyclical increment) is used.

Prerequisites

  1. The required data sources are configured. Before you configure a data synchronization solution, you must configure the data sources from which you want to read data and to which you want to write data. This way, you can select the data sources when you configure a data synchronization solution. For information about the data source types that support the solution-based synchronization feature and the configuration of a data source, see Supported data source types and read and write operations.
    Note For information about the items that you need to understand before you configure a data source, see Overview.
  2. An exclusive resource group for Data Integration that meets your business requirements is purchased. For more information, see Create and use an exclusive resource group for Data Integration.
  3. Network connections between the exclusive resource group for Data Integration and data sources are established. For more information, see Establish a network connection between a resource group and a data source.
  4. The data source environments are prepared. Before you configure a data synchronization solution, you must create an account that can be used to access a database and grant the account the permissions required to perform specific operations on the database based on your configurations for data synchronization. For more information, see Overview.

Procedure

  1. Step 1: Select a synchronization solution
  2. Step 2: Configure network connections for data synchronization
  3. Step 3: Configure the source and synchronization rules
  4. Step 4: Configure destination objects
  5. Step 5: Configure synchronization rules
  6. Step 6: Configure the resources required by the synchronization solution
  7. Step 7: Run the synchronization solution

Step 1: Select a synchronization solution

Go to the Data Integration page in the DataWorks console and click Create Data Synchronization Solution. On the Create Data Synchronization Solution page, select a source and a destination for data synchronization from the drop-down lists. Then, select One-click batch synchronization to OSS (Once Full then cyclical increment). For more information, see Create a synchronization solution.

Step 2: Configure network connections for data synchronization

Select a source, a destination, and a resource group that is used to run nodes. Test the network connectivity to make sure that the resource group is connected to the source and destination. For more information, see Configure network connections for data synchronization.

Step 3: Configure the source and synchronization rules

  1. In the Basic Configuration section, configure the parameters, such as the Solution Name and Location parameters, based on your business requirements.
  2. In the Data Source section, confirm the information about the source.
  3. In the Source Table section, select the tables from which you want to read data from the Source Table list. Then, click the Icon icon to add the tables to the Selected Source Table list.

    The Selected Source Table list displays all tables in the source. You can select all or specific tables.

  4. In the Conversion Rule for Table Name section, click Add Rule, select a rule type, and then configure a mapping rule of the selected type.
    By default, data in a source table is written to an OSS object that has the same name as the source table. You can specify a destination object name in a mapping rule to write data in multiple source tables to the same OSS object. You can use regular expressions to convert the names of the destination objects. You can also use built-in variables to add prefixes and suffixes to the names of destination objects. For more information, see Step 4: Select the source databases and tables and configure mapping rules.

Step 4: Configure destination objects

  1. Configure the parameters. The following table describes the parameters.
    ParameterDescription
    Destination PathThe directory for storing the data that is synchronized. If the directory that you specify does not exist, the system automatically creates a directory.
    Note If you want to store data of each day in a separate OSS folder, you can use the ${bdp.system.bizdate} variable to specify the directory.
    Object TypeThe format of the files that you want to write to the destination. Valid values: csv, text, and parquet.
    Note
    • If a file is written as a CSV file, the file must follow CSV specifications. If the data in the file contains column delimiters, the column delimiters are escaped by double quotation marks (").
    • If a file is written as a text file, the data in the file is separated by column delimiters. In this case, the column delimiters are not escaped.
    Column separatorThe column delimiter that is used in the files that you want to write to OSS. By default, commas (,) are used as column delimiters.
    Row DelimiterThe row delimiter that is used in the files that you want to write to OSS. Default value: \n.
    EncodingThe encoding format of the files that you want to write to OSS.
    null valueThe string that represents null. Standard strings cannot represent null in text files. Data Integration provides this parameter to define a string that represents a null pointer. For example, if you set the null value parameter to null, Data Integration considers null as a null pointer.
    Time formatThe format in which data of a time data type is written to OSS.
    Write Single ObjectSpecifies whether to write data in a source table as a single file to OSS. If you turn off the switch, data in a source table is written as multiple files to OSS, and random strings are used as suffixes for the names of the files.
    Note This parameter is available only for CSV files and TXT files. If you turn on the switch, you can select Replace the original file or Exit report error for the Solution to Prefix Conflicts parameter. If you turn on the switch and the volume of data in a single source table exceeds 10 GB, an error occurs.
    Solution to Prefix ConflictsThe solution to prefix conflicts. Valid values:
    • Replace the original file: All existing objects whose names start with the specified prefix are deleted from the destination directory before data is written to OSS. For example, if you set the object parameter to abc, all objects whose names start with abc are deleted before data is written to OSS.
    • Keep original files: OSS Writer writes all files to OSS and adds random UUIDs to file names as suffixes to ensure that the names of the files are different from the names of existing objects. For example, if you set the object parameter to DI, the names of the files written to OSS are in the DI_****_****_**** format.
      Note This option is not available when you write data in a source table as a single file.
    • Exit report error: If an OSS object whose name starts with the specified prefix exists, an error occurs. For example, if you set the object parameter to abc and an OSS object named abc123 exists, an error occurs.
  2. Configure mappings between source tables and destination OSS objects.
    Click Refresh source table and OSS Object mapping to generate destination OSS objects based on the rules you configured in the Conversion Rule for Table Name section in Step 3. If no mapping rule is configured in Step 3, data in the source tables is written to the destination objects that have the same names as the source tables by default. You can also change the method of generating an OSS object and add additional fields to a destination object.
    Note If you turn off the Write Single File switch, data in a single source table is written as multiple files to OSS. In this case, the prefix of the names of the source files is displayed as an OSS object name in this step, but the actual names of the destination objects are suffixed with a random string.
    ParameterDescription
    Object creation methodThe method of generating an object. You can select Use Existing Object or Create Object from the drop-down list in the Object creation method column.
    Edit additional fieldsYou can click Edit additional fields in the Actions column to add additional fields to a destination object and assign values to the fields. The values can be constants or variables.
    Note You can add additional fields only if you select Create Object from the drop-down list in the Object creation method column.

Step 5: Configure synchronization rules

  1. Configure rules for incremental data synchronization.
    You can use the WHERE clause to extract incremental data from the source tables. You need to enter only the WHERE clause in the Condition for Incremental Synchronization field. You do not need to enter the keyword WHERE. You can also use built-in system variables in the WHERE clause. For example, you can use the ${bdp.system.bizdate} variable to specify the data timestamp in the yyyymmdd format and use the ${bdp.system.cyctime} variable to specify the scheduling time in the yyyymmddhh24miss format.
    Note You can use scheduling parameters to specify the scope of the data that you want to synchronize and the location to which you want to write the data. For more information about how to use scheduling parameters, see Description for using scheduling parameters in data synchronization.
  2. Configure scheduling settings for data synchronization.
    Configure the parameters in the Recurrence section, such as Recurrence, Scheduling Period, and Pausing Scheduling. For more information about the parameters, see Configure time properties.

Step 6: Configure the resources required by the synchronization solution

This synchronization solution generates batch synchronization nodes for full data synchronization and incremental data synchronization. You can specify the names for the batch synchronization nodes and select resource groups for scheduling and resource groups for Data Integration. You can also view the maximum number of connections and parallel nodes allowed for the source database. If you want to perform fine-grained configurations for nodes, you can modify the related parameters in the Advanced Configuration section.

Note Batch synchronization nodes in DataWorks can run only after they are provisioned to resource groups for Data Integration by using resource groups for scheduling. Therefore, resource groups for scheduling are also required. If you run nodes on exclusive resource groups for scheduling, you are charged for scheduling instances. For more information, see Mechanism for issuing nodes.

Step 7: Run the synchronization solution

  1. Go to the Tasks page in Data Integration and find the created data synchronization solution.
  2. Click Submit and Run in the Actions column to run the data synchronization solution.
  3. Click Execution details in the Actions column to view the execution details of the data synchronization solution.

What to do next

After a data synchronization solution is configured, you can manage the solution. For example, you can add tables to or remove tables from the solution, configure alerting and monitoring settings for the nodes that are generated by the solution, and view information about the running of the nodes. For more information, see Perform O&M for a data synchronization solution.