MaxCompute supports the following batch synchronization solutions: One-click batch synchronization to MaxCompute (Cyclical Full), One-click batch synchronization to MaxCompute (Cyclical Increment), One-click batch synchronization to MaxCompute (Once Full), One-click batch synchronization to MaxCompute (Once Increment), and One-click batch synchronization to MaxCompute (Once Full then cyclical increment). This topic describes how to create a batch synchronization solution to synchronize all data in a database to MaxCompute. In this example, One-click batch synchronization to MaxCompute (Once Full then cyclical increment) is used.

Prerequisites

  1. The required data sources are configured. Before you configure a data synchronization solution, you must configure the data sources from which you want to read data and to which you want to write data. This way, you can select the data sources when you configure a data synchronization solution. For information about the data source types that support the solution-based synchronization feature and the configuration of a data source, see Supported data source types and read and write operations.
    Note For information about the items that you need to understand before you configure a data source, see Overview.
  2. An exclusive resource group for Data Integration that meets your business requirements is purchased. For more information, see Create and use an exclusive resource group for Data Integration.
  3. Network connections between the exclusive resource group for Data Integration and data sources are established. For more information, see Establish a network connection between a resource group and a data source.
  4. The data source environments are prepared. Before you configure a data synchronization solution, you must create an account that can be used to access a database and grant the account the permissions required to perform specific operations on the database based on your configurations for data synchronization. For more information, see Overview.

Background information

For more information about the data synchronization solutions and how to write data to a partitioned table, see Overview of the solution-based synchronization feature.

Procedure

  1. Step 1: Select a synchronization solution
  2. Step 2: Configure network connections for data synchronization
  3. Step 3: Configure the source and synchronization rules
  4. Step 4: Configure destination tables
  5. Step 5: Configure synchronization rules
  6. Step 6: Configure the resources required by the synchronization solution
  7. Step 7: Run the synchronization solution

Step 1: Select a synchronization solution

Go to the Data Integration page in the DataWorks console and click Create Data Synchronization Solution. On the Create Data Synchronization Solution page, select a source and a destination for data synchronization from the drop-down lists. Then, select One-click batch synchronization to MaxCompute (Once Full then cyclical increment). For more information, see Create a synchronization solution.

Step 2: Configure network connections for data synchronization

Select a source, a destination, and a resource group that is used to run nodes. Test the network connectivity to make sure that the resource group is connected to the source and destination. For more information, see Configure network connections for data synchronization.

Step 3: Configure the source and synchronization rules

  1. In the Basic Configuration section, configure the parameters, such as the Solution Name and Location parameters, based on your business requirements.
  2. In the Data Source section, confirm the information about the source.
  3. In the Source Table section, select the tables from which you want to read data from the Source Table list. Then, click the Icon icon to add the tables to the Selected Source Table list.

    The Selected Source Table list displays all tables in the source. You can select all or specific tables.

  4. In the Mapping Rules for Table Names section, click Add Rule, select a rule type, and then configure a mapping rule of the selected type.
    By default, data in a source table is written to a MaxCompute table that has the same name as the source table. You can configure a mapping rule to synchronize data from multiple tables in the source to the same table in the destination. You can also configure a mapping rule to synchronize data from source tables whose names start with a specified prefix to the destination tables whose names start with another specified prefix. You can use regular expressions to convert the names of the destination tables. You can also use built-in variables to add prefixes and suffixes to the names of destination tables. For more information, see Step 4: Select the source databases and tables and configure mapping rules.

Step 4: Configure destination tables

  1. Configure the Automatic Partitioning by Time parameter.
    You can set the Automatic Partitioning by Time parameter to only Partitioned Table. You can click the Edit icon to specify the name of the partition field for the destination table.
  2. Configure mappings between source tables and destination MaxCompute tables.
    Click Refresh source table and MaxCompute Table mapping to generate destination tables based on the rules you configured in the Mapping Rules for Table Names section in Step 3. If no mapping rule is configured in Step 3, data in the source tables is written to the destination tables that have the same names as the source tables. If no destination tables that have the same names as the source tables exist, the system automatically creates such destination tables. You can also change the method of generating a table and add additional fields to a source table.
    Note The name of the destination table is generated based on the mapping rules that you configured in the Mapping Rules for Table Names section.
    OperationDescription
    Select a primary key for a source tableSource tables that do not have primary keys cannot be synchronized. If a source table does not have a primary key, you can click the Edit icon in the Synchronized Primary Key column to specify one or more fields in the table as the primary key of the table.
    Select the method of generating a destination tableYou can select Create Table or Use Existing Table from the drop-down list in the Table creation method column.
    • Use Existing Table: If you select this method, you must select the desired table from the drop-down list in the MaxComputeTable name column.
    • Create Table: If you select this method, the name of the MaxCompute table that is automatically created appears in the MaxComputeTable name column. You can click the table name to view and modify the table creation statement.
    Edit a destination tableBy default, the lifecycle of MaxCompute tables that are created by using the Create Table method is 30 days and field type conversion may occur. For example, if the data types of the fields in a destination table are different from the data types of the fields in a source table, the data synchronization solution converts the data types of the fields in the source table to the data types that can be written to the destination table. You can click the name of a destination table in the MaxComputeTable name column to modify the lifecycle or field types of the table.

Step 5: Configure synchronization rules

  1. Configure rules for full data synchronization.
    You can configure the Clear Original Table Before Data Write parameter based on your business requirements. If you select Yes, all existing data in the MaxCompute table is deleted before data is written to the table. Proceed with caution.
  2. Configure rules for incremental data synchronization.
    You can use the SQL WHERE clause to extract incremental data from the source tables. You need to enter only the WHERE clause in the Condition for Incremental Synchronization field. You do not need to enter the keyword WHERE. You can use built-in system variables in the WHERE clause. For example, you can use the ${bdp.system.bizdate} variable to specify the data timestamp and use the ${bdp.system.cyctime} variable to specify the scheduling time.
    Note You can use scheduling parameters to specify the scope of the data that you want to synchronize and the location to which you want to write the data. For more information about how to use scheduling parameters, see Description for using scheduling parameters in data synchronization.
  3. Configure scheduling settings for data synchronization.
    Configure the parameters in the Recurrence section, such as Recurrence, Scheduling Period, and Pausing Scheduling. For more information about the parameters, see Configure time properties.

Step 6: Configure the resources required by the synchronization solution

This synchronization solution generates batch synchronization nodes for full data synchronization and incremental data synchronization. You can specify the names for the batch synchronization nodes and select resource groups for scheduling and resource groups for Data Integration. You can also view the maximum number of connections and parallel nodes allowed for the source database. If you want to perform fine-grained configurations for the nodes, you can modify the related parameters in the Advanced Configuration section.
Note Batch synchronization nodes in DataWorks can run only after they are provisioned to resource groups for Data Integration by using resource groups for scheduling. Therefore, resource groups for scheduling are also required. If you run nodes on exclusive resource groups for scheduling, you are charged for scheduling instances. For more information, see Mechanism for issuing nodes.

Step 7: Run the synchronization solution

  1. Go to the Tasks page in Data Integration and find the created data synchronization solution.
  2. Click Submit and Run in the Actions column to run the data synchronization solution.
  3. Click Execution details in the Actions column to view the execution details of the data synchronization solution.

What to do next

After a data synchronization solution is configured, you can manage the solution. For example, you can add tables to or remove tables from the solution, configure alerting and monitoring settings for the nodes that are generated by the solution, and view information about the running of the nodes. For more information, see Perform O&M for a data synchronization solution.