After the Change Data Capture (CDC) feature of SQL Server is enabled, SQLServer CDC Reader allows Data Integration to monitor and collect logs about inserting, updating, and deleting tables in SQL Server. Then, Data Integration converts the logs to real-time messages. This way, data is synchronized from SQL Server in real time.

Prerequisites

The CDC feature is enabled before you configure SQLServer CDC Reader.
-- ====
-- Enable Database for CDC template, databaseName: MyDB
-- ====
USE MyDB
GO
EXEC sys.sp_cdc_enable_db
GO

Procedure

  1. Go to the DataStudio page.
    1. Log on to the DataWorks console.
    2. In the left-side navigation pane, click Workspaces.
    3. Select the region where the required workspace resides, find the workspace, and then click Data Analytics.
  2. Move the pointer over the Create icon and choose Data Integration > Real-time synchronization.
    Alternatively, you can click the required workflow, right-click Data Integration, and then choose Create > Real-time synchronization.
  3. In the Create Node dialog box, set the Sync Method parameter to End-to-end ETL and set the Node Name and Location parameters.
    Create Node
    Note The node name must be 1 to 128 characters in length. It can contain letters, digits, underscores (_), and periods (.).
  4. Click Submit.
  5. On the configuration tab of the real-time sync node, drag SqlServer under Input to the canvas.
  6. Click SqlServer and set the parameters in the Node configuration section.
    Parameter Description
    Data source The connection to the SQL Server data store. In this example, you must select an SQL Server connection.

    If no connection is available, click New data source on the right to add one on the Data Source page.

    Table The name of the table from which data is to be synchronized in the specified data store. You can click Data preview on the right to preview the selected table.
    In the case of database and table sharding, data can be synchronized from multiple databases and tables in real time.
    Notice To avoid errors, the tables in database shards and table shards must use the same schema.
    Output field The fields to be synchronized, including management fields and data fields.
    • Management fields: additional fields that are automatically added to the destination tables when data is synchronized. These fields facilitate data management, sorting, and deduplication.
    • Data fields: the fields in the source tables to be synchronized.

    For more information, see Fields used for real-time synchronization.

    SQLServer CDC Reader supports database and table sharding. You can click Add a database and table data source and select the required data stores and tables from drop-down lists to synchronize data from multiple tables and databases.
    Notice To avoid errors, the selected tables must use the same schema.
  7. Click the Save icon icon in the toolbar.