All Products
Search
Document Center

DataWorks:LogHub (SLS) single table real-time synchronization to Data Lake Formation

Last Updated:Nov 14, 2025

Data Integration supports real-time synchronization of single table data from sources such as LogHub (SLS) and Kafka to Data Lake Formation through ETL. This topic describes how to synchronize single table data in real time from LogHub (SLS) to Data Lake Formation.

Limitations

Only Serverless resource groups are supported.

Prerequisites

Procedure

1. Select a synchronization task type

  1. Go to the Data Integration page.

    Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose Data Integration > Data Integration. On the page that appears, select the desired workspace from the drop-down list and click Go to Data Integration.

  2. In the left-side navigation pane, click Sync Task. Then, click Create Sync Task at the top of the page to go to the sync task creation page. Configure the following basic information:

    • Data Source And Destination: LogHubData Lake Formation

    • Task Name: Customize a name for the synchronization task.

    • Sync Type: Single Table Real-time.

2. Configure network and resources

  1. In the Network And Resources section, select a Resource Group for the synchronization task. You can allocate the number of compute units (CUs) for Task Resource Usage.

  2. For Source Data Source, select the added LogHub data source. For Destination Data Source, select the added Data Lake Formation data source, then click Test Connectivity.image

  3. After confirming that both the source and destination data sources are connected successfully, click Next.

3. Configure the synchronization link

1. Configure LogHub (SLS) source

Click the SLS data source at the top of the page to edit the SLS Source Information.

image

  1. In the SLS Source Information section, select the logstore in LogHub (SLS) that you want to synchronize.

    Retain default values for other parameters, or modify their configurations based on your business requirements.

  2. Click Data Sampling in the upper-right corner.

    In the dialog box that appears, specify the Start Time and Number Of Samples, and then click the Start Collection button. You can sample data in the logstore and preview the data, which provides input for data preview and visual configuration of subsequent data processing nodes.

  3. After you select a logstore, the data in the logstore is automatically loaded in the Output Field Configuration section, and corresponding field names are generated. You can adjust the Data Type, Delete fields, and Manually Add Output Fields.

    Note

    If a configuration or field does not exist in the Simple Log Service data source, NULL is written to the destination.

2. Edit data processing nodes

You can click the image icon to add data processing methods. The following data processing methods are supported: Data Masking, Replace String, Data filtering, JSON Parsing, and Edit Field and Assign Value. You can arrange the data processing methods based on your business requirements. When the synchronization task is run, data is processed based on the processing order that you specify.

image

After completing the configuration of a data processing node, you can click the Data Output Preview button in the upper-right corner. In the dialog box that appears, click Retrieve Upstream Output Again to simulate the result of the logstore sample data after it is processed by the current data processing node.

image

Note

The data output preview heavily depends on the Data Sampling of the LogHub (SLS) source. Before executing the data output preview, you need to complete data sampling in the LogHub (SLS) source form.

3. Configure Data Lake Formation destination information

Click the Data Lake Formation data destination at the top of the page to edit the Data Lake Formation destination information.

image

  1. In the Data Lake Formation Destination Information section, select whether to Automatically Create Table or Use Existing Table for the Data Lake Formation table to which you want to write data.

    • If you select to automatically create a table, a table with the same name as the data source table is created by default. You can manually modify the destination table name.

    • If you select to use an existing table, select the destination table to which you want to synchronize data from the drop-down list.

  2. (Optional) Modify the schema of a destination table.

    If you select Create tables automatically for the Destination Table parameter, click Edit Table Schema. In the dialog box that appears, edit the schema of the destination table that will be automatically created. You can also click Re-generate Table Schema Based on Output Column of Ancestor Node to re-generate a schema based on the output columns of an ancestor node. You can select a column from the generated schema and configure the column as the primary key.

    Note

    The destination table must have a primary key. Otherwise, the configurations cannot be saved.

  3. Configure mappings between fields in the source and fields in the destination.

    After you complete the preceding configuration, the system automatically establishes mappings between fields in the source and fields in the destination based on the same-name mapping principle. You can modify the mappings based on your business requirements. One field in the source can map to multiple fields in the destination. Multiple fields in the source cannot map to the same field in the destination. If a field in the source has no mapped field in the destination, data in the field in the source is not synchronized to the destination.

4. Configure alert rules

To prevent the failure of the synchronization task from causing latency on business data synchronization, you can configure different alert rules for the synchronization task.

  1. In the upper-right corner of the page, click Configure Alert Rule to go to the Configure Alert Rule panel.

  2. In the Configure Alert Rule panel, click Add Alert Rule. In the Add Alert Rule dialog box, configure the parameters to configure an alert rule.

    Note

    The alert rules that you configure in this step take effect for the real-time synchronization subtask that will be generated by the synchronization task. After the configuration of the synchronization task is complete, you can refer to Manage real-time synchronization tasks to go to the Real-time Synchronization Task page and modify alert rules configured for the real-time synchronization subtask.

  3. Manage alert rules.

    You can enable or disable alert rules that are created. You can also specify different alert recipients based on the severity levels of alerts.

5. Configure advanced parameters

DataWorks allows you to modify the configurations of specific parameters. You can change the values of these parameters based on your business requirements.

Note

To prevent unexpected errors or data quality issues, we recommend that you understand the meanings of the parameters before you change the values of the parameters.

  1. In the upper-right corner of the configuration page, click Configure Advanced Parameters.

  2. In the Configure Advanced Parameters panel, change the values of the desired parameters.

6. Configure resource groups

You can click Configure Resource Group in the upper-right corner of the page to view and change the resource groups that are used to run the current synchronization task.

7. Perform a test on the synchronization task

After the preceding configuration is complete, you can click Perform Simulated Running in the upper-right corner of the configuration page to enable the synchronization task to synchronize the sampled data to the destination table. You can view the synchronization result in the destination table. If specific configurations of the synchronization task are invalid, an exception occurs during the test run, or dirty data is generated, the system reports an error in real time. This can help you check the configurations of the synchronization task and determine whether expected results can be obtained at the earliest opportunity.

  1. In the dialog box that appears, configure the parameters for data sampling from the specified table, including the Start At and Sampled Data Records parameters.

  2. Click Start Collection to enable the synchronization task to sample data from the source.

  3. Click Preview to enable the synchronization task to synchronize the sampled data to the destination.

8. Run the synchronization task

  1. After the configuration of the synchronization task is complete, click Complete in the lower part of the page.

  2. In the Tasks section of the Synchronization Task page, find the created synchronization task and click Start in the Operation column.

  3. Click the name or ID of the synchronization task in the Tasks section and view the detailed running process of the synchronization task.

Synchronization task operation and maintenance

View task running status

After the synchronization task is created, you can go to the Synchronization Task page to view all synchronization tasks created in the workspace and the basic information of each task.

image

  • You can Start or Stop a synchronization task in the Operation column. In the More menu, you can Edit, View, and perform other operations on the synchronization task.

  • For tasks that have been started, you can see the basic running status in Execution Overview, or click the corresponding overview area to view execution details.

image

The synchronization task from LogHub (SLS) to Data Lake Formation consists of two steps: Schema Migration and Real-time Data Synchronization:

  • Schema Migration: Includes the creation method of the destination table (existing table or automatic table creation). If automatic table creation is selected, the DDL for creating the table will be displayed.

  • Real-time Data Synchronization: Includes statistics information for real-time synchronization, including real-time running information, DDL records, alert information, and more.

Rerun the synchronization task

In some special cases, if you want to modify the fields to synchronize, the fields in a destination table, or table name information, you can also click Rerun in the Operation column of the desired synchronization task. This way, the system synchronizes the changes that are made to the destination. Data in the tables that are already synchronized and are not modified will not be synchronized again.

  • Directly click Rerun without modifying the configurations of the synchronization task to enable the system to rerun the synchronization task.

  • Modify the configurations of the synchronization task and then click Complete. Click Apply Updates that is displayed in the Operation column of the synchronization task to rerun the synchronization task for the latest configurations to take effect.