All Products
Search
Document Center

DataWorks:Real-time synchronization from a single LogHub (SLS) Logstore to an OSS-HDFS data lake

Last Updated:Mar 26, 2026

Real-time log ingestion from multiple sources — such as Kafka and LogHub — into cloud object storage is a common requirement for data lake architectures. Data Integration enables you to synchronize data from a Single Log Service (SLS) Logstore to an OSS-HDFS data lake in real time, with support for Hudi, Paimon, and Iceberg write formats and optional inline data transformations.

Prerequisites

Before you begin, ensure that you have:

Create and configure a real-time sync task

The following steps walk you through creating a synchronization task that reads from an SLS Logstore and writes to an OSS-HDFS destination. The configuration flow has nine steps: select the task type, configure network and resources, configure the synchronization link (source, data processing, and destination), set alert rules, set advanced parameters, configure DDL handling, assign resource groups, run a simulated test, and start the task.

Step 1: Select a synchronization task type

  1. Log on to the DataWorks console. In the top navigation bar, select the region. In the left-side navigation pane, choose Data Integration > Data Integration. Select the workspace from the drop-down list and click Go to Data Integration.

  2. In the left-side navigation pane, click Synchronization Task, then click Create Synchronization Task. Configure the following settings:

    SettingValue
    Source And DestinationLogHubOSS-HDFS
    New Node NameA name you specify for the task
    Synchronization MethodSingle Logstore Realtime Sync

Step 2: Configure network settings and resources

  1. In the Network And Resource Configuration section, select a Resource Group for the synchronization task. Allocate compute units (CUs) under Task Resource Usage as needed.

  2. For Source, select the LogHub data source. For Destination, select the OSS-HDFS data source. Click Test Connectivity.

    Network and resource configuration

  3. After connectivity is confirmed, click Next.

Step 3: Configure the synchronization link

Configure the SLS source

In the wizard at the top of the page, click SLS to open SLS Source Information.

SLS source configuration
  1. In the SLS Source Information section, select the Logstore to synchronize data from.

  2. Click Data Sampling in the upper-right corner. Specify the Start Time and Sampled Data Records parameters, then click Start Collection. The system collects sample data from the Logstore for use in data preview and visual configuration of downstream processing nodes.

  3. The system automatically loads data from the Logstore and generates field names in the Output Field Configuration section. Adjust Data Type, delete fields, or click Manually Add Output Fields as needed.

    Note

    If an output field does not exist in the SLS data source, NULL is written to the destination.

Edit data processing nodes

Data processing nodes let you transform data between the source and destination. The following methods are supported:

MethodDescription
Data MaskingMask sensitive field values before writing to the destination
Replace StringFind and replace string values in a field
Data filteringFilter records based on field conditions
JSON ParsingParse JSON-formatted fields into structured columns
Edit Field and Assign ValueAdd or modify field values

Click the image icon to add a processing method. Arrange methods in the order you want them applied — data is processed in the order you specify when the task runs.

Data processing nodes

After configuring a processing node, click Preview Data Output in the upper-right corner, then click Retrieve Upstream Output Again to simulate the result after sample data passes through the current node.

Note

Preview Data Output requires completed data sampling from the SLS source. Complete Data Sampling in the SLS source form before using this feature.

Configure OSS-HDFS destination information

In the wizard at the top of the page, click OSS-HDFS to open OSS-HDFS Destination Information.

OSS-HDFS destination configuration
  1. Configure the destination settings:

    Note

    Cross-region metadatabase and metatable creation is not supported.

    SettingDescription
    Write FormatSelect Hudi, Paimon, or Iceberg
    Select Metadatabase Auto-build LocationIf Data Lake Formation (DLF) is activated for your account, the system can automatically create metadatabases and metatables in DLF when synchronizing data
    Storage Path SelectionSelect the OSS path where synchronized data will be stored
    Destination DatabaseSelect an existing database, or select Create Database and specify a Database Name to create a new DLF metadatabase
    Destination TableSelect Auto Create Table or Use Existing Table, then enter or select a Table Name
  2. (Optional) If you selected Auto Create Table, click Edit Table Schema to modify the destination table schema. Click Re-generate Table Schema Based on Output Column of Ancestor Node to regenerate the schema from upstream output columns. Select a column to configure it as the primary key.

  3. Review the field mappings between source and destination. The system maps fields automatically using the Map Fields with Same Name principle. Modify mappings as needed:

    • One source field can map to multiple destination fields.

    • Multiple source fields cannot map to the same destination field.

    • Source fields with no mapped destination field are not synchronized.

Step 4: Configure alert rules

  1. In the upper-right corner of the page, click Configure Alert Rule to open the Alert Rule Configurations for Real-time Synchronization Subnode panel.

  2. Click Add Alert Rule. In the dialog box, configure the alert parameters.

    Note

    These alert rules apply to the real-time synchronization subtask generated by this task. After completing the task configuration, you can modify alert rules on the Real-time Synchronization Task page. For more information, see Run and manage real-time synchronization tasks.

  3. Enable or disable rules as needed. Set different alert recipients based on alert severity.

Step 5: Configure advanced parameters

  1. In the upper-right corner of the configuration page, click Configure Advanced Parameters.

  2. In the Configure Advanced Parameters panel, modify parameter values as needed.

    Note

    Understand each parameter's meaning before changing its value to avoid unexpected errors or data quality issues.

Step 6: Configure DDL capabilities

DDL operations may be performed on the source. Click Configure DDL Capability in the upper-right corner to configure rules for processing DDL messages from the source.

Note

For more information, see Configure rules to process DDL messages.

Step 7: Configure resource groups

Click Configure Resource Group in the upper-right corner to view and change the resource groups used to run the synchronization task.

Step 8: Run a simulated test

Run a simulated test to verify the task configuration before going live. The system synchronizes sampled data to the destination table and reports errors or dirty data in real time if configuration issues are detected.

  1. Click Perform Simulated Running in the upper-right corner of the configuration page.

  2. In the dialog box, configure the sampling parameters:

    ParameterDescription
    Start AtStart time for data sampling from the SLS Logstore
    Sampled Data RecordsNumber of records to sample
  3. Click Start Collection to sample data from the source.

  4. Click Preview to synchronize the sampled data to the destination table and review the result.

Step 9: Start the synchronization task

  1. Click Complete at the bottom of the page to save the task configuration.

  2. On the Data Integration > Synchronization Task page, find the task and click Start in the Operation column.

  3. Click the task Name or ID in the Tasks section to view the detailed execution process.

Manage the synchronization task

View running status

After the task starts, go to the Synchronization Task page to view all tasks in the workspace and their basic information.

Synchronization task list
  • In the Actions column, click Start or Stop to control the task. Select More to Edit, View, or perform other operations.

  • In the Execution Overview column, view the running status of a started task. Click the overview area for execution details.

Execution stages

The SLS-to-OSS-HDFS synchronization task has two stages:

StageDescription
Schema MigrationShows whether the destination table is newly created or an existing table. For new tables, the DDL statement used to create the table is displayed.
Real-time Data SynchronizationShows real-time synchronization statistics, DDL records, and alert information.

Rerun the synchronization task

If you need to modify synchronized fields, destination table fields, or table names, click Rerun in the Operation column to synchronize the changes to the destination. Data in already-synchronized, unmodified tables is not re-synchronized.

  • Click Rerun directly (without changing the task configuration) to rerun the task with current settings.

  • Modify the task configuration and click Complete, then click Apply Updates in the Operation column to rerun the task with the updated settings.

Limitations

  • Cross-region metadatabase and metatable creation in Data Lake Formation (DLF) is not supported.

  • Multiple source fields cannot map to the same destination field.

  • The Preview Data Output feature requires data sampling to be completed in the SLS source form first.

  • Alert rules configured during task setup apply to the real-time synchronization subtask. Modify them after task creation on the Real-time Synchronization Task page.