All Products
Search
Document Center

Hologres:Ingest sharded MySQL data to Hologres

Last Updated:Mar 10, 2026

Sharded MySQL databases split data across multiple databases and tables for write scalability, but querying scattered shards is slow and complex. Consolidate all shards into a single Hologres table to enable fast, unified analytics. Two methods are available: DataWorks Data Integration and Flink.

Synchronization process

The synchronization follows two phases:

  1. Full load: All existing data from the sharded MySQL tables is read and written to a single Hologres table.

  2. Real-time synchronization: After the full load completes, incremental data is synchronized to Hologres in real time.

MySQL shard 1 (DB: hmtest1) ──┐
  ├─ table: product_20220420  │
  └─ table: product_20220421  ├──▶ DataWorks Data Integration ──▶ Hologres table (product1)
MySQL shard 2 (DB: hmtest2) ──┘      or Flink
  └─ table: product_20220422

To track the origin of each row after consolidation, add db_name and table_name columns to the Hologres table. These columns record which source database and table each row came from.

Prerequisites

Before you begin, make sure that you have:

Important

All Alibaba Cloud services must be in the same region.

Synchronize sharded MySQL data with DataWorks

This method consolidates three sharded MySQL tables into a single Hologres table using the DataWorks Sync Task feature.

Prepare MySQL data

This example uses two databases with three tables:

Database name

Table name

hmtest1

product_20220420

hmtest1

product_20220421

hmtest2

product_20220422

All three tables share the same schema. Some data is duplicated across the tables. The Data Definition Language (DDL) is:

CREATE TABLE product_20220420 (
 value_id int8,
 attribute_id int8 not null,
 id_card int8,
 name text,
 potion text,
    ds text,
PRIMARY KEY (`value_id`)
);

Create a synchronization task

  1. In the DataWorks Data Integration navigation pane, click Sync Task to create a synchronization task. Select MySQL as the data source and Hologres as the data destination. For more information, see Select a synchronization solution.

    image

  2. Click Create Sync Task.

Configure the network connection

  1. Enter a New Task Name.

  2. Set Sync Type to Real-time synchronization for the entire database.

  3. Select the resource group, the source data source name, and the destination data source name. Click Test All Connections. If this is your first synchronization task, create a new resource group and data source first.

    image

  4. After the status for both the data source and data destination shows Connected, click OK, and then click Next.

    image

Configure the synchronization source and rules

  1. Configure the basic settings, including the configuration name, source, time zone, and source table.

    Basic configuration for synchronization source

  2. Set table name mapping rules. Choose one of the following approaches:

    • Use existing Hologres tables: If mapping tables already exist in the Hologres database, select the corresponding table name in the Destination Table Name column and the corresponding schema in the Target Schema Name column. Select existing destination table

    • Create new mapping tables: If no mapping tables exist, define new ones through Customize Target Table Name Mapping. The system automatically creates the tables and completes the mapping.

      1. Click Edit to the right of Customize Target Table Name Mapping.

      2. In the Design Table Name Mapping Rules dialog box, click Add.

      3. In the Edit Rule dialog box, enter a Rule Name, select the Source Table Name, click Actions, enter the source table name in the Source field, and enter the new table name in the Target field. Edit table name mapping rule

      4. Click OK to apply the rule. In the row for the source table, select the rule and refresh the mapping. In the Target Schema Name column, select the corresponding schema. Apply mapping rule and select schema

Configure the target table

  1. Click Refresh Source and Hologres Table Mappings.

    Note

    The mapping shows the relationship between each source table and its target table. If multiple source tables map to the same target table, they are consolidated into that single table.

  2. Add extra fields to track the source of each row: This example uses DB_NAME_SRC to represent the source database name and TABLE_NAME_SRC to represent the source table name.

    1. Select all tasks, click Batch Modify, and select Target Table Schema - Batch Modify and Add Fields.

    2. In the Target Table Schema - Batch Modify and Add Fields dialog box, click Add Field to add two fields: db_name and table_name.

    3. Click Apply and Refresh Mapping.

    Add extra fields for source tracking

  3. (Optional) Set the extra fields as the primary key (PK) If you have a large amount of upstream data and many tables, set the extra fields as part of a composite primary key with the source table's primary key. This prevents primary key conflicts from different source tables. Also, set the extra fields as the distribution key. This ensures that related data is written to the same shard, which improves performance.

    1. Click image to the right of Destination Table Name to open the SQL Table Creation Statement Preview dialog box. SQL Table Creation Statement Preview

    2. Modify the table creation statement to add table_name as part of the primary key and distribution key.

      Note
      • We recommend adding the source table name `tablename` to the composite primary key. You can add more fields as needed.

      • You can also set more indexes for the table to improve performance. For more information, see CREATE TABLE.

      Modified table creation statement

      Example Hologres DDL:

      BEGIN;
      CREATE TABLE IF NOT EXISTS hologres1.product1 (
         value_id     BIGINT NOT NULL,
         attribute_id BIGINT NOT NULL,
         id_card      BIGINT,
         "name"       TEXT,
         potion       TEXT,
         ds           TEXT,
         table_name   TEXT NOT NULL,
         db_name      TEXT,
         PRIMARY KEY(value_id,table_name)
      );
      CALL SET_TABLE_PROPERTY('hologres1.product1', 'distribution_key', '"table_name","value_id"');
      CALL SET_TABLE_PROPERTY('hologres1.product1', 'time_to_live_in_seconds', '3153600000');
      CALL SET_TABLE_PROPERTY('hologres1.product1', 'orientation', 'column');
      CALL SET_TABLE_PROPERTY('hologres1.product1', 'binlog.level', 'none');
      CALL SET_TABLE_PROPERTY('hologres1.product1', 'bitmap_columns', '"name","potion","ds"');
      CALL SET_TABLE_PROPERTY('hologres1.product1', 'dictionary_encoding_columns', '"name":auto,"potion":auto,"ds":auto,"table_name":auto,"db_name":auto');
      COMMIT;

    3. Click Apply and Refresh Mapping. Apply and refresh mapping after PK modification

Set the DML policy

Configure the DML policy for the synchronization task. Apply the policy to a single table or to multiple tables in a batch.

  1. Select all tasks, click Batch Modify, and then select DML Rule Configuration.

  2. In the DML Rule Configuration dialog box, set Processing Policy to Normal.

    DML Rule Configuration

  3. Click OK.

Configure DDL settings

  1. Configure the DDL settings for the synchronization task as needed.

    DDL message processing policy

  2. Click OK.

Configure advanced parameters

  1. Configure advanced parameters as needed, including Read-side configuration, Write-side configuration, and Runtime configuration.

    Advanced parameter settings

  2. Click OK.

Complete and run the task

  1. Click Complete Configuration to finish the synchronization task setup.

  2. In the Actions column for the task, click Start. To view the running details, choose More > View.

    Start the synchronization task

Verify the results

The synchronization task first runs a full data synchronization, then switches to real-time synchronization. After the full load completes, query data in Hologres.

Query results showing consolidated data with source tracking columns

The db_name and table_name columns show which source database and table each row came from. This confirms that data from all shards has been consolidated into a single Hologres table.

After the initial full load, incremental data written to the source MySQL tables is synchronized to Hologres in real time.

Synchronize sharded MySQL data with Flink

For detailed instructions on synchronizing sharded MySQL data to Hologres using Flink, see Quick Start for real-time data warehousing.

Related topics