All Products
Search
Document Center

Synchronize full and incremental data from ApsaraDB RDS

Last Updated: Mar 26, 2021

This topic describes how to synchronize full and incremental data from an ApsaraDB RDS database to ApsaraDB for HBase.

Synchronize full and incremental data from an ApsaraDB RDS database to ApsaraDB for HBase

  1. Scenarios:

    • An ApsaraDB RDS database is a low-cost database that stores historical data.

    • Full data is migrated from an ApsarDB RDS database to ApsaraDB for HBase.

  1. Features:

    • Synchronizes full and incremental data from an ApsaraDB RDS database to ApsaraDB for HBase in one task.

    • Transform data from an ApsaraDB RDS database. For more information, see the "Configuration description" section.

    • Synchronizes data in multiple tables between ApsaraDB RDS and ApsaraDB for HBase.

    • Filters and processes data based custom rules. This feature will be available soon.

    • Automatically identifies Data Definition Language (DDL) changes. This feature will be available soon.

    • Processes dirty data. This feature will be available soon.

3. Limits:

    • You can specify ApsaraDB RDS and Data Transmission Service (DTS) as data sources.

    • HBase data sources are supported.

    • Phoenix is not supported. Use an earlier version.

Before you begin

  1. Purchase a Lindorm Tunnel Service (LTS) cluster, set the username and the password for logging on to the LTS web UI, and then log on to the web UI.

  2. Connect your LTS and ApsaraDB for HBase (Lindorm) clusters to your ApsaraDB RDS instance. You can skip this step if they are in the same virtual private cloud (VPC).

Create a task

Choose Data Import > RDS Migration. On the page that appears, click Create New Job.rds_en1rds_en2

  1. Select the ApsaraDB RDS data source, the DTS data source, and the destination data source.

  2. Select the table to synchronize and click Generate Configuration.

  3. Click Create to create the task.

Notes

  1. When LTS migrates the specified full and incremental data from an ApsaraDB RDS database to HBase, it migrates full historical data and then incremental data.

  2. The system automatically generates HBase column families after the data is migrated to the destination HBase cluster. Each column of the specified ApsaraDB RDS database table is associated with one column in f. The HBase rowkey is a concatenated string of the primary keys of the specified ApsaraDB RDS database table.

  3. If the default configurations are used, data is not deleted from the ApsaraDB RDS database after the synchronization task is completed. If you want to delete data, modify the configurations. For more information, see the "Configuration description" section.

Add a data source

  1. RDS data sources

  2. DTS subscription channels

  3. HBase data sources

  4. ApsaraDB for HBase Performance-enhanced Edition

Configuration description

Click Edit. You can view the default configurations and modify them.

Configuration description for using the ApsaraDB for HBase API

{
    "reader": {
        "querySql": [
            "select * from dts.cluster where id < 1000" ,// The query statement that is used to synchronize full data. One statement is associated with one read thread.
      "select * from dts.cluster where id >= 1000" // Split queries to increase the query speed and reduce the retry cost.
        ]
    },
    "writer": {
        "columns": [
            {
                "name": "f:id",// The name of the column in the destination table.
                "value": "id" ,// The name of the column in the source table.
        "isPk": false // This parameter does not affect the synchronization process. You can ignore it.
            },
            {
                "name": "f:cluster_id",
                "value": "cluster_id",
                "isPk": false
            },
      {
        "name": "f:id_and_cluster",
                "value": "{{concat(id, cluster_id)}}",// Jtwig domain names can be used to transform data.
       }
        ],
        "rowkey": {
            "value": "id" // The columns in the ApsaraDB RDS database table that constitute the rowkey in the HBase model. The Jtwig syntax is supported.
        },
        "config": {
            "skipDelete": true // Skip the delete operation.
        },
        "table": {
            "name": "dts:cluster",// The name of the table in the ApsaraDB for HBase (Lindorm) cluster.
            "parameter": {
                "compression": "ZSTD",// In the ApsaraDB for HBase (Lindorm) cluster, We recommend that you use Zstandard (zstd) as the compression algorithm for the new table.
        "split":["1", "5", "9", "b"] // Specifies the split key to pre-partition the new table.
            }
        },
        "sourceTable": "dts.cluster"
    }
}

Jtwig syntax description