All Products
Search
Document Center

Full data synchronization and incremental data synchronization from ApsaraDB RDS to ApsaraDB for HBase

Last Updated: May 19, 2022

This topic describes how to synchronize full data or incremental data from an ApsaraDB RDS instance to an ApsaraDB for HBase instance.

Scenarios

  • Synchronize historical data from ApsaraDB RDS.

  • Synchronize full data from ApsaraDB RDS.

Prerequisites

  • You have logged on to the Lindorm Tunnel Service (LTS) console. For more information, see Create a synchronization task.

  • LTS, the destination ApsaraDB for HBase cluster, and the source ApsaraDB RDS instance are connected or deployed in the same virtual private cloud (VPC).

Features

  • Full data and incremental data synchronization from ApsaraDB RDS to ApsaraDB for HBase.

  • Data transformation. For more information, see Create a task.

  • Multi-table synchronization.

Limits

  • Full data synchronization from ApsaraDB RDS supports MySQL data sources.

  • Incremental data synchronization from ApsaraDB RDS supports Data Transmission Service (DTS) data sources.

  • The following destination data sources are supported: ApsaraDB for HBase Basic Edition and ApsaraDB for HBase Performance-enhanced Edition.

Create a task

  1. In the LTS console, choose Data Import > RDS Migration.

  2. Click create new job.

  3. Select the ApsaraDB RDS data source, the DTS data source, and the destination data source.

    Note
  4. In the configuration section, click edit. Then, you can view information about the default configuration. You can also modify the configuration. For more information, see Jtwig syntax.

    Configuration description for using the ApsaraDB for HBase API

    {
        "reader": {
            "querySql": [
                "select * from dts.cluster where id < 1000",// The query statement that is executed to synchronize full data. One statement is associated with one read thread.
          "select * from dts.cluster where id >= 1000"// Split queries to increase the query speed and reduce the retry cost.
            ]
        },
        "writer": {
            "columns": [
                {
                    "name": "f:id",// The name of the column in the destination table.
                    "value": "id", // The name of the column in the source table.
            "isPk": false // This parameter does not affect the synchronization process. You can ignore it.
                },
                {
                    "name": "f:cluster_id",
                    "value": "cluster_id",
                    "isPk": false
                },
          {
            "name": "f:id_and_cluster",
                    "value": "{{concat(id, cluster_id)}}",// Jtwig domain names can be used to transform data.
           }
            ],
            "rowkey": {
                // The columns in the ApsaraDB RDS database table constitute the rowkey in the ApsaraDB for HBase model. The Jtwig syntax is supported. The fields used in the rowkey can be configured only in the columns.
                "value": "id" 
            },
            "config": {
                "skipDelete": true// Skip the delete operation.
            },
            "table": {
                "name": "dts:cluster",// The name of the table in the Lindorm or ApsaraDB for HBase cluster.
                "parameter": {
                    "compression": "ZSTD",// In the Lindorm or ApsaraDB for HBase cluster, we recommend that you use Zstandard (zstd) as the compression algorithm for the new table.
            "split":["1", "5", "9", "b"] // Specifies the split key to pre-partition the new table.
                }
            },
            "sourceTable": "dts.cluster"
        }
    }
  5. Select the table that you want to synchronize and click generate configuration.

    Note
    • When LTS migrates the specified full and incremental data from an ApsaraDB RDS instance to an ApsaraDB for HBase instance, it migrates full historical data and then incremental data.

    • The system automatically generates ApsaraDB for HBase column families after the data is migrated to the destination ApsaraDB for HBase cluster. Each column of the specified ApsaraDB RDS database table is associated with one column in f. The ApsaraDB for HBase rowkey is a concatenated string of the primary keys of the specified ApsaraDB RDS database table.

    • If the default configurations are used, data is not deleted from the ApsaraDB RDS database after the synchronization task is completed. If you want to delete data, modify the configurations. For more information, see the "Configuration description" section.

  6. Click create.