All Products
Search
Document Center

Archive incremental data to MaxCompute

Last Updated: Feb 22, 2021

This topic describes how to archive incremental data of HBase clusters to MaxCompute.

Prerequisites

  1. Lindorm Tunnel Service (LTS) is enabled.

  2. An HBase data source is added.

  3. A MaxCompute data source is added.

Supported versions

  • User-created HBase V1.x and V2.x

  • EMR HBase

  • ApsaraDB for HBase Standard Edition and Performance-enhanced Edition that run in cluster mode and Lindorm

Limits

  • Real-time data is archived based on HBase logs. Therefore, data in BulkLoad cannot be exported.

Submit an archiving task

  1. Log on to the LTS and select Data Export > Incremental Archive to MaxCompute on the left-side navigation pane.create job

  2. Click create new job. On the page that appears, select a source HBase cluster and a destination MaxCompute resource package, and specify the HBase tables to export. 步骤2-usThe preceding figure provides an example on how to archive data from the wal-test HBase table to MaxCompute in real time.

    • The columns to be archived contain cf1:a, cf1:b, cf1:c, and cf1:d.

    • The mergeInterval parameter specifies the archiving interval in milliseconds. The default value is 86400000.

    • Specify the mergeStartAt parameter in the format of yyyyMMddHHmmss. It specifies the start time, which is 00:00, September 30, 2019 in this example. You can specify a past time value.

  3. Log on to the LTS view the archiving progress of the tables. Real-time Synchronization Channel shows the latency and offset of the synchronized logs. Table Merge shows merging tasks of tables. After the table merging is complete, you can query the tables on the latest partitions in MaxCompute.步骤3-us

  4. Query data in the MaxCompute tables.步骤4

Parameter description

The following code provides an example on the format of exported tables:

hbaseTable/odpsTable {"cols": ["cf1:a|string", "cf1:b|int", "cf1:c|long", "cf1:d|short","cf1:e|decimal", "cf1:f|double","cf1:g|float","cf1:h|boolean","cf1:i"], "mergeInterval": 86400000, "mergeStartAt": "20191008100547"}
hbaseTable/odpsTable {"cols": ["cf1:a", "cf1:b", "cf1:c"],  "mergeStartAt": "20191008000000"}
hbaseTable {"mergeEnabled": false} // Specifies not to perform merging operations on the tables.

The expression for an exported table consists of three parts: {{hbaseTable}}, {{odpsTable}}, and {{tbConf}}. The hbaseTable part specifies the ApsaraDB for HBase table. The odpsTable part is optional and specifies the name of the MaxCompute table to be exported to. By default, the MaxCompute table has the same name as the ApsaraDB for HBase table. The MaxCompute table name cannot contain periods (.) and hyphens (-), which are converted to underscores (_). The tbConf part specifies the archiving actions. The following table lists the supported parameters in the tbConf part.

Parameter

Description

Example

cols

Specifies the columns to be exported and the data types of the columns in each table. By default, the values are converted to the HexString format.

"cols": ["cf1:a", "cf1:b", "cf1:c"]

mergeEnabled

Specifies whether to convert key-value (KV) tables to wide tables. Default value: true.

"mergeEnabled": false

mergeStartAt

The start time of table merging. You can specify a past time value in the yyyyMMddHHmmss format.

"mergeStartAt": "20191008000000"

mergeInterval

The interval of table merging. Unit: ms. The default value is one day, which indicates that data is archived on a daily basis.

"mergeInterval": 86400000