All Products
Search
Document Center

Lindorm:Archive incremental data to MaxCompute

Last Updated:Mar 28, 2026
Important

This feature is no longer available for Lindorm Tunnel Service (LTS) instances purchased after June 16, 2023. If your LTS instance was purchased before June 16, 2023, you can continue to use this feature.

LTS archives HBase incremental data to MaxCompute by reading HBase write-ahead logs (WAL). Archived data is merged from key-value (KV) format into partitioned wide tables that you can query directly in MaxCompute.

Supported versions

HBase sourceNotes
Self-managed HBase V1.x and HBase V2.x
E-MapReduce HBase
ApsaraDB for HBase Standard Edition
ApsaraDB for HBase Performance-enhanced EditionCluster mode only
Lindorm

Limits

  • Data archived through LTS is based on HBase logs. Data imported by using bulk loading cannot be exported.

Log data lifecycle

  • If log data is not consumed after you enable the archiving feature, LTS retains the data for 48 hours by default. After 48 hours, the subscription is automatically canceled and the retained data is automatically deleted.

  • Log data may fail to be consumed if your LTS cluster is released while a task is running, or if a synchronization task is suspended.

Prerequisites

Before you begin, ensure that you have:

  • LTS activated

  • An HBase data source added

  • A MaxCompute data source added

Archive incremental data

  1. Log on to the LTS web UI. In the left-side navigation pane, choose Data Export > Incremental Archive to MaxCompute.

    LTS Data Export navigation

  2. Click create new job. Select a source HBase cluster and a destination MaxCompute resource package, then specify the HBase tables to export.

    Create new job page

    The example archives data from the wal-test HBase table to MaxCompute in real time:

    • Columns archived: cf1:a, cf1:b, cf1:c, cf1:d

    • mergeInterval: set to 86400000 (the default), so data is archived once per day

    • mergeStartAt: set to 20190930000000, specifying September 30, 2019 at 00:00 as the start time. You can specify a past point in time.

  3. Monitor the archiving progress. The Real-time Synchronization Channel section shows the latency and start offset of the log synchronization task. The Table Merge section shows table merging tasks. After a merge completes, the new partitioned tables are queryable in MaxCompute.

  4. Query data in MaxCompute.

    MaxCompute query view

Parameters

Each line in the job configuration specifies one table export using the following format:

<hbaseTable>[/<odpsTable>] <tbConf>
  • <hbaseTable>: Source HBase table name.

  • <odpsTable>: Destination MaxCompute table name (optional). Defaults to the same name as the HBase table. Hyphens (-) in HBase table names are converted to underscores (_) in MaxCompute.

  • <tbConf>: JSON object containing the archiving configuration for the table.

Examples:

hbaseTable/odpsTable {"cols": ["cf1:a|string", "cf1:b|int", "cf1:c|long", "cf1:d|short", "cf1:e|decimal", "cf1:f|double", "cf1:g|float", "cf1:h|boolean", "cf1:i"], "mergeInterval": 86400000, "mergeStartAt": "20191008100547"}
hbaseTable/odpsTable {"cols": ["cf1:a", "cf1:b", "cf1:c"], "mergeStartAt": "20191008000000"}
hbaseTable {"mergeEnabled": false}

The tbConf object supports the following parameters:

ParameterRequiredDefaultDescriptionExample
colsNoHexString (per column)Columns to export and their data types. Specify each column as <columnFamily>:<qualifier> or <columnFamily>:<qualifier>|<type>. If no type is specified, the value is exported in HexString format."cols": ["cf1:a|string", "cf1:b|int"]
mergeEnabledNotrueWhether to convert KV tables into wide tables. Set to false to skip the merge step."mergeEnabled": false
mergeStartAtNoStart time for table merging, in yyyyMMddHHmmss format. Specify a past point in time to backfill historical data."mergeStartAt": "20191008000000"
mergeIntervalNo86400000Interval between table merge tasks, in milliseconds. The default value archives data once per day."mergeInterval": 86400000