This topic describes the auto-merge feature provided by AnalyticDB for PostgreSQL.

Overview

Auto-merge is a process that runs in the backend of AnalyticDB for PostgreSQL to automatically merge and sort data. This feature checks the status of data in a table on a regular basis. It sorts new unordered data and merges the data with the existing ordered data.

By default, the auto-merge feature is enabled for all tables. You can manually disable this feature for tables that do not require automatic data merging and sorting. For more information, see Enable or disable auto-merge on individual tables.

To disable this feature for all your tables, Submit a ticket.

Note Auto-merge improves query performance, but consumes resources, especially I/O resources, in the merging and sorting process.

Precautions

  • Auto-merge supports only append-optimized (AO) tables that are configured with sort keys. You can execute the following statement to check whether a table is an AO table:
    SELECT reloptions FROM pg_class WHERE relname = 'table_name';

    The following results may be returned. appendonly=true indicates that the queried table is an AO table.

                                  reloptions
    ----------------------------------------------------------------------
     {appendonly=true,orientation=column,compresstype=lz4,compresslevel=9}
    (1 row)
  • The auto-vacuum feature must be enabled, because auto-merge is dependent on auto-vacuum. For more information about auto-vacuum, see Configure scheduled maintenance tasks to clear junk data.
  • The minor version of the AnalyticDB for PostgreSQL instance must be 6.3.5.0 or later. For information about how to view and update the minor version of an instance, see Query the minor engine version and Upgrade the engine version.

Define sort keys

Before you use the auto-merge feature for a table, you must define sort keys for the table by using DDL statements.

  • The following example shows how to define sort keys when you create a table:
    CREATE TABLE table_name(
    col_name type,
    ...
    )
    WITH(appendonly = true, orientation = row/column)
    DISTRIBUTED BY(distributed_keys)
    ORDER BY(sort_keys)
    ;
  • The following example shows how to add or modify sort keys for an existing table:
    ALTER TABLE table_name SET ORDER BY(sort_keys);

Enable or disable auto-merge on individual tables

AnalyticDB for PostgreSQL allows you to enable or disable auto-merge on individual tables.

  • Enable auto-merge on a table.
    ALTER TABLE table_name SET (automerge = on);
  • Disable auto-merge on a table.
    ALTER TABLE table_name SET (automerge = off);
  • The following example shows how to check whether auto-merge is enabled on a table:
    SELECT relname, reloptions FROM pg_class WHERE relname = 'table_name';

    Sample query result:

      relname   |                     reloptions
    ------------+----------------------------------------------------
     table_name | {appendonly=true,orientation=column,automerge=on}

    automerge=on indicates that auto-merge is enabled. If the status of this feature on the specified table is the same as that on all tables of the instance, the automerge parameter is not returned. In this case, you can execute the following statement to query the value of this parameter:

    SHOW automerge;

Test the effect of auto-merge

In the following example, queries are performed to demonstrate the performance improvement brought by the auto-merge feature.

  1. Check whether the auto-vacuum and auto-merge features are enabled.
    SHOW autovacuum;
    SHOW automerge;

    If both features are enabled, on is returned.

  2. Disable the Laser computing engine.
    SET laser.enable = off;
    Note This operation is performed only to demonstrate the effect of the auto-merge feature. We recommend that you enable the Laser computing engine in real-case scenarios. For more information about the Laser computing engine, see Use the Laser computing engine.
  3. Create a table.
    CREATE TABLE test_automerge(a bigint, b bigint)
    WITH(appendonly = true, orientation = column, compresstype = lz4, compresslevel = 9)
    DISTRIBUTED BY(a)
    ORDER BY(b);
  4. Specify the number of added rows that triggers the auto-merge feature and import random data. To compare the performance before and after auto-merge is triggered, make sure that the number of rows imported in this step is lower than the threshold.
    ALTER TABLE test_automerge set (automerge_unsorted_row_num_threshold = 10000000);
    
    INSERT INTO test_automerge SELECT i, round(random() * 100) FROM generate_series(1, 9000000) as i;
  5. Perform a query.
    SELECT count(*) FROM test_automerge WHERE b = 5;

    Sample query result:

     count
    -------
     89713
    (1 row)
    
    Time: 204.918 ms
  6. Import more data to trigger auto-merge.
    INSERT INTO test_automerge SELECT i, round(random() * 100) FROM generate_series(1, 1000000) as i;
  7. After the data is merged and sorted, perform a query again.
    SELECT count(*) FROM test_automerge WHERE b = 5;

    Sample query result:

     count
    -------
     99683
    (1 row)
    
    Time: 15.289 ms
    Note You can check the progress of auto-merge in the pg_stat_activity view.

The preceding example shows that auto-merge can significantly reduce the amount of time required for queries. In addition, auto-merge significantly improves performance in scenarios where the AGGREGATE, JOIN, or ORDER BY clauses are used. To maximize the effect of the auto-merge feature, we recommend that you select a sort key for a table based on the following rules:

  • Specify one or more fields that filter out a large amount of data in queries as the sort key.
  • If the AGGREGATE, JOIN, or ORDER BY clauses are frequently used on a table, select the group key, join key, or order key as the sort key. The distribution key and the sort key must be the same.

References

Configure sorting acceleration