All Products
Search
Document Center

AnalyticDB:Data bloat diagnostics

Last Updated:Mar 07, 2024

AnalyticDB for PostgreSQL provides the intelligent data bloat diagnostics feature. This feature automatically performs diagnostics on all database tables on a regular basis and generates a diagnostic information table. You can use the diagnostic information table to check the data bloat status of tables and obtain optimization suggestions, such as VACUUM and VACUUM FULL statements.

Usage notes

  • The intelligent data bloat diagnostics feature is supported only for AnalyticDB for PostgreSQL instances in elastic storage mode that meet the following minor version requirements:

    • AnalyticDB for PostgreSQL V6.0: V6.3.10.0 or later.

    • AnalyticDB for PostgreSQL V7.0: V7.0.4.0 or later.

    For information about how to view and update the minor version, see View the minor engine version and Update the minor engine version.

  • The intelligent data bloat diagnostics feature is implemented for every database, except for the five system databases: postgres, template0, template1, adbpgadmin, and aurora. To ensure that your data is diagnosed for data bloat, we recommend that you do not store your data in the five system databases.

  • The intelligent data bloat diagnostics feature scans all database tables except for temporary tables and unlogged tables. By default, tables that are smaller than 1 GB in size are not scanned to balance speed and efficacy. For information about how to modify the data volume threshold, see the "Specify the data volume threshold" section of this topic.

  • The intelligent data bloat diagnostics feature automatically performs diagnostics on every user database at the beginning of each hour. For example, after the intelligent data bloat diagnostics feature performs diagnostics at 17:00:00, the feature performs the next diagnostics at 18:00:00.

Causes of data bloat

Data bloat is a measure of the unused space and dead tuple space in a table. AnalyticDB for PostgreSQL uses Multiversion concurrency control (MVCC) to manage concurrent transactions. Data of underlying tables is divided into 32-KB pages. Each page contains a header, an item pointer array, unused space, and tuples.

  • When you execute the INSERT statement, new tuples are assigned from the unused space of pages. If the unused space of the current page is insufficient, a new page is assigned to store the written data.1.png

  • When you execute the DELETE statement, data is not deleted from pages. The data that you want to delete is marked as dead tuples. Dead tuples still occupy space and may cause data bloat.2.png

  • When you execute the UPDATE statement, the MVCC policy does not directly update data. Original tuples are marked as dead tuples and new tuples are inserted to update data. In this case, UPDATE operations may also cause data bloat.3.png

Check data bloat

The diagnostic information generated by the intelligent data bloat diagnostics feature is stored in the adbpg_toolkit.diag_bloat_tables table.

Important
  • The diagnostic information in the diag_bloat_tables table is sorted by using the ORDER BY bloat_coeff desc, real_size DESC clause. The table that has a larger bloat coefficient is placed in an earlier position. If two tables have the same bloat coefficient, the table that has a larger amount of data is placed in an earlier position.

  • The diagnostic information table is generated based on the statistics that are collected by the statistic collector process of PostgreSQL. In the unlikely event that the PostgreSQL server fails to respond, the statistic collector process resets the statistics. If the diagnostic information is unreasonable, you can execute the ANALYZE statement to collect statistics. For more information, see Use the ANALYZE statement to collect statistics on AnalyticDB for PostgreSQL.

You can use one of the following methods to view diagnostic information:

  • View diagnostic information in the AnalyticDB for PostgreSQL console. For more information, see Data bloat, data skew, and index statistics.

  • Execute the following SQL statement to query diagnostic information:

    SELECT * FROM adbpg_toolkit.diag_bloat_tables;

    The following table describes the fields that are contained in the diag_bloat_tables table.

    Field

    Data type

    Description

    schema_name

    name (63-byte type for storing system identifiers)

    The name of the schema to which the table belongs.

    table_name

    name

    The name of the table.

    storage_type

    text

    The storage type of the table. Example: heap table or append-optimized (AO) table.

    expect_size

    bigint

    The expected size of the table. Unit: bytes.

    real_size

    bigint

    The actual size of the table. Unit: bytes.

    bloat_size

    bigint

    The bloat size of the table. Unit: bytes.

    bloat_coeff

    bigint

    The bloat coefficient of the table. Unit: %. Valid values: 0 to 100.

    suggest_action

    text

    The suggested action on the table. Valid values:

    • Empty (no action)

    • VACUUM

    • VACUUM FULL

    last_vacuum

    timestamp with time zone

    The time when the table was last vacuumed. VACUUM FULL operations are not included.

    diagnose_time

    timestamp with time zone

    The time when the diagnostic information was generated.

You can specify filter conditions to check a schema or table for data bloat.

  • Check all tables within a specific schema for data bloat.

    SELECT * FROM adbpg_toolkit.diag_bloat_tables WHERE schema_name = '<Schema name>';
  • Check a specific table for data bloat.

    SELECT * FROM adbpg_toolkit.diag_bloat_tables WHERE table_name = '<Table name>';

Manually initiate data bloat diagnostics

By default, intelligent data bloat diagnostics is initiated at the beginning of each hour. If you want to immediately check the effect of the VACUUM or VACUUM FULL statement, you can manually initiate data bloat diagnostics. Execute the following statement to manually initiate data bloat diagnostics:

SELECT adbpg_toolkit.diagnose_bloat_tables();

Specify the data volume threshold

By default, the intelligent data bloat diagnostics feature filters out tables that are smaller than 1 GB in size. If you want to change the data volume threshold, you can execute the following statement:

ALTER DATABASE <Database name> SET adb_diagnose_table_threshold_size to <Table size in bytes>;

For example, to perform diagnostics on tables that are larger than 500 MB in size, execute the following statement:

ALTER DATABASE diagnose SET adb_diagnose_table_threshold_size to 536870912;

Eliminate data bloat

When INSERT, DELETE, and UPDATE operations are performed on a table, a large number of dead tuples are generated. In this case, the unused space is reduced and more pages are assigned to store data. When pages are scanned, a large number of dead tuples are also scanned. This increases I/O time overheads. Data bloat causes the following side-effects:

  • A large table storage space is assigned, which results in a waste of storage space.

  • Scanning I/O overheads increase, which decreases query performance.

You can execute the VACUUM or VACUUM FULL statement to eliminate data bloat.

  • Execute the VACUUM statement to delete dead tuples to increase the unused space available for new tuples.4.png

    This method has the following advantages and disadvantages:

    • Advantages: Compared with the VACUUM FULL statement, the VACUUM statement is less limited by table locks. The VACUUM FULL statement uses the ACCESS EXCLUSIVE lock mode, which blocks all other types of operations on tables. For example, when you execute the VACUUM FULL statement on a table, the SELECT statement is blocked. You cannot use the table before the VACUUM FULL statement is complete.

    • Disadvantages: The VACUUM statement cannot organize the storage space of a table across pages. The table size on disks cannot be reduced.

  • Execute the VACUUM FULL statement to delete dead tuples to increase the unused space available for new tuples, and organize the storage space of a table across pages.5.png

    This method has the following advantages and disadvantages:

    • Advantages: Table data is organized to become more compact. The table size on disks can be reduced.

    • Disadvantages: The VACUUM FULL statement uses the ACCESS EXCLUSIVE lock mode, which blocks all other types of operations on tables. When the VACUUM FULL statement organizes pages, an additional storage space is used to store the organized data.