The AUTO OPTIMIZE statement can be used to start a streaming optimization task to automatically optimize tables in external data lakes. This topic describes the background information, prerequisites, limits, precautions, basic syntax, and parameter configurations of the AUTO OPTIMIZE statement.

Background information

  • Description

    You can use the AUTO OPTIMIZE statement only to rewrite small files into a large file. This statement helps you reduce the metadata reading load and improve the query efficiency. After you execute the AUTO OPTIMIZE statement to start an optimization task, the task regularly checks the specified Iceberg table. If multiple small files exist in a table partition, the small files are rewritten into a large file. If a read performance bottleneck of your Iceberg table occurs due to excessive small files, we recommend that you execute the AUTO OPTIMIZE statement to start an optimization task. This improves the read performance of your Iceberg table.

  • Features
    Feature Description
    Optimize a single table Optimizes a specified table.
    Optimize a single database Optimizes all tables in a specified database.
  • Optimization process
    When you execute the AUTO OPTIMIZE statement, fully managed Flink performs the following operations:
    1. Checks whether the table or database that you want to optimize exists. If the table or database does not exist, an error is returned.
    2. Continuously monitors the specified table or the tables in the specified database. If a table meets the optimization conditions, fully managed Flink optimizes the table.

Prerequisites

The table or database that you want to optimize is registered in the console of fully managed Flink. For more information about the registration method, see Create an Apache Iceberg result table.

Limits

  • Only fully managed Flink whose engine version is vvr-4.0.12-flink-1.13 or later supports the AUTO OPTIMIZE statement.
  • You can use the AUTO OPTIMIZE statement only to rewrite small files into a large file.
  • Only databases that contain Iceberg tables can be optimized. If a database contains Iceberg tables and other types of tables, the database cannot be optimized.
    Note You can register the Iceberg tables that need to be optimized into a new database and then optimize the new database.

Usage notes

The optimization task that is started by using the AUTO OPTIMIZE statement occupies network bandwidth and some compute units (CUs) of fully managed Flink. When the optimization task is running, the small files that are rewritten into a large file are not deleted. If you want to delete the small files, you must use the snapshot expiration mechanism of Iceberg.

Syntax

  • Optimize a single table
    AUTO OPTIMIZE TABLE <target_table>
    <target_table>:
      [catalog_name.][db_name.]table_name
  • Optimize a single database
    AUTO OPTIMIZE DATABASE <target_database>
    <target_database>:
      [catalog_name.]db_name

Parameter configuration

  • Configuration method
    You can run the following command to configure the parameters related to AUTO OPTIMIZE:
    ALTER TABLE <target_table> SET ('auto-optimize.rewrite.enable' = 'false');
    <target_table>:
      [catalog_name.][db_name.]table_name
    Note You cannot directly use the ALTER command to modify the parameters of an Iceberg table in a memory-based catalog in the console of fully managed Flink. However, you can modify the parameters of tables in other types of catalogs.
  • Parameters
    Category Parameter Description Unit and default value
    Table monitoring parameter auto-optimize.table-monitor.interval Specifies the interval at which the system monitors the specified table to determine whether to optimize the table.
    You can configure this parameter based on your expectations. Recommended configuration:
    • If you want to optimize the table at earliest opportunity, you can set this value to a small value.
    • If you want to reduce the frequency of optimizing the table, you can set this value to a large value.
    • Unit: milliseconds.
    • Default value: 600000. This value is equivalent to 10 minutes.
    Table optimization parameter auto-optimize.rewrite.enable Specifies whether to enable file rewriting. Valid values:
    • true: File rewriting is enabled. This is the default value.
    • false: File rewriting is disabled.

      If you want to optimize a single database but do not want to optimize some tables in the database, you can set this parameter to false for the tables that you do not want to optimize.

    N/A.
    auto-optimize.rewrite.target-file-size-bytes The size of the file into which small files are written.
    • Unit: bytes.
    • Default value: 536870912. This value is equivalent to 512 MB.