All Products
Search
Document Center

Realtime Compute for Apache Flink:AUTO OPTIMIZE statement

Last Updated:Aug 15, 2023

The AUTO OPTIMIZE statement can be used to start a streaming optimization task to automatically optimize tables in external data lakes. This topic describes the background information, prerequisites, limits, precautions, basic syntax, and parameter configurations of the AUTO OPTIMIZE statement.

Background information

  • Description

    You can use the AUTO OPTIMIZE statement only to rewrite small files into a large file. This statement helps you reduce the metadata reading load and improve the query efficiency. After you execute the AUTO OPTIMIZE statement to start an optimization task, the task regularly checks the specified Iceberg table. If multiple small files exist in a table partition, the small files are rewritten into a large file. If a read performance bottleneck of your Iceberg table occurs due to excessive small files, we recommend that you execute the AUTO OPTIMIZE statement to start an optimization task. This improves the read performance of your Iceberg table.

  • Features

    Feature

    Description

    Optimize a single table

    Optimizes a specified table.

    Optimize a single database

    Optimizes all tables in a specified database.

  • Optimization process

    When you execute the AUTO OPTIMIZE statement, fully managed Flink performs the following operations:

    1. Checks whether the table or database that you want to optimize exists. If the table or database does not exist, an error is returned.

    2. Continuously monitors the specified table or the tables in the specified database. If a table meets the optimization conditions, fully managed Flink optimizes the table.

Prerequisites

The table or database that you want to optimize is registered in the console of fully managed Flink. For more information about the registration method, see Create an Apache Iceberg result table.

Limits

  • Only fully managed Flink whose engine version is vvr-4.0.12-flink-1.13 or later supports the AUTO OPTIMIZE statement.

  • You can use the AUTO OPTIMIZE statement only to rewrite small files into a large file.

  • Only databases that contain Iceberg tables can be optimized. If a database contains Iceberg tables and other types of tables, the database cannot be optimized.

    Note

    You can register the Iceberg tables that need to be optimized into a new database and then optimize the new database.

Usage notes

The optimization task that is started by using the AUTO OPTIMIZE statement occupies network bandwidth and some compute units (CUs) of fully managed Flink. When the optimization task is running, the small files that are rewritten into a large file are not deleted. If you want to delete the small files, you must use the snapshot expiration mechanism of Iceberg.

Syntax

  • Optimize a single table

    AUTO OPTIMIZE TABLE <target_table>
    <target_table>:
      [catalog_name.][db_name.]table_name
  • Optimize a single database

    AUTO OPTIMIZE DATABASE <target_database>
    <target_database>:
      [catalog_name.]db_name

Parameter configuration

  • Configuration method

    You can run the following command to configure the parameters related to AUTO OPTIMIZE:

    ALTER TABLE <target_table> SET ('auto-optimize.rewrite.enable' = 'false');
    <target_table>:
      [catalog_name.][db_name.]table_name
    Note

    You cannot directly use the ALTER command to modify the parameters of an Iceberg table in a memory-based catalog in the console of fully managed Flink. However, you can modify the parameters of tables in other types of catalogs.

  • Parameters

    Category

    Parameter

    Description

    Unit and default value

    Table monitoring parameter

    auto-optimize.table-monitor.interval

    Specifies the interval at which the system monitors the specified table to determine whether to optimize the table.

    You can configure this parameter based on your expectations. Recommended configuration:

    • If you want to optimize the table at the earliest opportunity, you can set this value to a small value.

    • If you want to reduce the frequency of optimizing the table, you can set this value to a large value.

    • Unit: milliseconds.

    • Default value: 600000. This value is equivalent to 10 minutes.

    Table optimization parameter

    auto-optimize.rewrite.enable

    Specifies whether to enable file rewriting. Valid values:

    • true: File rewriting is enabled. This is the default value.

    • false: File rewriting is disabled.

      If you want to optimize a single database but do not want to optimize some tables in the database, you can set this parameter to false for the tables that you do not want to optimize.

    N/A.

    auto-optimize.rewrite.target-file-size-bytes

    The size of the file into which small files are written.

    • Unit: bytes.

    • Default value: 536870912. This value is equivalent to 512 MB.