The AUTO OPTIMIZE statement can be used to start a streaming optimization task to automatically optimize tables in external data lakes. This topic describes the background information, prerequisites, limits, precautions, basic syntax, and parameter configurations of the AUTO OPTIMIZE statement.
Background information
Description
You can use the AUTO OPTIMIZE statement only to rewrite small files into a large file. This statement helps you reduce the metadata reading load and improve the query efficiency. After you execute the AUTO OPTIMIZE statement to start an optimization task, the task regularly checks the specified Iceberg table. If multiple small files exist in a table partition, the small files are rewritten into a large file. If a read performance bottleneck of your Iceberg table occurs due to excessive small files, we recommend that you execute the AUTO OPTIMIZE statement to start an optimization task. This improves the read performance of your Iceberg table.
Features
Feature
Description
Optimize a single table
Optimizes a specified table.
Optimize a single database
Optimizes all tables in a specified database.
Optimization process
When you execute the AUTO OPTIMIZE statement, fully managed Flink performs the following operations:
Checks whether the table or database that you want to optimize exists. If the table or database does not exist, an error is returned.
Continuously monitors the specified table or the tables in the specified database. If a table meets the optimization conditions, fully managed Flink optimizes the table.
Prerequisites
The table or database that you want to optimize is registered in the console of fully managed Flink. For more information about the registration method, see Create an Apache Iceberg result table.
Limits
Only fully managed Flink whose engine version is vvr-4.0.12-flink-1.13 or later supports the AUTO OPTIMIZE statement.
You can use the AUTO OPTIMIZE statement only to rewrite small files into a large file.
Only databases that contain Iceberg tables can be optimized. If a database contains Iceberg tables and other types of tables, the database cannot be optimized.
NoteYou can register the Iceberg tables that need to be optimized into a new database and then optimize the new database.
Usage notes
The optimization task that is started by using the AUTO OPTIMIZE statement occupies network bandwidth and some compute units (CUs) of fully managed Flink. When the optimization task is running, the small files that are rewritten into a large file are not deleted. If you want to delete the small files, you must use the snapshot expiration mechanism of Iceberg.
Syntax
Optimize a single table
AUTO OPTIMIZE TABLE <target_table> <target_table>: [catalog_name.][db_name.]table_name
Optimize a single database
AUTO OPTIMIZE DATABASE <target_database> <target_database>: [catalog_name.]db_name
Parameter configuration
Configuration method
You can run the following command to configure the parameters related to AUTO OPTIMIZE:
ALTER TABLE <target_table> SET ('auto-optimize.rewrite.enable' = 'false'); <target_table>: [catalog_name.][db_name.]table_name
NoteYou cannot directly use the ALTER command to modify the parameters of an Iceberg table in a memory-based catalog in the console of fully managed Flink. However, you can modify the parameters of tables in other types of catalogs.
Parameters
Category
Parameter
Description
Unit and default value
Table monitoring parameter
auto-optimize.table-monitor.interval
Specifies the interval at which the system monitors the specified table to determine whether to optimize the table.
You can configure this parameter based on your expectations. Recommended configuration:
If you want to optimize the table at the earliest opportunity, you can set this value to a small value.
If you want to reduce the frequency of optimizing the table, you can set this value to a large value.
Unit: milliseconds.
Default value: 600000. This value is equivalent to 10 minutes.
Table optimization parameter
auto-optimize.rewrite.enable
Specifies whether to enable file rewriting. Valid values:
true: File rewriting is enabled. This is the default value.
false: File rewriting is disabled.
If you want to optimize a single database but do not want to optimize some tables in the database, you can set this parameter to false for the tables that you do not want to optimize.
N/A.
auto-optimize.rewrite.target-file-size-bytes
The size of the file into which small files are written.
Unit: bytes.
Default value: 536870912. This value is equivalent to 512 MB.