Release details of ApsaraDB for ClickHouse Enterprise Edition V24.12 - ApsaraDB for ClickHouse

This topic describes the features release in ApsaraDB for ClickHouse Enterprise Edition V24.12.

New features

Adds support for loading the primary index of all partitions (for a specified table or all tables) via the SYSTEM LOAD PRIMARY KEY command. This helps with benchmarking and prevents extra latency during query execution.
Adds the CHECK GRANT query to check whether the current user or role has specific permissions and whether the corresponding table or column exists in memory.
Adds SQL syntax descriptions for workloads and resource management.
Implementes Iceberg table read support with schema evolution (column order/name changes and simple type extensions)
Adds support to set an independent expiration time for each authentication method, and removes related settings from the user entity.
Adds support to push external user roles from the query initiator node to other nodes in the cluster. This applies to scenarios where only the initiator node can access external authentication services, such as LDAP.
Adds support to alter the data type from String to JSON. Upgraded the serialization for JSON and Dynamic types to V2. You can revert to V1 using the merge_tree_use_v1_object_and_dynamic_serialization setting.
Adds the toUnixTimestamp64Second function to convert a DateTime64 value to an Int64 value with Fixes-second precision. This function supports negative values for dates before January 1, 1970.
Adds the enforce_index_structure_match_on_partition_manipulation setting. This setting allows ATTACH operations when the projections and secondary indexes of the source table are a subset of the target table.
Adds support for the Spark text output format. This feature is disabled by default.
Adds the access_header authentication header type for S3 endpoints. This header has the lowest priority and can be overwritten by other configurations.
Implementes the layered settings feature.
Supports the staleness clause in ORDER BY WITH FILL.
Implementes simple CAST conversions from Map, Tuple, and Object to JSON using JSON string serialization and deserialization.
Adds camelCase aliases, such as anyLastRespectNulls, for the any and anyLast aggregate functions.
Adds the date_time_utc configuration parameter. This parameter allows JSON logs to use UTC in RFC 3339 or ISO 8601 format.
Adds the query_plan_join_swap_table setting. This setting specifies the inner table (build table) for a join. In auto mode, the table with fewer rows is automatically selected.
Optimizes memory usage for index granularity values when partition granularity is constant. Adds the new use_const_adaptive_granularity setting to ensure this memory optimization.
Add a global switch allowed_feature_tier to disable all experimental or beta features.
Adds Cluster table functions for Iceberg, Delta Lake, and Hudi.
Adds the syntax for modifying `SETTING` or `PROFILE` in `ALTER USER`, `ALTER ROLE`, and `ALTER PROFILE` statements.
Adds the arrayPrAUC function to calculate the area under the precision-recall curve.
Implementes a primary index cache for MergeTree tables, which is enabled using the use_primary_key_cache setting. This cache supports on-demand loading, similar to the mark cache, and prefetching, which is enabled using the prewarm_primary_key_cache setting.
Adds the indexOfAssumeSorted array function for optimized search in non-decreasing sorted arrays.
The groupConcat aggregate function supports a delimiter as its optional second argument.
Adds the http_response_headers setting to support custom HTTP response headers.
Adds the fromUnixTimestamp64Second function to convert an Int64 UNIX timestamp to a DateTime64 value.

Performance optimizations

Adds the short_circuit_function_evaluation_for_nulls setting. When the proportion of NULL values in a Nullable column exceeds a threshold, functions are executed only on non-NULL rows.
Optimizes memory usage for the --recursive delete operation on Object Storage Service disks.
Avoids copying input block columns for parallel processing when join_algorithm='parallel_hash'.
Enables JIT compilation for more expressions, such as abs, bitCount, comparison functions, and logical functions.
The default join_algorithm behavior now favors parallel_hash.
Optimizes the Replacing merge algorithm for disjoint partitions.
Improves the collection performance of system.query_metric_log by reducing the critical section.
Adds the optimize_extract_common_expressions setting. This setting supports extracting common expressions from WHERE or ON conditions to reduce the number of hash tables during joins.
Adds support for using indexes on LowCardinality(String) columns.
In parallel replica queries, worker nodes skip index analysis, which is now handled by the client node.
Optimizes the reading of single-column subcolumns in Compact Parts.
Optimizes sorting performance for LowCardinality(String) columns.
Optimizes the performance of argMin and argMax functions for simple data types.
Optimizes shared lock contention for the memory tracker.
Adds the use_async_executor_for_materialized_views setting. This setting enables asynchronous and multi-threaded execution for materialized views, which improves INSERT performance but increases memory consumption.
Increases the default threshold for pre-allocated memory for aggregations and joins to 10^12.
Optimizes deserialization performance for AggregateFunction states (in AggregateFunction data type and distributed queries), with minor improvements to RowBinary format parsing.

Breaking changes

The functions greatest and least now ignore NULL input values, whereas previously they returned NULL if any parameter was NULL. For example, SELECT greatest(1, 2, NULL) now returns 2. This behavior is consistent with PostgreSQL.
The Variant and Dynamic types are now disallowed by default in ORDER BY, GROUP BY, PARTITION BY, and PRIMARY KEY clauses because they can cause unexpected results.
Removes the generate_series and generateSeries system tables, which were added by mistake.
Fixed "file does not exist" errors in JSON subcolumn files that were caused by unescaped special characters.
The Kafka, NATS, and RabbitMQ table engines are now categorized under separate permission items in the SOURCES level. CREATE permissions for these engines must be granted to non-default database users.
Mutation queries, including subqueries, are now fully checked before execution. This prevents the accidental execution of invalid queries or mutations.
Renames the file system cache setting skip_download_if_exceeds_query_cache to filesystem_cache_skip_download_if_exceeds_per_query_cache_write_limit.
Disallows the use of Dynamic and Variant types in min and max functions to avoid confusion.
Removes support for Enum, UInt128, and UInt256 arguments in the deltaSumTimestamp function. It also Removes support for Int8, UInt8, Int16, and UInt16 for its second argument, timestamp.
Adds a validation feature for dictionary source queries when ClickHouse is used as a dictionary data source.

Improvements

Higher-order functions that contain constant arrays now return a constant value.
Optimizes ordered reads by generating virtual rows. This is especially useful in multi-partition scenarios.
Query plan step names and pipeline processor names now include a unique ID suffix. This makes it easier to associate them with performance analytics tools.
The write buffer is now explicitly canceled or terminated. If an exception occurs, the client is notified of the interruption through the HTTP protocol.
Removes the allow_experimental_join_condition setting. Non-equi-join conditions are now allowed by default.
Enables parallel_replicas_local_plan by default. This builds a complete local plan on the query initiator node to improve performance.
The http_handlers setting now supports setting a user and password for dynamic_query_handler and predefined_query_handler.
S3Queue storage now supports ALTER TABLE MODIFY/RESET SETTING to modify specific settings.
The Object Storage Service API is no longer called when listing folders. Instead, a list of file names is stored in memory. This trades initial load time for memory usage.
Adds the prewarm_mark_cache setting to support prefetching the mark cache when inserting, merging, or fetching partitions.
The native Parquet reader now supports the Boolean type.
Adds more S3 error types for retries, such as "Malformed message".
Lowers the log level for some S3-related logs.
Support for writing to HDFS files with paths that contain spaces.
Fixes RIGHT and FULL joins in parallel replica queries. The right-side table can now be read in a distributed manner.
Adds a setting to limit the number of replicated tables, dictionaries, and views.
Automatically enables external sorting for GROUP BY and ORDER BY based on memory usage. This is controlled by max_bytes_ratio_before_external_group_by and max_bytes_ratio_before_external_sort.
The translate function now supports character deletion when the from argument is longer than the to argument.
Adds the parseDateTime64 series of functions, which return a DateTime64 type.
Reduces the memory footprint of the index_granularity array for MergeTree family engines.
The command-line application now supports syntax highlighting for multi-statement queries.
The command-line application now returns a non-zero exit code on error.
The Vertical format, which is activated when a query ends with \G, now includes features from the Pretty format, such as a thousands separator for numbers.
Allows disabling the growth of the file system cache memory buffer using the filesystem_cache_prefer_bigger_buffer_size setting.
Adds the background_download_max_file_segment_size setting to control the file segment size for background downloads in the file system cache.
Enables HTTP compression by default (enable_http_compression=1).
Supports altering the data type from Object to JSON.
Improves JSON type parsing. When a path corresponds to values of multiple types, the system now tries them in a best-match order.
Reading from system.asynchronous_metrics no longer waits for concurrent updates to complete.
Sets polling_max_timeout_ms to 10 minutes and polling_backoff_ms to 30 seconds.
Simple queries, such as SELECT - FROM t LIMIT 1, no longer load partition indexes.
Enables allow_reorder_prewhere_conditions by default under old compatibility settings.
Direct dictionary queries now only require SELECT or dictGet permissions. This fixes an ACL bypass issue.
Adds a selector for the system.dashboards table to the advanced dashboard page.
The prefer_localhost_replica setting is now respected during distributed INSERT...SELECT operations.
Upgrades JSON, Dynamic, and Variant types from experimental features to Beta.
Allows the use of UNION in materialized view queries. Only the first table triggers an insert.
Optimizes MergeTree write performance for batch inserts with a single partition key value.
Adds the MergeTreeIndexGranularityInternalArraysTotalSize metric to system.metrics to help detect high memory usage issues.
Recognizes all spelling variations of Null in Format Null queries.
Allows unknown values in a set for the Enum type.
Adds the total_bytes_with_inactive column to system.tables to count the size of inactive partitions.
Adds MergeTreeSettings to system.settings_changes.
Supports string search operations, such as like, on the Enum type.
Supports the notEmpty function for the JSON type.
Supports parsing the AuthenticationRequired error from GCS S3.
Supports the use of the Dynamic type in ifNull and coalesce functions.
Adds the JoinBuildTableRowCount, JoinProbeTableRowCount, and JoinResultRowCount performance events.
Supports the use of the Dynamic type in functions such as toFloat64 and toUInt32.

Bug fixes

Fixes an issue where duplicate partitions in an ATTACH PART query would become stuck in the attaching_ state.
Fixes an issue where DateTime64 values lost precision in the IN function.
Fixes a logic error in the IGNORE NULLS and RESPECT NULLS functions within ORDER BY ... WITH FILL.
Fixes a logic error that occurred when asynchronous inserts in Native format reached the memory limit.
Fixes an issue with comments in CREATE TABLE for EPHEMERAL columns.
Fixes a type error between JSONExtract and LowCardinality(Nullable).
Fixes a behavior issue that occurred when table names were too long.
The URL engine supports overriding the Content-Type header using a custom user header.
Fixes a "Cannot create persistent node at /processed" error in StorageS3Queue.
Fixes an issue where the _row_exists column was not considered when rebuilding projections for lightweight deletes.
Fixes an issue where a race condition caused incorrect values in system.query_metric_log.
Fixes a name mismatch for the quantileExactWeightedInterpolated function.
Fixes a bad_weak_ptr exception in comparison functions for the Dynamic type.
Fixes an issue where blobs were not deleted during zero-copy replication if they were still in use by a node.
Fixes an issue where Native format settings were ignored in HTTP and asynchronous inserts.
Fixes an issue where queries containing system table literals were rejected when use_query_cache=1.
Fixes a memory growth issue in disk storage that was not configured with a cache.
Fixes a "Cannot read all data" error when deserializing LowCardinality dictionaries within Dynamic columns.
Fixes an issue with incomplete cleanup of parallel output formats on the client.
Fixes a missing escape character issue in named collections that could prevent the service from starting.
Fixes an issue with asynchronous inserts of empty blocks in the native protocol.
Fixes an AST formatting inconsistency that occurred with incorrect wildcard character grants.
Fixes an incorrect row count in Chunks that contain a Variant column.
Fixes a crash that occurred when the MongoDB table function received invalid arguments, such as NULL.
Fixes a crash caused by optimize_rewrite_array_exists_to_has.
Fixes a transaction rollback error that occurred when creating a directory fails for the plain_rewritable disk.
Fixes a high memory usage issue caused by max_insert_delayed_streams_for_parallel_write when writing to multiple partitions.
Fixes a "Function argument must be a constant" error that occurred when arrayJoin appeared in a WHERE condition with the old analyzer.
Fixes a crash in SortCursor with zero columns when using the old analyzer.
Fixes a date32 out-of-bounds issue caused by uninitialized ORC data.
Fixes the size calculation for Dynamic and JSON types in wide parts.
Fixes an analyzer issue with IN clauses that use CTEs in queries within materialized views.
Fixes an issue where the bitShift function returned 0 or a default character instead of throwing an exception in case of out of bounds.
Fixes a server crash when using materialized views with specific engines.
Fixes a null pointer dereference in ARRAY JOIN on a nested struct with a constant array alias.
Fixes a LOGICAL_ERROR that occurred when altering an empty tuple.
Fixes an issue with converting a constant set in predicates over partition columns in case of NOT IN operator.
Fixes a CAST error from LowCardinality(Nullable) to Dynamic.
Fixes an exception for toDayOfWeek on WHERE condition with primary key of DateTime64 type.
Fixes an issue with default value padding after parsing sparse columns.
Fixes an error with the GROUPING function when input is ALIAS on distributed table.
Fixes an issue where the WITH TIES clause could return an insufficient number of rows.
Fixes a TOO_LARGE_ARRAY_SIZE exception caused by arrayWithConstant misjudging the array size limit.
Fixes a data race between the progress indicator and the progress table in the clickhouse-client. This issue is visible when using FROM INFILE.
Fixes a serialization issue with Dynamic values in Pretty JSON format.
Fixes an issue where the s3 and s3Cluster functions returned incomplete results or threw an exception when encountering an empty object, such as pattern/, in glob mode, such as pattern/*.
Fixes a crash in clickhouse-client syntax highlighting.
Fixes an "Illegal type" error for binary monotonic function in ORDER BY when the first argument is a constant.
EXPLAIN AST only supports SELECT subqueries. Other types of queries lead to logical error.
Fixes a formatting issue with MOVE PARTITION when format_alter_commands_with_parentheses is enabled.
Adds inferred format names to CREATE queries in the File, S3, URL, HDFS, and Azure engines, preventing the error caused by data file deletion during service restarts.
Fixes an issue where min_age_to_force_merge_on_partition_only repeatedly operated on partitions that were already merged into a single part.
Fixes a rare crash in SimpleSquashingChunksTransform when processing sparse columns.
Fixes a data race in GraceHashJoin that caused missing rows in the join output.
Fixes an issue with ALTER DELETE queries when enable_block_number_column is enabled.
Fixes a data race when ColumnDynamic::dumpStructure() is called concurrently, such as during ConcurrentHashJoin construction.
Fixes a LOGICAL_ERROR issue with duplicate columns in ORDER BY …… WITH FILL.
Fixes a type mismatch issue after applying optimize_functions_to_subcolumns.
Fixes a parsing failure of BACKUP DATABASE db EXCEPT TABLES db.table queries.
Don't allow creating an empty Variant.
Fixes an invalid formatting issue for result_part_path in system.merges.
Fixes a parsing issue with single-element globs.
Fixes a query generation issue on secondary servers for distributed queries that contain ARRAY JOIN.
Fixes an error when DateTime64 in DateTime64 returns nothing.
Fixes a No such key error in S3Queue unordered mode with tracked_files_limit smaller than the rate of files appearance.
Fixes high context mutex lock contention that occurred when deleting a large mark cache.
Fixes an issue where the primary key cache underestimated the dictionary size for LowCardinality columns.
Fixes an exception that was thrown in RemoteQueryExecutor when a local user does not exist.
Fixes an issue with mutation operations when enable_block_number_column is enabled.
Fixes a backup and restore issue for plain rewritable disks in case there are empty files in backup.
Fixes an insert cancellation issue in DistributedAsyncInsertDirectoryQueue.
Fixes a crash that occurred when parsing of incorrect data into sparse columns. This could happen when enable_parsing_to_custom_serialization is enabled.
Fixes a potential crash during backup and restore.
Fixes a potential issue in parallel_hash JOIN when the ON clause contains complex inequality conditions.
Use default format settings during JSON parsing to avoid deserialization interruption.
Fixes a crash related to transactions with unsupported storage engines.
Fixes a missing check for duplicate JSON keys during Tuple parsing. This could previously lead to the logic error "Invalid number of rows in Chunk".