This topic describes the features release in ApsaraDB for ClickHouse Enterprise Edition V24.12.
New features
Adds support for loading the primary index of all partitions (for a specified table or all tables) via the
SYSTEM LOAD PRIMARY KEYcommand. This helps with benchmarking and prevents extra latency during query execution.Adds the
CHECK GRANTquery to check whether the current user or role has specific permissions and whether the corresponding table or column exists in memory.Adds SQL syntax descriptions for workloads and resource management.
Implementes Iceberg table read support with schema evolution (column order/name changes and simple type extensions)
Adds support to set an independent expiration time for each authentication method, and removes related settings from the user entity.
Adds support to push external user roles from the query initiator node to other nodes in the cluster. This applies to scenarios where only the initiator node can access external authentication services, such as LDAP.
Adds support to alter the data type from String to JSON. Upgraded the serialization for JSON and Dynamic types to V2. You can revert to V1 using the
merge_tree_use_v1_object_and_dynamic_serializationsetting.Adds the
toUnixTimestamp64Secondfunction to convert a DateTime64 value to an Int64 value with Fixes-second precision. This function supports negative values for dates before January 1, 1970.Adds the
enforce_index_structure_match_on_partition_manipulationsetting. This setting allows ATTACH operations when the projections and secondary indexes of the source table are a subset of the target table.Adds support for the Spark text output format. This feature is disabled by default.
Adds the access_header authentication header type for S3 endpoints. This header has the lowest priority and can be overwritten by other configurations.
Implementes the layered settings feature.
Supports the staleness clause in
ORDER BY WITH FILL.Implementes simple CAST conversions from Map, Tuple, and Object to JSON using JSON string serialization and deserialization.
Adds camelCase aliases, such as
anyLastRespectNulls, for theanyandanyLastaggregate functions.Adds the
date_time_utcconfiguration parameter. This parameter allows JSON logs to use UTC in RFC 3339 or ISO 8601 format.Adds the
query_plan_join_swap_tablesetting. This setting specifies the inner table (build table) for a join. In auto mode, the table with fewer rows is automatically selected.Optimizes memory usage for index granularity values when partition granularity is constant. Adds the new
use_const_adaptive_granularitysetting to ensure this memory optimization.Add a global switch
allowed_feature_tierto disable all experimental or beta features.Adds Cluster table functions for Iceberg, Delta Lake, and Hudi.
Adds the syntax for modifying `SETTING` or `PROFILE` in `ALTER USER`, `ALTER ROLE`, and `ALTER PROFILE` statements.
Adds the
arrayPrAUCfunction to calculate the area under the precision-recall curve.Implementes a primary index cache for MergeTree tables, which is enabled using the
use_primary_key_cachesetting. This cache supports on-demand loading, similar to the mark cache, and prefetching, which is enabled using theprewarm_primary_key_cachesetting.Adds the
indexOfAssumeSortedarray function for optimized search in non-decreasing sorted arrays.The
groupConcataggregate function supports a delimiter as its optional second argument.Adds the
http_response_headerssetting to support custom HTTP response headers.Adds the
fromUnixTimestamp64Secondfunction to convert an Int64 UNIX timestamp to a DateTime64 value.
Performance optimizations
Adds the
short_circuit_function_evaluation_for_nullssetting. When the proportion of NULL values in a Nullable column exceeds a threshold, functions are executed only on non-NULL rows.Optimizes memory usage for the
--recursivedelete operation on Object Storage Service disks.Avoids copying input block columns for parallel processing when
join_algorithm='parallel_hash'.Enables JIT compilation for more expressions, such as abs, bitCount, comparison functions, and logical functions.
The default
join_algorithmbehavior now favors parallel_hash.Optimizes the Replacing merge algorithm for disjoint partitions.
Improves the collection performance of
system.query_metric_logby reducing the critical section.Adds the
optimize_extract_common_expressionssetting. This setting supports extracting common expressions from WHERE or ON conditions to reduce the number of hash tables during joins.Adds support for using indexes on LowCardinality(String) columns.
In parallel replica queries, worker nodes skip index analysis, which is now handled by the client node.
Optimizes the reading of single-column subcolumns in Compact Parts.
Optimizes sorting performance for LowCardinality(String) columns.
Optimizes the performance of argMin and argMax functions for simple data types.
Optimizes shared lock contention for the memory tracker.
Adds the
use_async_executor_for_materialized_viewssetting. This setting enables asynchronous and multi-threaded execution for materialized views, which improves INSERT performance but increases memory consumption.Increases the default threshold for pre-allocated memory for aggregations and joins to 10^12.
Optimizes deserialization performance for AggregateFunction states (in AggregateFunction data type and distributed queries), with minor improvements to RowBinary format parsing.
Breaking changes
The functions
greatestandleastnow ignore NULL input values, whereas previously they returned NULL if any parameter was NULL. For example,SELECT greatest(1, 2, NULL)now returns 2. This behavior is consistent with PostgreSQL.The Variant and Dynamic types are now disallowed by default in ORDER BY, GROUP BY, PARTITION BY, and PRIMARY KEY clauses because they can cause unexpected results.
Removes the
generate_seriesandgenerateSeriessystem tables, which were added by mistake.Fixed "file does not exist" errors in JSON subcolumn files that were caused by unescaped special characters.
The Kafka, NATS, and RabbitMQ table engines are now categorized under separate permission items in the SOURCES level. CREATE permissions for these engines must be granted to non-default database users.
Mutation queries, including subqueries, are now fully checked before execution. This prevents the accidental execution of invalid queries or mutations.
Renames the file system cache setting
skip_download_if_exceeds_query_cachetofilesystem_cache_skip_download_if_exceeds_per_query_cache_write_limit.Disallows the use of Dynamic and Variant types in min and max functions to avoid confusion.
Removes support for Enum, UInt128, and UInt256 arguments in the deltaSumTimestamp function. It also Removes support for Int8, UInt8, Int16, and UInt16 for its second argument, timestamp.
Adds a validation feature for dictionary source queries when ClickHouse is used as a dictionary data source.
Improvements
Higher-order functions that contain constant arrays now return a constant value.
Optimizes ordered reads by generating virtual rows. This is especially useful in multi-partition scenarios.
Query plan step names and pipeline processor names now include a unique ID suffix. This makes it easier to associate them with performance analytics tools.
The write buffer is now explicitly canceled or terminated. If an exception occurs, the client is notified of the interruption through the HTTP protocol.
Removes the allow_experimental_join_condition setting. Non-equi-join conditions are now allowed by default.
Enables parallel_replicas_local_plan by default. This builds a complete local plan on the query initiator node to improve performance.
The http_handlers setting now supports setting a user and password for dynamic_query_handler and predefined_query_handler.
S3Queue storage now supports ALTER TABLE MODIFY/RESET SETTING to modify specific settings.
The Object Storage Service API is no longer called when listing folders. Instead, a list of file names is stored in memory. This trades initial load time for memory usage.
Adds the
prewarm_mark_cachesetting to support prefetching the mark cache when inserting, merging, or fetching partitions.The native Parquet reader now supports the Boolean type.
Adds more S3 error types for retries, such as "Malformed message".
Lowers the log level for some S3-related logs.
Support for writing to HDFS files with paths that contain spaces.
Fixes RIGHT and FULL joins in parallel replica queries. The right-side table can now be read in a distributed manner.
Adds a setting to limit the number of replicated tables, dictionaries, and views.
Automatically enables external sorting for GROUP BY and ORDER BY based on memory usage. This is controlled by max_bytes_ratio_before_external_group_by and max_bytes_ratio_before_external_sort.
The translate function now supports character deletion when the
fromargument is longer than thetoargument.Adds the parseDateTime64 series of functions, which return a DateTime64 type.
Reduces the memory footprint of the index_granularity array for MergeTree family engines.
The command-line application now supports syntax highlighting for multi-statement queries.
The command-line application now returns a non-zero exit code on error.
The Vertical format, which is activated when a query ends with
\G, now includes features from the Pretty format, such as a thousands separator for numbers.Allows disabling the growth of the file system cache memory buffer using the filesystem_cache_prefer_bigger_buffer_size setting.
Adds the background_download_max_file_segment_size setting to control the file segment size for background downloads in the file system cache.
Enables HTTP compression by default (enable_http_compression=1).
Supports altering the data type from Object to JSON.
Improves JSON type parsing. When a path corresponds to values of multiple types, the system now tries them in a best-match order.
Reading from system.asynchronous_metrics no longer waits for concurrent updates to complete.
Sets polling_max_timeout_ms to 10 minutes and polling_backoff_ms to 30 seconds.
Simple queries, such as
SELECT - FROM t LIMIT 1, no longer load partition indexes.Enables allow_reorder_prewhere_conditions by default under old compatibility settings.
Direct dictionary queries now only require SELECT or dictGet permissions. This fixes an ACL bypass issue.
Adds a selector for the system.dashboards table to the advanced dashboard page.
The prefer_localhost_replica setting is now respected during distributed INSERT...SELECT operations.
Upgrades JSON, Dynamic, and Variant types from experimental features to Beta.
Allows the use of UNION in materialized view queries. Only the first table triggers an insert.
Optimizes MergeTree write performance for batch inserts with a single partition key value.
Adds the MergeTreeIndexGranularityInternalArraysTotalSize metric to system.metrics to help detect high memory usage issues.
Recognizes all spelling variations of Null in Format Null queries.
Allows unknown values in a set for the Enum type.
Adds the total_bytes_with_inactive column to system.tables to count the size of inactive partitions.
Adds MergeTreeSettings to system.settings_changes.
Supports string search operations, such as like, on the Enum type.
Supports the notEmpty function for the JSON type.
Supports parsing the AuthenticationRequired error from GCS S3.
Supports the use of the Dynamic type in ifNull and coalesce functions.
Adds the JoinBuildTableRowCount, JoinProbeTableRowCount, and JoinResultRowCount performance events.
Supports the use of the Dynamic type in functions such as toFloat64 and toUInt32.
Bug fixes
Fixes an issue where duplicate partitions in an ATTACH PART query would become stuck in the attaching_ state.
Fixes an issue where DateTime64 values lost precision in the IN function.
Fixes a logic error in the IGNORE NULLS and RESPECT NULLS functions within ORDER BY ... WITH FILL.
Fixes a logic error that occurred when asynchronous inserts in Native format reached the memory limit.
Fixes an issue with comments in CREATE TABLE for EPHEMERAL columns.
Fixes a type error between JSONExtract and LowCardinality(Nullable).
Fixes a behavior issue that occurred when table names were too long.
The URL engine supports overriding the Content-Type header using a custom user header.
Fixes a "
Cannot create persistent node at /processed" error in StorageS3Queue.Fixes an issue where the _row_exists column was not considered when rebuilding projections for lightweight deletes.
Fixes an issue where a race condition caused incorrect values in system.query_metric_log.
Fixes a name mismatch for the quantileExactWeightedInterpolated function.
Fixes a bad_weak_ptr exception in comparison functions for the Dynamic type.
Fixes an issue where blobs were not deleted during zero-copy replication if they were still in use by a node.
Fixes an issue where Native format settings were ignored in HTTP and asynchronous inserts.
Fixes an issue where queries containing system table literals were rejected when use_query_cache=1.
Fixes a memory growth issue in disk storage that was not configured with a cache.
Fixes a "
Cannot read all data" error when deserializing LowCardinality dictionaries within Dynamic columns.Fixes an issue with incomplete cleanup of parallel output formats on the client.
Fixes a missing escape character issue in named collections that could prevent the service from starting.
Fixes an issue with asynchronous inserts of empty blocks in the native protocol.
Fixes an AST formatting inconsistency that occurred with incorrect wildcard character grants.
Fixes an incorrect row count in Chunks that contain a Variant column.
Fixes a crash that occurred when the MongoDB table function received invalid arguments, such as NULL.
Fixes a crash caused by optimize_rewrite_array_exists_to_has.
Fixes a transaction rollback error that occurred when creating a directory fails for the plain_rewritable disk.
Fixes a high memory usage issue caused by max_insert_delayed_streams_for_parallel_write when writing to multiple partitions.
Fixes a "
Function argument must be a constant" error that occurred when arrayJoin appeared in a WHERE condition with the old analyzer.Fixes a crash in SortCursor with zero columns when using the old analyzer.
Fixes a date32 out-of-bounds issue caused by uninitialized ORC data.
Fixes the size calculation for Dynamic and JSON types in wide parts.
Fixes an analyzer issue with IN clauses that use CTEs in queries within materialized views.
Fixes an issue where the bitShift function returned 0 or a default character instead of throwing an exception in case of out of bounds.
Fixes a server crash when using materialized views with specific engines.
Fixes a null pointer dereference in ARRAY JOIN on a nested struct with a constant array alias.
Fixes a LOGICAL_ERROR that occurred when altering an empty tuple.
Fixes an issue with converting a constant set in predicates over partition columns in case of NOT IN operator.
Fixes a CAST error from LowCardinality(Nullable) to Dynamic.
Fixes an exception for toDayOfWeek on WHERE condition with primary key of DateTime64 type.
Fixes an issue with default value padding after parsing sparse columns.
Fixes an error with the GROUPING function when input is ALIAS on distributed table.
Fixes an issue where the WITH TIES clause could return an insufficient number of rows.
Fixes a TOO_LARGE_ARRAY_SIZE exception caused by arrayWithConstant misjudging the array size limit.
Fixes a data race between the progress indicator and the progress table in the clickhouse-client. This issue is visible when using FROM INFILE.
Fixes a serialization issue with Dynamic values in Pretty JSON format.
Fixes an issue where the s3 and s3Cluster functions returned incomplete results or threw an exception when encountering an empty object, such as
pattern/, in glob mode, such aspattern/*.Fixes a crash in clickhouse-client syntax highlighting.
Fixes an "Illegal type" error for binary monotonic function in
ORDER BYwhen the first argument is a constant.EXPLAIN AST only supports SELECT subqueries. Other types of queries lead to logical error.
Fixes a formatting issue with MOVE PARTITION when format_alter_commands_with_parentheses is enabled.
Adds inferred format names to CREATE queries in the File, S3, URL, HDFS, and Azure engines, preventing the error caused by data file deletion during service restarts.
Fixes an issue where min_age_to_force_merge_on_partition_only repeatedly operated on partitions that were already merged into a single part.
Fixes a rare crash in SimpleSquashingChunksTransform when processing sparse columns.
Fixes a data race in GraceHashJoin that caused missing rows in the join output.
Fixes an issue with ALTER DELETE queries when enable_block_number_column is enabled.
Fixes a data race when
ColumnDynamic::dumpStructure()is called concurrently, such as during ConcurrentHashJoin construction.Fixes a LOGICAL_ERROR issue with duplicate columns in
ORDER BY …… WITH FILL.Fixes a type mismatch issue after applying optimize_functions_to_subcolumns.
Fixes a parsing failure of BACKUP DATABASE db EXCEPT TABLES db.table queries.
Don't allow creating an empty Variant.
Fixes an invalid formatting issue for result_part_path in system.merges.
Fixes a parsing issue with single-element globs.
Fixes a query generation issue on secondary servers for distributed queries that contain ARRAY JOIN.
Fixes an error when DateTime64 in DateTime64 returns nothing.
Fixes a
No such keyerror in S3Queue unordered mode with tracked_files_limit smaller than the rate of files appearance.Fixes high context mutex lock contention that occurred when deleting a large mark cache.
Fixes an issue where the primary key cache underestimated the dictionary size for LowCardinality columns.
Fixes an exception that was thrown in RemoteQueryExecutor when a local user does not exist.
Fixes an issue with mutation operations when enable_block_number_column is enabled.
Fixes a backup and restore issue for plain rewritable disks in case there are empty files in backup.
Fixes an insert cancellation issue in DistributedAsyncInsertDirectoryQueue.
Fixes a crash that occurred when parsing of incorrect data into sparse columns. This could happen when enable_parsing_to_custom_serialization is enabled.
Fixes a potential crash during backup and restore.
Fixes a potential issue in parallel_hash JOIN when the ON clause contains complex inequality conditions.
Use default format settings during JSON parsing to avoid deserialization interruption.
Fixes a crash related to transactions with unsupported storage engines.
Fixes a missing check for duplicate JSON keys during Tuple parsing. This could previously lead to the logic error "Invalid number of rows in Chunk".