All Products
Search
Document Center

MaxCompute:Release notes for engine versions

Last Updated:Feb 28, 2025

This topic describes the release notes for MaxCompute and provides links to the relevant references.

February 2025

MaxCompute SQL V50 is officially released in February 2025. The following features are added or enhanced in this version:

Data warehouse engine

  • Data formats

    A higher scale is supported by the DECIMAL data type. The scale value of DECIMAL(precision,scale) indicates the number of digits in the decimal part of a DECIMAL value. You can run the SET odps.sql.decimal2.extended.scale.enable=true; command to extend the range of scale from [0, 18] to [0, 38]. For more information, see MaxCompute V2.0 data type edition.

  • Syntax and function enhancements

    • Enhanced STRUCT syntax: The STRUCT expression syntax is supported. This syntax allows you to use named expressions to construct data of the STRUCT type. This is a new method to construct the complex data type.

    • Enhanced FIND_IN_SET function: This function can be used to search for a substring within a string that consists of substrings separated by a delimiter and return the position of the substring. In earlier MaxCompute SQL versions, only a comma (,) can be used as the delimiter for this function. After the enhancement, you can specify a custom delimiter of the STRING type for this function. This enables more flexible string search. For more information, see FIND_IN_SET.

    • New built-in function GET_DATA_FROM_OSS: This function can be used to download Object Storage Service (OSS) object data from Object Tables and return binary data for subsequent computing.

  • Feature enhancements

    • MATERIALIZED CTE: When you define a common table expression (CTE), a MATERIALIZE hint can be used in the SELECT statement to cache the CTE calculation result to a temporary table. When you subsequently access the CTE, you can directly read the result from the cache. This avoids the out-of-memory (OOM) issues in the multi-layer CTE nesting scenario and improves the performance of the CTE statement.

    • Enhanced observability of Bloom filter indexes: The time taken to merge Bloom filter indexes is displayed on the SubStatusHistory tab of LogView. For more information, see Generate a Bloom filter index.

    • Enhanced materialized views: The query rewrite feature of materialized views is improved to support more operators, including DISTRIBUTED BY, ORDER BY, ORDER BY + LIMIT, and LIMIT.

  • Performance and parameter upgrade

    • The execution performance of the ARRAY_CONTAINS function is improved. In string search scenarios, the optimizer automatically identifies input parameters that involve the SPLIT operation and optimizes the function into an equivalent FIND_IN_SET operation by default. This optimization also supports a wider range of delimiter scenarios. For example, ARRAY_CONTAINS(SPLIT(c1, '_'), c2) is automatically optimized to FIND_IN_SET(c1, c2, '_'), which improves the execution performance of ARRAY_CONTAINS.

    • By default, the Reshuffle Split capability is enabled for dynamic partitioning to optimize scenarios involving dynamic partition reshuffling. This optimization splits the data flow of dynamic partitions and performs reshuffling only on a single path, thereby reducing the overhead of dynamic partition reshuffling while avoiding the issue of excessive small files.

    • The Shuffle Removal capability is further enhanced to eliminate unnecessary shuffles in MAPJOIN and PARTITIONED HASH JOIN scenarios and improve job performance.

Near real-time data warehousing

  • Data writing from Flink to Delta tables

    MaxCompute Delta tables support multiple data write methods. MaxCompute provides a new version of the Flink connector plug-in. The Flink connector plug-in can be used to write data from Flink to MaxCompute standard tables and Delta tables. This facilitates data writing from Flink to MaxCompute. In addition, Flink CDC data can be directly written to Delta tables. For more information, see Use Flink to write data to a Delta table.

Lakehouse and the external table feature

  • When you create an external table to parse data in the PARQUET format, implicit conversions are supported for some data types, such as TINYINT, SMALLINT, and DATETIME.

  • The MAX_PT function can be used to query the latest partition of an external table. You can use this function to query the latest partition that contains data in an OSS external table. For more information, see MAX_PT.

November 2024

The MaxCompute SQL V49 version was officially released in November 2024. The following features are added or enhanced in this version:

Data warehouse engine

  • New features

    • Bitmap index: You can create bitmap indexes for low-cardinality columns in which a large number of duplicate values are available. In range filtering scenarios, you can use bitmap indexes to filter out more than 50% of data at most. This helps accelerate queries. For more information, see Bitmap index (beta).

    • Bloom filter index: A Bloom filter is an efficient probabilistic data structure. MaxCompute allows you to use Bloom filter indexes to perform large-scale point queries. This reduces unnecessary data scanning during queries and improves the overall query efficiency and performance. For more information, see Bloom filter index (Beta).

  • Built-in functions

    • The built-in function JSON_EXPLODE is added. This function is used to expand each element in a JSON array or JSON object into multiple rows. For more information, see JSON_EXPLODE.

  • Syntax enhancement

    • You can use the OR REPLACE clause in the CREATE TABLE statement to update the metadata of a table. If the destination table already exists, you can directly update the metadata of the table. You do not need to delete an existing table before you re-create the table. This simplifies the use of SQL statements and improves ease of use. For more information, see Create and drop tables.

    • Single-line and multi-line comments can be used in MaxCompute SQL scripts to improve code readability. For more information, see SQL comments.

    • SUBQUERY_MAPJOIN HINT is supported. Specific subqueries such as SCALAR, IN, and EXISTS are transformed into JOIN operations during the execution. You can explicitly specify hints to use the MAPJOIN algorithm to improve the execution efficiency. For more information, see SUBQUERY_MAPJOIN HINT.

  • New parameters

    • The original parameter odps.stage.mapper.split.size supports the overall settings in the map stage. You can use the new flag to flexibly split tables by row or concurrency. The new flag can improve the task concurrency in scenarios where the size of each row in the table is small and the workloads of subsequent computing operations are heavy. For more information, see Flag parameters.

    • If a query repeatedly accesses the same partitioned table, you can run the set odps.optimizer.merge.partitioned.table=true; command to enable the system to merge the access operations on the partitioned table. This minimizes the I/O operations on the partitioned table and improves query performance. For more information, see Flag parameters.

  • Behavior changes

    • The feature to convert dynamic partitions into static partitions is enabled by default for all DML operations that are performed in MaxCompute. This feature helps improve query performance. The implementation changes the behavior of UPDATE, DELETE, and MERGE INTO operations. For more information, see the relevant notice in Service notices in 2024.

Big data AI (MaxFrame)

  • LogView 2.0 is compatible with MaxFrame and supports the following MaxFrame-related features. For more information, see Use LogView 2.0 to view MaxFrame jobs.

    • Allows you to view the execution records and running durations of all directed acyclic graphs (DAGs) submitted in MaxFrame sessions.

    • Allows you to interactively view the execution sequence, running time, operator topology, and status relationships of sub-DAGs in each DAG.

    • Allows you to view the settings, status, memory usage, and CPU utilization of each child instance.

  • MaxFrame offers an automatic packaging service to simplify the management of third-party packages in Python-based job development. This service allows you to declare required external dependency files during job development. When the job is running, the dependency files are automatically packaged and integrated into the job development environment. You do not need to manually upload the packages. This simplifies package management. For more information, see Automatic packaging service.

Lakehouse and the external table feature

  • ZSTD compression is supported during data writes to a Parquet external table based on the JNI interface.

    Before this feature is introduced, only the uncompressed files and Snappy files can be written when you create a Parquet external table. After this feature is introduced, files that are compressed by using the ZSTD compression algorithm can also be written in this scenario. This helps improve the compression ratio and read and write performance and achieves greater cost-effectiveness. For more information, see Create an OSS external table.

  • CsvStorageHandler and OpenCsvSerde are provided to support more data types for data reads and writes.

    • MaxCompute provides the read and write standards OpenCsvSerde, which is referred to as CsvSerde and is compatible with Hive. Data types supported by OpenCsvSerde are Hive-compatible data types. MaxCompute also provides the custom read and write standards CsvStorageHandler, which is referred to as CsvHandler. Data types supported by CsvStorageHandler are data types supported in the MaxCompute V2.0 data type edition. The data types of CsvSerde and CsvHandler are not totally the same, but still have the intersection of multiple basic data types, such as INT and FLOAT. However, there are still many differences in the parsing behavior of these data types. A unified standard has not yet been formulated. For example, for the FLOAT type, CsvSerde defines the processing of special values such as INF. However, CsvHandler does not process special values and only attempts to use the parseFloat method for parsing. As a result, the behaviors for parsing basic data types may not be the same when you use both CsvHandler and CsvSerde.

    • CsvStorageHandler supports multiple basic data types, such as BOOLEAN, TIMESTAMP, DATE, and DATETIME. This allows you to export data of all data types from MaxCompute to OSS and store the data in the CSV format. After cross-region replication is performed based on OSS, data can be restored to MaxCompute.

  • OSS external tables allow you to perform STS authentication by assuming a RAM role.

    This feature is optimized to allow you to access MaxCompute by assuming a RAM role (no AccessKey pairs required) or access MaxCompute from other cloud services by assuming a RAM role in scenarios where external tables are involved. Before this feature is optimized, when you assume a role in the preceding scenarios, the system cannot obtain the user information of the RAM role because the external tables contain the RAM role that is integrated with table properties for MaxCompute to access peer services. As a result, you cannot access the external tables by assuming the RAM role. After this feature is optimized, the use of RAM roles and seamless access based on RAM roles are not adversely affected even if external tables are involved. For more information, see Create an OSS external table.

  • The optimizer supports statistics of temporary statistics tables during queries to identify small tables and optimize query plans.

    Since the data queried by using external tables is stored in an external data lake, the system does not establish metadata locally in order to ensure data openness. In this case, if you do not collect statistics in advance, the optimizer uses conservative policies, which decrease the query efficiency. The optimizer now supports statistics of temporary statistics tables during queries to identify small tables. This allows you to proactively use various methods to optimize query plans. The methods include performing hash join operations, optimizing the join order, reducing a large number of shuffle operations, and shortening the executed pipelines. For more information, see Read OSS data.