All Products
Search
Document Center

E-MapReduce:Release notes for EMR Serverless Spark - November 12, 2025

Last Updated:Mar 26, 2026

EMR Serverless Spark released a new version on November 12, 2025. This release adds AI-assisted data processing capabilities, expands gateway and resource management options, and upgrades engine performance across all supported runtime versions.

Platform updates

AI Center (Beta)

Data Development

  • Streaming job log rotation — Streaming jobs now support log rotation, preventing unbounded log accumulation for long-running jobs. See Develop batch or streaming jobs.

  • Streaming job retry — Configure automatic retry policies and retry intervals for failed streaming jobs. See Develop batch or streaming jobs.

  • SparkSQL editor run history — The SparkSQL job editor now displays run records and execution results from the last three days. See Develop with SparkSQL.

Data catalog

  • Multi-catalog support — A single workspace can now add and use HMS (Hive Metastore Service), DLF 1.0, and DLF (formerly DLF 2.5) data catalogs simultaneously. See Manage data catalogs.

Resource Management

  • Hybrid billing mode — Queues now support allocating both pay-as-you-go and subscription quotas in the same queue, combining the flexibility of pay-as-you-go with the cost savings of subscription pricing. See Manage resource queues.

  • Decrease subscription quotas — Subscription quotas can now be decreased. See Manage workspaces.

  • Resource observation daily granularity — Resource observation now supports daily-granularity queries, letting you view resource usage trends for the last 7 and 30 days. See Resource observation.

Gateway

  • Kyuubi Application job overview and logs — Kyuubi Application now supports viewing job overviews and exploring logs directly in the console. See Manage Kyuubi Gateway.

  • Livy Gateway session limits — Livy Gateway now supports limiting the number of sessions a single user can create, helping prevent resource exhaustion from runaway clients. See Livy Gateway configuration examples.

Configuration management

  • Timeout configuration in Spark configuration templates — Spark configuration templates now include a timeout configuration item. See Manage Spark configuration templates.

  • Gateway template loading — Kyuubi Gateway and Livy Gateway now support loading configurations from Spark configuration templates, reducing duplicated configuration across gateway types. See Manage Spark configuration templates.

Best practices

  • Large-scale text deduplication with MinHash-LSH — A new best practice guide covers using Serverless Spark to perform text deduplication at scale with the MinHash-LSH algorithm. See Large-scale text deduplication based on MinHash-LSH.

  • Python UDFs in SparkSQL — Register and use Python UDFs in SparkSQL jobs. See Use UDFs.

Engine updates

The following updates apply to engine versions esr-5.0.0 (Spark 4.0.1, Scala 2.13), esr-4.6.0 (Spark 3.5.2, Scala 2.12), esr-3.5.0 (Spark 3.4.4, Scala 2.12), and esr-2.9.0 (Spark 3.3.1, Scala 2.12), unless noted otherwise.

Fusion acceleration

  • `shiftrightunsigned` support — The shiftrightunsigned function is now supported in Fusion acceleration.

  • `str_to_map` last-win modestr_to_map now supports the last_win parameter for handling duplicate keys.

  • Parquet write optimization — Improved write performance for Parquet-format outputs.

  • Commit optimization — Reduced overhead during job commit.

  • JSON Datasource optimization — Improved parsing and processing performance for JSON data sources.

  • Sort operator optimization — Improved performance for sort-heavy workloads.

Lakehouse formats

DLF (Data Lake Formation):

  • Optimized table reads and writes — Improved throughput for DLF table operations.

  • Password-free access to pvfs — DLF now supports password-free access to pvfs storage.

  • Lance file format support — DLF now supports reading and writing the Lance file format.

Paimon:

  • Password-free access for Parquet — Paimon now supports password-free access when reading Parquet files.

  • Row-level lineage — Paimon now supports tracking row-level lineage for data governance use cases.

  • MERGE INTO optimization — Improved performance for the MERGE INTO statement on Paimon tables.

  • Compaction optimization — Improved compaction efficiency, reducing background resource usage during table maintenance.

Spark framework

  • Spark 4.0 support — Supports Spark 4.0.

  • Python UDF support — SparkSQL jobs can now define and register Python UDFs at the engine level.

  • MC Connector `max_pt` and `map_agg` — The MaxCompute Connector (MC Connector) now supports the max_pt partition function and the map_agg aggregation function.

  • Fast Fail — Supports Fast Fail.

  • Improved Hive compatibility — Expanded compatibility with Apache Hive metastore behaviors and query semantics.

  • DistCp — The DistCp utility is now available for large-scale distributed data copy operations.

DataWorks

  • RDD lineage — Supports RDD lineage.

DuckDB

  • OSS read/write support — DuckDB running within Serverless Spark can now read from and write to Alibaba Cloud OSS (Object Storage Service).

Celeborn

  • Shuffle read retry optimization — Improved the shuffle read retry mechanism to reduce job failures caused by transient shuffle service errors.

  • Shuffle resource allocation optimization — Improved shuffle resource allocation to reduce contention under high-concurrency workloads.