EMR Serverless Spark released a new version on November 12, 2025. This release adds AI-assisted data processing capabilities, expands gateway and resource management options, and upgrades engine performance across all supported runtime versions.
Platform updates
AI Center (Beta)
-
AI Function — Provides out-of-the-box Qwen large model capabilities, so you can call the model directly to process large-scale data without managing model infrastructure. See Introduction to AI Function.
-
Model service management — Register external model services and call them from Serverless Spark jobs. See Tutorial: Integrate external model services with EMR Serverless Spark.
Data Development
-
Streaming job log rotation — Streaming jobs now support log rotation, preventing unbounded log accumulation for long-running jobs. See Develop batch or streaming jobs.
-
Streaming job retry — Configure automatic retry policies and retry intervals for failed streaming jobs. See Develop batch or streaming jobs.
-
SparkSQL editor run history — The SparkSQL job editor now displays run records and execution results from the last three days. See Develop with SparkSQL.
Data catalog
-
Multi-catalog support — A single workspace can now add and use HMS (Hive Metastore Service), DLF 1.0, and DLF (formerly DLF 2.5) data catalogs simultaneously. See Manage data catalogs.
Resource Management
-
Hybrid billing mode — Queues now support allocating both pay-as-you-go and subscription quotas in the same queue, combining the flexibility of pay-as-you-go with the cost savings of subscription pricing. See Manage resource queues.
-
Decrease subscription quotas — Subscription quotas can now be decreased. See Manage workspaces.
-
Resource observation daily granularity — Resource observation now supports daily-granularity queries, letting you view resource usage trends for the last 7 and 30 days. See Resource observation.
Gateway
-
Kyuubi Application job overview and logs — Kyuubi Application now supports viewing job overviews and exploring logs directly in the console. See Manage Kyuubi Gateway.
-
Livy Gateway session limits — Livy Gateway now supports limiting the number of sessions a single user can create, helping prevent resource exhaustion from runaway clients. See Livy Gateway configuration examples.
Configuration management
-
Timeout configuration in Spark configuration templates — Spark configuration templates now include a timeout configuration item. See Manage Spark configuration templates.
-
Gateway template loading — Kyuubi Gateway and Livy Gateway now support loading configurations from Spark configuration templates, reducing duplicated configuration across gateway types. See Manage Spark configuration templates.
Best practices
-
Large-scale text deduplication with MinHash-LSH — A new best practice guide covers using Serverless Spark to perform text deduplication at scale with the MinHash-LSH algorithm. See Large-scale text deduplication based on MinHash-LSH.
-
Python UDFs in SparkSQL — Register and use Python UDFs in SparkSQL jobs. See Use UDFs.
Engine updates
The following updates apply to engine versions esr-5.0.0 (Spark 4.0.1, Scala 2.13), esr-4.6.0 (Spark 3.5.2, Scala 2.12), esr-3.5.0 (Spark 3.4.4, Scala 2.12), and esr-2.9.0 (Spark 3.3.1, Scala 2.12), unless noted otherwise.
Fusion acceleration
-
`shiftrightunsigned` support — The
shiftrightunsignedfunction is now supported in Fusion acceleration. -
`str_to_map` last-win mode —
str_to_mapnow supports thelast_winparameter for handling duplicate keys. -
Parquet write optimization — Improved write performance for Parquet-format outputs.
-
Commit optimization — Reduced overhead during job commit.
-
JSON Datasource optimization — Improved parsing and processing performance for JSON data sources.
-
Sort operator optimization — Improved performance for sort-heavy workloads.
Lakehouse formats
DLF (Data Lake Formation):
-
Optimized table reads and writes — Improved throughput for DLF table operations.
-
Password-free access to pvfs — DLF now supports password-free access to pvfs storage.
-
Lance file format support — DLF now supports reading and writing the Lance file format.
Paimon:
-
Password-free access for Parquet — Paimon now supports password-free access when reading Parquet files.
-
Row-level lineage — Paimon now supports tracking row-level lineage for data governance use cases.
-
MERGE INTO optimization — Improved performance for the MERGE INTO statement on Paimon tables.
-
Compaction optimization — Improved compaction efficiency, reducing background resource usage during table maintenance.
Spark framework
-
Spark 4.0 support — Supports Spark 4.0.
-
Python UDF support — SparkSQL jobs can now define and register Python UDFs at the engine level.
-
MC Connector `max_pt` and `map_agg` — The MaxCompute Connector (MC Connector) now supports the
max_ptpartition function and themap_aggaggregation function. -
Fast Fail — Supports Fast Fail.
-
Improved Hive compatibility — Expanded compatibility with Apache Hive metastore behaviors and query semantics.
-
DistCp — The DistCp utility is now available for large-scale distributed data copy operations.
DataWorks
-
RDD lineage — Supports RDD lineage.
DuckDB
-
OSS read/write support — DuckDB running within Serverless Spark can now read from and write to Alibaba Cloud OSS (Object Storage Service).
Celeborn
-
Shuffle read retry optimization — Improved the shuffle read retry mechanism to reduce job failures caused by transient shuffle service errors.
-
Shuffle resource allocation optimization — Improved shuffle resource allocation to reduce contention under high-concurrency workloads.