This topic summarizes the features and fixes in each minor version of ApsaraMQ for Kafka.
V3 (3.3.1 series)
v3.6.0.2
Release date: 2025-12-15
Added Prometheus monitoring metrics to provide more comprehensive monitoring of cluster load and runtime status.
Introduced a high availability (HA) backoff retry mechanism to improve system stability and recovery reliability during network jitter or brief node failures.
Added automatic transaction cleanup and historical transaction skip-loading mechanisms to resolve abnormal transaction states caused by expired transactions.
v3.5.0.2
Release date: 2025-08-25
Fixed a memory pool lifecycle management issue to prevent resource leaks and ensure long-term stability.
Optimized HA failover time to further reduce fault recovery time.
Significantly improved throughput performance under high I/O workloads to better support high-throughput write and consumption scenarios.
Added a topic-level control to disable writes, which lets you flexibly pause writes during O&M or in emergency scenarios.
Added support for dynamic adjustment of the prefetch cache size to optimize read performance across different workloads without requiring a restart.
v3.4.2.4
Release date: 2025-05-12
Optimized the handling logic for storage file creation timeouts to improve fault tolerance in extreme scenarios.
Optimized the Kafka kernel startup process to accelerate instance initialization.
Optimized the storage tier prefetch mechanism to improve the response time for the first read or write operation after a cold start.
v3.4.2.3
Release date: 2025-04-29
Fixed a leader election failure in specific concurrent scenarios to ensure high availability.
Improved the accuracy of ZooKeeper session heartbeat detection to reduce unnecessary failovers caused by false positives.
Enhanced the state reporting mechanism to improve the real-time awareness of in-sync replica (ISR) changes in the monitoring system.
Optimized the performance of the underlying file List interface.
v3.4.2.2
Release date: 2025-04-22
Fixed multiple critical issues, including log loading failures and high disk pressure caused by an excessive number of open index files during service shutdown.
Upgraded underlying dependency components to improve overall reliability.
Enhanced server-side monitoring metrics to accelerate issue detection.
Optimized traffic steering policies during cluster scaling to minimize the impact on online services.
v3.4.2.1
Release date: 2025-03-31
Added support for the dynamic creation of internal system topics to increase runtime flexibility.
Added support for full-trace TraceID pass-through to simplify request tracing and troubleshooting in distributed environments.
Optimized the topic deletion process to improve metadata cleanup efficiency.
v3.4.0.5
Release date: 2025-01-10
Improved observability during HA failover to accelerate fault diagnosis.
Improved I/O scheduling policies in the storage tier to reduce tail latency under high-concurrency read and write workloads.
Fixed the asynchronous task timeout handling mechanism to prevent unexpected request blocking.
Fixed an issue where the leader epoch grew abnormally when the leader did not change.
v3.4.0.3
Release date: 2024-11-05
Added support for high-availability channels to improve network transmission efficiency and stability.
Fixed a memory leak that occurred during fast HA recovery.
Added the ability to specify a default storage class for newly created topics to simplify configuration.
Optimized multiple server-side health check metrics to improve the early detection of cluster abnormalities.
v3.4.0.1
Release date: 2024-09-26
Added fast HA recovery to significantly reduce leader failover time.
Isolated cold reads from hot reads to improve read performance and stability.
Fixed an off-heap memory leak to ensure long-term stability.
Added an adaptive throttling policy to better handle high-load scenarios.
Enhanced key alert log identifiers to improve emergency response efficiency.
v3.2.0.3
Release date: 2024-04-15
Improved observability during HA failover to accelerate fault diagnosis.
Fixed multiple boundary issues in the snapshot file cleanup logic to ensure metadata consistency.
Fixed an issue where the leader epoch grew abnormally when the leader did not change.
Added support for the dynamic creation of internal system topics to increase runtime flexibility.
Optimized the performance of the underlying file List interface.
Fixed the asynchronous task timeout handling mechanism to prevent unexpected request blocking.
V2 (2.6.2, 2.2.0, and 0.10.x series)
v5.2.4.1
Release date: 2025-10-17
Optimized code logic and improved memory performance.
Refactored and optimized core module logic to reduce memory overhead, improve overall resource utilization, and increase runtime efficiency.Added a minimum consumer offset cache.
Added a cache for minimum consumer offsets to support fast queries and responses. This feature significantly improves offset retrieval performance and reduces backend storage pressure.Weakened consumer offset interface dependencies on other components.
Optimized the consumer offset query flow to use weak dependencies on other components. This change improves system availability and stability when other components fail or experience network jitter.Fixed a Socket memory pool leak.
Identified and fixed a memory leak in the Socket memory pool that occurred when memory was not released correctly in certain scenarios. This fix further improves system reliability and stability during long-term operation.
v5.2.3.1
Release date: 2025-01-15
Feature optimizations
Optimized kernel logic to reduce the frequency of Fetch requests.
Refactored the message pull flow and optimized the Fetch request trigger mechanism to reduce unnecessary Fetch requests. This change lowers broker load and network overhead.Optimized read and write queues to improve system isolation.
Improved queue scheduling policies for read and write requests to isolate cold data reads from other core API requests. This change significantly reduces interference from cold data reads on critical path performance.Enhanced kernel observability.
Added monitoring metrics and instrumentation for key paths to improve visibility into the system runtime status. This change helps with troubleshooting and performance tuning.
Bug fixes
Fixed consumer offset rollback after unexpected downtime.
Optimized the offset persistence mechanism to ensure correct offset recovery after broker crashes. This fix prevents duplicate message consumption.Fixed write failures caused by duplicate topic names.
Fixed the topic metadata management logic to resolve write failures caused by naming conflicts. This fix improves cluster stability and compatibility.Fixed transaction exceptions caused by ZooKeeper session expiration (KAFKA-9307).
Optimized ZooKeeper session management to improve the transaction state machine’s tolerance of session timeouts. This fix prevents transaction interruptions caused by brief connection jitter.Fixed LocalTopic memory leaks (KAFKA-8448).
Identified and fixed a LocalTopic memory leak caused by unreleased references during long-term operation. This fix improves long-term stability and resource management.
v5.2.2.9
Release date: 2024-12-02
Fixed the uncontrolled growth of the
__consumer_offsetsinternal topic caused by transaction marker messages (KAFKA-8335).
Identified and fixed an issue where transaction markers were not cleaned up in time, which prevented segment log compaction and expiration in the__consumer_offsetstopic. Optimized transaction state writing and cleanup to keep the storage growth of the internal topic within acceptable limits. This fix avoids abnormal disk space usage and improves long-term stability.
v5.2.2.8
Release date: 2024-07-04
Further optimized the kernel Time-to-Live (TTL) deletion mechanism to reduce the impact on disk read performance.
Refactored the TTL data expiration logic to decouple cleanup operations from the read path. This change reduces competition for disk I/O between background deletion tasks and read operations, which significantly lowers the impact on read latency. It also improves system stability and response performance under high load.Enhanced kernel log observability.
Improved the log output for key paths, standardized log formats, and added context, such as request type and duration. These changes accelerate troubleshooting and improve O&M monitoring, which helps diagnose anomalies faster.
v5.2.2.5
Release date: 2024-03-28
Optimized the kernel Time-to-Live (TTL) deletion logic to reduce the impact on disk read performance.
Refactored the TTL data cleanup mechanism. Optimized the scheduling and I/O handling for background expiration tasks to reduce resource contention with disk reads during bulk deletions in large-scale scenarios. Introduced fine-grained throttling controls for cleanup tasks to effectively mitigate read latency jitter caused by concentrated deletions. This optimization significantly improves system stability and response performance under high load.
v5.2.2.4
Release date: 2023-08-14
Fixed LocalTopic deletion failures caused by partition skew. Identified and resolved an issue where LocalTopic deletion stalled or failed due to inconsistent metadata states in partition-skewed deployments. Strengthened the fault-tolerant deletion logic and state validation to ensure that LocalTopic can be reclaimed reliably across all deployment scenarios. This fix improves resource management reliability and system robustness.
v5.2.2.2
Release date: 2023-03-29
Fixed incorrect deletion of metadata.
Identified and fixed a scenario where critical metadata was incorrectly purged. Strengthened metadata lifecycle management and deletion condition checks to ensure the integrity of topic, partition, and replica configurations during unintended operations. This fix improves system stability and data security.Added support for Sarama clients to retrieve offset lists during node downtime.
Enhanced broker high availability by optimizing the metadata return logic. This change allows Sarama and other clients to retrieve consumer offset lists even when some nodes fail, which improves client fault tolerance and availability during cluster anomalies.Fixed error messages for the Add Partitions API.
Optimized the exception feedback mechanism for theAdd PartitionsAPI. Standardized and clarified error codes and response messages to accelerate O&M diagnostics and help users identify the causes of operation failures more accurately.
v5.2.2.1
Release date: 2022-10-09
Enhanced kernel observability.
Improved monitoring metrics and instrumentation for key kernel paths. Added fine-grained observation capabilities for message reads and writes, partition state, and resource usage. This change increases system transparency and supports faster troubleshooting and performance tuning.Optimized the performance of auto-created topics to improve creation speed.
Refactored the auto-create topic flow to reduce metadata initialization and synchronization overhead. This change significantly shortens the topic creation response time and improves system responsiveness and user experience under high concurrency.Added support for filtering auto-created topics by internal management clients.
Added fine-grained control over auto-topic creation by internal management clients. You can use whitelists or policies to prevent unintended or unauthorized topic creation. This feature enhances cluster security governance and O&M controllability.
v5.2.2.0
Release date: 2022-03-15
Fixed concurrency safety issues in abnormal read scenarios.
Identified and resolved resource access conflicts caused by multi-threaded competition in abnormal read paths. Introduced fine-grained locking and state validation to ensure thread-safe reading and system stability under high concurrency.Added a balanced load balancing policy to optimize resource distribution.
Introduced a smarter rebalancing policy to improve the uniformity of partition and replica distribution across brokers. This policy reduces load skew, improves overall resource utilization, and boosts service stability.Allowed only the actual leader node to perform remote reads.
Strengthened replica role validation to restrict remote read requests to the actual leader of the current partition. This change prevents data read anomalies caused by inconsistent role states and improves data consistency and cluster security.Fixed hostname resolution failures.
Optimized the hostname resolution logic to improve robustness in containerized or special network environments. This fix ensures correct node registration and avoids registration failures or communication issues caused by empty or invalid hostnames.Added read and write support for a specified ZooKeeper instance in synchronous mode to prevent data inconsistency.
Added explicit read and write support for a fixed ZooKeeper node. This feature forces synchronous access to the primary ZooKeeper instance for critical metadata operations. This change avoids brief data inconsistencies caused by cross-ZooKeeper reads and improves the reliability of configuration management.Optimized mapping compression and improved monitoring metric reporting.
Improved the traffic routing data compression logic in specific scenarios to reduce memory and network overhead. Fixed and enhanced the accuracy and timeliness of core monitoring metrics to improve traffic statistics and observability.
v5.1.1.2
Release date: 2025-10-10
Fixed leader epoch rollback.
Fixed an abnormal leader epoch rollback in specific failback scenarios. Strengthened monotonic leader epoch increments to avoid duplicate consumption or data loss caused by metadata inconsistency. This fix improves the reliability of the replica state machine.Fixed replica resource leaks.
Identified and resolved replica object leaks that occurred during broker offline events or partition migration. Optimized resource reclamation to prevent the continuous accumulation of memory and handles. This fix improves long-term system stability.
v5.1.1.1
Release date: 2025-08-10
Added separate read and default API queues, which are enabled by default.
Introduced a dedicated read request queue to isolate consumer read traffic from regular API requests. This change prevents mutual interference under high load and improves overall scheduling efficiency and service stability.Fixed configuration loss during dynamic scaling.
Optimized hot configuration updates to ensure correct state synchronization during parameter adjustments. This fix prevents service anomalies caused by lost configurations and improves runtime maintainability.Fixed inaccurate throttling metric collection.
Improved monitoring data collection in the throttling module. Corrected counting errors in multi-threaded environments to ensure thatThrottlemetrics accurately reflect the current traffic control status. This fix improves observability and operational decision-making.Fixed write failures during topic state changes.
Resolved write blocking caused by metadata validation failures during topic state transitions. This fix ensures continuity and write availability throughout the process.Optimized the ListOffsets mechanism to return special offsets in abnormal scenarios.
Enhanced the fault tolerance of the ListOffsets interface to return preset or cached offsets even when partitions are unavailable. This change improves client compatibility and availability during anomalies for clients such as Sarama.Optimized the log format and content.
Standardized log output, added key context, such as request type and duration, and removed redundant logs. These changes improve troubleshooting efficiency and system observability.Enhanced HA to prevent HA failure during ZooKeeper transient disconnections.
Improved HA failover fault tolerance by adding tolerance for brief ZooKeeper session interruptions. This change prevents master-slave failover failures caused by network jitter or temporary ZooKeeper unavailability and ensures fast cluster recovery.Optimized NameServer connection management.
Adjusted client-to-NameServer connection strategies to reduce the blocking effects on network threads during failures. This change improves system robustness when the NameServer fails or experiences delays.Merged fixes for four key Apache Kafka community issues related to transactions and idempotence.
Incorporated multiple core fixes from the Apache Kafka community, including the following:KAFKA-8448: Fixed LocalTopic memory leaks
KAFKA-9307: Fixed transaction exceptions caused by ZooKeeper session expiration
KAFKA-9839: Optimized the transaction coordinator state machine
KAFKA-8764: Fixed sequence number reset issues for idempotent producers
These fixes significantly improve the stability and compatibility of transactional and idempotent features.