All Products
Search
Document Center

ApsaraMQ for Kafka:Use scenarios

Last Updated:Mar 11, 2026

ApsaraMQ for Kafka serves as the streaming data backbone for real-time analytics, log pipelines, stream extract, transform, and load (ETL), and cross-system data distribution.

When to use ApsaraMQ for Kafka

ScenarioProblem it solves
Website activity trackingCapture user actions (sign-ins, purchases, payments) in real time and route them to analytics or data warehouse systems
Log aggregationCollect logs from distributed hosts and applications into a single pipeline for real-time and offline analysis
Stream ETLProcess high-volume data in flight instead of landing it in a database first, then analyze or redistribute the results
Data routing hubFan out a single dataset to multiple downstream systems (search, OLAP, time-series databases) through one pipeline

Website activity tracking

ApsaraMQ for Kafka provides a publish-subscribe pipeline that captures user behavior data as it happens and routes it to analytics systems.

Data flow

  1. Publish user action events (sign-in, logon, top-up, payment, purchase) to different topics based on business data types.

  2. Subscribers consume the message streams in real time for live dashboards and monitoring.

  3. Load the same streams into offline data warehouse systems such as Hadoop and MaxCompute for batch analysis.

Website activity tracking architecture

Advantages for this scenario

  • High throughput -- Handles the volume of user action data that production websites generate at peak traffic.

  • Auto scaling -- During promotions or flash sales, brokers can be scaled out based on your requirements to absorb traffic spikes.

  • Big data integration -- Connects to real-time engines (Storm, Spark) and offline data warehouse systems (Hadoop, MaxCompute).

Log aggregation

Many platforms, such as Taobao and Tmall, generate large volumes of logs every day. In most cases, these logs are streaming data, such as page views and queries. Compared to log-centric tools like Scribe and Flume, ApsaraMQ for Kafka delivers higher performance, stronger data persistence, and lower end-to-end latency, making it a natural fit for centralized log collection.

Data flow

  1. Abstract logs from multiple hosts and applications into unified message streams. In ApsaraMQ for Kafka, file details are ignored.

  2. Send those streams asynchronously to an ApsaraMQ for Kafka cluster. Clients can batch and compress messages to keep producer overhead low.

  3. Consumers read the streams and route them to offline warehouse systems (Hadoop, MaxCompute) for batch analytics or to real-time engines (Storm, Spark) for live analysis.

Log aggregation architecture

Advantages for this scenario

  • System decoupling -- Acts as a buffer between application systems and analytics systems, decoupling the two types of systems.

  • Horizontal scalability -- Add nodes to rapidly scale out as log volume grows.

  • Online and offline analysis -- Feeds both real-time engines and offline systems like Hadoop from the same pipeline.

Stream ETL

In domains like stock market analysis, weather monitoring, and user behavior analytics, data arrives continuously and in high volume. Storing everything in a database before processing is impractical, and traditional ETL architectures cannot keep up. ApsaraMQ for Kafka, paired with stream processing engines (Storm, Samza, Spark), processes data in flight instead.

Data flow

  1. Ingest raw data into ApsaraMQ for Kafka topics as it is generated.

  2. A stream processing engine reads from the topics, transforms the data in real time, and runs analytics based on your business logic.

  3. Save processed results to downstream storage or distribute them to other components.

Stream ETL architecture

Advantages for this scenario

  • In-flight processing -- Captures and transforms data during transit rather than after storage.

  • High scalability -- Scales horizontally to handle real-time data volumes that traditional ETL cannot.

  • Broad engine support -- Integrates with open-source engines (Storm, Samza, Spark) and Alibaba Cloud services (E-MapReduce, Blink, Realtime Compute).

Data routing hub

Over the past 10 years, modern architectures have come to rely on purpose-built systems for specific workloads: HBase for key-value storage, Elasticsearch for search, Storm/Spark/Samza for stream processing, and OpenTSDB for time-series data. Each system is designed for a single goal, and the simplicity of these systems makes it easier and more cost-efficient to build distributed systems on commercial hardware. However, the same dataset often needs to reach multiple systems. For example, application logs may be used for offline analysis while also requiring the ability to search for a single log. Building a separate ingestion pipeline for each system is impractical.

ApsaraMQ for Kafka acts as a centralized data routing hub: ingest data once and fan it out to every downstream system that needs it.

Data routing hub architecture

Advantages for this scenario

  • High-capacity storage -- Stores large volumes of data on commercial hardware and supports horizontally scalable distributed architectures.

  • One-to-many consumption -- The publish-subscribe model lets multiple consumers independently read the same dataset without interference.

  • Real-time and batch processing -- Supports local data persistence and page cache, delivering messages for both real-time and batch workloads simultaneously with no performance penalty.