All Products
Search
Document Center

DataHub:Product Overview

Last Updated:Nov 20, 2025

DataHub is an Alibaba Cloud platform for processing streaming data. It enables you to publish, subscribe to, and distribute data streams to build data analytics and applications.

Product overview

DataHub is a streaming data processing platform provided by Alibaba Cloud. You can use its core features of publishing, subscribing to, and distributing data streams to build analytics and applications for streaming data.

Key capabilities

  • Data collection: DataHub continuously collects, stores, and processes large volumes of streaming data from various sources, such as mobile devices, applications, web services, and sensors.

  • Real-time processing: Streaming data written to DataHub, such as web access logs and application events, can be processed by compute engines such as Flink or by custom applications. This processing generates real-time results, including real-time charts, alert messages, and statistics.

Overall architecture

  • DataHub is built on Apsara, Alibaba Cloud's proprietary distributed platform. It features high availability, low latency, high scalability, and high throughput.

  • It provides a unified REST API. Applications can interact with the API using software development kits (SDKs) for multiple languages.

  • DataHub also seamlessly connects with other cloud products and compute engines, such as MaxCompute and Flink. This lets you use SQL for stream data analytics.

  • DataHub can also distribute streaming data to various cloud products. It currently supports distribution to MaxCompute (formerly ODPS) and OSS.

image

Benefits

  • High throughput: Supports up to 160 million writes per day on a single Shard.

  • Timeliness: DataHub collects and processes data from various sources in real time, which lets you quickly respond to your business needs.

  • Ease of use

    • It provides SDKs for languages such as C++, Java, Python, and Go.

    • It provides RESTful API specifications that support custom access interfaces.

    • It provides common client plug-ins, such as Fluentd, Logstash, and Flume.

    • It supports both structured data with a strong schema (by creating a topic of the TUPLE type) and unstructured data (by creating a topic of the BLOB type).

  • High availability

    • Service availability is at least 99.9%.

    • Data durability is at least 99.999%.

    • It scales automatically without affecting external services.

    • DataHub supports automatic backup for redundancy.

  • Dynamic scaling

    The data stream throughput capacity for each Topic can be dynamically scaled up or down, reaching a maximum throughput of 256,000 Records/s per Topic.

  • High security

    • It provides enterprise-grade, multilayer security protection and multi-user resource isolation.

    • DataHub supports multiple authentication and authorization methods. For example, you can configure a whitelist or grant permissions using Resource Access Management (RAM).

Scenarios

As a streaming data processing service, DataHub can be combined with other Alibaba Cloud products to build a complete data processing solution.

Stream computing (StreamCompute)

Real-time Compute for Apache Flink is an Alibaba Cloud stream computing engine that provides an SQL-like language for stream processing. DataHub seamlessly integrates with Flink and can serve as both a data source and an output destination. For more information, see Real-time Computing (Stream Computing).

image

Stream processing applications

Custom applications can subscribe to data in DataHub, process it in real time, and output the results. The results from one application can be sent back to DataHub for another application to process. This lets you build a directed acyclic graph (DAG) for your data processing workflow.

Streaming data archiving

Streaming data can be archived to MaxCompute (formerly ODPS). You can create a DataHub connector and specify the required configurations to set up a sync task that periodically archives streaming data from DataHub.