All Products
Search
Document Center

DataHub:Overview

Last Updated:Aug 02, 2021

Data synchronization of DataHub

DataHub allows you to synchronize the data source and data sink. You can synchronize topic data to other Alibaba Cloud services in real or near real time. This facilitates the data flow among Alibaba Cloud services. The following Alibaba Cloud services are supported: MaxCompute, AnalyticDB for MySQL, ApsaraDB RDS, Tablestore (OTS), Object Storage Service (OSS), Elasticsearch, Function Compute, and Hologres. After you write data to DataHub, you can configure data synchronization to synchronize the data to these services for later use. This implements complete closed-loop data transmission.

Supported Alibaba Cloud services

Note:

  • If you create a DataConnector to synchronize data from DataHub to MaxCompute, incremental data is synchronized every 5 minutes or when it accumulates to 64 MB.
  • If you create a DataConnector to synchronize data from DataHub to OTS, OSS, Elasticsearch, Function Compute, or Hologres, incremental data is synchronized every 20 seconds or when it accumulates to 4 MB.
  • If you create a DataConnector to synchronize data from DataHub to ApsaraDB RDS for MySQL, ApsaraDB RDS, or AnalyticDB for MySQL V3.0, incremental data is synchronized every 20 seconds or when it accumulates to 512 KB.
Destination Supported topic type Timeliness VPC
MaxCompute TUPLE and BLOB Near real time with a latency of 5 minutes Not supported
AnalyticDB for MySQL TUPLE Real time Not supported
ApsaraDB RDS TUPLE Real time Supported
OTS TUPLE Real time Not supported
OSS TUPLE and BLOB Real time Not supported
Elasticsearch TUPLE Real time Supported
Function Compute TUPLE and BLOB Real time Not supported
Hologres TUPLE Real time Not supported

Usage notes

  1. We recommend that you synchronize data between Alibaba Cloud services in the same region. This avoids the impacts of network disconnection and reduces network latency.
  2. Due to the network isolation of Alibaba Cloud, you must specify the internal endpoint of an Alibaba Cloud service when you configure data synchronization for the Alibaba Cloud service. Public endpoints are not supported.
  3. DataHub supports only the at-least-once semantics for data synchronization. This semantics may result in a small number of duplicate records in the destination service in a few scenarios, such as a scenario in which a network error occurs. In this case, you must remove duplicate records.
  4. When you create a DataConnector, DataHub automatically associates a subscription ID with the DataConnector to record the point at which DataHub starts to read data. The recorded point is not the point at which synchronization starts. Do not change the subscription ID.
  5. Only the owner and creator of a topic can create DataConnectors in the topic.