You can use federated analytics together with the AnalyticDB Pipeline Service (APS) feature of AnalyticDB for MySQL to synchronize data from PolarDB for MySQL to AnalyticDB for MySQL Data Lakehouse Edition (V3.0) in real time. This facilitates data synchronization and management. This topic describes how to use federated analytics to synchronize data from a PolarDB for MySQL cluster to an AnalyticDB for MySQL Data Lakehouse Edition (V3.0) cluster.
You can join the DingTalk group 33600023146 to learn more about the federated analytics feature.
Prerequisites
A PolarDB for MySQL cluster and an AnalyticDB for MySQL Data Lakehouse Edition (V3.0) cluster are created in the same region. For more information, see Purchase a pay-as-you-go cluster and Create a Data Lakehouse Edition cluster.
Binary logging is enabled for the PolarDB for MySQL cluster. For more information, see Enable binary logging.
Limits
PolarDB for MySQL supports federated analytics only for AnalyticDB for MySQL Data Lakehouse Edition (V3.0) clusters.
Federated analytics is supported only in the following regions: China (Beijing), China (Hangzhou), China (Shanghai), China (Shenzhen), China (Hong Kong), Japan (Tokyo), Singapore, Malaysia (Kuala Lumpur), Indonesia (Jakarta), US (Silicon Valley), US (Virginia), Germany (Frankfurt), and UK (London).
You can create up to three synchronization jobs for each PolarDB for MySQL cluster and up to 30 synchronization jobs in each region.
Create a synchronization job
Log on to the PolarDB console.
In the upper-left corner of the console, select a region.
In the left-side navigation pane, click Federated Analytics.
Click Create Job. In the Create Job panel, configure the parameters that are described in the following table.
Parameter
Description
Job Name
The name of the job. Default value: data-sync-<Time>.
PolarDB for MySQL Cluster
The ID of the source PolarDB for MySQL cluster.
Database Account Name
The database account that is automatically created by federated analytics for the PolarDB for MySQL cluster to synchronize data. The name of the database account starts with sync. Do not delete or modify the name.
AnalyticDB for MySQL Data Lakehouse Edition Cluster
The ID of the destination AnalyticDB for MySQL Data Lakehouse Edition (V3.0) cluster.
You can select an existing AnalyticDB for MySQL Data Lakehouse Edition (V3.0) cluster or click Click to create an AnalyticDB for MySQL cluster to create an AnalyticDB for MySQL Data Lakehouse Edition (V3.0) cluster.
Advanced Settings
By default, advanced settings are disabled. In this case, the entire source cluster is synchronized.
After you enable advanced settings, you can configure the Select the database or table to synchronize and Large Table Partition Key Settings parameters.
Select the database or table to synchronize
You can select the databases and tables that you want to synchronize. By default, all databases and tables are synchronized.
ImportantYou cannot synchronize tables that do not have primary keys. These tables are automatically filtered out.
Each AnalyticDB for MySQL cluster can contain up to 2,048 databases. For more information, see Limits.
Large Table Partition Key Settings
To improve the data write and query performance, we recommend that you specify partition keys for tables. For more information, see Schema design.
The following partition formats are supported:
value: partitioned by value.
yyyyMMdd: partitioned by year, month, and day.
yyyyMM: partitioned by year and month.
yyyy: partitioned by year.
Click OK. The job automatically starts.
The created job is displayed on the Federated Analytics page. You can click View, Edit, Delete, Suspend or Start in the Actions column.
ImportantDeleted jobs cannot be recovered.
To analyze data, click the destination cluster ID to go to the AnalyticDB for MySQL console. For more information, see SQL editor.