This topic describes how to synchronize data from ApsaraDB RDS for PostgreSQL to AnalyticDB for PostgreSQL by using the one-click data warehousing feature.
Overview
AnalyticDB for PostgreSQL provides the one-click data warehousing feature that can be used to synchronize data from ApsaraDB RDS for PostgreSQL to AnalyticDB for PostgreSQL. Specifically, you can create a publication on ApsaraDB RDS for PostgreSQL, and then subscribe to the publication by creating a one-click data warehousing task on AnalyticDB for PostgreSQL.
Precautions
This feature is supported only for AnalyticDB for PostgreSQL instances of V6.3.8.0 or later in elastic storage mode.
This feature is supported only for ApsaraDB RDS for PostgreSQL instances that run PostgreSQL 10.0 or later.
The ApsaraDB RDS for PostgreSQL and AnalyticDB for PostgreSQL instances must reside within the same virtual private cloud (VPC).
The synchronization objects must be tables.
Only the INSERT, UPDATE, DELETE, and TRUNCATE operations can be synchronized. Schema changes cannot be synchronized.
The destination table must have the same schema as the source table.
The source table on ApsaraDB RDS for PostgreSQL must contain a primary key.
Only the following data types are supported for synchronization:
Numeric, character, BOOLEAN, time and date, enumeration, GEO, INTERNET, RANGE, JSON, BYTEA, BIT, and an array of the preceding types.
NoteIf you synchronize data of other types, the one-click data warehousing task may be interrupted.
If data of the source table on ApsaraDB RDS for PostgreSQL is updated by using a statement like
UPDATE/DELETE WHERE CTID = ******
, the update is not synchronized to the destination table on AnalyticDB for PostgreSQL.
Prerequisites
AnalyticDB for PostgreSQL
An AnalyticDB for PostgreSQL instance is created. For more information, see Create an instance.
A privileged account is created. For more information, see Create a database account.
ApsaraDB RDS for PostgreSQL
An ApsaraDB RDS for PostgreSQL instance is created. For more information, see Create an ApsaraDB RDS for PostgreSQL instance.
A privileged account is created. For more information, see Create an account.
The configuration parameter wal_level is set to logical. For more information about how to modify configuration parameters, see Modify the parameters of an ApsaraDB RDS for PostgreSQL instance.
NoteThe modification to this configuration parameter takes effect only after you restart the instance. We recommend that you restart the instance during off-peak hours.
The internal endpoint of the AnalyticDB for PostgreSQL instance is added to a whitelist of the ApsaraDB RDS for PostgreSQL instance.
SQL syntax
This section describes the SQL syntax used to create, modify, and delete a one-click data warehousing task.
Create a one-click data warehousing task
Syntax
CREATE SUBSCRIPTION subscription_name
CONNECTION 'conninfo'
PUBLICATION publication_name [, ...]
[ WITH ( subscription_parameter [= value] [, ... ] ) ]
The following table describes the parameters.
Parameter | Description |
subscription_name | The name of the one-click data warehousing task. |
conninfo | The connection information of the ApsaraDB RDS for PostgreSQL instance. It consists of the endpoint, port number, database name, account, and password.
Example: |
publication_name | The name of the publication created on the ApsaraDB RDS for PostgreSQL instance. |
subscription_parameter | The optional parameter of the one-click data warehousing task. Only copy_data (boolean) is supported. copy_data (boolean): specifies whether to synchronize existing data from the source table to the destination table when the one-click data warehousing task is created. Valid values:
|
Modify a one-click data warehousing task
Refresh the synchronization objects of a one-click data warehousing task
When a new source table is added to the publication on the ApsaraDB RDS for PostgreSQL instance, you must perform a REFRESH operation to allow the destination table to subscribe to data of the new source table.
Syntax
ALTER SUBSCRIPTION subscription_name REFRESH PUBLICATION [ WITH ( refresh_option [= value] [, ... ] ) ]
The following table describes the parameters.
Parameter
Description
subscription_name
The name of the one-click data warehousing task.
refresh_option
The optional parameter of the one-click data warehousing task. Only copy_data (boolean) is supported.
copy_data (boolean): specifies whether to synchronize existing data from the source table to the destination table when the one-click data warehousing task is created. Valid values:
true: synchronizes existing data from the source table to the destination table.
false: does not synchronize existing data from the source table to the destination table.
Start a one-click data warehousing task
Syntax
ALTER SUBSCRIPTION subscription_name ENABLE
The following table describes the parameters.
Parameter
Description
subscription_name
The name of the one-click data warehousing task.
Stop an ongoing one-click data warehousing task
Syntax
ALTER SUBSCRIPTION subscription_name DISABLE
The following table describes the parameters.
Parameter
Description
subscription_name
The name of the one-click data warehousing task.
Modify the owner of a one-click data warehousing task
Syntax
ALTER SUBSCRIPTION subscription_name OWNER TO { new_owner | CURRENT_USER | SESSION_USER }
The following table describes the parameters.
Parameter
Description
subscription_name
The name of the one-click data warehousing task.
new_owner
The new owner of the one-click data warehousing task.
Modify the name of a one-click data warehousing task
Syntax
ALTER SUBSCRIPTION subscription_name RENAME TO new_name
The following table describes the parameters.
Parameter
Description
subscription_name
The name of the one-click data warehousing task.
new_name
The new name of the one-click data warehousing task.
Delete a one-click data warehousing task
Syntax
DROP SUBSCRIPTION [ IF EXISTS ] subscription_name
The following table describes the parameters.
Parameter | Description |
subscription_name | The name of the one-click data warehousing task to be deleted. |
For more information about the syntax, see PostgreSQL official documentation.
Start data synchronization
The following example demonstrates how to synchronize a source table named logical_tb1 from the ApsaraDB RDS for PostgreSQL instance to the AnalyticDB for PostgreSQL instance:
Create a source table named logical_tb1 on the ApsaraDB RDS for PostgreSQL instance.
CREATE TABLE logical_tb1(a int primary key, b varchar(20));
Create a publication named pub1 on the ApsaraDB RDS for PostgreSQL instance and add the logical_tb1 table to the pub1 publication.
CREATE PUBLICATION pub1 FOR TABLE logical_tb1;
Create a destination table on the AnalyticDB for PostgreSQL instance that has the same schema as the source table.
CREATE TABLE logical_tb1(a int primary key, b varchar(20));
Create a one-click data warehousing task on the AnalyticDB for PostgreSQL instance.
Execute the following statement if you want to synchronize existing data and new data from the source table:
CREATE SUBSCRIPTION sub1 CONNECTION 'host=****** port=5432 user=*** dbname=*** password=***' PUBLICATION pub1;
Execute the following statement if you want to synchronize only new data from the source table:
CREATE SUBSCRIPTION sub1 CONNECTION 'host=****** port=5432 user=*** dbname=*** password=***' PUBLICATION pub1 WITH(copy_data = false);
Modify the table to be synchronized
Assume that a one-click data warehousing task named sub1 has been created on the AnalyticDB for PostgreSQL instance to synchronize the logical_tb1 table contained in the pub1 publication on the ApsaraDB RDS for PostgreSQL instance. The following example demonstrates how to synchronize another table named logical_tb2 to the AnalyticDB for PostgreSQL instance:
Create a source table named logical_tb2 on the ApsaraDB RDS for PostgreSQL instance.
CREATE TABLE logical_tb2(id int primary key, name varchar(20), age int);
Add the logical_tb2 table to the pub1 publication on the ApsaraDB RDS for PostgreSQL instance.
ALTER PUBLICATION pub1 ADD TABLE logical_tb2;
Create a destination table on the AnalyticDB for PostgreSQL instance that has the same schema as the source table.
CREATE TABLE logical_tb2(id int primary key, name varchar(20), age int);
Refresh the publication information to which you have subscribed on the AnalyticDB for PostgreSQL instance. The logical_tb2 table is synchronized to the AnalyticDB for PostgreSQL instance.
ALTER SUBSCRIPTION sub1 REFRESH PUBLICATION;
Stop data synchronization
Assume that a publication named pub1 has been created on the ApsaraDB RDS for PostgreSQL instance and the pub1 publication contains the logical_tb1 and logical_tb2 tables. The following example demonstrates how to continue synchronizing the logical_tb1 table and stop synchronizing the logical_tb2 table:
Remove the logical_tb2 table from the pub1 publication on the ApsaraDB RDS for PostgreSQL instance.
ALTER PUBLICATION pub1 DROP TABLE logical_tb2;
Refresh the publication information to which you have subscribed on the AnalyticDB for PostgreSQL instance. The logical_tb1 table is synchronized to the AnalyticDB for PostgreSQL instance but the logical_tb1 table is not.
ALTER SUBSCRIPTION sub1 REFRESH PUBLICATION;
Query the state of a one-click data warehousing task
Query whether a one-click data warehousing task is completed
You can execute the following statement on the ApsaraDB RDS for PostgreSQL instance to query whether a one-click data warehousing task is completed:
SELECT confirmed_flush_lsn = pg_current_wal_lsn() FROM pg_replication_slots WHERE slot_name = '<subscription_name>';
subscription_name specifies the name of the one-click data warehousing task that you want to query.
Description of the returned result:
t: The one-click data warehousing task is completed.
f: The one-click data warehousing task is in progress.
null: The one-click data warehousing task does not exist or is abnormal.
Query the state of a one-click data warehousing task
You can execute the following statement on the ApsaraDB RDS for PostgreSQL instance to query the state of a one-click data warehousing task:
SELECT active FROM pg_replication_slots WHERE slot_name='<subscription_name>';
subscription_name specifies the name of the one-click data warehousing task that you want to query.
Description of the returned result:
t: The one-click data warehousing task is normal.
f: The one-click data warehousing task is abnormal.
null: The one-click data warehousing task does not exist.
Query the number of one-click data warehousing tasks on an AnalyticDB for PostgreSQL instance
You can execute the following statement on the AnalyticDB for PostgreSQL instance to query the number of one-click data warehousing tasks on the instance:
SELECT count(*) FROM pg_subscription WHERE subenabled = true;