When you use DataWorks to synchronize data from a PostgreSQL database to Hologres, you can refer to the operations in this topic to configure the network, whitelists, and permissions for data sources to implement data synchronization.
- Data sources are purchased. A source PostgreSQL data source and a destination Hologres data source are purchased.
- Plan and prepare resources: An exclusive resource group for Data Integration is purchased and configured. For more information, see Plan and configure resources.
- Evaluate and plan the network environment: Before you perform data integration, connect
data sources to exclusive resource groups for Data Integration based on your business
requirements. After data sources and exclusive resource groups for Data Integration
are connected, you can refer to the operations in this topic to configure access settings
such as vSwitches and whitelists.
- If data sources and exclusive resource groups for Data Integration reside in the same region and virtual private cloud (VPC), they are automatically connected.
- If data sources and exclusive resource groups for Data Integration reside in different network environments, you must connect data sources and resource groups by using methods such as a VPN gateway.
- Configure whitelists for the data sources
If the data sources and the exclusive resource group for Data Integration reside in the same VPC, you must add the CIDR block of the vSwitch that is bound to the exclusive resource group for Data Integration during network configuration to the whitelists of the data sources. This ensures that the exclusive resource group for Data Integration can be used to access the data sources.
- Create an account and grant permissions the account
You must create an account that can be used to access the data sources, read data from the source, and write data to the destination during the data synchronization process.
- Check whether the database version is the version supported by real-time synchronization
nodes of Data Integration in DataWorks.
The following PostgreSQL database versions are supported: PostgreSQL
12. You can execute the following statement to view the version of the PostgreSQL database:
- Data Integration supports
ADD COLUMNstatement cannot be combined with other DDL statements, such as
DROP COLUMN.Notice If you execute an
ADD COLUMNstatement together with an
ALTER COLUMNstatement, such as a
DROP COLUMN or RNAME COLUMNstatement on a data synchronization node, the node cannot normally work.
- DDL statements except
ADD COLUMNstatements cannot be identified.
ALTER TABLE and CREATE TABLEstatements are not supported.
- TEMPRORARY tables and UNLOGGED tables cannot be replicated. The PostgreSQL database does not provide a mechanism for log parsing subscriptions to these two types of tables.
- Sequences cannot be replicated (
serial, bigserial, and identity).
- TRUNCATE statements are not supported.
- Large objects cannot be replicated (BYTEA).
- Views, materialized views, and foreign tables cannot be replicated.
- Configure a whitelist for the PostgreSQL database. Add the CIDR block of the virtual private cloud (VPC) where the exclusive resource group resides to the whitelist of the PostgreSQL database.
- View and record the elastic IP address (EIP) and CIDR block of the exclusive resource group.
- Add the elastic IP address (EIP) and CIDR block of the exclusive resource group to the whitelist of the PostgreSQL database.
- Create an account and grant the required permissions to the account. You must create an account to log on to the PostgreSQL database for subsequent operations. The account must have the
REPLICATION and LOGINpermissions on the PostgreSQL database.Note
Real-time synchronization supports only the logical replication mechanism. Logical replication uses a publish and subscribe model in which one or more subscribers subscribe to one or more publications on the node of the publisher. The subscribers extract data from the publications to which they subscribe.
Logical replication of a table usually starts with creating a snapshot for the data in the database of the publisher and replicating the snapshot to the subscriber. After logical replication is complete, modifications on the publisher are sent to the subscriber in real time.
- Create an account. For more information, see Create a database and an account on an ApsaraDB RDS for PostgreSQL instance.
- Grant the required permissions to the account. Execute the following statement to check whether the account has the
REPLICATIONpermission on the PostgreSQL database:
select userepl from pg_user where usename='xxx'
The expected return result is True. If False is returned, the account does not have the REPLICATION permission on the PostgreSQL database. You can execute the following statement to grant the REPLICATION permission to the account:
ALTER USER <user> REPLICATION;
- Create an account.
- Execute the following statement to check whether real-time synchronization supports
Real-time synchronization supports only primary databases. The expected return result is False. If True is returned, it indicates that the data source is a secondary database. Real-time synchronization does not support secondary databases. You must modify the configuration of the data source to the primary database. For more information, see Configure a PostgreSQL connection.
- Check whether the value of the
wal_levelparameter specifies the
wal_loglevel. The expected return result is logical. If logical is not returned, the logical replication mechanism is not supported.
- Check whether the
wal_senderprocess can be started.
-- Query the value of the max_wal_senders parameter. show max_wal_senders; -- Query the number of pg_stat_replication. select count(*) from pg_stat_replication
max_wal_sendersparameter is not empty and the value of the
max_wal_sendersparameter is greater than the number of
wal_senderprocess can be used. The PostgreSQL database starts the
wal_senderprocess for the data synchronization program to send logs to subscribers.