how to build a primary/secondary PostgreSQL architecture - Elastic Compute Service

PostgreSQL is one of the most advanced open source database systems and supports NoSQL data types such as JSON, XML, and hstore. This topic describes how to build a primary/secondary PostgreSQL architecture on Elastic Compute Service (ECS) instances that run CentOS 7.

Prerequisites

An Alibaba Cloud account is created. To create an Alibaba Cloud account, go to the Sign up to Alibaba Cloud page.
An inbound rule that allows traffic on port 5432 is added to the security group of the ECS instances on which you want to build a primary/secondary PostgreSQL architecture. For more information, see Add a security group rule.

Background information

The procedure described in this topic is applicable to Alibaba Cloud users who are familiar with Alibaba Cloud ECS, Linux operating systems, and PostgreSQL databases.

In this topic, the following instance type and software versions are used. Operations may vary based on your instance type and software version.

Instance type: ecs.g6.large
Operating system: CentOS 7.2
PostgreSQL: 11

To install PostgreSQL and build a primary/secondary PostgreSQL architecture by using YUM, perform the following steps:

Step 1: Create two ECS instances

To build a primary/secondary PostgreSQL architecture, create two ECS instances in a virtual private cloud (VPC). One instance works as the primary node, and the other instance works as the secondary node. For information about how to create an instance, see Create an instance on the Custom Launch tab.

Note

We recommend that you do not automatically assign public IP addresses to the ECS instances but associate elastic IP addresses (EIPs) with the instances. This allows you to upgrade the configurations or optimize the architecture based on your business requirements. For more information, see Apply for an EIP.

Step 2: Configure the primary PostgreSQL node

Connect to the primary PostgreSQL node.
For more information, see Connection method overview.

Run the following commands in sequence to install PostgreSQL:

yum update -y

wget --no-check-certificate https://download.postgresql.org/pub/repos/yum/reporpms/EL-7-x86_64/pgdg-redhat-repo-latest.noarch.rpm

rpm -ivh pgdg-redhat-repo-latest.noarch.rpm

yum install postgresql11-server postgresql11-contrib -y

/usr/pgsql-11/bin/postgresql-11-setup initdb

Note

In this topic, PostgreSQL 11 is used. We recommend that you use the latest version of PostgreSQL.

Run the following commands in sequence to start PostgreSQL and configure PostgreSQL to start on system startup:

systemctl start postgresql-11.service # Start PostgreSQL.

systemctl enable postgresql-11.service # Configure PostgreSQL to start on system startup.

Create a database account named replica that you can use to replicate data between the primary and secondary nodes. Then, specify a password and configure logon and backup permissions for the account.
1. Run the following command to log on to PostgreSQL by using the postgres account:
```
su - postgres
```
2. When -bash-4.2$ appears, you are logged on to PostgreSQL. Then, run the following command to access the PostgreSQL interactive terminal:
```
psql
```
3. When postgres=# appears, you are accessing the PostgreSQL interactive terminal. Execute the following statement to set a password for the postgres account to enhance security:
```
ALTER USER postgres WITH PASSWORD 'YourPassWord';
```
4. Execute the following SQL statement to create a database account named replica, specify a password, and configure the logon and backup permissions for the account.
  In this example, the password is set to replica.
```
CREATE ROLE replica login replication encrypted password 'replica';
```
5. Execute the following statement to check whether the database account is created:
```
SELECT usename from pg_user;
```
  The following results indicate that the account named replica is created:
```
usename  
----------
postgres
replica
(2 rows)
```
6. Execute the following statement to check whether the permissions are configured:
```
SELECT rolname from pg_roles;
```
  The following results indicate that the permissions are configured:
```
 rolname 
---------------------------
 pg_monitor
 pg_read_all_settings
 pg_read_all_stats
 pg_stat_scan_tables
 pg_read_server_files
 pg_write_server_files
 pg_execute_server_program
 pg_signal_backend
 postgres
 replica
(10 rows)
```
7. Run the following command and press the Enter key to exit the PostgreSQL interactive terminal:
```
\q
```
8. Run the following command and press the Enter key to exit PostgreSQL:
```
exit
```

Run the following command to open the pg_hba.conf file and then configure a whitelist for the replica account:

vim /var/lib/pgsql/11/data/pg_hba.conf

Add the following lines to the IPv4 local connections section:

host    all             all             <VPC IPv4 CIDR block of the secondary node>          md5     # Allow connections from the VPC IPv4 CIDR block of the secondary node by using MD5 password authentication.
host    replication     replica         <VPC IPv4 CIDR block of the secondary node>          md5     # Allow users in the VPC IPv4 CIDR block of the secondary node to perform data synchronization.

After the lines are added, press the Esc key, enter :wq, and then press the Enter key to save the file and exit.

Run the following command to open the postgresql.conf file:

vim /var/lib/pgsql/11/data/postgresql.conf

Find and modify the following parameters:

listen_addresses = '*'   # Specify the IP addresses on which the server listens for connections from client applications.
wal_level = hot_standby  # Enable the hot standby mode.
synchronous_commit = on  # Enable synchronization.
max_wal_senders = 32     # Specify the maximum number of synchronization processes.
wal_sender_timeout = 60s # Specify the maximum amount of time to wait for write-ahead logging (WAL) replication.
max_connections = 100    # Specify the maximum number of connections. The value of max_connections for the secondary node must be greater than that for the primary node.

After the parameters are modified, press the Esc key, enter :wq, and then press the Enter key to save the file and exit.

Run the following command to restart PostgreSQL:
```
systemctl restart postgresql-11.service
```

Step 3: Configure the secondary PostgreSQL node

Connect to the secondary PostgreSQL node.
For more information, see Connection method overview.

Run the following commands in sequence to install PostgreSQL:

yum update -y

wget --no-check-certificate https://download.postgresql.org/pub/repos/yum/reporpms/EL-7-x86_64/pgdg-redhat-repo-latest.noarch.rpm

rpm -ivh pgdg-redhat-repo-latest.noarch.rpm

yum install postgresql11-server postgresql11-contrib -y

Run the following command to create a backup directory by using the pg_basebackup utility:

pg_basebackup -D /var/lib/pgsql/11/data -h <IP address of the primary node> -p 5432 -U replica -X stream -P

In this example, the Password parameter is set to replica.

Password: 
30075/30075 kB (100%), 1/1 tablespace

Run the following commands in sequence to create and modify the recovery.conf file:

cp /usr/pgsql-11/share/recovery.conf.sample /var/lib/pgsql/11/data/recovery.conf

vim /var/lib/pgsql/11/data/recovery.conf

Find and modify the following parameters:

standby_mode = on     # Declare the node to be the secondary database.
primary_conninfo = 'host=<IP address of the primary node> port=5432 user=replica password=replica' # Specify the connection information of the primary node.
recovery_target_timeline = 'latest' # Synchronize the latest data by using streaming replication.

After the parameters are modified, press the Esc key, enter :wq, and then press the Enter key to save the file and exit.

Run the following command to open the postgresql.conf file:

vim /var/lib/pgsql/11/data/postgresql.conf

Find and modify the following parameters:

max_connections = 1000             # Specify the maximum number of connections. The value for the secondary node must be greater than that for the primary node.
hot_standby = on                   # Enable the hot standby mode.
max_standby_streaming_delay = 30s  # Specify the maximum delay for streaming replication.
wal_receiver_status_interval = 1s  # Specify the maximum interval at which the secondary node reports its running status to the primary node.
hot_standby_feedback = on          # Enable the secondary node to send feedback to the primary node if errors occur during replication.

After the parameters are modified, press the Esc key, enter :wq, and then press the Enter key to save the file and exit.

Run the following command to modify the group and owner of the data directory:
```
chown -R postgres.postgres /var/lib/pgsql/11/data
```

Run the following commands in sequence to start PostgreSQL and configure PostgreSQL to start on system startup:

systemctl start postgresql-11.service # Start PostgreSQL.

systemctl enable postgresql-11.service # Configure PostgreSQL to start on system startup.

Step 4: Test the primary/secondary PostgreSQL architecture

To test the primary/secondary PostgreSQL architecture, make sure that data are exchanged between the primary and secondary nodes. For example, the following content shows the expected test result when you create a backup directory for the secondary node:

# pg_basebackup -D /var/lib/pgsql/9.6/data -h <IP address of the primary node> -p 5432 -U replica -X stream -P

Run the following command to check the sender process on the primary node:

ps aux |grep sender

The following command output indicates that the sender process is available:

postgres  2916  0.0  0.3 340388  3220 ?        Ss   15:38   0:00 postgres: walsender  replica 192.168.**.**(49640) streaming 0/F01C1A8

Run the following command to check the receiver process on the secondary node:
```
ps aux |grep receiver
```
The following command output indicates that the receiver process is available:
```
postgres 23284  0.0  0.3 387100  3444 ?        Ss   16:04   0:00 postgres: walreceiver   streaming 0/F01C1A8
```

On the primary node, access the PostgreSQL interactive terminal and execute the following SQL statement to check the status of the secondary node.

Run the following command to log on to PostgreSQL by using the postgres account:
```
su - postgres
```
When -bash-4.2$ appears, you are logged on to PostgreSQL. Then, run the following command to access the PostgreSQL interactive terminal:
```
psql
```

Run the following command to check the status of the secondary node:

select * from pg_stat_replication;

The following results indicate that the status of the secondary node is displayed:

pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | backend_xmin | state | sent_location | write_locati
on | flush_location | replay_location | sync_priority | sync_state 
------+----------+---------+------------------+---------------+-----------------+------------- +-------------------------------+--------------+-----------+---------------+-------------
---+----------------+-----------------+---------------+------------
2916 | 16393 | replica | walreceiver | 192.168.**.** | | 49640 | 2017-05-02 15:38:06.188988+08 | 1836 | streaming | 0/F01C0C8 | 0/F01C0C8 
| 0/F01C0C8 | 0/F01C0C8 | 0 | async
(1 rows)

Run the following command and press the Enter key to exit the PostgreSQL interactive terminal:
```
\q
```
Run the following command and press the Enter key to exit PostgreSQL:
```
exit
```