By Hitesh Jethva, Alibaba Cloud Tech Share Author. Tech Share is Alibaba Cloud’s incentive program to encourage the sharing of technical knowledge and best practices within the cloud community.
High availability is a term that describes a websites or applications that are durable and likely to operate continuously without failure for a long time. High availability provides a number of failsafe, and aims for a 99% uptime. Highly available systems are made from several components, they can be scaled horizontally when needed, thus improving their ability to serve content.
Pacemaker is an advanced, scalable High-Availability cluster resource manager that provides maximum availability of the cluster resources by doing failover of resources between the cluster nodes. Pacemaker uses corosync for heartbeat and internal communication among cluster components. Pacemaker manages all cluster resources and achieves maximum availability by detecting and recovering from node- and resource-level failures by making use of the messaging and membership capabilities provided by Corosync.
In this tutorial, we will explain the installation and configuration of a two-node Nginx web server cluster using Pacemaker on an Alibaba Cloud Elastic Compute Service (ECS) Ubuntu 16.04 server.
First, login to your https://ecs.console.aliyun.com/?spm=a3c0i.o25424en.a3.13.388d499ep38szx">Alibaba Cloud ECS Console . Create a new ECS instance , choosing Ubuntu 16.04 as the operating system with at least 2GB RAM. Connect to your ECS instance and log in as the root user.
Once you are logged into your Ubuntu 16.04 instance, run the following command to update your base system with the latest available packages.
apt-get update -y
Before starting, you will need to configure hosts file on each server, so each server can communicate to the other servers with the hostname of the server.
You can do this by editing /etc/hosts file on both servers.
nano /etc/hosts
Add the following lines (replace the variable Node1_IP_Address and Node2_IP_Address with the actual IP address of your ECS instances):
Node1_IP_Address node1
Node2_IP_Address node2
Save and close the file, when you are finished.
Next, test hostname resolution by pinging the other server using hostname:
ping node1
ping node2
Before setting up the High Availability web server, you will need to install and configure Nginx on each of the nodes. You can install Nginx by running the following command:
apt-get install nginx -y
Once Nginx is installed, start Nginx service and enable it to start on boot time by running the following command on each of the nodes:
systemctl start nginx
systemctl enable nginx
Next, create default index.html page of Nginx on each node:
On Node1, open the index.html page:
nano /var/www/html/index.html
Remove all the lines and add the following lines:
<h1>
Nginx Cluster ::: Node1
</h1>
Save and close the file when you are finished.
On Node2, open the index.html page:
nano /var/www/html/index.html
Remove all the lines and add the following lines:
<h1>
Nginx Cluster ::: Node2
</h1>
Save and close the file when you are finished.
Now, stop the Nginx service on each node:
systemctl stop nginx
Next, you will need to install Pacemaker, Corosync, and Crmsh on each node. By default, all the packages are available in Ubuntu 16.04 default repository. So you can install all of them with the following command:
apt-get install pacemaker corosync crmsh -y
Once the installation is completed, stop Pacemaker and Corosync services with the following command:
systemctl stop corosync
systemctl stop pacemaker
Next, you will need to configure Corosync on Node1 and generate the Corosync key for the cluster authentication.
Before starting, you will need to install haveged to generate random numbers for the Corosync key. You can install it with the following command:
apt-get install haveged -y
Next, generate Corosync key by running the following command:
corosync-keygen
You should see the following output:
Corosync Cluster Engine Authentication key generator.
Gathering 1024 bits for key from /dev/random.
Press keys on your keyboard to generate entropy.
Press keys on your keyboard to generate entropy (bits = 920).
Writing corosync key to /etc/corosync/authkey.
You can also see the generated key using the following command:
ls -l /etc/corosync/
Output:
-r-------- 1 root root 128 Feb 28 20:39 authkey
-rw-r--r-- 1 root root 3929 Oct 21 2015 corosync.conf
Next, change the directory to /etc/corosync and remove default configuration file:
cd /etc/corosync/
rm -rf corosync.conf
Next, create new corosync.conf file as shown below:
nano corosync.conf
Add the following lines (replace the variable Node1_IP_Address and Node2_IP_Address with the actual IP addresses of your ECS instances):
totem {
version: 2
cluster_name: lbcluster
transport: udpu
interface {
ringnumber: 0
bindnetaddr: Node1_IP_Address
broadcast: yes
mcastport: 5405
}
}
quorum {
provider: corosync_votequorum
two_node: 1
}
nodelist {
node {
ring0_addr: Node1_IP_Address
name: primary
nodeid: 1
}
node {
ring0_addr: Node2_IP_Address
name: secondary
nodeid: 2
}
}
logging {
to_logfile: yes
logfile: /var/log/corosync/corosync.log
to_syslog: yes
timestamp: on
}
service {
name: pacemaker
ver: 1
}
Save and close the file when you are finished.
Next, copy the corosync authentication key and the configuration file from Node1 to Node2 with the following command:
scp /etc/corosync/* root@Node2_IP_Address:/etc/corosync/
Now, start pacemaker and corosync service on each of the nodes and enable them to start on boot time with the following command:
systemctl start corosync
systemctl enable corosync
systemctl start pacemaker
systemctl enable pacemaker
Once both services have been started, check the status of the service on both nodes with the following command:
crm status
If everything is fine, you should see the following output:
Last updated: Wed Feb 28 21:13:27 2018 Last change: Wed Feb 28 21:12:44 2018 by hacluster via crmd on primary
Stack: corosync
Current DC: primary (version 1.1.14-70404b0) - partition with quorum
2 nodes and 0 resources configured
Online: [ primary secondary ]
Full list of resources:
You can also check the Corosync members with the following command:
corosync-cmapctl | grep members
You should see the IP address of both nodes in the following output:
runtime.totem.pg.mrp.srp.members.1.config_version (u64) = 0
runtime.totem.pg.mrp.srp.members.1.ip (str) = r(0) ip(192.168.0.102)
runtime.totem.pg.mrp.srp.members.1.join_count (u32) = 1
runtime.totem.pg.mrp.srp.members.1.status (str) = joined
runtime.totem.pg.mrp.srp.members.2.config_version (u64) = 0
runtime.totem.pg.mrp.srp.members.2.ip (str) = r(0) ip(192.168.0.103)
runtime.totem.pg.mrp.srp.members.2.join_count (u32) = 1
runtime.totem.pg.mrp.srp.members.2.status (str) = joined
Now, we are ready to create and configure Pacemaker. Here, we will run all Pacemaker commands on Primary Node (Node1), as it automatically synchronizes all cluster-related changes across all member nodes.
Next, you will also need to disable STONITH mode. STONITH is a mode that can be used to remove faulty nodes. Here, we are setting up a two node cluster, so we don't need STONITH mode.
You can disable it with the following command:
crm configure property stonith-enabled=false
crm configure property no-quorum-policy=ignore
Now, verify your STONITH status and the quorum policy with the following command:
crm configure show
You should see the following output:
node 1: primary
node 2: secondary
property cib-bootstrap-options: \
have-watchdog=false \
dc-version=1.1.14-70404b0 \
cluster-infrastructure=corosync \
cluster-name=debian \
stonith-enabled=false \
no-quorum-policy=ignore
Pacemaker is now running and configured. Next, you will need to create some new resources for the cluster, Virtual IP for the floating IP and webserver for Nginx service.
You can create a new Virtual IP resource for floating IP using the crm command as shown below (replace the variable Floating_IP_Address with the actual IP address):
crm configure primitive virtual_ip ocf:heartbeat:IPaddr2 params ip="Floating_IP_Address" cidr_netmask="32" op monitor interval="10s" meta migration-threshold="10"
Next, create a webserver resource using the following command:
crm configure primitive webserver ocf:heartbeat:nginx configfile=/etc/nginx/nginx.conf op start timeout="40s" interval="0" op stop timeout="60s" interval="0" op monitor interval="10s" timeout="60s" meta migration-threshold="10"
Next, check the status of the new resource with the following command:
crm resource status
You should see the following output:
virtual_ip (ocf::heartbeat:IPaddr2): Started
webserver (ocf::heartbeat:nginx): Started
Next, you will also need to add a group for the new configuration of the Failover IP service. Now, add the virtual_ip and webserver resources to a new group named hakase_balancing by running the following command:
crm configure group hakase_balancing virtual_ip webserver
Next, check the status of the new resource with the following command:
crm resource show
You should see the following output:
Resource Group: hakase_balancing
virtual_ip (ocf::heartbeat:IPaddr2): Started
webserver (ocf::heartbeat:nginx): Started
The cluster configuration is now completed, it's time to check the status of node and cluster.
You can do this with the following command:
crm status
You should see the following output:
Last updated: Wed Feb 28 21:35:21 2018 Last change: Wed Feb 28 21:34:50 2018 by root via cibadmin on primary
Stack: corosync
Current DC: primary (version 1.1.14-70404b0) - partition with quorum
2 nodes and 2 resources configured
Online: [ primary secondary ]
Full list of resources:
Resource Group: hakase_balancing
virtual_ip (ocf::heartbeat:IPaddr2): Started primary
webserver (ocf::heartbeat:nginx): Started primary
You have now two nodes [primary secondary] with status online.
Now, from the remote machine, open your web browser and type the URL http://Floating_IP_Address (replace the variable Floating_IP_Address with the actual IP address). You should see the Node1 page:
Next, stop the cluster service on Node1 with the following command:
crm cluster stop
Now, check the cluster status on the Node2 with the following command:
crm status
You should see that primary node is offline and secondary node is online as shown below:
Last updated: Wed Feb 28 22:00:59 2018 Last change: Wed Feb 28 21:46:57 2018 by root via cibadmin on primary
Stack: corosync
Current DC: secondary (version 1.1.14-70404b0) - partition with quorum
2 nodes and 2 resources configured
Online: [ secondary ]
OFFLINE: [ primary ]
Full list of resources:
Resource Group: hakase_balancing
virtual_ip (ocf::heartbeat:IPaddr2): Started secondary
webserver (ocf::heartbeat:nginx): Started secondary
Now, from the remote machine, open your web browser and type the URL http://Floating_IP_Address (replace the variable Floating_IP_Address with the actual IP address). You should see the Node2 page:
If your High Availability setup is not working as expected. You can use some useful troubleshooting command to find the exact reason.
The crm_mon is a very useful tool for viewing the real-time status of your nodes and resources:
crm_mon
You should see the following output:
Last updated: Wed Feb 28 23:46:46 2018 Last change: Wed Feb 28 22:00:43 2018 by root via cibadmin on primary
Stack: corosync
Current DC: secondary (version 1.1.14-70404b0) - partition WITHOUT quorum
2 nodes and 2 resources configured
Online: [ secondary ]
OFFLINE: [ primary ]
Resource Group: hakase_balancing
virtual_ip (ocf::heartbeat:IPaddr2): Started secondary
webserver (ocf::heartbeat:nginx): Started secondary
You can see your cluster configuration using the following command (replace the variable Floating_IP_Address with the actual IP address):
crm configure show
Output:
node 1: primary
node 2: secondary
primitive virtual_ip IPaddr2 \
params ip=Floating_IP_Address cidr_netmask=32 \
op monitor interval=10s \
meta migration-threshold=10
primitive webserver nginx \
params configfile="/etc/nginx/nginx.conf" \
op start timeout=40s interval=0 \
op stop timeout=60s interval=0 \
op monitor interval=10s timeout=60s \
meta migration-threshold=10
group hakase_balancing virtual_ip webserver
property cib-bootstrap-options: \
have-watchdog=false \
dc-version=1.1.14-70404b0 \
cluster-infrastructure=corosync \
cluster-name=debian \
stonith-enabled=false \
no-quorum-policy=ignore
You can also troubleshoot cluster by looking the Corosync logs using the following command:
tail -f /var/log/corosync/corosync.log
Congratulations! You now have a basic Nginx High Availability server setup using Corosync and Pacemaker on Ubuntu 16.04 server. For more information refer the official Pacemaker doc.
Server Load Balancer is a ready-to-use service that seamlessly integrates with Elastic Compute Service (ECS) to manage varying traffic levels without manual intervention. First, you need to add the ECS instances to the Server Load Balancer instance. Server Load Balance then distributes incoming traffic across multiple ECS instances, detects unhealthy or unsafe instances and routes traffic to healthy and safe instances only.
Alibaba Cloud Express Connect is a convenient and efficient network service. The product provides a fast, stable, secure and private or dedicated network communication between different cloud environments, including VPC intranet intercommunication and dedicated leased line connection across regions and users.
With Express Connect you can increase the flexibility of your network topology and enhance the quality and security of inter-network communication.
How to Deploy Ruby on Rails with Passenger and Nginx on Ubuntu 16.04
2,599 posts | 756 followers
FollowAlibaba Clouder - July 9, 2018
Marketplace - February 21, 2019
Alibaba Clouder - June 19, 2018
Alibaba Clouder - August 27, 2020
Alibaba Clouder - May 8, 2019
Alibaba Clouder - May 10, 2019
2,599 posts | 756 followers
FollowElastic and secure virtual cloud servers to cater all your cloud hosting needs.
Learn MoreLearn More
A dedicated network connection between different cloud environments
Learn MoreMore Posts by Alibaba Clouder
testa December 28, 2018 at 9:05 am
实验做不出来root@iZrj9920trj1lvjd0zy2c5Z:/etc/corosync