High Availability (HA), Fault Tolerance (FT), and Horizontal Scale Friendly (HSF) are as equally important as to functionality for web applications to run and succeed today. Existing or new web applications shall be designed and provisioned with such architecture underlying. Fortunately you can easily and promptly deploy mentioned architecture in the Cloud era today (compared to the on-premises bare-metal machine era)! However, this flexibility comes with a caveat - how to choose the right cloud provider? We are spoiled with choice when it comes to choosing cloud providers and it can be really challenging (and hectic!) when evaluating and choosing the right one.
This post is intended to discuss and provide a walkthrough on deploying web applications on Alibaba Cloud from the ground up, including HA, FT, and HSF. There is no intention to discuss on requirement analysis and capacity planning on particular specific domain (I'm a newbie to Alibaba Cloud). Anyway, throughout this post I will briefly introduce several services & tools provided in Alibaba Cloud. Yes, briefly! If you wish to learn more on particular services or tools, please visit the Documentation Center. In addition, the post would also highlight the concerns and considerations when deploying such services.
WordPress is used as the demo web application that would be deployed on Alibaba Cloud in this post. The same deploying principle shall apply to many other web applications. This post is not intended to discuss on WordPress configuration at all. It shall not (and not able to) serves as reference for WordPress configuration. There are tons and tons of good resources out there regarding best practices on WordPress administrative.
2. High-level Architecture:
Like many other web applications, the demo web application consists of application layer (WordPress) and DB layer (MySQL).
Goal : Ultimately, we want an always-onrunning web application (WordPress)!
In order to achieve such “simple” goal, the demo web application must deployed with the following 'minimum' requirements:
- One sites: Single main site.
- Minimum two physically separate running WordPress instances on each site for redundancy and load balance purpose.
- Auto spawning on WordPress instance when existing instance stopped/failure.
- The DB instance (MySQL) must also be running in redundancy mode. Automatic failover to active standby instance when necessary.
- Centralized dataspace. Shared resource must be accessible and available to all running WordPress instances. For e.g. document uploaded by a user via WordPress should be synced across all running WordPress instances.
Fortunately, Alibaba Cloud provides a list of services and tool for us to fulfill such 'minimum' requirements. In this post specifically, we would utilize Cloud DNS (DNS), Auto Scaling Group (ASG), Server Load Balancer (SLB), Elastic Compute Service (ECS), Relational Database System (RDS), Object Storage Service (OSS), and Object Storage File System (OSSFS) tool to achieve mentioned Goal . The high-level architecture diagram for the deployed WordPress would be as following:
3. Deployment Procedures:
we'll briefly introduce the components shown in Figure 1.0 before diving into each individual configuration. As stated earlier, you would have to refer to other sources such as Alibaba Cloud online documentation for detail explanation. The following table summarizes the description and usage of such components according to our deployment context:
3.1. Identify Service Regionit's substantially important to decide on the region where an application should deployed. The general considerations shall include (but not limited to) following: 1. The mother of all considerations- COST . Yes, the cost may vary according region. 2. Service availability in the region? It's not uncommon that some regions provide additional services that not available in other region, you have to test to find out! 3. Main target users' geographical location . It's definitely better user experience if deployed application is physically closer (shorter latency!) to customer. 4. Rules & Regulation . Is legally OK for the application hosting and running in the selected region? 5. Number Availability Zone . Occasionally, we need to improve application availability by deploying redundant application at different Zone. Since I'm based in Southeast Asia, I will be looking at the Singapore and Kuala Lumpur data centers. At the time of writing, “Asia Pacific SE 3 (Kuala Lumpur)" has only single Zone while “Asia Pacific SE 1 (Singapore)" has dual zone.
Conclusion: After consideration, we’ve decided “Asia Pacific SE 1 (Singapore)” as the main region for our demo deployment.
3.2.Plan for Network ConfigurationI. VPC We have to consider the number of nodes that might potentially be running in the deployment. Each running nodes is subject to one private IP, and we don't want to end up running out of private IPs for nodes in future! There are three type of CIDR blocks allowed by Alibaba Cloud for a VPC: 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16 . According to Alibaba Cloud documentation, the first & last three IP of CIDR block would be reserved by system usage, and hence the maximum private IP for each CIDR block are: • 10.0.0.0/8=16777212 (16777216 - 4) • 172.16.0.0/12 1048572 (1048576 - 4) • 192.168.0.0/16=65532 (65536 - 4) You may also wonder, why don't we just use the biggest CIDR block allowed to avoid potentially running-out of private IP in future? The following might help you to reconsider that thought: 1. Bigger CIDR block may increase complexity when dealing with IP related configuration such as subnet creation, route configuration, security group configuration, and etc. 2. If the above not a valid show-stopper for you, then consider this:" VPC peering (interconnect) "with other VPC doesn't allow overlapping CIDR block. In other words, it's not possible to peer with other VPC once you using 10.0.0.0/8 as CIDR block!
Conclusion: After consideration, we’d use “192.168.0.0/16” for our demo deployment as there would only be few running nodes within VPC.II. Subnet In Alibaba Cloud, VSwitch could be used to further segmenting VPC CIDR block into subnet with smaller CIDR block. The general consideration for segmenting subnet shall include following: 1. Logical grouping of instance according to the functionality. E.g. grouping application in one group and RDS in another group for easier maintainability. For e.g. disable group of instances by deleting VSwitch attaching to such group. 2. Simplify security group profile configuration. Security rule based on subnet CIDR block level rather than individual instance's IP is cleaner. 3. Enable Auto scaling and Server Load Balancer monitoring and actions on specific subnet. 4. Redundancy on resources . It's possible to seamlessly failover to different subnet that based on different Zone when the existing subnet's zone is encounters failure.
Conclusion: After consideration, we’d logical grouping ‘WordPress’ in one subnet (192.168.1.0/24) and RDS instance in another subnet (192.168.2.0/24).
3.3. Configure Firewall (Security Group)Network access on instance level could be limited via Security Group in Alibaba Cloud. The Security Group Rule configuration could be very granular up to per protocol per port per client IP level. Hence, to avoid unauthorized access to instance, we shall consider the following: 1. Always comply least privilege practice . Restrict access to required client only. 2. Intranet or/and internet connectivity . Security group is possible to use for create “private subnet” (no internet usage) by only allowing access for inbound intranet. In addition, NAT gateway could be used to allow instance in such private network to access outbound internet services.
Conclusion: Since we are running WordPress on Linux instances, we would at least allow inbound rule for Port 80 (HTTP) and 22 (SSH) in Security Group. Besides that, all outbound traffics would be allowed since no specific requirement on that.
3.4. Configure Application LayerThis could be the trickiest and most uncertain decision we have to make when deploying web applications. As stated earlier, this post will not discuss on application's capacity requirements and hence choosing a proper instance type is out of scope of this post. Anyhow, the following considerations may assist in deciding on instance type generally: 1. Always start' Pay-As-You-Go 'model if you have no idea on the instance type performance nor the actual capacity requirement. This pricing model allows you to experiment with different instance types freely without a lock-in period. 2. You have to understand the nature of the to-be deployed application's constraint . Is the application subjects by CPU-bound or IO-bound? You have to answer that in order to determine a proper instance type with great cost efficiency. 3. Deploy one step down instance whenever possible. If an application capacity requirement could be satisfied with a 'X' instance of a instance family type Y, it might be better if we deploy the application with two one step down instances (e.g. X/2) from the same family type for the same amount of workload. This will increase the availability of such application. For example, we can still process 50% of workload if any X/2 instance is goes down compared with 100% down time if the X instance is down. Of course, this approach is subject to the design and usage of application. 4. Decide On other usage parameters e.g. network type, network bandwidth, operating system image, and etc. accordingly.
Conclusion: Since this is a demo deployment without any real production usage, we’ll go for the lowest (cheapest) ECS instance configuration. For e.g.: General Type n1: 1-core, 1GB, Ubuntu 16.04 OS, Ultra Cloud Disk 40GB, and 1Mbps network bandwidth.
3.5. Configure DB LayerGenerally, we have to decide between using self-managed DB instances (self-install DB at ECS instance) like what we usually do for on- premises solutions, or using fully managed RDS DB services like ApsaraDB. Again, it's out of this post's scope in comparing or benchmarking the two variants of DB services. Anyhow below guideline may assist in choosing DB variants generally: 1. Do you have available resources for managing and operating DB instances? The management and operational tasks may include backing up data files, OS /DB patching, access control on host machine, and etc. If the answer is no then maybe a fully managed RDS DB is preferable. 2. Do you need a dedicate DB instance? If you DB is small/workload is minimal and able to co-exist with application (e.g. at development environment), perhaps the self-managed variant is preferable due to cost efficiency. 3. Do you need access to underlying host for DB instance? For example, if you need to perform specific OS / DB configuration for performance tuning purposes, then self-managed variant shall be employed. 4. Does the fully managed DB services provide the DB type that you required? If no, then the answer is straightforward go for self-managed DB variant. 5. If you have concern on possible cloud vendor lock-in, then you might want to avoid the fully managed variant as some RDS implementation could be cloud vendor specific.
Conclusion: There is neither manpower to maintain the demo DB nor any specify DB configuration, and hence we’d deploy the demo DB with ApsaraDB RDS – MySQL. In addition, such variant allow us to make redundancy (active standby) DB easily (with just a click!)
3.6. Identify Centralized StorageEventually, there could be multiple concurrent WordPress applications running on physically separated ECS instances. Each instance might generate and store certain files/image/media resulting from users' operations. Obviously objects that are generated by any instance would have to be sync across all other running application instances. One of the approaches to achieve mentioned synchronization is through centralized storage. Objects generated shall be sync to centralized storage and following by synchronization between centralized object and other running instance. Additionally, the centralized storage must be always available and any failure of any instance shouldn't impact the availability and durability of centralized storage. Alibaba Cloud provides a couple of fully managed services as following which could be served as centralized storage: 1. Object Storage Service for objects - It's ideal as centralized object storage due to the guaranteed high availability (99.9%), scalability, and fully managed nature. Specifically to this demo deployment, each running WordPress instance shall sync with a dedicated common Object Storage Service's bucket. By employed such syncing mechanism, all the running WordPress would having identical set of created object. 2. ApsaraDB Redis for application state - Share state (e.g. shared value, parameter) among running instance is possible to be shared across via fully managed ApsaraDB Redis.
Conclusion: A dedicated bucket in Object Storage Service would be created and used to store object created resulted of users’ operation. All running WordPress shall sync with the mentioned bucket for the list of created object.
3.7. Plan for HA, FT, and HSFTo achieve HA, FT, and HSF in Alibaba Cloud, a web application shall be fundamentally designed as stateless and horizontally scalable. Any dependent application's state or data shall be decoupled from web application and be migrated to a centralized storage as discussed in the earlier section. Services listed below could be employed for deploy a HA FT, and HSF web application: 1. Cloud DNS - It's possible to configure 'A' record type for instances hosting in different regions. It's really useful during failover scenario whereby an 'A' record of a standby instance could be enabled (make active) with one click and resulting network traffic diversion to the standby instance. 2. Auto Scaling - It can be used to auto spawn instance in a desired Zone when exiting running instance are goes down/became unhealthy. 3. Server Load Balancer - This service would provide health check on configured instances and report such status to Auto Scaling service for further action. Besides that, this service would also load balance workload among running instances. 4. ApsaraDB RDS - RDS MySQL provides the multi zone availability feature with just a click. It's really ease the effort to provide HA and FT for DB.
Conclusion: The demo deployment would utilizing DNS to route traffic to WordPress instances, Auto Scaling to ensure minimum 2 running instances in each region, and Server Load Balancer to provide health check as well as to load balance workload. Last but not least, the Multi-Zone availability feature on RDS MySQL are enabled to provide HA and FT for DB.
3.8. Testing and RunTo test on the HA and FT behavior, we may stop a running ECS manually and observe the action triggers by auto scaling service. If the auto scaling has been configured properly, a new instance would be spawned automatically. Besides that we may also manually turn off the RDS DB instance to observe the Multi-Zone redundancy failover happening. The best thing about such actions are automatically handled by respective service without any manual intervention. Shown below is our deployed WordPress: Figure 3: Demo Deployed HA, FT, and LB WordPress
4. Possible ImprovementsFollowing suggestions might be useful to further improve the resiliency, performance, and availability of a deployed web application: 1. Auto scale out/in according instance's workload. For example spawn a new instance when CPU/ memory exceeds certain threshold over a defined period. 2. Utilize CDN to cache and distribute content to minimize geographical latency and reduce the traffic to application instance. In addition, CDN would also acts as defend layer for DDoS attack to application instances. 3. Offload database 'read' workload by creating read replica. 4. Plan a Disaster Recovery Region and create failover strategy. 5. Setup cloud monitoring, enable alert, and turn on detailed log at least for critical metrics and incidents such as instance failure, disk space full, auto scaling triggered, and etc.
5. Appendix (Sample Configuration)Following sample configuration steps are based on the outcome discussed in “Deployment Procedures” section. You would need an Alibaba Cloud account to run the following configuration. If you yet to have one, you may register (with USD300 free credit at the time of this writing) with this link. I. VPC & Network Configuration (Identify Service Region & Plan for Network Configuration): 1.Login to Alibaba Cloud console 2.Create “VPC”. Go to “Product” and click on “Virtual Private Cloud” under “Networking”. Select region as “Asia Pacific SE 1”. Once landed at VPC overview page, click at “VPC” on side tab follows by click on “Create VPC” button. • Name: VPC-Main • CIDR range: 192.168.0.0/16 3.Create “subnet”. One subnet for WordPress instance and one subnet for RDS. First subnet (Continue with “Next Step” at Step 2 to “Create VSwitch”): • VPC: Recently created VPC (e.g. VPC-Main) • Name: Public-Subnet1 • Zone: Zone A • CIDR: 192.168.1.0/24 Second subnet (Click on 'Create More' to create second switch) : • VPC: Recently created VPC (e.g. VPC-Main) • Name: Public-Subnet2 • Zone: Zone A • CIDR: 192.168.2.0/24 ii. Security Group Configuration (Configure Firewall) 4.Create “Security Group”. Go to “Product” and click on “ Elastic Computing Service”. Once landed at ECS overview page, click at “Security Group” on side tab follow by click on “Create Security Group” button. • Name: Any name. E.g. SG-SSH-HTTP • Network Type: VPC • VPC: VPC-Main • Click on “Set the Rule Immediately” 5.Add rule. Click on “Add Security Group Rules” First Rule (SSH for any inbound client) • Rule Direction: Inbound • Authorization Policy: Allow • Protocol Type: SSH • Authorization Object: 0.0.0.0/0 Second Rule (HTTP for any inbound client) • Rule Direction: Inbound • Authorization Policy: Allow • Protocol Type: HTTP • Authorization Object: 0.0.0.0/0 Third Rule (All protocol for any outbound target) • Rule Direction: outbound • Authorization Policy: Allow • Protocol Type: All • Authorization Object: 0.0.0.0/0 iii. ECS Configuration (Configure Application Layer - Part 1) 6.Create “Key Pair”. Go to “Product” and click on “Elastic Computing Service”. Once landed at ECS overview page, click at “Key Pairs” on side tab follows by click on “Create key Pair” button. • Name: ECS-Lab • Type: Automatically Create a Key Pair • A key pair file namely “ECS-Lab.pem” should be auto downloaded. This file would be used as authentication key when connect to ECS instance 7.Create ECS instance for WordPress installation. Go to “Product” and click on “Elastic Computing Service”. Once landed at ECS overview page, click at “Instances” on side tab follows click on “Create Instance” button. • Pricing Model: Pay-As-You-Go • Region and Zone: Asia Pacific SE 1 (Singapore), Asia Pacific SE 1 Zone A • Instance Type: General Type n1 - 1 core 1GB • Network Type: Select the created 'VPC' (VPC-Main), VSwitch (Public-Subnet1), and Security Group (SG-SSH-HTTP) accordingly • Operating System: Ubuntu 16.04 • Security Setting: Attach Key Pair, select the generated key-pair from Step 6 (ECS-Lab). • Instance name: ECS-Lab-WP • Click “Buy Now” and proceed accordingly 8.SSH to the purchased ECS instance with the key pair generated at Step 6. Refer to this link on how to SSH the ECS instance. Go to “Product” and click on “Elastic Computing Service”. Once landed at ECS overview page, click at “Instances” on side tab. The internet IP address at “IP Address” column. SSH into the ECS instance, run following commands to install the necessary software and packages for WordPress. Please ensure all commands are executed successfully.
apt-get update apt-get install apache2 libapache2-mod-php php php-mcrypt php-mysql mysql-client-core-5.7 -y cd /var/www/html mv index.html index.html.bk wget https://wordpress.org/latest.tar.gz tar -xzf latest.tar.gz cp -r wordpress/* /var/www/html/ rm -rf wordpress latest.tar.gz chown -R www-data:www-data /var/www/html chmod -R 755 /var/www/html/wp-content service apache2 restart
iv. ApsaraDB RDS Configuration (Configure DB Layer) 9.Create ApsaraDB RDS - MySQL. Go to “Product” and click on “ApsaraDB for RDS”. Once landed at RDS page, click at “Create Instances.• Billing Method: Pay-As-You-Go• Region and Zone: Singapore, Multiple Zone (Zone A + Zone B )• Database Engine: MySQL• Instance type: 1 Core 1GB (rds.mysql.t1.small)• Network Type: “VPC ”, and select VPC (VPC-Main) and VSwitch (Public-Subnet2) accordingly• Click “Buy Now” and proceed accordingly10.Configure RDS instance. Go to “Product” and click on “ApsaraDB for RDS” (It might take a while before the purchased “RDS” appears at the page). Once the purchased RDS is up and running click on “Manage” on the RDS.10.1 Create whitelist. Click on “Security” at side tab. At “Whitelist Setting” tab, click on "+Add a Whitelist Group”• Group Name: rds_ecs_whitelist• Whitelist: 192.168.1.0/24 • Click “OK ”10.2 Create "wordpress” database. Click on “Databases” at side tab follows by click on “Create Database”• Database Name: wordpress• Supported Character: utf8• Click OK10.3 Create user account. Click on “Accounts” at side tab follows by click on “Create Account”• Database Account: wordpress_user• Authorized Databases: select the created database (wordpress)• Password & Re-enter Password: Wordpress12310.4 Click “OK” to create accountv. WordPress Configuration (Configure Application Layer - Part 2): 11.Browse to ECS internet IP (created at Step 7) using internet browser.11.1 Fill in the MySQL connection detail such as “Database Name”, “Username ”, “Password” as defined in Step 10. The “Database Host” is the “Intranet Address” of the created RDS instance at Step 9. You may get the intranet address by go to Alibaba Cloud console at “Product” and click on “ApsaraDB for RDS”. Once landed at RDS page, click on the created RDS instance and copy the “Intranet Address” value.12.Click on the “Run on Installation” and continue the WordPress configuration till completion. Hooray, by now, your first WordPress instance should be installed and running at Alibaba Cloud! vi. Sync Dependent Data Storage (Identify Centralized Storage) 13.The folder used by WordPress to store user uploaded object should be sync to centralized storage.14.Create an OSS bucket. Go to “Product” and click on “Object Storage Service” under “Storage & CDN”. Once you landed at the Object Storage page, click on “Create Bucket” on the RDS.• Bucket Name: lab-wp-XXX (using your own bucket name)• Region: Asia Pacific SE 1 (Singapore)• Storage Class: Standard• ACL: Private• Click OK15.Grant access to the bucket created at Step 14. Go to “Product” and click on “Resource Access Management” under “Monitor and Management”. Once you landed at the RAM page click on “User” and follows by click on “Create User”.• User Name: oss-user• Click OK16.Authorize created user with OSS access. Go to “Product” and click on “Resource Access Management” under “Monitor and Management”. Once you landed at the RAM page click on the “Authorize” of the newly create user.• Select and add the “AliyunOSSFullAccess”• Click OK17.Generate “User Access Key”. Go to “Product” and click on “Resource Access Management” under “Monitor and Management”. Once you landed at the RAM page, click on the “Manage” of the newly create user.17.1 Go to “User Access Key” section and click “Create Access Key”17.2 Click on “Save Access Key Information” to save the generated Access Key and Access Key Secret18.Install “ossfs” tool. This tool would be used to sync WordPress' dependent folder with OSS bucket created at Step 14.18.1 SSH into the launched WordPress ECS instance18.2 Install 'ossfs' according to the guideline at this Link
cd wget https://github.com/aliyun/ossfs/releases/download/v1.80.3/ossfs_1.80.3_ubuntu16.04_amd64.deb sudo apt-get update sudo apt-get install gdebi-core -y sudo gdebi ossfs_1.80.3_ubuntu16.04_amd64.deb
18.3 Make WordPress uploading directory
mkdir -p /var/www/html/wp-content/uploads chown -R www-data:www-data /var/www/html/wp-content/uploads
18.4 Setup credential with the bucket name and key created at Step 14 & 17 accordingly.
chmod 640 /etc/passwd-ossfs
18.5 mount' lab-wp-XXX 'OSS bucket to the WordPress' dependent folder and enable auto mounting during instance startup18.5.1 Add following command into '/etc/fstab' to mount 'lab-wp-XXX' during system startup. Beware of using the correct zone. E.g. “http://oss-ap-southeast-1.aliyuncs.com”
echo "ossfs#lab-wp-XXX /var/www/html/wp-content/uploads fuse _netdev,url=http://oss-ap-southeast-1.aliyuncs.com,allow_other, 0 0" >> /etc/fstab
18.5.2 Execute the mounting operation
mount -a18.6 To avoid mounted OSS bucket to be scanned by Linux (which incurs unnecessary cost), add following detail into "/etc/updatedb.conf”18.6.1 Add "/var/www/html/wp-content/uploads” to PRUNEPATHS 18.6.2 Add “fuse.ossfs” into PRUNEFS vii. High Availability, Fault Tolerance, and Load Balance Configuration (Plan for HA, FT, and HSF): 19.Create Load balancer. At ECS overview page, click at “Load Balancer” on side tab. Once Load Balancer page loaded, click on “Create Server Load Balancer”.• Region: Singapore• Zone: Multi-zone• Primary Zone: Zone A• Backup Zone: Zone B• Instance Type: Internet• Quantity: 120.Configure load balancer. At ECS overview page click at “Load Balancer” on side tab. Once Load Balancer page loaded, click “Manage” on the purchased load balancer at Step 19.20.1 Click “Listener” then click “Add Listener” button.• Front-end Protocol: HTTP, port 80 • Back-end Protocol: HTTP, port 80• Scheduling: Weighted Round • Click “Show Advance” and enable persistence session• Timeout Duration: 30020.2 Click “Next” to configure health check.• Domain Name: Leave Blank• Health Check Port: 80• Health Check Path: /index.php• Normal Status Code: enable http_2xx and http_3xx• Click “Confirm” to provision Load Balancer20.3 Update the Load Balancer internet IP address at WordPress. This is important as the running WordPress instance from Step 11 has been auto configured with the running ECS IP. We need to change the IP to point to Load Balancer's IP as WordPress might be running by any ECS instance behind load balancer. If you have Domain Name, you might want to update to the Domain Name instead.• Browse to WordPress using browser. Go to the “Setting” url e.g. “http ://