Hybrid Cloud Storage: Cross-Cloud Backup

In this article, we discuss the major pain points of data storage reliability and availability, and explore how enterprises can alleviate them through hybrid cloud backups.

11.11 The Biggest Deals of the Year. 40% OFF on selected cloud servers with a free 100 GB data transfer! Click here to learn more.

Cloud storage has become the go-to solution for enterprise storage applications because of its reliability and security. Statistics show that AWS, Microsoft Azure, and Alibaba Cloud revenues have increased by 45.9%, 61%, and 126%, respectively. According to Gartner, IaaS continues to be the most promising growth field, with a projected growth of 28% in the next 5 years. However, major public cloud providers are not immune to accidents and disasters, which can lead to downtime in services.

In February 2017, an engineer at AWS accidentally entered an incorrect command line while trying to debug an S3 storage system in the data center located in Virginia, causing four hours of downtime. This affected many enterprise platforms including Slack, Quora, and Trello. In September, another storage accident occurred in this region (East US).

In March 2017, Microsoft Azure public cloud storage encountered availability issues for more than eight hours, during which a portion of customers located in eastern US were influenced.

In June 2018, an operation error during Alibaba Cloud maintenance caused some customers to encounter issues when they tried to access the console on the official Alibaba Cloud website and use some products.

In August 2018, Tencent Cloud, lost production data stored by several start-up companies due to a silent error resulting from a hard drive hardware bug.

Almost all major cloud providers have had similar production incidents. Does this indicate that public clouds are insecure?

Problems with Traditional Backup and Disaster Tolerance Solutions

Backup and disaster tolerance products/solutions are still the battlefield of traditional service providers. They provide rich products that cover a wide range of fields. Other cloud providers have relatively small input and output in this industry. In our opinion, traditional backup and disaster tolerance products have two problems:

Not Cloud Native

For public cloud users, the backup and disaster tolerance ecosystems on clouds are incomplete. Even when backup and disaster tolerance software from traditional service providers are successfully deployed, it is hard to integrate them with existing resources on clouds for seamless monitoring and maintenance. Additionally, non-cloud-native backup and disaster tolerance may pose potential risks to users. Even though some backup and disaster tolerance products have been integrated into public clouds, traditional service providers may still fail to provide immediate response and support due to frequent releases and upgrades of products and features provided by public cloud providers. Therefore, users may not be able to take advantage of new features and performance improvements immediately when they are available. Finally, traditional service providers cannot implement internal coordination among various products while cloud providers can. Private cloud or hybrid cloud users also face the same problems.

High Cost and Complex Deployment

Traditional backup and disaster tolerance products still target the ecosystem of traditional servers and storage. The lump-sum input in deploying one or more devices and designing solutions, both for the pay-by-node and pay-by-capacity payment models, is very costly for small and medium enterprises. The maintenance cost can even be higher than the initial input when device warranties or authorization expires.

Users' concerns and problems are our responsibilities. In addition to improving reliability of individual products and providing maintenance guarantees, each public cloud provider has the obligation to provide cost-effective, easy-to-use, and efficient disaster tolerance solutions. More public cloud users means stronger disaster recovery needs. Hybrid Backup Recovery, Cloud Storage Gateway, and Hybrid Disaster Recovery from Alibaba Cloud's hybrid cloud storage team can provide users with perfect disaster tolerance solutions. These products are solutions for customers needing hybrid cloud disaster recovery from local IDCs to Alibaba Cloud or cross-cloud disaster recovery (multi-cloud disaster recovery) from other cloud providers to Alibaba Cloud. This article mainly shows how the three products from Alibaba Cloud's hybrid cloud storage team react to the cross-cloud disaster recovery (multi-cloud disaster recovery) scenario.

Cross-Cloud Backup Architecture based on Hybrid Backup Recovery

For users, the architecture of Hybrid Backup Recovery is very simple: clients and cloud backup warehouses. Clients are installed on hosts that need to be backed up, and the unlimited space in cloud backup warehouses are used to store backup data. For users, clients and cloud backup warehouses have the many-to-one relationship. Multiple clients and one cloud backup warehouse are connected by using public network or leased lines.

Diagram - Hybrid cloud backup cross-cloud backup architecture

Cross-Cloud Backup Implementation based on Hybrid Backup Recovery

This section uses two backups and one recovery example to show how Hybrid Backup Recovery backs up files on users' hosts, backs up incremental data, and recovers user data. The section will give you an intuitive understanding of Hybrid Backup Recovery.

To give an end-to-end demonstration, we apply for a virtual machine from another domestically famous cloud provider T. The virtual machine will be used to simulate a user server. The configuration of this cloud host is shown as follows: dual-core 4 GB RAM, 50 GB system disks, 100 GB data disks, 1.5 Gbit/s Intranet bandwidth and 50Mbit/s public network bandwidth, and 64-bit CentOS 7.4; the cloud host is located in Shanghai.

Server configuration

The 100 data disks contain 33 GB data files and 13 GB server logs.

Content in the server data disks

Data files in databases

Log files

Log on to Alibaba Cloud Console, go to the Hybrid Backup Recovery page, enable the service, and create backups. Note: We recommend that you select a Hybrid Backup Recovery region that is the same as or close to the region of the backup source. So, here we also choose the "China (Shanghai)" region.

Select a region and enable Hybrid Backup Recovery

Select a region and create a backup

After creating the backup and the backup warehouse, we need to download clients and certificates. Clients will be uploaded to and installed on the backup source end, which is the cloud host that we created previously.

Complete the creation and download the client and certificate

Upload the downloaded client software to the backup source (the cloud host), decompress and install the software.

Upload and install the backup client

After the installation, open http:// host public network IP>:8011 in a browser. Note that port 8011 may be opened in the cloud host's security group. Users need to edit the security group policy to open TCP port 8011. The backup client registration page pops up. Users need to enter the downloaded certificate (the key used to register and connect the backup source and the backup warehouse), the AK information of Alibaba Cloud accounts and users' own client logon passwords. Because the backup client and the cloud backup warehouse are connected over a public network, we select Classic as the network type.

Backup client registration page

After registration, users can see the backup client page. This page is the portal where users create backups and recover data. We start with creating an immediate backup (an immediate backup can be perceive as one single backup that runs only once; planned backup is the user-defined schedule where backups are made regularly). Here we select Back Up Now to back up the "/server_dir" directory only once.

Create backups

When required information is submitted, the backup starts immediately. In the backup client page, users can see backup progress and other related information.

Backup progress

While a backup is in progress, many users may wonder if the backup influences the services on the backup source end. We can check CPU, memory, and network usage by using the resource monitoring information on the backup source end. As we can see, when the backup starts, CPU load doesn't increase, and memory usage is increased by about 400 MB. The server resource usage hasn't increased much. The network bandwidth quickly rises to the fullest, which indicates the high performance of Hybrid Backup Recovery (Note: The cloud host has only one network adapter, so Intranet traffic and public network traffic are the same. We may as well perceive this as the cloud provider's design).

Cloud host resource usage at backup

Next, add a new directory that includes 13 GB files to the "server_dir" directory, and set a traffic limit on the backup task in the "Traffic Control" page: 24-hour traffic limit with the maximum speed of 2 MB/s. Remember to click Add to validate the traffic policy.

Add 13 GB files

Backup traffic limit

After settings are submitted, the backup task starts. We can see that the total data to be backed up is 57 GB, and very soon the progress bar displays 79%, with the speed exceeding 1.5 GB/s. This is because that the backup source directory includes 45 GB of files that have already been backed up. Hybrid Backup Recovery quickly identifies different content between two backups by using efficient comparison algorithms, and backs up added and modified files to cloud, improving backup efficiency.

Incremental backup

You might have a question as to whether the 1.56 GB/s backup speed is being restricted by the traffic limit. Let's check if the traffic limit works. By checking resource monitoring information on the cloud host and the nload output on the host, we can easily find out that 16Mbit/s equals the 2 MB/s outbound traffic speed. CPU usage experiences a short high load period due to the computing resource consumption in the process of incremental file comparison.

Resource monitoring on the cloud host

nload output of the host

After the two backups are completed, in the Hybrid Backup Recovery page on Alibaba Cloud Console, we can clearly see the backup overview: two successful backups, total source data and data that actually uses the backup warehouse. The ratio of the original data size to the actual usage represents the compression-to-deduplication ratio. Alibaba Cloud Hybrid Backup Recovery implements highly efficient compression and deduplication algorithms with the maximum ratio of 1:30, considerably saving bandwidth consumption and backup warehouse space consumption at the time of backup.

Backup warehouse info

Finally, we will show that how Hybrid Backup Recovery recovers files across clouds (recovering files from the backup warehouse to hosts of other cloud providers). We delete both the "db_file" directory and the "server_log" directory to simulate a user data loss scenario.

Delete files

After going back to the Recover page of Hybrid Backup Recovery, we can see the two successful backup records and related information.

Hybrid Backup Recovery Recover page

Click the Recover button after the latest backup record to recover data. In the Data Recovery pop-up page, we can specify backup files that are to be recovered and destination directories to which these backup files are to be recovered. This is easy to understand. Note: Many users may enter a destination path that is the same as the backup folder and check "All files". In this case, another "sever_dir" directory will be created under "/sever_dir/" in the actual recovery process. This doesn't influence the recovery operation, but we need to move directories after recovery is completed.

Simple and flexible recovery policy

After clicking Submit, we can see the Data Recovery page. This page shows recovery performance, data size and number of files. Recovery performance is better than backup performance. The reason may be that the cloud provider T has a lager upper limit on the write bandwidth. We can see that recovery performance is very good. The write performance of the 100 GB cloud disks may experience a bottleneck.

Data Recovery

Similarly, users can verify the file recovery speed by checking network traffic on the cloud host.

Recovery performance

After recovery is completed, users will see the success status in the recovery page of the client.

Recovery completed

Log on to the cloud host, and we can see that the two deleted directories have been recovered. The metadata is also completely recovered.

File directories after recovery

Summary

The three preceding examples clearly show users how Hybrid Backup Recovery efficiently backs up and recovers files on demand and on schedule in hybrid cloud backup or multiple cloud scenarios. One-click client installation, registration, and backup (batch processing) support file backup scenarios on multiple cloud hosts, which is very convenient for enterprise users to protect multiple cloud hosts.

The Hybrid Backup Recovery client currently supports all versions of Windows (32 bit and 64 bit) and popular releases of Linux (32 bit and 64 bit). Support for MacOS is coming soon.

To learn more about Hybrid Backup Recovery, visit www.alibabacloud.com/product/hbr

Community

Hybrid Cloud Storage: Cross-Cloud Backup

Problems with Traditional Backup and Disaster Tolerance Solutions

Not Cloud Native

High Cost and Complex Deployment

Cross-Cloud Backup Architecture based on Hybrid Backup Recovery

Cross-Cloud Backup Implementation based on Hybrid Backup Recovery

Summary

Read previous post:

Read next post:

Alibaba Cloud Storage

You may also like

Comments

Raja_KT February 11, 2019 at 5:34 am

Alibaba Cloud Storage

Related Products

Cloud Backup

Hybrid Cloud Storage

Hybrid Cloud Distributed Storage

Backup and Archive Solution