How Do Alibaba Engineers Recover Data That's Lost

By Fanjun

In recent decades, data security has received an unprecedented level of importance. And, at the same time, data protection has also gained even more attention. The bottomline is that service interruption time needs to dwindle as it is a constant plaque to today's users.

In this article, Fanjun, a technical expert at Alibaba, will address everything you need to know about Continuous Data Protection (CDP), and will provide a solution, specifically the elastic assured CDP solution. Fanjun will throw light on the various aspects of CDP, including its scenarios and development, as well as the challenges involved with data security nowadays, by defining the problem, looking at some traditional solutions and solutions adopted by today's cloud vendors.

So, what is there to know? Well, first, traditional CDP solutions obtain data-change logs at the Guest OS layer or private storage layer during the write operation. This has a great impact on the storage performance of production machines and can cause customers to bear increasing computing and storage costs after cloud migration. Next, data protection, in a hybrid architecture, seriously lags behind public cloud services in terms of network bandwidth and implementation complexity and fails to satisfy customers who are trying to reduce their recovery point objective (RPO) and recovery time objective (RTO). In addition to snapshot implementation and data migration, CDP also has a clear focus on the protection and recovery of data as well as providing efficient service continuity, which is something you don't see with snapshots and replication plans.

So, what's Alibaba Cloud's solution? Alibaba Cloud's Apsara Distributed File System version 2.0 provides a new block storage architecture that boosts CDP implementation in the cloud. With the core component, Log Structure Block Device supports a new data write method, log storage method, and snapshots. As enterprises move forward with cloud migration, Alibaba's Apsara Distributed File System version 2.0 will not only ensure storage performance but will also satisfy traditional and advanced enterprise users that are in need of data protection with low RTO and RPO requirements. And, in terms of data backup and its operability, the effectiveness of data protection mainly depends on the extent to which data is recovered.

The Challenges of Data Protection

In recent years, data security has received an unprecedented level of importance in the industry, and data protection likewise has gained a new level of attention as service interruption time continues to negatively impact cloud customers and any and all users of web services. Therefore, it is clear is that cloud customers need a more effective data security and protection solution to defend against viruses, ransomware, frequent mis-delete operations on databases, and direct attacks against backup software.

It goes without saying, data plays an increasingly important and crucial role in protecting our enterprise and online assets and resources nowadays. The data deletion incident of GitLab in January 2017 led the industry to pay much more attention to information security risks. In the data recovery process of GitLab, only the db1.staging database could be used for recovery operations, whereas the other five backup mechanisms were completely useless during this incident. Earlier, the db1.staging database only generated 6 hours of data. The recovery proceeded slowly because of the limited transmission rate. GitLab eventually lost nearly 6 hours of data.

So, as you can see form the above incident, many users nowadays are in urgent need of a solution to mitigate the risk of data loss, reduce the data protection window, significantly lower losses, and provide an efficient recovery mechanism. Low recovery time objective (RTO) and assured recovery contribute a lot to data protection. In today's ever changing and highly technical world, data recoverability is critical and takes precedence over storage costs.

What is Continuous Data Protection

The Storage Networking Industry Association (SNIA) defines continuous data protection (CDP) as a set of methods used to capture or track data changes by storing them separately from production data. This ensures that data is restored at any point in time. CDP is implemented based on blocks, files, or applications and can provide a recovery granularity that supports an unlimited number of recovery time points.

Gartner, if you happen not to know, is an authoritative IT research and consulting firm. According to them, CDP can be defined as a recovery method used to capture or track changes in data files or data blocks in a continuous or near-continuous manner and store these changes in logs.

This method provides fine-grainarity, real-time points to reduce data loss and make data recovery possible at any point in time. Some CDP solutions are configured to capture data changes continuously, which can be referred to as real CDP, and others are configured to capture data changes at a certain, specific time, which are referred to as near CDP.

In terms of metrics, only recovery point objective (RPO) and recovery time objective (RTO) really indicate the actual status of CDP. To be more precise:

Recovery point objective (RPO) indicates the length of time over which data may be lost when a disaster occurs, that is, the backup interval.
Recovery time objective (RTO) indicates the length of time required to restore service operation when a disaster occurs, that is, the recovery time.
In real CDP, the value of RPO is equal to 0, and the value of RTO is close to or approaches 0. An incident of CDP where RPO is not equal to 0 is referred to as a "near CDP".

The Main Characteristics of Continuous Data Protection

Traditional data protection solutions focus on periodic data backup, which is accompanied by issues such as backup windows, data consistency, and impact on production systems. CDP provides a new data protection method that allows system administrators to quickly recover data only by selecting the desired backup time point, without having to pay attention to the data backup process. The CDP system continuously monitors changes on important data and automatically protects data.

CDP has the following advantages over traditional disaster recovery technology:

CDP Exceedingly Improves RPO. The backup technology backs up data every 24 hours, which exposes users to the risk of losing data that spans up to 24 hours. Snapshot technology reduces the time frame of data loss to several hours, whereas CDP reduces it to several seconds. The time length varies depending on different CDP products and solutions. Traditional data protection technology replicates data at a single point in time, whereas CDP replicates data at any point in time.
CDP eliminates the typical weakness of replication technology, which fails to prevent data loss due to human-factor logic errors or virus attacks, despite the ability to obtain the latest data status by synchronizing with production data. When production data is deleted by accident or corrupted due to the preceding causes, its status is synchronized to the backup data storage system through replication. This causes corruption in the backup data. CDP helps to restore the data status to any point in time before data is corrupted.
CDP provides a more flexible data recovery with finer-grained RTO and RPO. Currently, some CDP products and solutions allow system administrators and end-users to directly recover data.

Implementation

CDP enables point-in-time data recovery by recording and storing data changes in the following three modes:

Reference Data Benchmarking Mode

In this mode, CDP creates reference data replicas, logs data differences based on changes in production data, and recovers data based on log differences. This mode is easy to implement. However, the recovery time is long as data recovery starts from the earliest reference data. The closer the recovery time point is to the current time, the longer the recovery time.

Reference Data Replication Mode

In this mode, CDP synchronizes production data and reference data replicas in real-time, records undo logs or events during synchronization and recovers data based on the differences of undo logs. The reference data replication mode is opposite to the reference data benchmarking mode. In the former mode, the closer the recovery time point to the current time, the shorter the recovery time. However, data and logs are synchronized when data is stored, which consumes many system resources.

Reference Data Merging Mode

The reference data merging mode is a compromise between the preceding two modes and achieves a better balance between resource consumption and RTO. However, it is difficult to implement since it requires complex software management and data processing. CDP technology or related solutions are implemented in multiple modes.

The CDP model varies according to different traditional vendors. Based on the storage sharing model of SNIA, CDP products or solutions are classified by application, file, and data block. This article describes CDP implementation at the data block level. The block-based CDP function may run on physical storage devices or logical volume managers, or even at the data transport layer.

When data blocks are written to the storage device of production data, the CDP system captures data replicas and stores them on another storage device. The block-based CDP function may implement on the host layer, transport layer, and storage layer.

Traditional Continuous Data Protection Products

The following table analyzes three vendors: FalconStor, Veeam Software, and EMC RecoverPoint. Of these, FalconStor is a representative vendor of CDP products.

All three vendors have a different background. The traditional storage vendor EMC acquired RecoverPoint to develop a CDP kit based on its own storage to protect the data on physical machines and virtual machines (VMs). Veeam Software is a rising star in the field of VM protection. It protects the data of VMware and Hyper-V VMs and expands its business to the cloud. Its current solution depends on VMware VAIO, which is a virtualized data acquisition framework.

EMC RecoverPoint/SE provides a complete solution for the EMC CLARiiON array, and EMC RecoverPoint provides a complete solution for data centers. The two products support local replication and synchronization, point-in-time data recovery, and asynchronous continuous remote replication (CRR).

Run CDP and CRR on an EMC RecoverPoint application device for concurrent local and remote (CLR) data protection to protect the same set of data locally and remotely by using a single solution. FalconStor CDP integrates multiple functions, such as data backup, system recovery, disaster recovery, local disaster recovery, and geo-disaster recovery.

FalconStor CDP is a disk-based backup and disaster recovery solution that supports real-time backup and instant recovery of files, databases, and operating systems by integrating local disaster recovery and geo-disaster recovery functions for verification and drills.

The Data Protection Methods of Major Cloud Vendors

Amazon Web Services (AWS) provides native snapshot functions and cloud migration methods. Its features such as data backup depend on traditional data protection vendors. Microsoft Azure provides basic VM-based backup and recovery methods but does not provide CDP and other advanced functions.

Elastic Assured Continous Data Protection

Gartner's description of an elastic cloud backup engine specifies the following features of a successful elastic backup:

The elastic cloud backup engine requires fast RTO, which requires the co-location of the backup engine and data recovery engine in the same data center.
The elastic cloud backup engine requires full backup, minimized data transmission through a wide area network (WAN), and separation of backup and production machines.
Data recoverability assurance is must.

CDP is an advanced data protection solution implemented by cloud vendors. It provides an elasticity that is absent from traditional backup. Traditional vendors must transmit data to the cloud through a WAN during cloud migration. This consumes CPU and I/O resources.

Traditional vendors may run scheduled tasks to avoid resource overconsumption. However, this hampers the effect of elastic backup and CDP. The CDP solution features assure reliability and operability. Inter-volume consistency groups and application consistency must be established to guarantee a consistent application of data across all volumes.

Summary

Data protection is a preventive method. Traditional enterprises impose higher requirements of cloud data protection when moving forward with cloud migration. Users attach more importance to data and are more sensitive to data loss than ever before. This heightens the conflict between cloud data protection and user requirements. Traditional block storage-based CDP depends on specific storage devices and is not elastic enough for off-premises implementation. Moreover, it does not adapt to complex, off-premises, distributed environments.

CDP is an important supplement to traditional or hybrid cloud data protection solutions and will become a new solution highly valuable for enterprise users. Apsara Distributed File System 2.0 provides a new block storage architecture that boosts off-premises CDP implementation.

As enterprises push forward with cloud migration, Apsara Distributed File System 2.0 will not only ensure storage performance but will also satisfy traditional advanced enterprise users who require data protection with low RTO and RPO.

Community

How Do Alibaba Engineers Recover Data That's Lost

The Challenges of Data Protection

What is Continuous Data Protection

The Main Characteristics of Continuous Data Protection

Implementation

Traditional Continuous Data Protection Products

The Data Protection Methods of Major Cloud Vendors

Elastic Assured Continous Data Protection

Summary

Read previous post:

Read next post:

Alibaba Cloud Storage

You may also like

Comments

Alibaba Cloud Storage

Related Products

Backup and Archive Solution

WAF(Web Application Firewall)

Web App Service

Cloud Backup