Zone-disaster recovery provided by MaxCompute is used to overcome unexpected failure scenarios, such as carrier network failures, data center power outages, data center facility failures, and cluster failures. You can enable multi-zone storage disaster recovery and multi-zone high-availability (HA) computing to significantly reduce business downtime and meet business assurance requirements and industry compliance requirements.
Feature introduction
MaxCompute zone-disaster recovery extends the availability of data storage and computing services from a single zone to three zones in the same region and utilizes physical isolation and low-latency network connectivity of the three zones to provide real-time data synchronization and fault isolation capabilities across data centers. This prevents business systems from being interrupted due to failures in a single data center and improves the risk resistance capability of your business.
MaxCompute zone-disaster recovery includes multi-zone storage disaster recovery and multi-zone HA computing. The following content describes the details.
Multi-zone storage: supports redundant storage of existing data in a project across three zones, rather than local storage in a single zone. Incremental data can be simultaneously written to the storage services in three zones. When the system encounters zone-level failures, multi-zone storage disaster recovery can ensure that data read and write services are not interrupted and data is not lost. This can help you achieve a zero recovery point objective (RPO) for data. Multi-zone storage allows you to store all user data in a project, including metadata, user permissions, all table types, materialized views, user-defined functions (UDFs), and resources.
Multi-zone HA computing: binds multi-zone HA computing resources to a project for which multi-zone storage is enabled to implement zone-disaster recovery for data storage and computing. You can reserve sufficient multi-zone HA computing resources in multiple zones. If a zone-level failure occurs, computing resources are automatically switched from the faulty zone to a zone that can normally provide services. Multi-zone HA computing resources support all job types, including SQL, MaxFrame, Cupid, and MapReduce tasks.
Disaster recovery guidelines
After zone-disaster recovery is enabled, the following recovery operations are performed when a zone-level failure occurs:
You are notified of the failure information from Alibaba Cloud MaxCompute.
The server immediately allocates computing resources from the zone that can normally provide services. The system checks the integrity and availability of data such as tables, partitions, and permissions in the project.
If a job that is submitted by a client fails to run, you must submit the job again. You do not need to modify the configurations for accessing MaxCompute, such as the endpoint, authentication information, project name, and quota name.
After the job resumes running, you must continue to monitor the upper-layer business operations to ensure that the business is properly running.
Scenarios
Finance
Financial services provided by banks require constant analysis and processing of business transaction data and need to prevent business interruptions caused by data center failures.
Critical infrastructure
Data analysis systems in industries such as power supply, water utility, and transportation need to prevent the interruptions of key information services that are critical for livelihoods of people when data centers fail.
Benefits
Redundant data backup
Reduced business downtime
Compliance with industry standards
Better upper-layer business customer experience
Limits
Zone-disaster recovery is supported in the China (Hangzhou), China (Shanghai), China (Beijing), China (Shenzhen), China East 2 Finance, China (Hong Kong), Singapore, and Indonesia (Jakarta) regions.
Billing
After multi-zone storage is enabled, MaxCompute is billed in multi-zone storage mode. For more information, see Billing for multi-zone storage.
To implement multi-zone HA computing, you must purchase multi-zone HA computing resources. For more information about the billing of multi-zone HA computing resources, see Computing fees (subscription).
Instructions
To implement zone-disaster recovery for storage and computing, you must enable multi-zone storage disaster recovery and multi-zone HA computing.
Enable multi-zone storage disaster recovery
Log on to the MaxCompute console, and select a region in the top navigation bar.
In the left-side navigation pane, choose
.On the Zone-disaster Recovery page, click Enable Zone-disaster Recovery.
In the dialog box that appears, select the MaxCompute project for which you want to implement storage disaster recovery and select the check box.
Click OK.
After you enable multi-zone storage disaster recovery for a project, the system starts to prepare for storage disaster recovery for the project data. In the preparation process, the project data that is stored in a single zone is migrated to the other two zones for multi-zone storage disaster recovery. The data preparation process takes about two days. After the data preparation is complete, the project has the storage disaster recovery capability.
NoteDuring the preparation for storage disaster recovery, the running of jobs is not affected and the business is not interrupted.
During the preparation for storage disaster recovery, if a task is writing data to historical table partitions in streaming mode, the preparation task does not start but waits until the streaming write operations are complete and submitted. We recommend that you write data to new partitions on a daily or weekly basis to ensure that multi-zone storage is implemented for all tables and partitions.
The local backup data and time travel query data that are generated before storage disaster recovery is enabled are retained in the original zone for local storage. The local backup data and time travel query data that are generated after storage disaster recovery is enabled are distributed to three zones for redundant storage.
Enable multi-zone HA computing
To enable multi-zone HA computing, you must purchase multi-zone HA computing resources and configure the multi-zone HA computing resources as the default computing quota of the desired project.
Log on to the MaxCompute console, and select a region in the top navigation bar.
In the left-side navigation pane, choose
.On the Quotas page, click New Quota.
On the resource purchase page, configure the parameters. The following table describes the key parameters.
Parameter
Description
Specifications Type
Select Multi-zone HA Computing Resource.
Multi-zone HA CU
Select the number of compute units (CUs) that you want to purchase.
NoteYou must purchase at least 50 CUs. The number of CUs for an incremental purchase must be an integer multiple of 1.
Click Buy Now. Read the terms of service and then complete the payment.
After you complete the payment, you can view the purchased multi-zone HA computing resources on the Quotas page.
Configure the multi-zone HA computing resources as the default computing quota of the desired project.
In the left-side navigation pane, choose
.Find the desired project, and click Manage in the Actions column.
On the Parameter Configuration tab, click Edit in the Basic Information section.
Set the Default Quota parameter to the multi-zone HA computing resources, and click Submit.
Observe disaster recovery resources
On the disaster recovery resource observation page for a project, you can view the overall status, zone monitoring information, and table data details of the project.
Log on to the MaxCompute console, and select a region in the top navigation bar.
In the left-side navigation pane, choose
.On the Zone-disaster Recovery page, click the name of a project for which disaster recovery is enabled. The disaster recovery resource observation page for the project appears.
You can observe the following information:
Basic Information
This section displays the overall disaster recovery status of the current project. You can view information about zones, disaster recovery of control information, current status, disaster recovery creation time, and last failover time.
NoteIf the value of Current Status is Preparing, the system is converting the data storage mode to multi-zone storage.
If the value of Current Status is Normal, the data is stored in multi-zone storage mode and the project is equipped with zone-level storage disaster recovery capabilities.
Zone Monitoring
This section displays the monitoring information of multi-zone HA computing. You can view the information of the zones where the multi-zone HA computing resources purchased and bound to the project reside.
In Use: Your jobs are running in this zone.
Reserved: If the zone where your jobs are running encounters a failure, the computing resources will be switched to this zone.
Table Data Details
You can search for a table exactly based on the schema name and table name. If you do not specify a table, information about all tables in the project is displayed.
Parameter
Description
Schema Name
A schema in the project.
Table Name
The name of a table.
Partitioned Table
Specifies whether the table is partitioned.
Last Data Update Time
The time when the data in the table was last modified.
Data Volume
The data size of the table.
Data Distribution
The zones to which data in the table is distributed.
If the zones are in the Preparing state, the system is converting the data storage mode to multi-zone storage.
If the zones are in the In Use state, data is redundantly stored in multiple zones.
Actions
If the table is a partitioned table, click View Partition Details to view the Last Data Update Time, Data Volume, and Data Distribution information of each partition in the table.
More operations
Perform disaster recovery drills
MaxCompute provides project-level disaster simulation and escape drills. The process is as follows:
Submit a ticket to Alibaba Cloud to apply for a disaster recovery drill. You must provide the following information: region, project name, quota name, and drill time period. We recommend that you perform a disaster recovery drill in off-peak hours.
After the ticket is approved by Alibaba Cloud, MaxCompute provides a failover button on the disaster recovery resource observation page at the project level. You can switch the zone where the computing resources are running as prompted. After the switchover, newly submitted jobs will be executed immediately. However, any jobs that fail due to the zone switching process must be retried manually.
The preceding operations are only used in drill scenarios. When an actual zone-level disaster occurs, the system automatically completes the failover of computing resources.
Disable disaster recovery
To disable disaster recovery for a project, you can find the desired project on the Zone-disaster Recovery page, and click Disable Disaster Recovery in the Actions column. Then, enter the project name as prompted, and click OK.
After disaster recovery is disabled, project data is redistributed to a single zone for local storage.
This is a high-risk operation. Once disaster recovery is disabled, the project will immediately lose its disaster recovery capabilities. Conduct a thorough evaluation before proceeding with this operation.