All Products
Search
Document Center

MaxCompute:Zone-disaster recovery

Last Updated:Apr 14, 2025

Zone-disaster recovery provided by MaxCompute is used to overcome unexpected failure scenarios, such as carrier network failures, data center power outages, data center facility failures, and cluster failures. You can enable multi-zone storage disaster recovery and multi-zone high-availability (HA) computing to significantly reduce business downtime and meet business assurance requirements and industry compliance requirements.

Feature introduction

MaxCompute zone-disaster recovery extends the availability of data storage and computing services from a single zone to three zones in the same region and utilizes physical isolation and low-latency network connectivity of the three zones to provide real-time data synchronization and fault isolation capabilities across data centers. This prevents business systems from being interrupted due to failures in a single data center and improves the risk resistance capability of your business.

MaxCompute zone-disaster recovery includes multi-zone storage disaster recovery and multi-zone HA computing. The following content describes the details.

  • Multi-zone storage: supports redundant storage of existing data in a project across three zones, rather than local storage in a single zone. Incremental data can be simultaneously written to the storage services in three zones. When the system encounters zone-level failures, multi-zone storage disaster recovery can ensure that data read and write services are not interrupted and data is not lost. This can help you achieve a zero recovery point objective (RPO) for data. Multi-zone storage allows you to store all user data in a project, including metadata, user permissions, all table types, materialized views, user-defined functions (UDFs), and resources.

  • Multi-zone HA computing: binds multi-zone HA computing resources to a project for which multi-zone storage is enabled to implement zone-disaster recovery for data storage and computing. You can reserve sufficient multi-zone HA computing resources in multiple zones. If a zone-level failure occurs, computing resources are automatically switched from the faulty zone to a zone that can normally provide services. Multi-zone HA computing resources support all job types, including SQL, MaxFrame, Cupid, and MapReduce tasks.

image

Disaster recovery guidelines

After zone-disaster recovery is enabled, the following recovery operations are performed when a zone-level failure occurs:

  1. You are notified of the failure information from Alibaba Cloud MaxCompute.

  2. The server immediately allocates computing resources from the zone that can normally provide services. The system checks the integrity and availability of data such as tables, partitions, and permissions in the project.

  3. If a job that is submitted by a client fails to run, you must submit the job again. You do not need to modify the configurations for accessing MaxCompute, such as the endpoint, authentication information, project name, and quota name.

  4. After the job resumes running, you must continue to monitor the upper-layer business operations to ensure that the business is properly running.

Scenarios

  • Finance

    Financial services provided by banks require constant analysis and processing of business transaction data and need to prevent business interruptions caused by data center failures.

  • Critical infrastructure

    Data analysis systems in industries such as power supply, water utility, and transportation need to prevent the interruptions of key information services that are critical for livelihoods of people when data centers fail.

Benefits

  • Redundant data backup

  • Reduced business downtime

  • Compliance with industry standards

  • Better upper-layer business customer experience

Limits

Zone-disaster recovery is supported in the China (Hangzhou), China (Shanghai), China (Beijing), China (Shenzhen), China East 2 Finance, China (Hong Kong), Singapore, and Indonesia (Jakarta) regions.

Billing

  • After multi-zone storage is enabled, MaxCompute is billed in multi-zone storage mode. For more information, see Billing for multi-zone storage.

  • To implement multi-zone HA computing, you must purchase multi-zone HA computing resources. For more information about the billing of multi-zone HA computing resources, see Computing fees (subscription).

image

Instructions

To implement zone-disaster recovery for storage and computing, you must enable multi-zone storage disaster recovery and multi-zone HA computing.

Enable multi-zone storage disaster recovery

  1. Log on to the MaxCompute console, and select a region in the top navigation bar.

  2. In the left-side navigation pane, choose Disaster Recovery > Zone-disaster Recovery.

  3. On the Zone-disaster Recovery page, click Enable Zone-disaster Recovery.

  4. In the dialog box that appears, select the MaxCompute project for which you want to implement storage disaster recovery and select the check box.

  5. Click OK.

    After you enable multi-zone storage disaster recovery for a project, the system starts to prepare for storage disaster recovery for the project data. In the preparation process, the project data that is stored in a single zone is migrated to the other two zones for multi-zone storage disaster recovery. The data preparation process takes about two days. After the data preparation is complete, the project has the storage disaster recovery capability.

    Note
    • During the preparation for storage disaster recovery, the running of jobs is not affected and the business is not interrupted.

    • During the preparation for storage disaster recovery, if a task is writing data to historical table partitions in streaming mode, the preparation task does not start but waits until the streaming write operations are complete and submitted. We recommend that you write data to new partitions on a daily or weekly basis to ensure that multi-zone storage is implemented for all tables and partitions.

    • The local backup data and time travel query data that are generated before storage disaster recovery is enabled are retained in the original zone for local storage. The local backup data and time travel query data that are generated after storage disaster recovery is enabled are distributed to three zones for redundant storage.

Enable multi-zone HA computing

To enable multi-zone HA computing, you must purchase multi-zone HA computing resources and configure the multi-zone HA computing resources as the default computing quota of the desired project.

  1. Log on to the MaxCompute console, and select a region in the top navigation bar.

  2. In the left-side navigation pane, choose Workspace > Quotas.

  3. On the Quotas page, click New Quota.

  4. On the resource purchase page, configure the parameters. The following table describes the key parameters.

    Parameter

    Description

    Specifications Type

    Select Multi-zone HA Computing Resource.

    Multi-zone HA CU

    Select the number of compute units (CUs) that you want to purchase.

    Note

    You must purchase at least 50 CUs. The number of CUs for an incremental purchase must be an integer multiple of 1.

  5. Click Buy Now. Read the terms of service and then complete the payment.

    After you complete the payment, you can view the purchased multi-zone HA computing resources on the Quotas page.

  6. Configure the multi-zone HA computing resources as the default computing quota of the desired project.

    1. In the left-side navigation pane, choose Workspace > Projects.

    2. Find the desired project, and click Manage in the Actions column.

    3. On the Parameter Configuration tab, click Edit in the Basic Information section.

    4. Set the Default Quota parameter to the multi-zone HA computing resources, and click Submit.

Observe disaster recovery resources

On the disaster recovery resource observation page for a project, you can view the overall status, zone monitoring information, and table data details of the project.

  1. Log on to the MaxCompute console, and select a region in the top navigation bar.

  2. In the left-side navigation pane, choose Disaster Recovery > Zone-disaster Recovery.

  3. On the Zone-disaster Recovery page, click the name of a project for which disaster recovery is enabled. The disaster recovery resource observation page for the project appears.

    You can observe the following information:

    • Basic Information

      This section displays the overall disaster recovery status of the current project. You can view information about zones, disaster recovery of control information, current status, disaster recovery creation time, and last failover time.

      Note
      • If the value of Current Status is Preparing, the system is converting the data storage mode to multi-zone storage.

      • If the value of Current Status is Normal, the data is stored in multi-zone storage mode and the project is equipped with zone-level storage disaster recovery capabilities.

    • Zone Monitoring

      This section displays the monitoring information of multi-zone HA computing. You can view the information of the zones where the multi-zone HA computing resources purchased and bound to the project reside.

      • In Use: Your jobs are running in this zone.

      • Reserved: If the zone where your jobs are running encounters a failure, the computing resources will be switched to this zone.

    • Table Data Details

      You can search for a table exactly based on the schema name and table name. If you do not specify a table, information about all tables in the project is displayed.

      Parameter

      Description

      Schema Name

      A schema in the project.

      Table Name

      The name of a table.

      Partitioned Table

      Specifies whether the table is partitioned.

      Last Data Update Time

      The time when the data in the table was last modified.

      Data Volume

      The data size of the table.

      Data Distribution

      The zones to which data in the table is distributed.

      • If the zones are in the Preparing state, the system is converting the data storage mode to multi-zone storage.

      • If the zones are in the In Use state, data is redundantly stored in multiple zones.

      Actions

      If the table is a partitioned table, click View Partition Details to view the Last Data Update Time, Data Volume, and Data Distribution information of each partition in the table.

More operations

Perform disaster recovery drills

MaxCompute provides project-level disaster simulation and escape drills. The process is as follows:

  1. Submit a ticket to Alibaba Cloud to apply for a disaster recovery drill. You must provide the following information: region, project name, quota name, and drill time period. We recommend that you perform a disaster recovery drill in off-peak hours.

  2. After the ticket is approved by Alibaba Cloud, MaxCompute provides a failover button on the disaster recovery resource observation page at the project level. You can switch the zone where the computing resources are running as prompted. After the switchover, newly submitted jobs will be executed immediately. However, any jobs that fail due to the zone switching process must be retried manually.

Important

The preceding operations are only used in drill scenarios. When an actual zone-level disaster occurs, the system automatically completes the failover of computing resources.

Disable disaster recovery

To disable disaster recovery for a project, you can find the desired project on the Zone-disaster Recovery page, and click Disable Disaster Recovery in the Actions column. Then, enter the project name as prompted, and click OK.

Important
  • After disaster recovery is disabled, project data is redistributed to a single zone for local storage.

  • This is a high-risk operation. Once disaster recovery is disabled, the project will immediately lose its disaster recovery capabilities. Conduct a thorough evaluation before proceeding with this operation.