edit-icon download-icon

SAP HANA High Availability and Disaster Recovery

Last Updated: Dec 27, 2017

High Availability of Alibaba Cloud services

Global infrastructure

Region and zone

Alibaba Cloud infrastructure is distributed in different regions and zones around the world. A region is a physical location in the world where Alibaba Cloud infrastructure is deployed. In most cases, a region contains multiple zones. You can deploy your SAP system on Alibaba Cloud infrastructure that is closest to your users to meet the legal or other business requirements. Regions are isolated from each other. Alibaba Cloud does not automatically synchronize your resources across regions.

A zone is a data center with independent power grids and networks in the same region. Zones can provide your production systems and databases on Alibaba Cloud with higher availability, fault tolerance performance and better scalability.

Alibaba Cloud services run in 29 zones within 14 regions around the world. For details about Alibaba Cloud regions and zones, refer to Regions and Zones.

High availability through multiple zones

Based on Alibaba Cloud’s many years of experience on cloud computing services, customers who care about application availability and performance can deploy their applications in multiple zones within the same region for better fault tolerance and lower network latency.

Within the same region, zones can intercommunicate with each other through the intranet to implement fault isolation. This architecture enables you to deploy applications in different zones within the same region. In this case, the system implements failover between different zones without human intervention when applications encounter problems.

Continuity improvement through cross-region data synchronization

Block storage (cloud disk) on Alibaba Cloud supports the automatic replication of your data within the zone. It prevents unexpected hardware faults from causing data unavailability and protects your services against component faults. In addition, you can store your services in OSS and synchronize data in different regions to realize data redundancy.

Computing

ECS is one of the core services of Alibaba Cloud. It enables you to deploy an ECS instance within minutes to meet your computing requirements in real time, along with a variety of basic components such as CPUs, memories, operating systems, and IP addresses.

In Alibaba Cloud Management Console, you can deploy your applications on different operating systems and manage network access permissions. From the console, you also can easily use more storage features, such as automatic snapshots. An automatic snapshot enables you to rapidly copy and replicate an ECS instance, which is efficient for you to test a new feature or operating system. For details, refer to ECS.

Storage

Block Storage (cloud disk) is a low-latency, persistent, and high-reliability random block-level data storage service provided by Alibaba Cloud to ECS users. You can attach multiple cloud disks to an ECS instance to permanently store data. You can also format the cloud disk that is attached to an ECS instance, create a file system, and store data in the cloud disk. Different service scenarios have different requirements on the I/O performance. Therefore, Alibaba Cloud provides different types of cloud disks that can be used alone or in combination as required. Within the same zone, three copies of data on the cloud disk are automatically stored in different locations to maximize data security. At the same time, you can use the cloud disk snapshot to store and restore your cloud disk. You can also configure the automatic snapshot policies for your cloud disk as required. For details, refer to Disk.

OSS is a simple and low-cost storage service provided by Alibaba Cloud. It can be used to backup and archive data for a long term on Alibaba Cloud. Files stored in OSS can be securely accessed in any place from around the world. OSS guarantees the data reliability of up to 99.99999999%, which is a perfect fit for data storage for global teams and international projects. OSS provides the cross-region data replication feature that allows you to synchronize data in different regions in real time. For the SAP solution, OSS can be used to store database backup and SAP archive files for a long term. For details, refer to OSS.

Automatic Recovery

The Automatic Recovery feature is used to improve the high availability of ECS. If the physical machine where ECS instances are deployed is shut down due to abnormal performance of the underlying physical machine or other causes, protective migration is initiated to migrate the affected ECS instances to another physical machine with normal performance. The instance IDs, private IP addresses, EIPs, and metadata of the ECS instances remain unchanged.

At the same time, Alibaba Cloud sends an Email to users whose services are affected. To effectively use the Automatic Recovery feature of ECS to improve the high reliability of the SAP HANA running environment, it is recommended that you set SAP HANA of the ECS instance to automatically start after system startup. For details about Automatic Recovery of ECS, refer to Automatic recovery of ECS instance FAQs.

NOTE: The Automatic Recovery feature is applicable only to the ECS instances to which cloud disks are attached. For ECS instances using ephemeral disks, after the Email is sent, Alibaba Cloud customer service specialists will contact the instance owner immediately for further actions.

SAP HANA High Availability solutions supported by Alibaba Cloud

Auto-Restart Service

When an SAP HANA service, such as Index Server or Name Server, stops due to program crash or intervention by an administrator, SAP HANA automatically restarts the monitoring program to detect the stopped service and restart it. During the restart, the service loads data into the memory and resumes its functions. Auto-Restart Service takes some time to restore data security.

Auto-Restart Service of SAP HANA works the same way on Alibaba Cloud as it works on any other platform.

Host Auto-Failover

Host Auto-Failover is an N+m node recovery solution provided by SAP. One node or multiple nodes can be configured to work in standby mode and added to a single node or a distributed SAP HANA system. The nodes in standby mode do not store any data and accept any request or query.

When a worker node fails, a standby node in the system automatically takes over its work. As the standby node may take over operations from any of the worker nodes, it needs to access data of all databases. This can be achieved by shared network storage (NFS) or with any storage connector API.

Alibaba Cloud suggests that you fully use the Automatic Recovery feature of Alibaba Cloud ECS. In this case, when a failure occurs on the physical machine where your ECS instance is located, the ECS instance is automatically migrated to another normal physical machine within the same zone. This essentially provides you with a high-availability ECS instance without incurring any additional cost. The ECS instance restored on the new physical machine is identical to the original one, including storage, configurations, IP address, and instace ID. At the same time, you are advised to configure SAP HANA to auto-start during system startup so that the HANA service is automatically restored after your SAP HANA ECS instance is automatically recovered. After restart, it takes some time to load data into the memory. The time required varies with the HANA data volume.

HANA System Replication (HSR)

HANA System Replication (HSR) is a high-availability and disaster recovery solution provided by SAP HANA. After HSR configuration, the secondary node is usually configured as an exact copy of the primary node.

The secondary node can be deployed near the primary node, setting up a rapid failover solution to resolve the planned shutdown or to handle storage corruption or other failures on the primary node. The secondary node can also be installed in a remote site to be used in a disaster recovery solution. With HSR, you can choose many replication options, including synchronous, synchronous in-memory, and asynchronous, depending on your recovery time objective (RTO) and recovery point objective (RPO). For details about HSR, refer to How to perform system replication for SAP HANA.

HSR is fully supported on Alibaba Cloud. You can use it in combination with Alibaba Cloud zones to help protect your data security. Generally, the network speed of the same zone within the same region is faster. It is recommended that synchronous HSR be used within the same zone while asynchronous HSR be used across the zones.

SAP HANA Backup and Restore

Although SAP HANA is an in-memory database, it stores all changes in the persistent storage system to recover data and resume from power outages without any loss of data. To ensure that data can be recovered after a disaster, it backs up data in the persistent storage system and logs in the database to a remote location. For details about backup and restoration of the SAP HANA database, refer to Backup and recovery - SAP HANA.

You can backup and restore the SAP HANA database on Alibaba Cloud, with the same operations as on any other platforms. In addition, you can take advantage of secure, durable, highly scalable, and cost-effective OSS, either by copying your HANA backup files to the OSS bucket or by taking snapshots for the cloud disk that stores HANA backup files to help achieve disaster recovery.

About Storage Replication

SAP HANA hardware partners offer a storage-level system replication solution for SAP HANA, which replicates data, logs, or file systems in the SAP HANA database to a remote networked storage system to restore the SAP HANA database with low RTO after a disaster.

However, Alibaba Cloud does not support Storage Replication.

High Availability and Disaster Recovery solutions for SAP HANA on Alibaba Cloud

You need to select a high-availability and disaster recovery solution for your SAP HANA system on Alibaba Cloud based on your business scenarios and importance. The core determination factors are as follows:

  • RPO: Used to determine the data loss volume.
  • RTO: Used to determine the service unavailability period.

The following figure shows the related concepts.
sap-hana-hadr-rporto-en

The following table describes comparison of the RPO, RTO, and cost between different solutions.

Solution Cost RPO RTO
HSR $$ Low Medium
HSR & secondary node as development and test $$$ Low Medium
HSR & secondary node as development and test $$$ Low Low
ECS Automatic Recovery + SAP HANA backup and restore $ Medium High

ECS Automatic Recovery

Generally, you can leverage the ECS Automatic Recovery feature to restore the SAP HANA ECS instance on another physical machine within the same zone when underlying physical hardware is impaired. When a zone failure occurs, you can refer to the following cross-zone solutions to protect data in your SAP HANA database.

HSR

You can deploy a primary node of SAP HANA in zone A, a secondary node in zone B, and HSR between the two nodes. As HSR is used, data changes on the primary node of SAP HANA will be constantly copied to the secondary node. When the primary node in zone A is unavailable, you can immediately restore the entire HANA instance on the secondary node in zone B.

NOTE: In this scenario, you need to configure HSR to work in asynchronous mode. Therefore, performance of the primary node will not be affected due to waiting for the synchronous feedback from the secondary node.

sap-hana-hadr-hsr-without-preload

HSR has a configuration option: Secondary node preload.

If this option is disabled, data synchronized to the secondary node will not be loaded to the memory in the secondary node. That means you can select an ECS instance with low-end configurations for the secondary node to reduce the total O&M cost. During a failover, you can change the ECS instance type of the secondary node to be the same as that of the primary node. Once the SAP HANA system is fully restored on the secondary node, you can redirect the HANA access from the client to the slave node.

HSR & secondary node as development and test

Based on HSR of SAP HANA, you can fully take advantages of your secondary node to further reduce the total O&M cost. Generally, the ECS instance type of the secondary node can be the same as that of the primary node. Besides taking the secondary node for production environment backup, you also can use it for your HANA development and test environment.

During a failover, the HANA instance of the secondary node provides services for the entire HANA database. At this time, you need to disable the HANA development and test environment on the secondary node, and release the used resources for the HANA production environment. Once the SAP HANA system runs normally on the secondary node, you can redirect the HANA access from the client to the secondary node.

sap-hana-hadr-hsr-devqa

HSR & secondary node with data preload

As mentioned above, HSR has a configuration option: Secondary node preload.

If this option is enabled, data synchronized to the secondary node will be immediately loaded to the memory in the secondary node. The advantage is that your secondary node needs less time to enable the HANA system to run normally. However, this solution requires that the ECS instance type of the secondary node must be the exactly same as the primary node.

sap-hana-hadr-hsr-with-preload

ECS Automatic Recovery & SAP HANA backup and restore

You can use a custom image to rebuild an ECS instance with the same type as the existing one in another zone (for example, zone B in the following figure), and copy the backup files of the SAP HANA database from the OSS bucket stored in another region to the cloud disk which is attached to the new ECS instance. Once the backup files are copied to the cloud disk, you can use the SAP HANA restore feature to restore the SAP HANA database on the new ECS instance. After the SAP HANA database runs normally on the new ECS instance, you can switch the HANA access from the client to the new instance.

sap-hana-hadr-backuprestore

The RPO of the SAP HANA database depends on how frequently you back up the SAP HANA database and copy the backup files to OSS.

Triggering HSR Takeover

To start the SAP HANA disaster recovery, you need to trigger the takeover procedure of SAP HANA System Replication on your secondary node. For details, refer to the “Takeover” section in How to perform system replication for SAP HANA.

In addition, SAP Note 2063657 provides SAP guidelines to help you decide whether the secondary node takeover is the optimal choice.

SAP HANA client redirection

At the end of an SAP HANA takeover process, you need to ensure that the client applications of SAP HANA (for example, the NetWeaver application server, JDBC links, and ODBC connection) can re-establish their connection with the node of the new SAP HANA server after the failover. You can complete the redirection by updating either the IP address or DNS of the SAP HANA database on the client.

For details about how to redirect a client access after an SAP HANA failover, refer to the “Client connection recovery” section in SAP HANA administration guide.

References

Thank you! We've received your feedback.