All Products
Search
Document Center

SAP HA Test Cases on Alibaba Cloud

Last Updated: Aug 05, 2019

SAP HA test cases on Alibaba Cloud

Release history

Version Revision date Changes Release date
1.0 2019-08-05

Overview

This topic describes scenarios and operations for HA failover tests to be performed after SAP intra-zone high availability (HA) deployment is completed. HA failover tests aim to verify the integrity and validity of HA deployment for SAP NetWeaver and SAP HANA. This topic is for reference only. For more information about management and operations, see relevant administration guides released by SUSE and SAP.

Scenarios

Cluster Test
SAP NetWeaver cluster ? SAP NetWeaver tests
? Cluster tests
? Infrastructure tests
SAP HANA system replication cluster ? SAP HANA tests
? Cluster tests
? Infrastructure tests

SAP NetWeaver test scenarios

SAP NetWeaver tests

1. Message server
(1) Prerequisites: The #crm status command output indicates that the statuses of the cluster and all resources are normal.
(2) Operations: Obtain and verify the process ID of message server. If the process ID is correct, kill the message server process.

  1. pgrep -fl ms.sap<SID>
  2. kill -9 <br>

(3) Result: After the message server process is killed, it automatically restarts with a new process ID. When the number of times that the process is killed and restarted reaches the value specified by Max_Program_Restart, HA failover is triggered. All resources are migrated to the secondary node, and the ABAP Central Service (ASCS) instance is started on the secondary node.

2. Enqueue server
(1) Prerequisites: The #crm status command output indicates that the statuses of the cluster and all resources are normal.
(2) Operations: Obtain and verify the process ID of enqueue server. If the process ID is correct, kill the enqueue server process.

  1. pgrep -fl en.sap<SID>
  2. kill -9 <pid-en>

(3) Result: After the enqueue server process is killed, HA failover is triggered. All resources are migrated to the secondary node, and the ASCS instance is started on the secondary node.

3. Enqueue replication server
(1) Prerequisites: The #crm status command output indicates that the statuses of the cluster and all resources are normal.
(2) Operations: Obtain and verify the process ID of enqueue replication server. If the process ID is correct, kill the enqueue replication server process.

  1. pgrep -fl er.sap<SID>
  2. kill -9 <pid-er>

(3) Result: After the enqueue replication server process is killed, it automatically restarts with a new process ID.

4. Sapstartsrv
(1) Prerequisites: The #crm status command output indicates that the statuses of the cluster and all resources are normal.
(2) Operations: Obtain and verify the process ID of sapstartsrv. If the process ID is correct, kill the sapstartsrv process.

  1. pgrep -fl "ASCS.*sapstartsrv"
  2. kill -9 <pid-ssrv>

(3) Result: After the sapstartsrv process is killed, it automatically restarts with a new process ID.

5. Dispatcher
(1) Prerequisites: The #crm status command output indicates that the statuses of the cluster and all resources are normal. The SAP Primary Application Server (PAS) instance is deployed on two local hosts in HA mode.
(2) Operations: Obtain and verify the process ID of the dispatcher process for the PAS instance. If the process ID is correct, kill the process.

  1. pgrep -fl dw.sap<SID>
  2. kill -9 <pid-disp>

(3) Result: After the dispatcher process for the PAS instance is killed, this process automatically restarts with a new process ID.

6. Sapcontrol command
(1) Prerequisites: The #crm status command output indicates that the statuses of the cluster and all resources are normal.
(2) Operations: Run the sapcontrol command to stop the ASCS instance.

  1. sapcontrol -nr <ASCSNR> -function "Start/Stop"

(3) Result: HA failover is not triggered, and multi-state resources stay stopped or disabled.

Cluster tests

1. Software
(1) Prerequisites: The #crm status command output indicates that the statuses of the cluster and all resources are normal.
(2) Operations: Obtain and verify the process ID of corosync on a node. If the process ID is correct, kill the corosync process.

  1. pgrep -fl corosync
  2. kill -9 <pid>

(3) Result: The node is fenced. All resources are switched to the secondary node.
2. ASCS instance resources
(1) Prerequisites: The #crm status command output indicates that the statuses of the cluster and all resources are normal.
(2) Operations: Migrate ASCS instance resources from the original node to the secondary node.

  1. crm resource move rsc_sap_ASCS <target_node>

(3) Result: ASCS instance resources are migrated to the secondary node.
3. Maintenance mode
(1) Prerequisites: The #crm status command output indicates that the statuses of the cluster and all resources are normal.
(2) Operations: Switch the cluster to the maintenance mode. After the maintenance is completed, switch the cluster back to the normal status.

  1. crm configure property maintenance-mode=on
  2. crm configure property maintenance-mode=off

(3) Result: After the cluster enters the maintenance mode, SAP instances keep running, and all resources in the cluster enter the unmanaged mode. In this case, you can perform maintenance on the cluster. After the cluster maintenance is completed, run the crm configure property maintenance-mode=off command to switch the cluster from the maintenance mode to the normal status.
4. Node fencing
(1) Prerequisites: The #crm status command output indicates that the statuses of the cluster and all resources are normal.
(2) Operations: Ensure that the cluster is in the normal status. Fence a node, and then recover the node.

  1. crm node fence <target_node>
  2. sbd -d /<device name>/message <node name> clear
  3. systemctl start pacemaker

(3) Result: After a node is fenced, resources on the node are migrated to the secondary node. The fenced node restarts, and the shoot the other node in the head (STONITH) device is reset. To recover the fenced node, change the status of the STONITH device to clear, and then start pacemaker to recover the cluster.

Infrastructure tests

1. Stop or restart a host
(1) Prerequisites: The #crm status command output indicates that the statuses of the cluster and all resources are normal.
(2) Operations: Stop the host of a node in the console. Then, recover the node to the normal status.

  1. In the ECS console, click Instances, select the host you want to stop, click More, and then click Stop.
  2. systemctl start pacemaker
  3. crm_mon -r
  4. ps -ef | grep ERS

(3) Result: HA failover is triggered. All resources on the stopped host are migrated to the secondary node, and the ASCS instance is started on the secondary node. Restart the host. Start the cluster. Check the cluster status and whether the Enqueue Replication Server (ERS) process is started.

2. Disable the heartbeat NIC
(1) Prerequisites: The #crm status command output indicates that the statuses of the cluster and all resources are normal. The ifconfig command output indicates that the NICs are normal. The communication between the heartbeat network and the business network is normal. The redundancy configuration for corosync is normal.
(2) Operations: Disable the heartbeat NIC on the primary node in the cluster, send PING messages to the IP address of the heartbeat NIC, and check the cluster status. Then, enable the heartbeat NIC.

  1. ifdown eth<nr>
  2. Send PING messages to the IP address of the heartbeat NIC.
  3. crm_mon -r
  4. ifup eth<nr>

(3) Result: After the heartbeat NIC for the primary node is disabled, the ping to the IP address of the heartbeat NIC fails, HA failover is not triggered, and a redundant business NIC is used for communication. After you check the test result, enable the heartbeat NIC.

3. Disable the business NIC
(1) Prerequisites: The #crm status command output indicates that the statuses of the cluster and all resources are normal. The ifconfig command output indicates that the NICs are normal. The communication between the heartbeat network and the business network is normal. The redundancy configuration for corosync is normal.
(2) Operations: Disable the business NIC on the primary node in the cluster, send PING messages to the IP address of the business NIC, and check the cluster status. Then, enable the business NIC.

  1. ifdown eth<nr>
  2. Send PING messages to the IP address of the business NIC.
  3. crm_mon -r
  4. Log on to the primary node by using the IP address of the heartbeat NIC.
  5. ifup eth<nr>

(3) Result: After the business NIC for the primary node is disabled, the ping to the IP address of the business NIC fails, HA failover is triggered, and resources on the primary node are migrated to the secondary node. After you check the test result, log on to the primary node by using the IP address of the heartbeat NIC, and enable the business NIC.

SAP HANA test scenarios

SAP HANA tests

1. Index server on the primary node
(1) Prerequisites: The #crm status command output indicates that the statuses of the cluster and all resources are normal.
(2) Operations: Obtain and verify the process ID of index server on the primary node. If the process ID is correct, kill the index server process.

  1. pgrep -fl hdbindexserver
  2. kill -9 <pid-indexserver>
  3. HDB info

(3) Result: After the index server process is killed, it automatically restarts with a new process ID. You can run the HDB info command to view the process ID.

2. Index server on the secondary node
(1) Prerequisites: The #crm status command output indicates that the statuses of the cluster and all resources are normal.
(2) Operations: Obtain and verify the process ID of index server on the secondary node. If the process ID is correct, kill the index server process.

  1. pgrep -fl hdbindexserver
  2. kill -9 <pid-indexserver>
  3. HDB info

(3) Result: After the index server process is killed, it automatically restarts with a new process ID. You can run the HDB info command to view the process ID.

3. XS engine server on the primary node
(1) Prerequisites: The #crm status command output indicates that the statuses of the cluster and all resources are normal.
(2) Operations: Obtain and verify the process ID of XS engine server on the primary node. If the process ID is correct, kill the XS engine server process.

  1. pgrep -fl xsengine
  2. kill -9 <pid-hdbxsengine>
  3. HDB info

(3) Result: After the XS engine server process is killed, it automatically restarts with a new process ID. You can run the HDB info command to view the process ID.

4. XS engine server on the secondary node
(1) Prerequisites: The #crm status command output indicates that the statuses of the cluster and all resources are normal.
(2) Operations: Obtain and verify the process ID of XS engine server on the secondary node. If the process ID is correct, kill the XS engine server process.

  1. pgrep -fl xsengine
  2. kill -9 <pid-hdbxsengine>

(3) Result: After the XS engine server process is killed, it automatically restarts with a new process ID. You can run the HDB info command to view the process ID.

5. Nameserver on the primary node
(1) Prerequisites: The #crm status command output indicates that the statuses of the cluster and all resources are normal.
(2) Operations: Obtain and verify the process ID of nameserver on the primary node. If the process ID is correct, kill the nameserver process.

  1. pgrep -fl hdbnameserver
  2. kill -9 <pid-hdbnameserver>
  3. HDB info

(3) Result: After the nameserver process is killed, it automatically restarts with a new process ID. You can run the HDB info command to view the process ID.

6. Nameserver on the secondary node
(1) Prerequisites: The #crm status command output indicates that the statuses of the cluster and all resources are normal.
(2) Operations: Obtain and verify the process ID of nameserver on the secondary node. If the process ID is correct, kill the nameserver process.

  1. pgrep -fl hdbnameserver
  2. kill -9 <pid-hdbnameserver>

(3) Result: After the nameserver process is killed, it automatically restarts with a new process ID. You can run the HDB info command to view the process ID.

7. Daemons on the primary node
(1) Prerequisites: The #crm status command output indicates that the statuses of the cluster and all resources are normal.
(2) Operations: Obtain and verify the process IDs of the daemons on the primary node. If the process IDs are correct, kill the daemons.

  1. pgrep -fl hdb.sap<SID>
  2. kill -9 <pid-disp>
  3. crm_mon -r
  4. HDB info
  5. hdbnsutil -sr_state
  6. hdbnsutil -sr_register --remoteHost=secondary node --remoteInstance= --replicationMode= --name= --operationMode=
  7. HDB start
  8. crm resource cleanup <failed resource>

(3) Result: After the daemons are killed, if PREFER_SITE_TAKEOVER is set to true, HA failover is triggered, resources are migrated to the secondary node, and the secondary node is promoted to be the primary node. If AUTOMATED_REGISTER is set to false, you need to reconfigure SAP HANA SR, register the former primary node to be the new secondary node, and then start SAP HANA instances.

8. Daemons on the secondary node
(1) Prerequisites: The #crm status command output indicates that the statuses of the cluster and all resources are normal.
(2) Operations: Obtain and verify the process IDs of the daemons on the secondary node. If the process IDs are correct, kill the daemons.

  1. pgrep -fl hdb.sap<SID>
  2. kill -9 <pid-disp>
  3. HDB info
  4. crm_mon -r
  5. crm resource cleanup <failed resource>

(3) Result: After the daemons are killed, HA failover is not triggered. After SAP HANA instances on the secondary node are killed, the cluster restarts the SAP HANA instances.

Cluster tests

1. Software
(1) Prerequisites: The #crm status command output indicates that the statuses of the cluster and all resources are normal.
(2) Operations: Obtain and verify the process ID of corosync on a node. If the process ID is correct, kill the corosync process.

  1. pgrep -fl corosync
  2. kill -9 <pid>
  3. hdbnsutil -sr_register --remoteHost=secondary node --remoteInstance= --replicationMode= --name= --operationMode= #If the corosync process on the primary node is killed, you need to reconfigure SAP HANA SR, register the primary node to be the new secondary node, and then restart the cluster.#
  4. sbd -d /<device name> message node clear
  5. systemctl start pacemaker
  6. HDB info
  7. crm_mon -r
  8. crm resource cleanup <failed resource>

(3) Result: After the node is fenced, all resources on the node are migrated to another node. To recover the fenced node, change the status of the STONITH device to clear. If the fenced node is the primary node, you need to reconfigure SAP HANA SR, register the primary node to be the new secondary node, and then restart the cluster.

2. Node fencing
(1) Prerequisites: The #crm status command output indicates that the statuses of the cluster and all resources are normal.
(2) Operations: Fence a node.

  1. crm node fence <target_node>
  2. hdbnsutil -sr_register --remoteHost=secondary node --remoteInstance= --replicationMode= --name= --operationMode= #If the fenced node is the primary node, you need to reconfigure SAP HANA SR, register the primary node to be the new secondary node, and then restart the cluster.#
  3. sbd -d /<device name> message node clear
  4. systemctl start pacemaker
  5. HDB info
  6. crm_mon -r
  7. crm resource cleanup <failed resource>

(3) Result: After the node is fenced, all resources on the node are migrated to another node. To recover the fenced node, change the status of the STONITH device to clear. If the fenced node is the primary node, you need to reconfigure SAP HANA SR, register the primary node to be the new secondary node, and then restart the cluster.

3. Maintenance mode
(1) Prerequisites: The #crm status command output indicates that the statuses of the cluster and all resources are normal.
(2) Operations: Switch the cluster to the maintenance mode. After the maintenance is completed, switch the cluster back to the normal status.

  1. crm configure property maintenance-mode=on
  2. crm configure property maintenance-mode=off

(3) Result: After the cluster enters the maintenance mode, SAP HANA instances keep running, and all resources in the cluster enter the unmanaged mode. In this case, you can perform maintenance on the cluster. After the cluster maintenance is completed, run the crm configure property maintenance-mode=off command to switch the cluster from the maintenance mode to the normal status.

Infrastructure tests

1. Stop or restart a host
(1) Prerequisites: The #crm status command output indicates that the statuses of the cluster and all resources are normal.
(2) Operations: Stop or restart the host of a node in the console. Then, recover the node to the normal status.

  1. In the console, click Instances, select the host you want to stop or restart, click More, and then click Stop or Restart.
  2. hdbnsutil -sr_register --remoteHost=secondary node --remoteInstance= --replicationMode= --name= --operationMode= #If the host of the secondary node is stopped or restarted, you do not need to reconfigure SAP HANA SR and register the node.#
  3. systemctl start pacemaker
  4. crm_mon -r
  5. HDB info

(3) Result: HA failover is triggered. All resources on the stopped or restarted host are migrated to the secondary node. If the primary node is stopped or restarted, you need to register the primary node to be the secondary node. (This is not required if the secondary node is stopped or restarted.) Then, start the cluster, and check the cluster status and whether HANA instances are started.

2. Disable the heartbeat NIC
(1) Prerequisites: The #crm status command output indicates that the statuses of the cluster and all resources are normal. The ifconfig command output indicates that the NICs are normal. The communication between the heartbeat network and the business network is normal. The redundancy configuration for corosync is normal.
(2) Operations: Disable the heartbeat NIC on the primary node in the cluster, send PING messages to the IP address of the heartbeat NIC, and check the cluster status. Then, enable the heartbeat NIC.

  1. ifdown eth<nr>
  2. Send PING messages to the IP address of the heartbeat NIC.
  3. crm_mon -r
  4. ifup eth<nr>

(3) Result: After the heartbeat NIC for the primary node is disabled, the ping to the IP address of the heartbeat NIC fails, HA failover is not triggered, and a redundant business NIC is used for communication. After you check the test result, enable the heartbeat NIC.

3. Disable the business NIC
(1) Prerequisites: The #crm status command output indicates that the statuses of the cluster and all resources are normal. The ifconfig command output indicates that the NICs are normal. The communication between the heartbeat network and the business network is normal. The redundancy configuration for corosync is normal.
(2) Operations: Disable the business NIC on the primary node or the secondary node in the cluster, send PING messages to the IP address of the business NIC, and check the cluster status. Then, enable the business NIC.

  1. ifdown eth<nr>
  2. Send PING messages to the IP address of the business NIC.
  3. Log on to the primary or secondary node by using the IP address of the heartbeat NIC.
  4. ifup eth<nr>
  5. hdbnsutil -sr_register --remoteHost=secondary node --remoteInstance= --replicationMode= --name= --operationMode= #If the business NIC on the primary node is disabled, you need to reconfigure SAP HANA SR, register the primary node to be the secondary node, and then start the cluster.#
  6. systemctl start pacemaker
  7. crm_mon -r
  8. HDB info

(3) Result: After the business NIC for the primary node is disabled, the ping to the IP address of the business NIC fails, HA failover is triggered, and resources on the primary node are migrated to the secondary node. After you check the test result, log on to the primary node by using the IP address of the heartbeat NIC, and enable the business NIC. In this case, you need to register the primary node to be the secondary node. (The registration is not required if the business NIC for the secondary node is disabled.) Then, start the cluster, and check whether SAP HANA instances are started.