Scaling groups cannot be directly associated with Tair (Redis OSS-compatible) instances, which means every scale-out or scale-in event requires a manual whitelist update — a slow, error-prone process. This tutorial shows how to use Auto Scaling lifecycle hooks with a CloudOps Orchestration Service (OOS) public template to automate whitelist management on each scaling event.
How it works
When a lifecycle hook fires, Auto Scaling holds the new (or terminating) ECS instance in a Pending Add state and triggers OOS to run the ACS-ESS-LifeCycleModifyRedisIPWhitelist template. The template retrieves the instance's private IP address and adds it to (or removes it from) the Tair instance whitelist. After the template finishes, OOS completes the lifecycle action.
Scale-out event
Auto Scaling adds ECS instance
|
v
Lifecycle hook fires -> instance enters Pending Add state
|
v
OOS runs ACS-ESS-LifeCycleModifyRedisIPWhitelist
|
v
Private IP added to Tair whitelist
|
v
Lifecycle action completed -> instance enters InService stateThe Default Execution Policy of a lifecycle hook determines what happens if the OOS template fails or the timeout expires. If set to Continue, the instance continues even if the whitelist update fails — the instance will not be able to connect to Tair. If set to Reject, the lifecycle action is rejected. Choose Reject for stricter access control, or Continue if you prefer availability over strict whitelist enforcement.
Prerequisites
Before you begin, make sure that you have:
A scaling group in the Enabled state
A Tair (Redis OSS-compatible) instance
A Resource Access Management (RAM) role for OOS, with Alibaba Cloud Service as the trusted entity and CloudOps Orchestration Service as the trusted service, and with the permissions to perform operations on the OOS template. For details, see Use RAM to grant permissions to OOS.
This tutorial uses OOSServiceRole as the example RAM role name. You can use any role that meets the requirements above.Step 1: Grant the OOS RAM role the required permissions
The ACS-ESS-LifeCycleModifyRedisIPWhitelist template needs to describe ECS instances, modify Tair security IPs, and complete lifecycle actions. Create a permission policy that grants exactly these three actions, then attach it to the OOS RAM role.
Log on to the RAM console.
Create a permission policy.
In the left-side navigation pane, choose Permissions > Policies.
Click Create Policy.
On the Create Policy page, click the JSON tab and enter the following policy document, then click OK.
Parameter Value Name ESSHookPolicyForRedisWhitelistPolicy document See JSON below { "Version": "1", "Statement": [ { "Action": [ "ecs:DescribeInstances" ], "Resource": "*", "Effect": "Allow" }, { "Action": [ "kvstore:ModifySecurityIps" ], "Resource": "*", "Effect": "Allow" }, { "Action": [ "ess:CompleteLifecycleAction" ], "Resource": "*", "Effect": "Allow" } ] }
Attach the policy to the OOS RAM role.
In the left-side navigation pane, choose Identities > Roles.
Find the
OOSServiceRoleRAM role and click Grant Permission in the Actions column.In the Grant Permission panel, configure the following settings and click Grant permissions.
Parameter Value Resource scope Account Policy ESSHookPolicyForRedisWhitelist(custom policy)
Step 2: Create a lifecycle hook
Create a lifecycle hook that triggers OOS when a scale-out event occurs.
-
Log on to the Auto Scaling console.
Log on to the Auto Scaling console.
In the left-side navigation pane, click Scaling Groups.
In the top navigation bar, select a region.
Find the scaling group and click its ID (or click Details in the Actions column) to open the details page.
On the Lifecycle Hook tab, click Create Lifecycle Hook.
Configure the following parameters and click OK. In the template configuration, also set:
Parameter Value Name ESSHookForAddRedisWhitelistScaling Activity Scale-out Event Timeout Period 300(seconds). Set this to longer than the time required for OOS to complete the whitelist update. If the timeout expires before OOS finishes, the default execution policy takes effect.Default Execution Policy Continue (or Reject — see the note in How it works) Send Notification When Lifecycle Hook Takes Effect Select OOS Template > Public Templates > ACS-ESS-LifeCycleModifyRedisIPWhitelistTemplate parameter Value dbInstanceIdThe ID of your Tair instance modifyModeAppend(adds the IP on scale-out)OOSAssumeRoleOOSServiceRole
Lifecycle hooks fire only when scaling events are triggered by scaling rules (manual execution, scheduled tasks, or event-triggered tasks). They do not fire when you manually add or remove ECS instances in the console.
Step 3: Trigger a scale-out event and verify the result
Trigger a scale-out to confirm that the lifecycle hook and OOS template work end-to-end.
Trigger a scale-out
On the scaling group details page, click the Scaling Rules and Event-triggered Tasks tab.
On the Scaling Rules tab, click Create Scaling Rule and configure the following settings, then click OK.
Parameter Value Rule Name Add1Rule Type Simple Scaling Rule Operation Add 1 Instances Find the
Add1rule and click Execute in the Actions column, then confirm. Auto Scaling adds one ECS instance to the scaling group. The instance enters the Pending Add state because theESSHookForAddRedisWhitelistlifecycle hook is active. During the timeout period, Auto Scaling notifies OOS to run theACS-ESS-LifeCycleModifyRedisIPWhitelisttemplate.
Verify the whitelist update
Log on to the Tair (Redis OSS-compatible) console.
In the left-side navigation pane, click Instances.
Find the instance and click its ID in the Instance ID/Name column.
In the left-side navigation pane, click Whitelist Settings. If the private IP address of the new ECS instance appears in the whitelist, the automation worked correctly. If the IP address is missing, check the OOS execution details as described in Check the OOS execution.
Set up scale-in (remove IPs on termination)
To automatically remove private IP addresses from the whitelist when instances terminate, create a separate lifecycle hook for scale-in events.
Follow Step 2: Create a lifecycle hook, with these changes:
| Parameter | Value |
|---|---|
| Name | ESSHookForRemoveRedisWhitelist |
| Scaling Activity | Scale-in Event |
modifyMode template parameter | Remove (removes the IP on scale-in) |
The scale-in lifecycle hook works the same way: when Auto Scaling terminates an instance, OOS retrieves the instance's private IP and removes it from the Tair whitelist before the instance is fully terminated.
Check the OOS execution (optional)
If the whitelist was not updated after a scaling event, inspect the OOS execution to diagnose the issue.
Log on to the OOS console.
Log on to the OOS console.
In the left-side navigation pane, choose Automated Task > Task Execution Management.
Find the execution by time and click Details in the Actions column.
In the Basic Information section, check the execution status. In the Execution Steps and Results section, click a task node to view its output. For details on reading execution results, see View the details of an execution.
FAQ
If an Operation and Maintenance (O&M) task fails, find the cause based on the error message in the execution result. For more information, see FAQ.
Common error messages are as follows:
Error message | Cause | Solution |
Forbidden.Unauthorized message: A required authorization for the specified action is not supplied. | Auto Scaling is not authorized to perform the specified action. | Verify that the required permissions are granted to the OOSServiceRole RAM role. |
Forbidden.RAM message: User not authorized to operate on the specified resource, or this API doesn't support RAM. | The RAM user or RAM role does not have the permissions to operate on the corresponding resource. | Verify that the OOSServiceRole RAM role has the required permissions. For example, you can grant the sample permissions for the OOS service to the RAM role. You must add operation permissions to the RAM role to ensure that the OOS service can manage the resources specified in the OOS template. |
LifecycleHookIdAndLifecycleActionToken.Invalid message: The specified lifecycleActionToken and lifecycleActionId you provided does not match any in process lifecycle action. | The ongoing lifecycle action has expired or has been aborted. | Evaluate the timeout period of the lifecycle hook to ensure that the O&M tasks defined in the OOS template can be completed within the timeout period. |
Troubleshooting
| Error message | Cause | Fix |
|---|---|---|
Forbidden.Unauthorized: A required authorization for the specified action is not supplied. | The OOS RAM role is missing required permissions. | Verify that OOSServiceRole has the ESSHookPolicyForRedisWhitelist policy attached with Account scope. |
Forbidden.RAM: User not authorized to operate on the specified resource, or this API doesn't support RAM. | The RAM role lacks permissions on the resources declared in the OOS template. | Grant OOSServiceRole the permissions for all resources used by the template (ECS, Auto Scaling, and Tair). |
LifecycleHookIdAndLifecycleActionToken.Invalid: The specified lifecycleActionToken and lifecycleActionId you provided does not match any in process lifecycle action. | The lifecycle hook timed out before OOS finished. | Increase the Timeout Period of the lifecycle hook so OOS has enough time to complete the whitelist update. |
For more OOS troubleshooting information, see FAQ.