All Products
Search
Document Center

Cloud Config:Automatically remediate non-compliant resources across accounts in an enterprise by using a resource directory

Last Updated:Jan 12, 2024

This topic describes how to automatically remediate non-compliant resources across accounts in an enterprise. In this topic, the cross-account resource management capability of a resource directory and the account group feature of Cloud Config are used.

Prerequisites

  • A resource directory is enabled. For more information, see Enable a resource directory.

  • Function Compute is activated. For more information, see Activate Function Compute.

    Important

    When you use Function Compute functions, you are charged for the number of function calls, resource usage, and outbound Internet traffic. For more information, see Billing overview.

Background information

Cloud Config can detect non-compliant resources based on rules. You can configure custom remediation for the non-compliant resources. If an enterprise needs to remediate non-compliant resources across accounts, the enterprise can use the organizational structure of a resource directory to manage accounts and resources. For more information about resource directories, see Resource Directory overview. In this example, the ecs-instance-monitor-enabled rule is used to detect and automatically remediate non-compliant resources across accounts. Account A (ID: 100931896542****) is the management account or a delegated administrator account of a resource directory. Account B (ID: 178366182654****) and Account A are the member accounts of the same resource directory. Account B has non-compliant resources. The following section describes how to log on to Cloud Config as Account A and detect and remediate non-compliant resources in Account B.

Step 1: Create a role for the management account and attach policies to the role

  1. Log on to the Resource Access Management (RAM) console.

  2. Create a RAM role.

    1. In the left-side navigation pane, choose Identities > Roles.

    2. Click Create Role. In the Create Role panel, set the parameters.

      1. Set the Select Trusted Entity parameter to Alibaba Cloud Account and click Next.

      2. Enter a RAM role name. In this example, ConfigCustomRemediationRole is used. Set the Select Trusted Alibaba Cloud Account parameter to Current Alibaba Cloud Account.

      3. Click OK.

      4. Click Close.

  3. Create a permissions policy.

    1. In the left-side navigation pane, choose Permissions > Policies.

    2. Click Create Policy. The Create Policy page appears.

      1. On the JSON tab, enter the following policy script.

        // The entity that assumes this role has the permissions to install the CloudMonitor agent. 
        {
          "Version": "1",
          "Statement": [
            {
              "Effect": "Allow",
              "Action": "cms:InstallMonitoringAgent",
              "Resource": "*"
            },
            {
              "Action": "sts:AssumeRole",
              "Effect": "Allow",
              "Resource": "*"
            }
          ]
        }
      2. Click Next to edit policy information. On the page appears, enter a policy name. In this example, ConfigCustomRemediationPolicy is used.

      3. Click OK.

  4. Grant permissions to the role.

    1. In the left-side navigation pane, choose Permissions > Grants.

    2. Click Grant Permission. In the Grant Permission panel, grant permissions to the role that you created.

      1. Set the Authorized Scope parameter to Alibaba Cloud Account.

      2. Enter ConfigCustomRemediationRole in the Principal field and select the role in the drop-down list.

      3. Click the Custom Policy tab in the Select Policy section. Enter ConfigCustomRemediationPolicy in the search box and click the displayed policy to select the policy.

      4. Click OK.

  5. Attach a trust policy to the role.

    1. In the left-side navigation pane, choose Identities > Roles.

    2. On the Roles page, search for the ConfigCustomRemediationRole role and click the role name to go to the role configuration page.

    3. On the Trust Policy Management tab, click Edit Trust Policy. In the Edit Trust Policy panel that appears, enter the following policy script.

      // Allow the Function Compute service to assume the role.
      { 
        "Statement": [
          {
            "Action": "sts:AssumeRole",
            "Effect": "Allow",
            "Principal": {
              "RAM": [
                "acs:ram::100931896542****:root"
              ]
            }
          },
          {
            "Action": "sts:AssumeRole",
            "Effect": "Allow",
            "Principal": {
              "Service": [
                "fc.aliyuncs.com"
              ]
            }
          }
        ],
        "Version": "1"
      }
    4. Click OK.

Step 2: Create a role for the member account and attach policies to the role

  1. Create a RAM role and grant permissions to the role.

    For more information, see Substeps 1 to 4 in Step 1.

  2. Attach a trust policy to the role.

    For more information, see Substep 5 in Step 1. Replace the policy script with the following sample script.

    // Allow Account A (ID: 100931896542****) to assume the role.
    {
      "Statement": [
        {
          "Action": "sts:AssumeRole",
          "Effect": "Allow",
          "Principal": {
            "RAM": [
              "acs:ram::178366182654****:root"
            ]
          }
        },
        {
          "Action": "sts:AssumeRole",
          "Effect": "Allow",
          "Principal": {
            "RAM": [
              "acs:ram::100931896542****:role/configcustomremediationrole"
            ]
          }
        }
      ],
      "Version": "1"
    }

Step 3: Create a custom remediation function

  1. Log on to the Function Compute console.

  2. Create a service.

    1. In the left-side navigation pane, click Services & Functions.

    2. Click Create Service. The Create Service panel appears.

      1. Enter a service name. In this example, ConfigRemediationService is used.

      2. In advanced options, set the Service Role parameter to ConfigCustomRemediationRole.

      3. Click OK.

  3. Create a custom remediation function.

    1. In the left-side navigation pane of the service details page, click Functions.

    2. Click Create Function. The Create Function page appears.

      1. Select Use Built-in Runtime.

      2. In the Basic Settings section, enter a function name. In this example, the Function Name parameter is set to ConfigRemediationFunction. Set the Handler Type parameter to Event Handler.

      3. In the Code section, set the Runtime parameter to Python 3.9 and the Code Upload Method parameter to Use Sample Code.

      4. Click Create to go to the function details page.

      5. On the Code tab, enter the following sample code for the resource remediation function.

        #!/usr/bin/env python
        # -*- encoding: utf-8 -*-
        import json
        from aliyunsdkcore.client import AcsClient
        from aliyunsdkcore.acs_exception.exceptions import ClientException
        from aliyunsdkcore.acs_exception.exceptions import ServerException
        from aliyunsdkcore.request import CommonRequest
        from aliyunsdkcore.auth.credentials import StsTokenCredential
        from aliyunsdksts.request.v20150401.AssumeRoleRequest import AssumeRoleRequest
        import logging
        
        logger = logging.getLogger()
        
        # The sample code is used to remediate non-compliant resources based on the ecs-instance-monitor-enabled rule. You can modify the remediation logic based on your business requirements.
        def handler(event, context):
            get_resources_non_compliant(event, context)
        
        def get_resources_non_compliant(event, context):
            resources = parse_json(event)
            for resource in resources:
                remediation(resource, context)
        
        def parse_json(content):
            """
            Parse string to json object
            :param content: json string content
            :return: Json object
            """
            try:
                return json.loads(content)
            except Exception as e:
                logger.error('Parse content:{} to json error:{}.'.format(content, e))
                return None
        
        def remediation(resource, context):
            logger.info(resource)
            region_id = resource['regionId']
            account_id = resource['accountId']
            resource_id = resource['resourceId']
            resource_type = resource['resourceType']
            config_rule_id = resource['configRuleId']
            if resource_type == 'ACS::ECS::Instance':
                logger.info("process account_id: {}, resource_id: {}, config_rule_id: {}".format(
                    account_id, resource_id, config_rule_id))
                install_monitoring_agent(context, account_id, region_id, resource_id)
        
        def install_monitoring_agent(context, account_id, resource_region_id, resource_id):
            logger.info("start install agent {}: {}".format(resource_region_id, resource_id))
        
            token = assume_role_and_get_token(context, account_id, resource_region_id)
            client = AcsClient(token['Credentials']['AccessKeyId'], token['Credentials']['AccessKeySecret'],
                               region_id=resource_region_id)
            request = CommonRequest()
            request.set_accept_format('json')
            request.set_domain(f'metrics.{resource_region_id}.aliyuncs.com')
            request.set_method('POST')
            request.set_protocol_type('https') # https | http
            request.set_version('2019-01-01')
            request.set_action_name('InstallMonitoringAgent')
            request.add_query_param('InstanceIds.1', resource_id)
            request.add_query_param('Force', "true")
            request.add_query_param('SecurityToken', token['Credentials']['SecurityToken'])
        
            response = client.do_action_with_exception(request)
            logger.info(response)
        
        # Assume the role to obtain a temporary Security Token Service (STS) token. Replace the role name in the sample code with the actual role that you use.
        def assume_role_and_get_token(context, account_id, region_id):
            creds = context.credentials
            logger.info('assume_role_and_get_token begin.')
            credentials = StsTokenCredential(creds.access_key_id, creds.access_key_secret, creds.security_token)
            client = AcsClient(credential=credentials)
        
            request = AssumeRoleRequest()
            request.set_domain(f'sts-vpc.{region_id}.aliyuncs.com')
            request.set_accept_format('json')
        
            request.set_RoleArn(f'acs:ram::{account_id}:role/configcustomremediationrole')
            request.set_RoleSessionName("ConfigCustomRemediationRole")
            response = client.do_action_with_exception(request)
            logger.info('assume_role_and_get_token response : {}.'.format(response))
        
            token = json.loads(response)
            logger.info('assume_role_and_get_token: {}, assume role: {}.'.format(context.credentials, token))
            return token

Step 4: Create a rule and configure custom remediation

  1. Log on to the Cloud Config console.

  2. Create an account group and add Account A and Account B to the account group.

    For more information, see Create an account group.

  3. In the upper-left corner of the Cloud Config console, switch to the account group that you created in the previous step.

  4. Create a rule. For more information, see Create a rule based on a managed rule.

    1. In the Select Create Method step, select Based on managed rule, search for and select the ecs-instance-monitor-enabled rule, and then click Next.

    2. In the Set Basic Properties step, set the Rule Name, Risk Level, Trigger, and Description parameters and click Next.

    3. In the Set Effective Scope step, retain the default settings and click Next.

    4. In the Set Remediation step, turn on Set Remediation, set the Invoke Type parameter to Automatic Remediation, and then select the Function Compute function that you created in Step 3 in the Function ARN section. Click Submit.

      Note

      If the custom remediation function is being tested, you can set the Invoke Type parameter to Manual Remediation to observe and debug the function. After the function passes the test, you can set the Invoke Type parameter to Automatic Remediation.

Step 5: Implement automatic remediation and verify the remediation result

  1. On the Rules page, find the rule that you want to manage, and click Remediation Detail in the Remediation Template column.

  2. On the Remediation Detail tab, click Perform Manual Remediation next to Remediation Detail.

    In the Execution Result List section, you can view the remediation results. You can also view the reason why a resource fails to be remediated.

    Note

    On the Remediation Detail tab, click the function ARN next to Remediation Template to go to the Code tab of the remediation function in the Function Compute console.

FAQ

How do I check whether a custom function is successfully executed?

You can enable the task mode for a Function Compute function. In task mode, Function Compute records the execution status of each task in each stage, and provides features such as task status query, task queue status query, task deduplication, and active task termination. Perform the following steps to enable the task mode:

  1. Log on to the Function Compute console.

  2. Find the Function Compute service and function that you want to test and go to the function details page.

  3. Click the Asynchronous Configurations tab.

  4. Click Modify next to Asynchronous Policy. In the Modify the Policy for Asynchronous Mode panel, set the Task Mode parameter to Enable and click OK.

How do I view the execution logs of a custom function?

You must enable the logging feature for a Function Compute service to view function execution logs. You can view the execution logs of a custom function that belongs to the service on the function details page. Perform the following steps to enable the logging feature for a Function Compute service:

  1. Log on to the Function Compute console.

  2. In the left-side navigation pane, click Services & Functions to go to the Services page.

  3. On the Services page, find the service for which you want to enable the logging feature and click Configure in the Actions column to go to the Modify Service page.

  4. In the Log Settings section, set the Logging parameter to Enable and set other required parameters.

  5. Click Save.