All Products
Search
Document Center

Container Compute Service:Provide resource configuration suggestions for containers based on resource profiling

Last Updated:Dec 25, 2024

Container Compute Service (ACS) can profile resources for Kubernetes-native workloads and provide resource configuration suggestions for containers based on the historical data of resource usage. This greatly simplifies the configuration of resource requests and limits for containers. This topic describes how to use the resource profiling feature in an ACS cluster by using a CLI.

Prerequisites and usage notes

  • The ack-koordinator component is installed. For more information, see ack-koordinator (FKA ack-slo-manager).

  • To ensure the accuracy of resource profiling, we recommend that you wait more than one day after you enable resource profiling for the system to collect data.

Billing

No fee is charged when you install and use the ack-koordinator component. However, fees may be charged in the following scenarios:

  • After ack-koordinator is installed, ack-koordinator applies for two ACS general-purpose pods. You can specify the amount of resources requested by each module when you install the component.

  • By default, ack-koordinator exposes the monitoring metrics of features such as resource profiling as Prometheus metrics. If you enable Prometheus metrics for ack-koordinator and use Managed Service for Prometheus, these metrics are considered as custom metrics and fees are charged for these metrics. The fee depends on factors such as the size of your cluster and the number of applications. Before you enable Prometheus metrics, we recommend that you read the Billing topic of Managed Service for Prometheus to learn the free quota and billing rules of custom metrics. For more information about how to monitor and manage resource usage see Query the amount of observable data and bills.

Limits

Component

Required version

metrics-server

≥ v0.3.9.7

ack-koordinator

≥ v1.5.0-ack1.14

Introduction to resource profiling

Kubernetes allows you to describe the resource requests of containers to manage container resources. After you specify the resource request for a container, the scheduler matches the resource request with the allocatable resources of each node to determine the node to which the container is scheduled. You can refer to the historical resource utilization and stress test results of a container when you manually specify the resource request. You can also adjust the resource request after the container is created based on the performance of the container. However, you may encounter the following issues:

  • To ensure application stability, you need to reserve an excessive amount of resources as a buffer to handle the fluctuations of the upstream and downstream workloads. As a result, the amount of resources in the resource requests that you specify for containers may greatly exceed the actual amount of resources used by the containers. This causes low resource utilization and resource waste in the cluster.

  • If your cluster hosts a large number of pods, you can decrease the resource request for individual containers to increase resource utilization in the cluster. This allows you to deploy more containers on a node. However, application stability is adversely affected when traffic spikes.

To resolve this issue, ack-koordinator provides resource profiles for workloads. You can obtain resource configuration suggestions for individual containers based on resource profiles. This simplifies the work of configuring resource requests and limits for containers. ACS allows you to use the resource profiling feature at the CLI. You can create CustomResourceDefinitions (CRDs) to manage resource profiles.

Use resource profiling in the console

Step 1: Enable resource profiling

  1. Log on to the ACS console. In the left-side navigation pane, click Clusters.

  2. On the Clusters page, find the cluster that you want to manage and click its ID. In the left-side navigation pane of the cluster details page, choose Cost Suite > Cost Optimization.

  3. On the Cost Optimization page, click the Resource Profiling tab, and follow the instructions in the Resource Profiling section to enable this feature.

    • Install or update the component: Follow the instructions on the page to install or update the ack-koordinator component. If this is the first time you use resource profiling, you need to install the ack-koordinator component.

    • If this is the first time you use resource profiling, after the component is installed or updated, we recommend that you select Default Settings to enable resource profiling for all workloads. You can click Profiling Configuration to modify the applicable scope of resource profiling later.

  4. Click Enable Resource Profiling to go to the Resource Profiling tab.

Step 2: Configure resource profiling

  1. On the Cost Optimization page, click the Resource Profiling tab, and then click Profiling Configuration.

    You can choose Global Configuration or Custom Configuration. The default settings that you selected when you install the resource profiling component belong to the global configuration. You can choose Global Configuration, modify the settings, and then click OK to apply the modifications.

    Global configuration mode (recommended)

    In global configuration mode, resource profiling is enabled for workloads other than those in the arms-prom and kube-system namespaces by default.image.png

    Parameter

    Description

    Valid value

    Excluded Namespace

    The namespaces for which you want to disable resource profiling. In most cases, resource profiling is disabled for the namespaces of system components. After you modify the global configuration, resource profiling is enabled only for workloads of the specified types that do not belong to the excluded namespaces.

    You can specify one or more existing namespaces in the cluster. By default, the kube-system and arms-prom namespaces are specified.

    Workload Type

    The types of workloads for which resource profiling is enabled. After you modify the global configuration, resource profiling is enabled only for workloads of the specified types that do not belong to the excluded namespaces.

    The following Kubernetes workload types are supported: Deployment, StatefulSet, and DaemonSet. You can select one or more workload types.

    CPU Redundancy Rate/Memory Redundancy Rate

    The redundancy rate that is specified in the resource profiling policy. For more information, see the following section.

    The redundancy rate must be 0 or a positive value. The system also provides three commonly used redundancy rates: 70%, 50%, and 30%.

    Custom configuration mode

    In custom configuration mode, resource profiling is enabled only for partial workloads. If your cluster is large (with more than 1,000 nodes) or you want to enable resource profiling for partial workloads, choose the custom configuration mode. image.png

    Parameter

    Description

    Valid value

    Namespace

    The namespaces for which you want to enable resource profiling. After you modify the custom configuration, resource profiling is enabled for workloads of the specified types that belong to the selected namespaces.

    You can select one or more existing namespaces in the cluster.

    Workload Type

    The workload types for which you want to enable resource profiling. After you modify the custom configuration, resource profiling is enabled for workloads of the specified types that belong to the selected namespaces.

    The following Kubernetes workload types are supported: Deployment, StatefulSet, and DaemonSet. You can select one or more workload types.

    CPU Redundancy Rate/Memory Redundancy Rate

    The redundancy rate that is specified in the resource profiling policy. For more information, see the following section.

    The redundancy rate must be 0 or a positive value. The system also provides three commonly used redundancy rates: 70%, 50%, and 30%.

    Note

    Resource redundancy: When an administrator assesses the workloads of an application, such as the QPS of the application, the administrator usually assumes that the workloads will not occupy 100% physical resources. This is because even technologies such as hyper-threading have limits on physical resources and the application also needs to reserve resources to handle traffic spikes during peak hours. If the difference between the suggested resource request and the original resource request exceeds the specified redundancy rate, the system suggests that you decrease the resource request. For more information about the resource profiling algorithm, see the Overview of application profiles section.资源冗余

Step 3: View resource profiles

  1. After you configure the resource profiling policy, you can view the resource profiles of the workloads on the Resource Profiling page.

    To ensure the accuracy of resource profiles, if this is the first time you use resource profiling, you need to wait at least 24 hours for the system to collect data.image.png

  2. This page displays the aggregated resource profile data and the resource profile of each workload.

    image.png

    Note

    In the following table, a hyphen (-) indicates N/A.

    Column

    Description

    Valid value

    Filter

    Workload Name

    The name of the workload.

    -

    Supported. You can filter resource profiles by workload name.

    Namespace

    The namespace to which the workload belongs.

    -

    Supported. You can filter resource profiles by namespace. By default, the kube-system namespace is excluded from filter conditions.

    Workload Type

    The type of workload.

    Valid values: Deployment, DaemonSet, and StatefulSet.

    Supported. You can filter resource profiles by workload type. By default, all workload types are selected as filter conditions.

    CPU Request

    The number of CPU cores that are requested by the pod of the workload.

    -

    Not supported.

    Memory Request

    The memory size that is requested by the pod of the workload.

    -

    Not supported.

    Profile Data Status

    The status of the resource profile.

    • Collecting: The resource profiling component is collecting historical data and generating the profiling result. To view the resource profile of a workload, we recommend that you wait at least one day after you enable resource profiling and make sure that the workload experiences traffic fluctuations within the day.

    • Normal: The resource profile is generated.

    • Workload Deleted: The workload is deleted. The resource profile of the workload will be deleted after a period of time.

    Not supported.

    CPU Profile/Memory Profile

    The CPU profile and memory profile provide suggestions on how to modify the original CPU request and memory request. The values are generated based on the suggested resource request, the original resource request, and the resource redundancy rate.

    Valid values: Upgrade, Downgrade, and Remain Unchanged. The percentage value that indicates the degree of difference between the original resource request and the suggested resource request. Formula: Abs(Suggested request value - Original request value)/Original request value.

    Supported. By default, Increase and Decrease are selected as filter conditions.

    The time when the job was created.

    The time when the resource profile was created.

    -

    Not supported.

    Change Resource Configuration

    After you check the resource profiles and suggestions, you can click Change Resource Configuration to modify the resource configurations. For more information, see Step 5: Modify resource configurations.

    -

    Not supported.

    Note

    The resource profiling feature of ACS provides a suggested resource request for each container of the workload, and compares the suggested request value (Recommend), original request value (Request), and resource redundancy rate (Buffer). The feature also provides suggestions on whether to increase or decrease the resource request of the workload. If the workload has multiple containers, ACS provides suggestions for the container whose original resource request has the highest degree of difference compared with the suggested resource request. The following content describes how ACS calculates the degree of difference between the suggested resource request and the original resource request.

    • If the suggested resource request is greater than the original resource request, the resource usage of the container is higher than the resource request of the container. In this case, ACS suggests that you increase the resource request of the container.

    • If the suggested resource request is lower than the original resource request, the resource usage of the container is lower than the resource request of the container. In this case, ACS suggests that you decrease the resource request of the container to avoid resource waste. ACS calculates the degree of difference between the suggested resource request and the original resource request in the following way:

      1. ACS calculates the target resource request based on the following formula: Target resource request = Recommend × (1 + Buffer).

      2. ACS calculates the degree of the difference between the target resource request and the original resource request based on the following formula: Degree = 1 - Request/Target

      3. ACS generates suggestions on adjusting CPU and memory requests based on the degree of difference between the target resource request and the original resource request. If the degree value exceeds 0.1, ACS suggests that you decrease the resource request.

    • In other cases, Maintain is displayed in the CPU Profile or Memory Profile column, which means that you do not need to adjust the resource request.

Step 4: View the details of a resource profile

  1. On the Resource Profiles page, click the name of a workload to go to the profile details page.

    On the details page, you can view basic information about the workload, view the resource curve of each container of the workload, and modify the resource specifications of the workload. The preceding figure shows the CPU curves of a workload.

    Curve

    Description

    cpu limit

    The CPU limit curve of the container.

    cpu request

    The CPU request curve of the container.

    cpu recommend

    The suggested CPU request curve of the container.

    cpu usage (average)

    The curve of the average CPU usage of the container.

    cpu usage (max)

    The curve of the peak CPU usage of the container.

Step 5: Modify resource configurations

  1. In the Change Resource Configuration section at the bottom of the Profile Details page, you can modify the resource configuration based on the suggested values generated by resource profiling.

    The following table describes the columns.

    Column

    Description

    Resource Request

    The original resource request of the container.

    Resource Limit

    The original resource limit of the container.

    Profile Value

    The resource request that is suggested by ACS.

    Resource Redundancy Rate

    The resource redundancy rate that is specified in the resource profiling policy. You can specify the new resource request based on the redundancy rate and the suggested resource request. In the preceding figure, the new CPU request is calculated based on the following formula: 4.28 CPU cores × (1 + 30%) = 5.6 CPU cores.

    New Resource Request

    The new resource request that you want to use.

    New Resource Limit

    The new resource limit that you want to use. If topology-aware CPU scheduling is enabled for the workload, the CPU limit must be an integer.

    Important

    The suggested request values provided by resource profiling are the actual values calculated by the algorithm. If you click to apply the resource configuration changes, ACS adjusts the resource specifications based on the compute classes of the pods. For more information, see Resource specifications.

  2. After you set the parameters, click Submit and OK to rolling update the workload. The system starts to update the resource configuration of the workload. You are redirected to the details page of the workload.

    Important

    After the resource specifications are updated, the controller performs a rolling update on the workload and recreates the pods. Proceed with caution.

Use resource profiling with the CLI

Step 1: Enable resource profiling

  1. Use the following YAML template to create a file named recommendation-profile.yaml and enable resource profiling for your workloads.

    You can use the RecommendationProfile CRD to generate resource profiles for your workloads and obtain resource configuration suggestions. You can specify the namespaces and workload types to which a RecommendationProfile CRD is applied.

    apiVersion: autoscaling.alibabacloud.com/v1alpha1
    kind: RecommendationProfile
    metadata:
      # The name of the RecommendationProfile CRD. If you want to create a non-namespaced RecommendationProfile CRD, do not specify a namespace. 
      name: profile-demo
    spec:
      # The workload types for which you want to enable resource profiling. 
      controllerKind:
      - Deployment
      # The namespaces for which you want to enable resource profiling. 
      enabledNamespaces:
      - default

    The following table describes the parameters in the YAML template.

    Parameter

    Type

    Description

    metadata.name

    String

    The name of the object. If you want to create a non-namespaced RecommendationProfile CRD, do not specify a namespace.

    spec.controllerKind

    String

    The workload types for which you want to enable resource profiling. Valid values: Deployment, StatefulSet, and DaemonSet.

    spec.enabledNamespaces

    String

    The namespaces for which you want to enable resource profiling.

  2. Run the following command to enable resource profiling for the application that you created:

    kubectl apply -f recommender-profile.yaml
  3. Create a file named cpu-load-gen.yaml and copy the following content to the file:

    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: cpu-load-gen
      labels:
        app: cpu-load-gen
    spec:
      replicas: 2
      selector:
        matchLabels:
          app: cpu-load-gen-selector
      template:
        metadata:
          labels:
            app: cpu-load-gen-selector
        spec:
          containers:
          - name: cpu-load-gen
            image: registry.cn-zhangjiakou.aliyuncs.com/acs/slo-test-cpu-load-gen:v0.1
            command: ["cpu_load_gen.sh"]
            imagePullPolicy: Always
            resources:
              requests:
                cpu: 8 # Request eight CPU cores for the application. 
                memory: "1Gi"
              limits:
                cpu: 12
                memory: "2Gi"
  4. Run the following command to deploy the cpu-load-gen application:

    kubectl apply -f cpu-load-gen.yaml
  5. Run the following command to obtain resource configuration suggestions for the application that you created:

    kubectl get recommendations -l \
      "alpha.alibabacloud.com/recommendation-workload-apiVersion=apps-v1, \
      alpha.alibabacloud.com/recommendation-workload-kind=Deployment, \
      alpha.alibabacloud.com/recommendation-workload-name=cpu-load-gen" -o yaml
    Note

    To generate accurate resource configuration suggestions, we recommend that you wait more than one day after you enable resource profiling for the system to collect data.

    After you enable resource profiling for your workloads, ack-koordinator provides resource configuration suggestions for your workloads. The suggestions are stored in the Recommendation CRD. The following code block shows a resource profile named cpu-load-gen:

    apiVersion: autoscaling.alibabacloud.com/v1alpha1
    kind: Recommendation
    metadata:
      labels:
        alpha.alibabacloud.com/recommendation-workload-apiVersion: app-v1
        alpha.alibabacloud.com/recommendation-workload-kind: Deployment
        alpha.alibabacloud.com/recommendation-workload-name: cpu-load-gen
      name: f20ac0b3-dc7f-4f47-b3d9-bd91f906****
      namespace: recommender-demo
    spec:
      workloadRef:
        apiVersion: apps/v1
        kind: Deployment
        name: cpu-load-gen
    status:
      recommendResources:
        containerRecommendations:
        - containerName: cpu-load-gen
          target:
            cpu: 4742m
            memory: 262144k
          originalTarget: #The intermediate result generated by the resource profiling algorithm. We recommend that you do not use the intermediate result. 
           # ...

    To facilitate data retrieval, the Recommendation CRD is generated in the same namespace as the workload. In addition, the Recommendation CRD saves the API version, type, and name of the workload in the labels described in the following table.

    Label Key

    Description

    Example

    alpha.alibabacloud.com/recommendation-workload-apiVersion

    The API version of the workload. The value of the label conforms to the Kubernetes specifications. Forward slashes (/) are replaced by hyphens (-).

    app-v1 (Original form: app/v1)

    alpha.alibabacloud.com/recommendation-workload-kind

    The type of the workload, for example, Deployment or StatefulSet.

    Deployment

    alpha.alibabacloud.com/recommendation-workload-name

    The name of the workload. The value of the label conforms to the Kubernetes specifications and cannot exceed 63 characters in length.

    cpu-load-gen

    The resource profiling result of each container is saved in status.recommendResources.containerRecommendations. The following table describes the parameters.

    Parameter

    Description

    Format

    Example

    containerName

    The name of the container.

    string

    cpu-load-gen

    target

    The resource profiling result, including the suggested CPU request and memory request.

    map[ResourceName]resource.Quantity

    cpu: 4742mmemory: 262144k

    originalTarget

    The intermediate result generated by the resource profiling algorithm. We recommend that you do not use the intermediate result.

    -

    -

    Note

    The suggested minimum amount of CPU resources is 0.025 CPU cores. The suggested minimum amount of memory resources is 250 MB.

    Compare the resource configurations requested by the cpu-load-gen application and the suggested resource configurations in this step. The requested CPU resources are greater than the suggested CPU resources. You can reduce the CPU request of the application to save resources.

    Resource

    Requested amount

    Suggested amount

    CPU

    8 vCPUs

    4.742 vCPUs

Step 2. (Optional): Verify the profiling results in Managed Service for Prometheus

The ack-koordinator component provides a Prometheus interface for you to query resource profiling results. If you use a self-managed Prometheus monitoring system, you can use the following metrics to configure dashboards:

#Specify a CPU resource profile. 
koord_manager_recommender_recommendation_workload_target{exported_namespace="$namespace", workload_name="$workload", container_name="$container", resource="cpu"}
#Specify a memory resource profile. 
koord_manager_recommender_recommendation_workload_target{exported_namespace="$namespace", workload_name="$workload", container_name="$container", resource="memory"}

FAQ

How does the resource profiling algorithm work?

The resource profiling algorithm uses a multi-dimensional data model and has the following characteristics:

  • The resource profiling algorithm continuously collects the resource usage statistics of containers, aggregates data, and then calculates the sample peak value, weighted average, fractional value of CPU and memory usage.

  • In the profiling result, the suggested CPU request is a 95th percentile value and the suggested memory request is a 99th percentile value. The resource profiling feature also sets safety margins for both values to ensure the reliability of the workload.

  • When the resource profiling algorithm is optimized for the time factor. It uses a half life time window model to aggregate data. New data samples have larger weights.

  • The resource profiling algorithm takes container events into consideration, such as out of memory (OOM) errors. This increases the accuracy of the suggestions.

For more information, see Technologies behind resource profiling and How resource profiling works and suggestions.

What types of applications are suitable for resource profiling?

Resource profiling is suitable for online applications.

In most cases, the resource configurations suggested by the resource profiling feature can meet the resource requirements of your applications. Offline applications use batch processing and require high throughput. Offline applications allow resource contention so as to improve resource utilization. If you enable resource profiling for offline applications, resource waste may occur. In most cases, key system components are deployed in active/standby mode and have multiple replicas. The resources that are allocated to standby replicas are idle. As a result, the resource profiling algorithm generates inaccurate results. In the preceding cases, we recommend that you do not use the resource configurations suggested by resource profiling. ACK will provide updates on how to specify resource configurations based on the suggestions provided by resource profiling in these cases.

Can I directly use the resource configurations suggested by resource profiling when I specify the resource request and resource limit of a container?

Resource profiling generates resource configuration suggestions based on the current resource demand of an application. Administrators need to take business characteristics into consideration and modify the suggested values accordingly. For example, you may need to reserve resources to handle traffic spikes or reserve resources for zone-disaster recovery. You may also need to increase the suggested values to ensure that resource-thirsty applications can run stably when the loads of the host are high.

Why is scale-up or scale-down still needed after I set the suggested resource requests?

The suggested request values provided by resource profiling are the actual values calculated by the algorithm. If you click to apply the resource configuration changes, ACS adjusts the resource specifications based on the compute classes of the pods. For more information, see Resource specifications. After the adjustment, the pod specifications may differ from the specified specifications.

How do I view resource profiles if I use a self-managed Prometheus monitoring system?

The Koordinator Manager module of the ack-koordinator component provides a Prometheus HTTP interface for you to query the resource profiling-related metrics. You can run the following command to query the IP address of a pod and view its metrics.

# Run the following command to query the IP address of a pod.
$ kubectl get pod -A -o wide | grep koord-manager
#The actual output shall prevail.
kube-system   ack-koord-manager-5479f85d5f-7xd5k                         1/1     Running            0                  19d   192.168.12.242   cn-beijing.192.168.xx.xxx   <none>           <none>
kube-system   ack-koord-manager-5479f85d5f-ftblj                         1/1     Running            0                  19d   192.168.12.244   cn-beijing.192.168.xx.xxx   <none>           <none>

# Run the following command to view metrics. Koordinator Manager runs in active/standby (two-replica) mode. Data is stored only in the active and standby pods.
# For the IP address and port, refer to the Deployment of the Koordinator Manager module.
# Make sure that the host where you run the command is connected to the container network of the cluster. 
$ curl -s http://192.168.12.244:9326/metrics | grep koord_manager_recommender_recommendation_workload_target
# The actual output shall prevail.
# HELP koord_manager_recommender_recommendation_workload_target Recommendation of workload resource request.
# TYPE koord_manager_recommender_recommendation_workload_target gauge
koord_manager_recommender_recommendation_workload_target{container_name="xxx",namespace="xxx",recommendation_name="xxx",resource="cpu",workload_api_version="apps/v1",workload_kind="Deployment",workload_name="xxx"} 2.406
koord_manager_recommender_recommendation_workload_target{container_name="xxx",namespace="xxx",recommendation_name="xxx",resource="memory",workload_api_version="apps/v1",workload_kind="Deployment",workload_name="xxx"} 3.861631195e+09

After the ack-koordinator component is installed, a Service and a Service Monitor are automatically created and associated with pods.

Prometheus collects metrics in various ways. If you use a self-managed Prometheus monitoring system, refer to the official documentation of Prometheus and debug the configuration as described in the preceding section. After debugging, refer to Step 2. (Optional): Verify the profiling results in Managed Service for Prometheus to configure a Grafana dashboard.

How do I delete resource profiles and resource profiling policies?

Resource profiles are stored in the Recommendation CRD. Resource profiling policies are stored in the RecommendationProfile CRD. You can run the following command to delete all resource profiles and resource profiling policies:

# Delete all resource profiles. 
kubectl delete recommendation -A --all

# Delete all resource profiling policies. 
kubectl delete recommendationprofile -A --all

How do I authorize a RAM user to use resource profiling?

The authorization system of Container Compute Service (ACS) consists of RAM authorization for infrastructure resources and Role-Based Access Control (RBAC) authorization for ACS clusters. For more information, see Authorization best practices. If you want to authorize a RAM user to use resource profiling, we recommend that you complete the following tasks:

  1. RAM user authorization

    Log on to the RAM console with your Alibaba Cloud account and grant the AliyunAccReadOnlyAccess (read-only) permission to the RAM user. For more information, see Attach system policies.

  2. RBAC authorization

    After you complete RAM user authorization, you need to assign the RBAC developer role to the RAM user or grant the RAM user higher permissions. For more information, see Grant RBAC permissions to RAM users or RAM roles.

Note

If the RAM user is granted developer or higher permissions, the RAM user can read and write all Kubernetes resources in the cluster. To grant permissions in a more fine-grained manner, refer to Attach RBAC policies and create or modify a custom ClusterRole. The resource profiling feature adds the following content to the ClusterRole:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: recommendation-clusterrole
- apiGroups:
  - autoscaling.alibabacloud.com
  resources:
  - '*'
  verbs:
  - '*'