ACK can profile resources for Kubernetes-native workloads and provide resource specification recommendations for containers based on the historical data of resource usage. This greatly simplifies the configuration of resource requests and limits for containers. This topic describes how to use resource profiling in the Container Service for Kubernetes (ACK) console or by using the CLI.

Prerequisites

  • An ACK Pro cluster is created. For more information, see Create an ACK Pro cluster.
  • ack-koordinator (FKA ack-slo-manager) is installed. For more information, see ack-koordinator.
  • Resource profiling is in public preview. You can use this feature in the Cost Suite menu of the ACK console.

Background information

Kubernetes allows you to specify resource requests for containers in a pod. The scheduler schedules the pod to a node whose capacity meets the resource requests that you specify. When you specify the resource request for a container, you can refer to the historical resource utilization and stress test results. You can also adjust the resource request after the container is created based on the performance of the container.

However, you may encounter the following issues:
  • To ensure application stability, you need to reserve a specific amount of resources as a buffer to handle the fluctuations of the upstream and downstream workloads. As a result, the amount of resources in the resource requests that you specify for containers is excessively greater than the actual amount of resources used by the containers. This causes low resource utilization and resource waste in the cluster.
  • If your cluster hosts a large number of pods, you can decrease the resource request for individual containers to increase resource utilization in the cluster. This allows you to deploy more containers on a node. However, application stability is adversely affected when traffic spikes.

To resolve this issue, ack-koordinator provides resource profiles for your workloads. You can obtain recommendations on resource specifications for individual containers in pods based on the resource profiles. This simplifies the work of configuring resource requests and limits for containers. The ACK console allows you to analyze the resource profiles of your applications, check whether the resource specifications of the applications meet your business requirements, and change the resource specifications accordingly. You can use the CLI to create CustomResourceDefinitions (CRDs) to manage resource profiling.

Limits

ComponentRequired version
Kubernetes≥ 1.18
metrics-server≥ 0.3.8
ack-koordinator (FKA ack-slo-manager)≥ 0.7.1
Helm≥ 3.0
Important If your cluster uses containerd as the container runtime and the cluster nodes were added before 14:00 (UTC+8) on January 19, 2022, you must remove the cluster nodes and add them to the cluster again, or update the Kubernetes version of your cluster to the latest version. For more information about how to update the Kubernetes version of an ACK cluster, see Update the Kubernetes version of an ACK cluster.

Use resource profiling in the ACK console

Install the resource profiling component

  1. Log on to the ACK console and click Clusters in the left-side navigation pane.
  2. On the Clusters page, click the name of a cluster and choose Cost Management Suite > Cost Optimization in the left-side navigation pane.
  3. On the Cost Optimization page, click the Resource Profiling tab.
  4. On the Resource Profiling tab, click Install or Upgrade. After the installation or update is complete, the Resource Profiling tab appears.
    Note
    • If this is the first time you install the resource profiling component, you must first install ack-koordinator or update ack-koordinator to the latest version. It requires about 2 minutes to install or update ack-koordinator.
    • If the version of ack-koordinator is earlier than 0.7.0, refer to Migrate ack-koordinator from the Marketplace page to the Add-ons page to migrate and update ack-koordinator.

Manage resource profiling policies

  1. Log on to the ACK console and click Clusters in the left-side navigation pane.
  2. On the Clusters page, click the name of a cluster and choose Cost Management Suite > Cost Optimization in the left-side navigation pane.
  3. On the Cost Optimization page, click the Resource Profiling tab and click Policy.
  4. In the Policy dialog box, set the parameters and click OK.
    Policy
    ParameterDescriptionValid value
    Policy NameThe name of the policy. The name must be 1 to 63 characters in length, and can contain letters, digits, hyphens (-), underscores (_), and periods (.). The name must start and end with a letter or a digit.
    NamespacesThe namespaces for which you want to enable resource profiling. Resource profiling is enabled for the workload types that you specify in the selected namespaces. The namespaces in the cluster. You can specify multiple namespaces.
    Workload TypeThe workload types for which you want to enable resource profiling. Resource profiling is enabled for the workload types that you specify in the selected namespaces. The following workload types are supported: Deployment, StatefulSet, and DaemonSet. You can specify multiple workload types.
    Resource Redundancy RateThe redundancy rate that is specified in the resource profiling policy. For more information, see the following section. The value must be positive or 0. You can select 30%, 50%, or 70% or specify a custom redundancy rate.
In most cases, the resource utilization of an application remains below 100% due to reasons such as the limit of physical resources (such as hyper-threading) and the need for a resource buffer to handle traffic spikes during peak hours. Therefore, the resource redundancy rate is required when the administrator configures resources for an application based on metrics such as the queries per second (QPS). If the difference between the recommended resource request and the original resource request exceeds the specified redundancy rate, the system recommends that you decrease the resource request. For more information about the resource profiling algorithm, see Overview of application profiles section. Resource Redundancy

Overview of application profiles

After you configure the resource profiling policy, you can view the details of resource profiles on the Resource Profiles page. The following table describes the columns that are displayed on the page. Resource profile details
ColumnDescriptionValue descriptionFilter
Workload NameThe name of the workload. -You can filter resource profiles by workload name.
NamespaceThe namespace to which the workload belongs.-You can filter resource profiles by namespace. By default, the kube-system namespace is not included in the filters.
Workload TypeThe type of workload. Valid values: Deployment, DaemonSet, and StatefulSet. You can filter resource profiles by workload type. By default, the following filters are supported: Deployment, DaemonSet, and StatefulSet.
CPU RequestThe CPU cores that are requested by the pod of the workload. -You cannot filter resource profiles by CPU request.
Memory RequestThe memory size that is requested by the pod of the workload. -You cannot filter resource profiles by memory request.
Profile Data StatusThe status of the resource profile of the workload.
  • Collecting: The resource profiling component is collecting historical data and generating the profiling result. To view the resource profile of a workload, we recommend that you wait at least one day after you enable resource profiling and make sure that the workload experiences traffic fluctuations within the day.
  • Normal: The resource profile is generated.
  • Workload Deleted: The workload is deleted. The resource profile of the workload will be deleted after a period of time.
You cannot filter resource profiles by profile state.
CPU Profile and Memory ProfileThe CPU profile and memory profile provide suggestions on how to modify the original CPU request and memory request. The suggestions are generated based on the recommended resource request, the original resource request, and the resource redundancy rate. For more information, see the following section. Valid values: Upgrade, Downgrade, and Remain Unchanged. The number of plus signs (+) or minus signs (-) indicates the degree of difference between the actual resource request and the recommended resource request. If the workload has multiple containers, the number of plus signs (+) or minus signs (-) indicates the container whose resource request has the highest degree of difference compared with the recommended resource request. You can filter resource profiles by Upgrade or Downgrade.
Creation TimeThe time when the profile was created. -You cannot filter resource profiles by creation time.
The resource profiling feature of ACK provides a recommended resource request for each container of the workload. The feature also provides suggestions on whether to increase or decrease the resource request of the workload based on the recommended resource request, original resource request, and resource redundancy rate. If the workload has multiple containers, ACK provides suggestions for the container whose original resource request has the highest degree of difference compared with the recommended resource request. The following content describes how ACK calculates the degree of the difference between the recommended resource request and the original resource request:
  • If the recommended resource request is greater than the original resource request, it indicates that the resource usage of the container is higher than the resource request of the container. In this case, ACK recommends that you increase the resource request of the container.

    ACK calculates the degree of the difference between the recommended resource request and the original resource request based on the following formula:

    Degree = min(1.0, 1 - Request / Recommend)

    Degree of differenceSuggestionDescription
    0 < Degree ≤ 0.3Increase +The resource usage of the container is slightly higher than the resource request. ACK recommends that you increase the resource request.
    0.3 < Degree ≤ 0.6Increase ++The resource usage of the container is significantly higher than the resource request. ACK recommends that you increase the resource request.
    0.6 < Degree ≤ 1.0Increase +++The resource usage of the container is excessively higher than the resource request. ACK recommends that you increase the resource request.
  • If the recommended resource request is lower than the original resource request, it indicates that the resource usage of the container is lower than the resource request of the container. In this case, ACK recommends that you decrease the resource request of the container to avoid resource waste. ACK calculates the degree of the difference between the recommended resource request and the original resource request based on the following description:
    1. ACK calculates the target resource request based on the following formula: Target resource request = Recommended resource request × (1 + Resource Redundancy Rate).
    2. ACK calculates the degree of the difference between the target resource request and the original resource request based on the following formula:
    3. ACK generates suggestions on how to modify the CPU and memory requests based on the calculated degree of difference. The following table describes the details.
      Degree of differenceSuggestionDescription
      -0.3 ≤ Degree < -0.1Downgrade -A small amount of the resources that are allocated to the container are idle. ACK recommends that you downgrade the resource request.
      -0.6 ≤ Degree < -0.3Downgrade - -A large amount of resources that are allocated to the container are idle. ACK recommends that you decrease the resource request.
      -1.0 ≤ Degree < -0.6Downgrade - - -An excessive amount of resources that are allocated to the container are idle. ACK recommends that you decrease the resource request.
  • In other cases, Remain Unchanged is displayed in the CPU Profile column or Memory Profile column.

View the application profile

On the Resource Profiling tab, click the name of a workload to go to the profile details page. On the details page, you can view basic information about the workload, view the resource curve of each container of the workload, and modify the resource specifications of the workload. View the application profile
The preceding figure shows the CPU curves of a workload.
CurveDescription
CPU LimitThe CPU limit curve of the container.
CPU RequestThe CPU request curve of the container.
CPU RecommendThe curve of the recommended CPU request for the container.
CPU Usage (Average) The curve of the average CPU usage of the container.
CPU Usage (Max) The curve of the peak CPU usage of the container.

Modify the resource specifications of the workload

In the Change Resource Configuration section in the lower part of the page, you can change the resource specifications of each container based on the recommended resource specifications provided by ACK. The following table describes the parameters.Resource Modification
ParameterDescription
Resource RequestThe original resource request of the container.
Resource LimitThe original resource limit of the container.
Recommended ValueThe resource request that is recommended by ACK.
Resource Redundancy RateThe resource redundancy rate that is specified in the resource profiling policy. You can specify the new resource request based on the redundancy rate and the recommended resource request. In the preceding figure, the new CPU request is calculated based on the following formula: 4.28 cores × (1 + 30%) = 5.6 cores.
New Resource RequestThe new resource request that you want to use.
New Resource LimitThe new resource limit that you want to use. If topology-aware CPU scheduling is enabled for the workload, the CPU limit must be an integer.

After you set the parameters, click Submit. The system starts to update the resource specifications of the workload. You are redirected to the details page of the workload. After the resource specifications are updated, the controller performs a rolling update on the workload and recreates the pods.

Use resource profiling at the CLI

Procedure

  1. Use the following YAML template to enable resource specification recommendation for your workloads.
    You can use the RecommendationProfile CRD to generate resource profiles for your workloads and obtain recommendations on resource specifications for containers in your workloads. You can specify the namespaces and workload types to which a RecommendationProfile CRD is applied.
    apiVersion: autoscaling.alibabacloud.com/v1alpha1
    kind: RecommendationProfile
    metadata:
      # The name of the RecommendationProfile CRD. If you want to create a non-namespaced RecommendationProfile CRD, do not specify a namespace. 
      name: profile-demo
    spec:
      # The workload types for which you want to enable resource profiling. 
      controllerKind:
      - Deployment
      # The namespaces for which you want to enable resource profiling. 
      enabledNamespaces:
      - recommender-demo
    The following table describes the parameters in the YAML template.
    ParameterTypeDescription
    metadata.nameStringThe name of the resource object. If you want to create a non-namespaced RecommendationProfile CRD, do not specify a namespace.
    spec.controllerKindStringThe workload types for which you want to enable resource profiling. Valid values: Deployment, StatefulSet, and DaemonSet.
    spec.enabledNamespacesStringThe namespaces for which you want to enable resource profiling.
    Important To generate accurate recommendations on resource specifications, we recommend that you wait at least one day after you enable resource profiling for your workloads. This way, you can obtain a sufficient amount of historical data.
  2. Run the kubectl get recommendations -o yaml command to obtain recommendations on resource specifications for your workloads.
    After you enable resource profiling for your workloads, ack-koordinator provides recommendations on resource specifications for each container in your workloads. The recommendations are stored in the Recommendation CRD. The following code block shows the content of a Recommendation CRD that stores the recommendation on resource specifications for a workload named cpu-load-gen:
    apiVersion: autoscaling.alibabacloud.com/v1alpha1
    kind: Recommendation
    metadata:
      labels:
        alpha.alibabacloud.com/recommendation-workload-apiVersion: app-v1
        alpha.alibabacloud.com/recommendation-workload-kind: Deployment
        alpha.alibabacloud.com/recommendation-workload-name: cpu-load-gen
      name: f20ac0b3-dc7f-4f47-b3d9-bd91f906****
      namespace: recommender-demo
    spec:
      workloadRef:
        apiVersion: apps/v1
        kind: Deployment
        name: cpu-load-gen
    status:
      recommendResources:
        containerRecommendations:
        - containerName: cpu-load-gen
          target:
            cpu: 4742m
            memory: 262144k
          originalTarget: # The intermediate results provided by the algorithm that is used to generate the recommendation on resource specifications. We recommend that you do not use the intermediate results. 
           # ...
    To facilitate data retrieval, the Recommendation CRD is generated in the same namespace as the workload. In addition, the Recommendation CRD has specific labels that record the API version, type, and name of the workload. The following table describes the labels.
    Label KeyDescriptionExample
    alpha.alibabacloud.com/recommendation-workload-apiVersionThe API version of the workload. The value of the label conforms to the Kubernetes specifications. Forward slashes (/) are replaced by hyphens (-). app-v1 (Original form: app/v1)
    alpha.alibabacloud.com/recommendation-workload-kindThe type of the workload, for example, Deployment or StatefulSet. Deployment
    alpha.alibabacloud.com/recommendation-workload-nameThe name of the workload. The value of the label conforms to the Kubernetes specifications and cannot exceed 63 characters in length. cpu-load-gen
    The recommendations on resource specifications for each container are displayed in the status.recommendResources.containerRecommendations section. The following table describes the fields.
    FieldDescriptionFormatExample
    containerNameThe name of the container. stringcpu-load-gen
    targetThe recommendations on CPU and memory resources. map[ResourceName]resource.Quantitycpu: 4742mmemory: 262144k
    originalTargetThe intermediate results provided by the algorithm that is used to generate the recommendation on resource specifications. We recommend that you do not use the intermediate results. If you want to use the intermediate results, Submit a ticket. --
    Note The recommended minimum CPU resources are 0.025 vCPUs. The recommended minimum memory resources are 250 MB.
  3. Optional:Verify the results on the Prometheus Monitoring page in the ACK console.
    ack-koordinator allows you to check resource profiles on the Prometheus Monitoring page in the ACK console.
    • If this is the first time you use Prometheus dashboards, reset the dashboards and install the Resource Profile dashboard. For more information about how to reset Prometheus dashboards, see Reset dashboards.

      To view details about the collected resource profiles on the Prometheus Monitoring page in the ACK console, perform the following steps:

      1. Log on to the ACK console.
      2. In the left-side navigation pane of the ACK console, click Clusters.
      3. On the Clusters page, find the cluster that you want to manage and click its name or click Details in the Actions column.
      4. In the left-side navigation pane of the cluster details page, choose Operations > Prometheus Monitoring.
      5. On the Prometheus Monitoring page, choose Cost Analysis/Resource Optimization > Resource Profiling.

        On the Resource Profiling tab, you can view details about the collected resource profiles. The details include the resource requests, resource usage, and recommended resource specifications of containers. For more information, see Enable Prometheus Service.

        Resource profile
    • If you use a self-managed Prometheus monitoring system, you can use the following metrics to configure dashboards:
      # Information about the containers for which you want to generate recommendations on CPU resources. 
      slo_manager_recommender_recommendation_workload_target{exported_namespace="$namespace", workload_name="$workload", container_name="$container", resource="cpu"}
      # Information about the containers for which you want to generate recommendations on memory resources. 
      slo_manager_recommender_recommendation_workload_target{exported_namespace="$namespace", workload_name="$workload", container_name="$container", resource="memory"}

Examples

  1. Create a file named cpu-load-gen.yaml and copy the following content to the file:
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: cpu-load-gen
      labels:
        app: cpu-load-gen
    spec:
      replicas: 2
      selector:
        matchLabels:
          app: cpu-load-gen-selector
      template:
        metadata:
          labels:
            app: cpu-load-gen-selector
        spec:
          containers:
          - name: cpu-load-gen
            image: registry.cn-zhangjiakou.aliyuncs.com/acs/slo-test-cpu-load-gen:v0.1
            command: ["cpu_load_gen.sh"]
            imagePullPolicy: Always
            resources:
              requests:
                cpu: 8 # Request eight vCPUs for the application. 
                memory: "1G"
              limits:
                cpu: 12
                memory: "2G"
  2. Run the following command to deploy the cpu-load-gen application:
    kubectl apply -f cpu-load-gen.yaml
  3. Create a file named recommender-profile.yaml with the following YAML template:
    apiVersion: autoscaling.alibabacloud.com/v1alpha1
    kind: RecommendationProfile
    metadata:
      name: profile-demo
    spec:
      controllerKind:
      - Deployment
      enabledNamespaces: # Enable recommendations on resource specifications for all Deployments in the default namespace. 
      - default
  4. Run the following command to enable resource profiling for the application that you created:
    kubectl apply -f recommender-profile.yaml
  5. Run the following command to obtain recommendations on resource specifications for the application that you created.
    Note To generate accurate recommendations on resource specifications, we recommend that you wait at least one day after you enable resource profiling for your workloads. This way, you can obtain a sufficient amount of historical data. For more information about the labels, see Labels.
    kubectl get recommendations -l \
      "alpha.alibabacloud.com/recommendation-workload-apiVersion=apps-v1, \
      alpha.alibabacloud.com/recommendation-workload-kind=Deployment, \
      alpha.alibabacloud.com/recommendation-workload-name=cpu-load-gen" -o yaml
    Expected output:
    apiVersion: autoscaling.alibabacloud.com/v1alpha1
    kind: Recommendation
    metadata:
      creationTimestamp: "2022-02-09T08:56:51Z"
      labels:
        alpha.alibabacloud.com/recommendation-workload-apiVersion: app-v1
        alpha.alibabacloud.com/recommendation-workload-kind: Deployment
        alpha.alibabacloud.com/recommendation-workload-name: cpu-load-gen
      name: f20ac0b3-dc7f-4f47-b3d9-bd91f906****
      namespace: recommender-demo
    spec:
      workloadRef:
        apiVersion: apps/v1
        kind: Deployment
        name: cpu-load-gen
    status:
      conditions:
      - lastTransitionTime: "2022-02-09T08:56:52Z"
        status: "True"
        type: RecommendationProvided
      recommendResources:
        containerRecommendations:
        - containerName: cpu-load-gen
          target:
            cpu: 4742m # The recommended CPU resources are 4.742 vCPUs. 
            memory: 262144k
          originalTarget: # The intermediate results provided by the algorithm that is used to generate the recommendation on resource specifications. We recommend that you do not use the intermediate results. 
            #...

Analyze the results

Compare the resource specifications in Step 1 and Step 5. The requested amount of CPU resources is greater than the amount of CPU resources in the recommendation. You can reduce the CPU request of the application to save resources in the cluster.

ItemRequested amountRecommended amount
CPU8 vCPUs4.742 vCPUs

FAQ

How does the resource profiling algorithm work?

The resource profiling algorithm is based on a multi-dimensional data model and has the following characteristics:
  • The resource profiling algorithm continuously collects resource metrics of containers and generates recommendations based on the aggregate values of CPU metrics and memory metrics.
  • When the resource profiling algorithm calculates aggregate values, the most recently collected metrics have the highest weights.
  • The resource profiling algorithm takes into consideration container events such as out of memory (OOM) errors. This helps generate more accurate recommendations.

What types of applications is resource profiling suitable for?

Resource profiling is suitable for online applications.

In most cases, the resource specifications that are recommended by the resource profiling feature can meet the resource requirements of your applications. Offline applications use batch processing and require high throughput. Offline applications allow resource contention so as to improve resource utilization. If you enable resource profiling for offline applications, resource waste may occur. In most cases, key system components are deployed in active/standby mode and have multiple replicas. In most cases, the resources that are allocated to standby replicas are idle. As a result, the results generated by the resource profiling algorithm are not accurate. In the preceding cases, we recommend that you do not directly use the resource specifications recommended by resource profiling. ACK will provide updates on how to specify resource specifications based on the recommendations provided by resource profiling in these cases.

Can I directly use the resource specifications recommended by resource profiling when I specify the resource request and limit of a container?

The recommendations provided by resource profiling are based on the current resource conditions of the workload. You must specify resource specifications based on the recommendations and the characteristics of the workload. If the workload needs a resource buffer to handle traffic spikes or perform failovers in an active zone-redundancy architecture, or the workload is resource-sensitive, we recommend that you specify resource specifications that are higher than the recommended resource specifications.

How do I view resource profiles if I use a self-managed Prometheus monitoring system?

Follow Step 3 in the Use resource profiling at the CLI section to configure Grafana dashboards.

How do I delete resource profiles and resource profiling rules?

Resource profiles are stored in the Recommendation CRD. Resource profiling rules are stored in the RecommendationProfile CRD. You can run the following command to delete all resource profiles and resource profiling rules:

# Delete all resource profiles. 
kubectl delete recommendation -A --all

# Delete all resource profiling rules. 
kubectl delete recommendationprofile -A --all