Container Compute Service (ACS) can profile resources for Kubernetes-native workloads and provide resource configuration suggestions for containers based on the historical data of resource usage. This greatly simplifies the configuration of resource requests and limits for containers. This topic describes how to use the resource profiling feature in an ACS cluster by using a CLI.
Prerequisites and usage notes
The ack-koordinator component is installed. For more information, see ack-koordinator (FKA ack-slo-manager).
To ensure the accuracy of resource profiling, we recommend that you wait more than one day after you enable resource profiling for the system to collect data.
Billing
No fee is charged when you install and use the ack-koordinator component. However, fees may be charged in the following scenarios:
After ack-koordinator is installed, ack-koordinator applies for two ACS general-purpose pods. You can specify the amount of resources requested by each module when you install the component.
By default, ack-koordinator exposes the monitoring metrics of features such as resource profiling as Prometheus metrics. If you enable Prometheus metrics for ack-koordinator and use Managed Service for Prometheus, these metrics are considered as custom metrics and fees are charged for these metrics. The fee depends on factors such as the size of your cluster and the number of applications. Before you enable Prometheus metrics, we recommend that you read the Billing topic of Managed Service for Prometheus to learn the free quota and billing rules of custom metrics. For more information about how to monitor and manage resource usage see Query the amount of observable data and bills.
Limits
Component | Required version |
≥ v0.3.9.7 | |
≥ v1.5.0-ack1.14 |
Introduction to resource profiling
Kubernetes allows you to describe the resource requests of containers to manage container resources. After you specify the resource request for a container, the scheduler matches the resource request with the allocatable resources of each node to determine the node to which the container is scheduled. You can refer to the historical resource utilization and stress test results of a container when you manually specify the resource request. You can also adjust the resource request after the container is created based on the performance of the container. However, you may encounter the following issues:
To ensure application stability, you need to reserve an excessive amount of resources as a buffer to handle the fluctuations of the upstream and downstream workloads. As a result, the amount of resources in the resource requests that you specify for containers may greatly exceed the actual amount of resources used by the containers. This causes low resource utilization and resource waste in the cluster.
If your cluster hosts a large number of pods, you can decrease the resource request for individual containers to increase resource utilization in the cluster. This allows you to deploy more containers on a node. However, application stability is adversely affected when traffic spikes.
To resolve this issue, ack-koordinator provides resource profiles for workloads. You can obtain resource configuration suggestions for individual containers based on resource profiles. This simplifies the work of configuring resource requests and limits for containers. ACS allows you to use the resource profiling feature at the CLI. You can create CustomResourceDefinitions (CRDs) to manage resource profiles.
Use resource profiling in the console
Step 1: Enable resource profiling
Log on to the ACS console. In the left-side navigation pane, click Clusters.
On the Clusters page, find the cluster that you want to manage and click its ID. In the left-side navigation pane of the cluster details page, choose .
On the Cost Optimization page, click the Resource Profiling tab, and follow the instructions in the Resource Profiling section to enable this feature.
Install or update the component: Follow the instructions on the page to install or update the ack-koordinator component. If this is the first time you use resource profiling, you need to install the ack-koordinator component.
If this is the first time you use resource profiling, after the component is installed or updated, we recommend that you select Default Settings to enable resource profiling for all workloads. You can click Profiling Configuration to modify the applicable scope of resource profiling later.
Click Enable Resource Profiling to go to the Resource Profiling tab.
Step 2: Configure resource profiling
On the Cost Optimization page, click the Resource Profiling tab, and then click Profiling Configuration.
You can choose Global Configuration or Custom Configuration. The default settings that you selected when you install the resource profiling component belong to the global configuration. You can choose Global Configuration, modify the settings, and then click OK to apply the modifications.
Global configuration mode (recommended)
In global configuration mode, resource profiling is enabled for workloads other than those in the arms-prom and kube-system namespaces by default.
Parameter
Description
Valid value
Excluded Namespace
The namespaces for which you want to disable resource profiling. In most cases, resource profiling is disabled for the namespaces of system components. After you modify the global configuration, resource profiling is enabled only for workloads of the specified types that do not belong to the excluded namespaces.
You can specify one or more existing namespaces in the cluster. By default, the kube-system and arms-prom namespaces are specified.
Workload Type
The types of workloads for which resource profiling is enabled. After you modify the global configuration, resource profiling is enabled only for workloads of the specified types that do not belong to the excluded namespaces.
The following Kubernetes workload types are supported: Deployment, StatefulSet, and DaemonSet. You can select one or more workload types.
CPU Redundancy Rate/Memory Redundancy Rate
The redundancy rate that is specified in the resource profiling policy. For more information, see the following section.
The redundancy rate must be 0 or a positive value. The system also provides three commonly used redundancy rates: 70%, 50%, and 30%.
Custom configuration mode
In custom configuration mode, resource profiling is enabled only for partial workloads. If your cluster is large (with more than 1,000 nodes) or you want to enable resource profiling for partial workloads, choose the custom configuration mode.

Parameter
Description
Valid value
Namespace
The namespaces for which you want to enable resource profiling. After you modify the custom configuration, resource profiling is enabled for workloads of the specified types that belong to the selected namespaces.
You can select one or more existing namespaces in the cluster.
Workload Type
The workload types for which you want to enable resource profiling. After you modify the custom configuration, resource profiling is enabled for workloads of the specified types that belong to the selected namespaces.
The following Kubernetes workload types are supported: Deployment, StatefulSet, and DaemonSet. You can select one or more workload types.
CPU Redundancy Rate/Memory Redundancy Rate
The redundancy rate that is specified in the resource profiling policy. For more information, see the following section.
The redundancy rate must be 0 or a positive value. The system also provides three commonly used redundancy rates: 70%, 50%, and 30%.
NoteResource redundancy: When an administrator assesses the workloads of an application, such as the QPS of the application, the administrator usually assumes that the workloads will not occupy 100% physical resources. This is because even technologies such as hyper-threading have limits on physical resources and the application also needs to reserve resources to handle traffic spikes during peak hours. If the difference between the suggested resource request and the original resource request exceeds the specified redundancy rate, the system suggests that you decrease the resource request. For more information about the resource profiling algorithm, see the Overview of application profiles section.
Step 3: View resource profiles
After you configure the resource profiling policy, you can view the resource profiles of the workloads on the Resource Profiling page.
To ensure the accuracy of resource profiles, if this is the first time you use resource profiling, you need to wait at least 24 hours for the system to collect data.
This page displays the aggregated resource profile data and the resource profile of each workload.
NoteIn the following table, a hyphen (-) indicates N/A.
Column
Description
Valid value
Filter
Workload Name
The name of the workload.
-
Supported. You can filter resource profiles by workload name.
Namespace
The namespace to which the workload belongs.
-
Supported. You can filter resource profiles by namespace. By default, the kube-system namespace is excluded from filter conditions.
Workload Type
The type of workload.
Valid values: Deployment, DaemonSet, and StatefulSet.
Supported. You can filter resource profiles by workload type. By default, all workload types are selected as filter conditions.
CPU Request
The number of CPU cores that are requested by the pod of the workload.
-
Not supported.
Memory Request
The memory size that is requested by the pod of the workload.
-
Not supported.
Profile Data Status
The status of the resource profile.
Collecting: The resource profiling component is collecting historical data and generating the profiling result. To view the resource profile of a workload, we recommend that you wait at least one day after you enable resource profiling and make sure that the workload experiences traffic fluctuations within the day.
Normal: The resource profile is generated.
Workload Deleted: The workload is deleted. The resource profile of the workload will be deleted after a period of time.
Not supported.
CPU Profile/Memory Profile
The CPU profile and memory profile provide suggestions on how to modify the original CPU request and memory request. The values are generated based on the suggested resource request, the original resource request, and the resource redundancy rate.
Valid values: Upgrade, Downgrade, and Remain Unchanged. The percentage value that indicates the degree of difference between the original resource request and the suggested resource request. Formula: Abs(Suggested request value - Original request value)/Original request value.
Supported. By default, Increase and Decrease are selected as filter conditions.
The time when the job was created.
The time when the resource profile was created.
-
Not supported.
Change Resource Configuration
After you check the resource profiles and suggestions, you can click Change Resource Configuration to modify the resource configurations. For more information, see Step 5: Modify resource configurations.
-
Not supported.
NoteThe resource profiling feature of ACS provides a suggested resource request for each container of the workload, and compares the suggested request value (Recommend), original request value (Request), and resource redundancy rate (Buffer). The feature also provides suggestions on whether to increase or decrease the resource request of the workload. If the workload has multiple containers, ACS provides suggestions for the container whose original resource request has the highest degree of difference compared with the suggested resource request. The following content describes how ACS calculates the degree of difference between the suggested resource request and the original resource request.
If the suggested resource request is greater than the original resource request, the resource usage of the container is higher than the resource request of the container. In this case, ACS suggests that you increase the resource request of the container.
If the suggested resource request is lower than the original resource request, the resource usage of the container is lower than the resource request of the container. In this case, ACS suggests that you decrease the resource request of the container to avoid resource waste. ACS calculates the degree of difference between the suggested resource request and the original resource request in the following way:
ACS calculates the target resource request based on the following formula:
Target resource request = Recommend × (1 + Buffer).ACS calculates the degree of the difference between the target resource request and the original resource request based on the following formula:
Degree = 1 - Request/TargetACS generates suggestions on adjusting CPU and memory requests based on the degree of difference between the target resource request and the original resource request. If the degree value exceeds 0.1, ACS suggests that you decrease the resource request.
In other cases, Maintain is displayed in the CPU Profile or Memory Profile column, which means that you do not need to adjust the resource request.
Step 4: View the details of a resource profile
On the Resource Profiles page, click the name of a workload to go to the profile details page.
On the details page, you can view basic information about the workload, view the resource curve of each container of the workload, and modify the resource specifications of the workload. The preceding figure shows the CPU curves of a workload.
Curve
Description
cpu limit
The CPU limit curve of the container.
cpu request
The CPU request curve of the container.
cpu recommend
The suggested CPU request curve of the container.
cpu usage (average)
The curve of the average CPU usage of the container.
cpu usage (max)
The curve of the peak CPU usage of the container.
Step 5: Modify resource configurations
In the Change Resource Configuration section at the bottom of the Profile Details page, you can modify the resource configuration based on the suggested values generated by resource profiling.
The following table describes the columns.
Column
Description
Resource Request
The original resource request of the container.
Resource Limit
The original resource limit of the container.
Profile Value
The resource request that is suggested by ACS.
Resource Redundancy Rate
The resource redundancy rate that is specified in the resource profiling policy. You can specify the new resource request based on the redundancy rate and the suggested resource request. In the preceding figure, the new CPU request is calculated based on the following formula: 4.28 CPU cores × (1 + 30%) = 5.6 CPU cores.
New Resource Request
The new resource request that you want to use.
New Resource Limit
The new resource limit that you want to use. If topology-aware CPU scheduling is enabled for the workload, the CPU limit must be an integer.
ImportantThe suggested request values provided by resource profiling are the actual values calculated by the algorithm. If you click to apply the resource configuration changes, ACS adjusts the resource specifications based on the compute classes of the pods. For more information, see Resource specifications.
After you set the parameters, click Submit and OK to rolling update the workload. The system starts to update the resource configuration of the workload. You are redirected to the details page of the workload.
ImportantAfter the resource specifications are updated, the controller performs a rolling update on the workload and recreates the pods. Proceed with caution.
Use resource profiling with the CLI
Step 1: Enable resource profiling
Use the following YAML template to create a file named
recommendation-profile.yamland enable resource profiling for your workloads.You can use the RecommendationProfile CRD to generate resource profiles for your workloads and obtain resource configuration suggestions. You can specify the namespaces and workload types to which a RecommendationProfile CRD is applied.
apiVersion: autoscaling.alibabacloud.com/v1alpha1 kind: RecommendationProfile metadata: # The name of the RecommendationProfile CRD. If you want to create a non-namespaced RecommendationProfile CRD, do not specify a namespace. name: profile-demo spec: # The workload types for which you want to enable resource profiling. controllerKind: - Deployment # The namespaces for which you want to enable resource profiling. enabledNamespaces: - defaultThe following table describes the parameters in the YAML template.
Parameter
Type
Description
metadata.nameString
The name of the object. If you want to create a non-namespaced RecommendationProfile CRD, do not specify a namespace.
spec.controllerKindString
The workload types for which you want to enable resource profiling. Valid values: Deployment, StatefulSet, and DaemonSet.
spec.enabledNamespacesString
The namespaces for which you want to enable resource profiling.
Run the following command to enable resource profiling for the application that you created:
kubectl apply -f recommender-profile.yamlCreate a file named
cpu-load-gen.yamland copy the following content to the file:apiVersion: apps/v1 kind: Deployment metadata: name: cpu-load-gen labels: app: cpu-load-gen spec: replicas: 2 selector: matchLabels: app: cpu-load-gen-selector template: metadata: labels: app: cpu-load-gen-selector spec: containers: - name: cpu-load-gen image: registry.cn-zhangjiakou.aliyuncs.com/acs/slo-test-cpu-load-gen:v0.1 command: ["cpu_load_gen.sh"] imagePullPolicy: Always resources: requests: cpu: 8 # Request eight CPU cores for the application. memory: "1Gi" limits: cpu: 12 memory: "2Gi"Run the following command to deploy the cpu-load-gen application:
kubectl apply -f cpu-load-gen.yamlRun the following command to obtain resource configuration suggestions for the application that you created:
kubectl get recommendations -l \ "alpha.alibabacloud.com/recommendation-workload-apiVersion=apps-v1, \ alpha.alibabacloud.com/recommendation-workload-kind=Deployment, \ alpha.alibabacloud.com/recommendation-workload-name=cpu-load-gen" -o yamlNoteTo generate accurate resource configuration suggestions, we recommend that you wait more than one day after you enable resource profiling for the system to collect data.
After you enable resource profiling for your workloads, ack-koordinator provides resource configuration suggestions for your workloads. The suggestions are stored in the Recommendation CRD. The following code block shows a resource profile named
cpu-load-gen:apiVersion: autoscaling.alibabacloud.com/v1alpha1 kind: Recommendation metadata: labels: alpha.alibabacloud.com/recommendation-workload-apiVersion: app-v1 alpha.alibabacloud.com/recommendation-workload-kind: Deployment alpha.alibabacloud.com/recommendation-workload-name: cpu-load-gen name: f20ac0b3-dc7f-4f47-b3d9-bd91f906**** namespace: recommender-demo spec: workloadRef: apiVersion: apps/v1 kind: Deployment name: cpu-load-gen status: recommendResources: containerRecommendations: - containerName: cpu-load-gen target: cpu: 4742m memory: 262144k originalTarget: #The intermediate result generated by the resource profiling algorithm. We recommend that you do not use the intermediate result. # ...To facilitate data retrieval, the Recommendation CRD is generated in the same namespace as the workload. In addition, the Recommendation CRD saves the API version, type, and name of the workload in the labels described in the following table.
Label Key
Description
Example
alpha.alibabacloud.com/recommendation-workload-apiVersionThe API version of the workload. The value of the label conforms to the Kubernetes specifications. Forward slashes (/) are replaced by hyphens (-).
app-v1 (Original form: app/v1)
alpha.alibabacloud.com/recommendation-workload-kindThe type of the workload, for example, Deployment or StatefulSet.
Deployment
alpha.alibabacloud.com/recommendation-workload-nameThe name of the workload. The value of the label conforms to the Kubernetes specifications and cannot exceed 63 characters in length.
cpu-load-gen
The resource profiling result of each container is saved in
status.recommendResources.containerRecommendations. The following table describes the parameters.Parameter
Description
Format
Example
containerNameThe name of the container.
string
cpu-load-gen
targetThe resource profiling result, including the suggested CPU request and memory request.
map[ResourceName]resource.Quantity
cpu: 4742mmemory: 262144k
originalTargetThe intermediate result generated by the resource profiling algorithm. We recommend that you do not use the intermediate result.
-
-
NoteThe suggested minimum amount of CPU resources is 0.025 CPU cores. The suggested minimum amount of memory resources is 250 MB.
Compare the resource configurations requested by the
cpu-load-genapplication and the suggested resource configurations in this step. The requested CPU resources are greater than the suggested CPU resources. You can reduce the CPU request of the application to save resources.Resource
Requested amount
Suggested amount
CPU
8 vCPUs
4.742 vCPUs
Step 2. (Optional): Verify the profiling results in Managed Service for Prometheus
The ack-koordinator component provides a Prometheus interface for you to query resource profiling results. If you use a self-managed Prometheus monitoring system, you can use the following metrics to configure dashboards:
#Specify a CPU resource profile.
koord_manager_recommender_recommendation_workload_target{exported_namespace="$namespace", workload_name="$workload", container_name="$container", resource="cpu"}
#Specify a memory resource profile.
koord_manager_recommender_recommendation_workload_target{exported_namespace="$namespace", workload_name="$workload", container_name="$container", resource="memory"}FAQ
How does the resource profiling algorithm work?
The resource profiling algorithm uses a multi-dimensional data model and has the following characteristics:
The resource profiling algorithm continuously collects the resource usage statistics of containers, aggregates data, and then calculates the sample peak value, weighted average, fractional value of CPU and memory usage.
In the profiling result, the suggested CPU request is a 95th percentile value and the suggested memory request is a 99th percentile value. The resource profiling feature also sets safety margins for both values to ensure the reliability of the workload.
When the resource profiling algorithm is optimized for the time factor. It uses a half life time window model to aggregate data. New data samples have larger weights.
The resource profiling algorithm takes container events into consideration, such as out of memory (OOM) errors. This increases the accuracy of the suggestions.
For more information, see Technologies behind resource profiling and How resource profiling works and suggestions.
What types of applications are suitable for resource profiling?
Resource profiling is suitable for online applications.
In most cases, the resource configurations suggested by the resource profiling feature can meet the resource requirements of your applications. Offline applications use batch processing and require high throughput. Offline applications allow resource contention so as to improve resource utilization. If you enable resource profiling for offline applications, resource waste may occur. In most cases, key system components are deployed in active/standby mode and have multiple replicas. The resources that are allocated to standby replicas are idle. As a result, the resource profiling algorithm generates inaccurate results. In the preceding cases, we recommend that you do not use the resource configurations suggested by resource profiling. ACK will provide updates on how to specify resource configurations based on the suggestions provided by resource profiling in these cases.
Can I directly use the resource configurations suggested by resource profiling when I specify the resource request and resource limit of a container?
Resource profiling generates resource configuration suggestions based on the current resource demand of an application. Administrators need to take business characteristics into consideration and modify the suggested values accordingly. For example, you may need to reserve resources to handle traffic spikes or reserve resources for zone-disaster recovery. You may also need to increase the suggested values to ensure that resource-thirsty applications can run stably when the loads of the host are high.
Why is scale-up or scale-down still needed after I set the suggested resource requests?
The suggested request values provided by resource profiling are the actual values calculated by the algorithm. If you click to apply the resource configuration changes, ACS adjusts the resource specifications based on the compute classes of the pods. For more information, see Resource specifications. After the adjustment, the pod specifications may differ from the specified specifications.
How do I view resource profiles if I use a self-managed Prometheus monitoring system?
The Koordinator Manager module of the ack-koordinator component provides a Prometheus HTTP interface for you to query the resource profiling-related metrics. You can run the following command to query the IP address of a pod and view its metrics.
# Run the following command to query the IP address of a pod.
$ kubectl get pod -A -o wide | grep koord-manager
#The actual output shall prevail.
kube-system ack-koord-manager-5479f85d5f-7xd5k 1/1 Running 0 19d 192.168.12.242 cn-beijing.192.168.xx.xxx <none> <none>
kube-system ack-koord-manager-5479f85d5f-ftblj 1/1 Running 0 19d 192.168.12.244 cn-beijing.192.168.xx.xxx <none> <none>
# Run the following command to view metrics. Koordinator Manager runs in active/standby (two-replica) mode. Data is stored only in the active and standby pods.
# For the IP address and port, refer to the Deployment of the Koordinator Manager module.
# Make sure that the host where you run the command is connected to the container network of the cluster.
$ curl -s http://192.168.12.244:9326/metrics | grep koord_manager_recommender_recommendation_workload_target
# The actual output shall prevail.
# HELP koord_manager_recommender_recommendation_workload_target Recommendation of workload resource request.
# TYPE koord_manager_recommender_recommendation_workload_target gauge
koord_manager_recommender_recommendation_workload_target{container_name="xxx",namespace="xxx",recommendation_name="xxx",resource="cpu",workload_api_version="apps/v1",workload_kind="Deployment",workload_name="xxx"} 2.406
koord_manager_recommender_recommendation_workload_target{container_name="xxx",namespace="xxx",recommendation_name="xxx",resource="memory",workload_api_version="apps/v1",workload_kind="Deployment",workload_name="xxx"} 3.861631195e+09After the ack-koordinator component is installed, a Service and a Service Monitor are automatically created and associated with pods.
Prometheus collects metrics in various ways. If you use a self-managed Prometheus monitoring system, refer to the official documentation of Prometheus and debug the configuration as described in the preceding section. After debugging, refer to Step 2. (Optional): Verify the profiling results in Managed Service for Prometheus to configure a Grafana dashboard.
How do I delete resource profiles and resource profiling policies?
Resource profiles are stored in the Recommendation CRD. Resource profiling policies are stored in the RecommendationProfile CRD. You can run the following command to delete all resource profiles and resource profiling policies:
# Delete all resource profiles.
kubectl delete recommendation -A --all
# Delete all resource profiling policies.
kubectl delete recommendationprofile -A --allHow do I authorize a RAM user to use resource profiling?
The authorization system of Container Compute Service (ACS) consists of RAM authorization for infrastructure resources and Role-Based Access Control (RBAC) authorization for ACS clusters. For more information, see Authorization best practices. If you want to authorize a RAM user to use resource profiling, we recommend that you complete the following tasks:
RAM user authorization
Log on to the RAM console with your Alibaba Cloud account and grant the AliyunAccReadOnlyAccess (read-only) permission to the RAM user. For more information, see Attach system policies.
RBAC authorization
After you complete RAM user authorization, you need to assign the RBAC developer role to the RAM user or grant the RAM user higher permissions. For more information, see Grant RBAC permissions to RAM users or RAM roles.
If the RAM user is granted developer or higher permissions, the RAM user can read and write all Kubernetes resources in the cluster. To grant permissions in a more fine-grained manner, refer to Attach RBAC policies and create or modify a custom ClusterRole. The resource profiling feature adds the following content to the ClusterRole:
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: recommendation-clusterrole
- apiGroups:
- autoscaling.alibabacloud.com
resources:
- '*'
verbs:
- '*'