how to optimize ECS resource costs - Elastic Compute Service

This topic describes the cost components and benefits of Elastic Compute Service (ECS) and provides cost management solutions that maximize cost-effectiveness and drive rapid business development.

Cost components

ECS costs consist of the following components:

Ownership cost: the costs of resources and resource plans.
O&M cost: the labor costs when you use ECS.

Cost benefits of cloud migration

To build a data center, you must consider the direct costs of hardware, networking, electricity, machine rooms, and O&M. You must also consider scaling costs due to upgrades and capacity expansions and risk costs that are associated with data backup and high-availability implementations. When you scale up your data center to meet growing business requirements, the resource unit cost and complexity of the data center increase while the fault tolerance decreases. If you select improper business models, additional costs exist.

Compared with self-managed data centers, cloud resources eliminate the need to invest upfront in hardware, physical environments, and labor. The unit cost of cloud resources is relatively linear. You can create or release cloud resources based on your business requirements. Cloud resources also support multiple billing methods to further optimize costs.

Cost optimization suggestions

Optimize resources

If you find resources that have high costs, you can monitor the resources from multiple aspects, determine the reasons for the high costs, and then take targeted optimization measures.

Monitor resource usage.
1. Monitor the usage of resources such as CPU, memory, disks, and bandwidth. Assess whether the current configuration is higher than required.
2. Monitor idle resources to avoid waste. Idle resources include instances that are upgraded but not restarted, reserved instances that are not matched to pay-as-you-go instances, disks that are not attached to instances, and elastic IP addresses (EIPs) that are not associated with instances.
3. Monitor resource usage cycles. If you want to use resources such as instances and disks for an extended period of time, we recommend that you purchase resources that use the subscription billing method or purchase resource plans to reduce costs.
4. Monitor the lifecycle of resources. Pay attention to the expiration dates of subscription resources, such as subscription instances, reserved instances, and storage capacity units. Renew the resources at the earliest opportunity.
Select appropriate instance types.
Instance types have significant impacts on ECS costs. Select the most cost-effective instance type and adjust the instance quantity based on business scenarios. This way, you can maximize resource utilization and minimize costs while meeting business requirements.
For example, you use 10 d1ne.14xlarge instances for short video scenarios. The monitoring results indicate that the memory usage of the instances is reasonable but the CPU utilization is low. Perform the following operations to resolve the issue:
Reduce the CPU-to-memory ratio of the instances to increase CPU utilization without affecting your business. The CPU-to-memory ratio of d1ne.14xlarge instances is 1:4. The CPU-to-memory ratio of d2s instances is 1:4.4. You can replace the 10 d1ne.14xlarge instances with 13 d2s.10xlarge instances to reduce costs by approximately 18%.
For more information about how to select instance types, see Best practices for selecting instance types.
Combine multiple billing methods.
Different types of business have different requirements for resource usage cycles. Select an appropriate billing method for each type of business and combine billing methods to optimize costs.
- Use subscription instances and reserved instances for stable business workloads.
- Use pay-as-you-go instances for stateful and dynamic business workloads.
- Use preemptible instances for stateless and fault-tolerant business workloads.
Use dedicated hosts to allow the reuse of ECS instance resources.
For scenarios that have minimal requirements for CPU stability, such as development and test environments, you can use CPU-overprovisioned dedicated hosts to deploy additional similar-sized ECS instances to reduce the unit deployment cost.
Stopped ECS instances that are deployed on dedicated hosts do not occupy resources. During off-peak hours, you can stop specific ECS instances in the production environment and use idle resources to run test tasks that have predictable cycles, such as offline computing and automated tests.

Upgrade instance types

ECS and hardware, such as processors, are continuously upgraded to improve performance and reduce costs. In most cases, later instance types are more cost-effective than earlier instance types.

The following table describes the differences between the g5.2xlarge and g6.2xlarge instance types in terms of performance and price.

Performance	Price
The integer computation performance is improved by 40%. The floating-point computation performance is improved by 30%. The memory bandwidth is increased by 15%. The memory idle latency is decreased by 40%. The internal bandwidth is increased by 220%.	The annual subscription price is reduced by 6%. The pay-as-you-go price is reduced by 43%.

To ensure that you can use the next-generation instance types in time, we recommend that you perform the following operations::

Design robust applications that can run on different instance types.
Stay updated on the new instance types that are released on the official Alibaba Cloud website and assess whether to upgrade instance types.

Examples of instance type upgrade

You can use the following upgrade schemes to improve business performance without changing CPU and memory specifications and reduce costs by at least 15%.

Current instance family	Recommended compatible instance family	Recommended alternative instance family
sn1 and sn2	c6 g6 r6	c5 and sn1ne g5 and sn2ne r5 and se1ne
c4	hfc6 and c6	hfc5 and c5
ce4	r6	r5 and se1ne
cm4	hfc6	hfc5 and g5
n1, n2, and e3	c6 g6 r6	c5 and sn1ne g5 and sn2ne r5 and se1ne
t1 s1, s2, and s3 m1 and m2 c1 and c2	c6 g6 r6	c5 and sn1ne g5 and sn2ne r5 and se1ne

Regular cost saving measures

You can use cloud resources based on your business requirements and save on the investment and cost of setting up and operating self-managed data centers. However, you must constantly optimize costs in your daily work to increase cost performance. You can refine the following common operations to create a practical scheme:

Hold regular cost meetings. Review budget implementation with cost-related parties, such as finance and R&D teams, evaluate optimization results, and improve optimization strategies on a regular basis.
Enforce the use of tags. Tag resources by business, environment, and owner to track daily costs.
Classify resources and select appropriate usage methods. For example, pay-as-you-go instances are recommended for deploying development and testing environments for short-term projects and can be promptly released when the projects are complete.
Avoid idle resources. Check resource usage on a regular basis and determine the notification and disposal workflows of idle resources.
Renew resources at the earliest opportunity. Apply for a budget for subscription resources in advance to avoid the additional cost of purchasing and deploying new resources after existing resources are released upon expiration.

Implement automated O&M

Alibaba Cloud provides a variety of O&M services that help you improve O&M efficiency and reduce O&M labor costs. Examples:

Auto Scaling: maintains instance clusters across different billing methods, instance types, and zones. This service is suitable for scenarios in which business workloads fluctuate from time to time.
Auto Provisioning: quickly deploys instance clusters across different billing methods, instance types, and zones. This service is suitable for scenarios in which consistent compute capacity must be provisioned in a quick manner and preemptible instances are used to reduce costs.
OOS: defines a series of O&M operations in a template to perform O&M tasks in an efficient manner. This service is suitable for scenarios in which event-driven O&M, scheduled O&M, batch O&M, or cross-region O&M is required.
Resource Orchestration Service: deploys and maintains stacks that contain multiple cloud resources and dependencies among the resources. This service is suitable for scenarios in which delivery of an integrated system or environment clone is required.