This topic describes how to optimize storage costs in terms of data partitions, table lifecycles, and the periodic deletion of deprecated tables.

You can perform the following operations to optimize storage costs:
  • Properly configure data partitions.
  • Configure reasonable lifecycles for tables.
  • Periodically delete deprecated tables.

Properly configure data partitions

In MaxCompute, each value of a partition key column is called a partition. You can group multiple fields of a table in a single partition to create a multi-level partition. Multi-level partitions are similar to multi-level directories. If you specify the name of a partition you want to access, the system reads data only from that partition and does not scan the entire table. This reduces costs and improves efficiency.
  • If the minimum period for data collection is one day, we recommend that you use the date field as a partition field. The system migrates data to the specified partitions every day. Then, it reads the data from the specified partitions for subsequent operations.
  • If the minimum period for data collection is one hour, we recommend that you use the combination of the date and hour fields as a partition field. The system migrates data to the specified partitions every hour. Then, it reads the data from the specified partitions for subsequent operations. If data that is collected on an hourly basis is partitioned based on dates, data in each partition is appended every hour. As a result, the system reads large amounts of unnecessary data, which increases storage costs.

You can use partition fields based on your business needs. In addition to the date and time fields, you can use other fields that have a relatively fixed number of enumerated values, such as channel, country, or province. Alternatively, you can use a combination of time and other fields as a partition field. We recommend that you specify two levels of partitions in a table. Each table supports a maximum of 60,000 partitions.

Configure reasonable lifecycles for tables

When you create a table, you can configure its lifecycle based on data usage. MaxCompute deletes data that exceeds the lifecycle threshold in a timely manner. This saves storage space.

For example, you can execute the following statement to create a table with the lifecycle of 100 days. If the last modification of the table or a partition occurred more than 100 days ago, MaxCompute deletes the table or partition.
CREATE TABLE test3 (key boolean) PARTITIONED BY (pt string, ds string) LIFECYCLE 100;

The lifecycle takes a partition as the smallest unit. If some partitions in a partitioned table reach the lifecycle threshold, these partitions are deleted. Partitions that do not reach the lifecycle threshold are not affected.

You can execute the following statement to modify the lifecycle settings for an existing table. For more information, see Lifecycle management operations.
ALTER TABLE table_name SET lifecycle days;

Periodically delete deprecated tables

We recommend that you periodically delete deprecated tables that are not accessed for a long period of time. The following tables are considered deprecated tables:
  • Tables that are not accessed within the last three months
  • Non-partitioned tables that are not accessed within the last month
  • Tables that do not consume storage resources