ClickHouse partition key selection - ApsaraDB for ClickHouse

This topic describes how to correctly select table partition keys in ApsaraDB for ClickHouse to optimize performance and improve data management efficiency in table design scenarios.

Partition keys

The partition feature organizes data into logical segments based on specified keys. Data is divided into separate fragments (parts) according to the partition key.

In ApsaraDB for ClickHouse Enterprise Edition, when you execute an INSERT statement to insert multiple rows into a table that does not use a partition key, all data is written to a new data part. However, when a table uses a partition key, the system performs the following operations:

Check the partition key values of the rows that you insert.
Create a data part in storage for each unique partition key value.
Assign the rows to the corresponding data parts based on the partition key values.

Table without a partition key

Table with a partition key

Core principles

To reduce the number of write requests sent to the Object Storage Service buckets that store data for ApsaraDB for ClickHouse Enterprise Edition, partition keys should prioritize low cardinality (a small number of distinct partition values) and fields that are easy to manage (such as time). Primary keys should cover commonly filtered fields and be in a logical order. Avoid high cardinality fields, overly fine partitioning, and irrelevant primary keys to leverage the high performance and easy management advantages of ClickHouse.

Partitioning is a data management tool
Partitioning is primarily used for efficient data expiration, tiered storage, batch deletion, and other management tasks, rather than as a primary query optimization tool. For more information, see Choosing a Partitioning Key.
Choose a low cardinality field as the partition key
We recommend keeping the number of partitions between 100 and 1,000. Avoid using high cardinality fields (fields with many distinct values, such as user_id or device numbers) as partition keys. Otherwise, the number of parts might increase exponentially, affecting performance or even causing "too many parts" errors.
Common partitioning method is time-based partitioning
Partition by time dimensions such as month or day, using functions like toYYYYMM(date), toStartOfMonth(date), or toDate(date). This facilitates data lifecycle management and tiered storage of hot and cold data. For more information, see Custom Partitioning Key.
Partition keys should be closely aligned with data lifecycle, archiving, cleanup, and other management requirements
Prioritize dimensions that are easy to manage in batch from a business perspective. For more information, see Applications of partitioning.

Table design recommendations

Prioritize time-based partitioning

For log, time series, and monitoring scenarios, we recommend partitioning by month or day. For example, partitioning a log table by month, with each month's data in a separate partition, offers the following advantages:

Efficient data management: You can delete, archive, or move data in batches by partition. For example, in typical ticket scenarios where ALTER TABLE DELETE is used to delete expired data, with monthly or daily partitioning, you only need to use DROP PARTITION to delete the corresponding partition without scanning the entire table, greatly improving efficiency.
Easy implementation of data lifecycle management (TTL): Combined with TTL policies, expired partitions can be automatically cleaned up, simplifying operations and maintenance.
Partition pruning improves query efficiency: When filtering by time in queries, ClickHouse only needs to scan relevant partitions, skipping irrelevant ones, significantly reducing I/O and accelerating query speed.

Avoid high cardinality fields for partitioning

Such as user IDs, order numbers, or device numbers. For example, if a table uses user_id as the partition key, which is a high cardinality field (each user is uniquely represented, indicating high cardinality), this will result in a very large number of partitions with the following drawbacks:

Partition explosion: Each unique user ID generates a partition, resulting in an extremely large number of partitions far exceeding the recommended range of 100-1,000, causing enormous pressure on metadata management and the file system.
Background merge failure: ClickHouse only merges parts within the same partition. Too many partitions prevent merge operations, resulting in many small parts that affect query and write performance.
Query performance degradation: Too many partitions require scanning large amounts of partition metadata during queries, reducing query efficiency.
Instance resource exhaustion: Too many partition parts consume large amounts of memory and file handles, potentially causing ClickHouse to start slowly or fail.

Avoid overly fine partitioning

Such as partitioning by hour, minute, or second, unless data volume is extremely large and there are specific requirements. For example, if a table uses toYYYYMMDDhhmm(event_time) to partition by minute, it will generate 1,440 partitions per day and over 500,000 partitions per year, with the following drawbacks:

Too many partitions: Overly fine partitioning results in a number of partitions far exceeding the recommended range of 100-1,000, greatly increasing the burden on metadata and file system management.
Typical error: DB::Exception: Too many parts (N). Merges are processing significantly slower than inserts.
Background merge failure: ClickHouse only merges parts within the same partition. Too many partitions prevent merge operations, resulting in many small parts that affect query and write performance.
Query and write performance degradation: Too many partitions require scanning large amounts of partition metadata during queries, reducing query efficiency, while also slowing down writes due to the excessive number of parts.

Partition keys should be original fields or simple expressions

Avoid complex functions to allow ClickHouse to utilize partition pruning.

Design partition keys in conjunction with primary keys

Primary keys should cover commonly queried filter fields, while partition keys serve data management purposes.

For example, for a log table where the typical business requirement is to frequently query by time range and service name while periodically cleaning up expired data, the design would be:

Partition key: toYYYYMM(event_time), one partition per month, facilitating monthly batch deletion, archiving, tiered storage of hot and cold data, and other data management operations.
Primary key: (service_name, event_time), for common queries like WHERE service_name = 'A' AND event_time BETWEEN ... , which can fully utilize the primary key index for data pruning to accelerate queries.

Table design examples

CREATE TABLE logs
(
    event_time DateTime,
    service_name String,
    log_level String,
    message String
)
ENGINE = MergeTree
PARTITION BY toYYYYMM(event_time)         -- Partition by month for easy data management
ORDER BY (service_name, event_time)       -- Primary key covers commonly filtered fields

Not recommended partition key choices

Partition key as user_id (high cardinality): One partition per user, resulting in too many partitions, merge failure, and extremely poor performance.
Partition key as device_id (high cardinality): As above, causing "too many parts" errors and unmanageable partitions.
Partition key as order_id (high cardinality): One partition per order, extreme fragmentation.
Partition key as name (high cardinality string): Uncontrollable number of partitions, difficult to manage.
Partition key as toHour(event_time) (too fine): 24 partitions per day, extremely large number of partitions over time, merge failure.
Partition key as toMinute(event_time) (extremely fine): Partition explosion, severely affecting performance.
Primary key as high cardinality field with illogical order: For example, ORDER BY (user_id, event_time), but queries are commonly by event_time, resulting in low utilization of the primary key index.
Primary key with too many fields: For example, ORDER BY (a, b, c, d, e, f, g, h, i, j), resulting in large primary key index size and high memory consumption.
Primary key as low cardinality field: For example, ORDER BY (status), with only a few status values, resulting in extremely poor primary key index pruning capability.
Partition key and primary key completely unrelated, and neither covers common query conditions: For example, partition key is region, primary key is type, but queries are commonly by event_time, resulting in neither partition nor primary key accelerating queries.