Common terms and concepts of ClickHouse - ApsaraDB for ClickHouse

This topic explains key terms for ApsaraDB for ClickHouse to help you better understand ApsaraDB for ClickHouse.

Common terms

Region

A region is the physical location where the servers for ApsaraDB for ClickHouse are hosted. You must select a region when you purchase the ApsaraDB for ClickHouse service. The region cannot be changed after it is selected.

Zone

A zone is a physical area within a region that has an independent power supply and network. Zones in the same region are connected through a low-latency internal network.

Database

A database is the highest-level object in an ApsaraDB for ClickHouse cluster. It consists of objects such as tables, columns, views, functions, and data types.

Community-compatible Edition

ApsaraDB for ClickHouse cluster (Cluster)

Physically, an ApsaraDB for ClickHouse cluster is a distributed database that consists of multiple ClickHouse Server instances. Depending on the specifications you purchase, these servers may contain one or more replicas and one or more shards.

Logically, an ApsaraDB for ClickHouse cluster can contain multiple database objects.

Edition

An ApsaraDB for ClickHouse cluster consists of the following replicas.

Double-replica Edition:
- Each shard contains two replicas. If one replica becomes unavailable, the other replica in the same shard continues to provide service.
- In a Double-replica Edition cluster, data is replicated across two replicas to ensure data consistency.
  Important
  When you create tables in a Double-replica Edition cluster, you must use Replicated table engines from the MergeTree family. If you use non-replicated table engines, data cannot be replicated between replicas, which may cause data inconsistency.
Single-replica Edition: Each shard has only one replica. If the replica becomes unavailable, the entire cluster becomes unavailable and remains so until the replica is fully restored.

Note

A Double-replica Edition cluster uses twice the resources and costs twice as much as a Single-replica Edition cluster.
Because the underlying disks provide high reliability, the Single-replica Edition also prevents data loss.

Shard

When processing large amounts of data, the storage and computing resources of a single server can become a bottleneck. To improve efficiency, ApsaraDB for ClickHouse distributes data across multiple servers. Each server, which stores and processes a portion of the total data, is called a shard.

Replica

To ensure data security and high availability, ApsaraDB for ClickHouse uses a replication mechanism to protect against failures. This mechanism stores redundant copies of data from a single server on two or more servers.

Table

A table is the basic structure used to store data. It consists of rows and columns. Each column represents a field, and each row represents a record.

From the perspective of data distribution, ApsaraDB for ClickHouse tables can be classified into two types: local tables and distributed tables.

Table type

Description

Differences

Local table

Data is stored only on the node to which it is written. It is not distributed across multiple servers.

Writes to and queries from local tables are limited by the storage and computing resources of a single server. They do not support horizontal scaling.
Writes to and queries from distributed tables can use the storage and computing resources of multiple servers. They have better horizontal scaling capabilities.

Distributed table

A collection of local tables.

It abstracts multiple local tables into a single, unified table that provides write and query capabilities. When you write data, it is automatically distributed to the local tables in the collection. When you query data, each local table in the collection is queried separately, and the final results are aggregated and returned.

From the perspective of storage engines, ApsaraDB for ClickHouse tables can also be classified into two types: non-replicated tables and replicated tables.

Table type	Description	Differences
Non-replicated table	Data is stored only on the current server and is not replicated to other servers. It has only one replica.	Non-replicated tables cannot guarantee high availability in case of an exception. Replicated tables can provide services as long as at least one replica is normal.
Replicated table	Data is automatically replicated to multiple servers, forming multiple replicas.

Data part

A data part is a fragment of data stored on a disk and is the basic unit of data storage for a ClickHouse table. A new data part is generated each time data is written to a ClickHouse table. Each data part is self-contained, includes all columns and indexes for its portion of the data, and maintains data order. This design supports efficient merge and compression operations, which is crucial for high-performance query processing in ClickHouse.

Enterprise Edition

ApsaraDB for ClickHouse cluster

An ApsaraDB for ClickHouse cluster is composed of computing and storage resource units. It provides a Platform as a Service (PaaS) for data storage and analysis that is based on the ClickHouse engine.

Worker node

A worker node is a replica node within an ApsaraDB for ClickHouse cluster. It is the physical resource that participates in engine computations.

CCU

A ClickHouse Compute Unit (CCU) is the unit for measuring and billing computing resources in an ApsaraDB for ClickHouse cluster. One CCU is equivalent to 1 vCPU and 4 GiB of memory. The standard billing unit is CCU per minute.

Auto scaling of computing resources

Autoscaling automatically adjusts the number of CCUs based on CPU and memory usage.

Auto scaling range

The auto scaling range is the minimum and maximum number of CCUs that you can set for a cluster. The auto scaling feature adjusts the number of CCUs within this defined range.

Storage resources

Storage resources refer to the shared storage solution used by the Enterprise Edition. These resources are billed on a pay-as-you-go basis.