A guide to table engine types and their features - ApsaraDB for ClickHouse - Alibaba Cloud - ApsaraDB for ClickHouse

ApsaraDB for ClickHouse supports four families of table engines: MergeTree, Log, Integrations, and Special. Each family is optimized for a different set of workloads. This topic describes what each engine does and walks through examples for the most commonly used engines.

Engine families at a glance

Family	Best for	Engines
MergeTree	High-throughput inserts with background processing, partitioning, and replication	MergeTree, ReplacingMergeTree, CollapsingMergeTree, VersionedCollapsingMergeTree, SummingMergeTree, AggregatingMergeTree, GraphiteMergeTree, Approximate Nearest Neighbor Search Indexes, Full-text Search using Inverted Indexes
Log	Fast writes to small tables (~1 million rows) with full-table reads	TinyLog, StripeLog, Log
Integrations	Importing or querying external data sources	Kafka, MySQL, JDBC, ODBC, HDFS
Special	Specific architectural needs (distributed queries, materialized views, in-memory tables)	Distributed, MaterializedView, Dictionary, Merge, File, NULL, Set, Join, URL, View, Memory, Buffer

Note

For a complete reference of all supported table engines, see Table Engines.

MergeTree family

Engines in the MergeTree family are the primary choice for production workloads. They support high-speed inserts, data partitioning, primary key indexes, sparse indexes, data sampling, data replication, and time to live (TTL) for data.

All MergeTree engines share the same storage structure: data is written in parts and merged in the background, similar to a log-structured merge-tree (LSM tree). Storage-layer processing — including deduplication, collapsing, and aggregation — happens during compaction, not at insert time.

Note

MergeTree in ApsaraDB for ClickHouse supports all ClickHouse SQL syntax, but primary keys work differently than in standard SQL. The primary key accelerates queries rather than enforcing uniqueness. Duplicate primary keys can exist even after compaction.

MergeTree

MergeTree is the base engine for high-load insert workloads. Data is inserted in parts and merged in the background based on the sort order defined by ORDER BY.

Example: MergeTree with partitioning

Create a table partitioned by create_time and sorted on (id, create_time).

CREATE TABLE test_tbl ON CLUSTER default (
  id UInt16,
  create_time Date,
  comment Nullable(String)
) ENGINE = MergeTree()
   PARTITION BY create_time
   ORDER BY (id, create_time)
   PRIMARY KEY (id, create_time)
   SETTINGS index_granularity=8192;

Insert rows with duplicate primary keys.

INSERT INTO test_tbl VALUES (1, '2019-12-13', null);
INSERT INTO test_tbl VALUES (1, '2019-12-13', null);
INSERT INTO test_tbl VALUES (2, '2019-12-14', null);
INSERT INTO test_tbl VALUES (3, '2019-12-15', null);
INSERT INTO test_tbl VALUES (3, '2019-12-15', null);

Query the table.

SELECT * FROM test_tbl;

Result — all five rows returned, including duplicates:

┌─id─┬─create_time─┬─comment──┐
│  1 │ 2019-12-13  │   NULL   │
│  1 │ 2019-12-13  │   NULL   │
│  2 │ 2019-12-14  │   NULL   │
│  3 │ 2019-12-15  │   NULL   │
│  3 │ 2019-12-15  │   NULL   │
└────┴─────────────┴──────────┘

Force compaction.
```
OPTIMIZE TABLE test_tbl FINAL;
```

Query again.

SELECT * FROM test_tbl;

Result — duplicates still present. MergeTree does not remove them.

┌─id─┬─create_time─┬─comment──┐
│  1 │ 2019-12-13  │   NULL   │
│  1 │ 2019-12-13  │   NULL   │
│  2 │ 2019-12-14  │   NULL   │
│  3 │ 2019-12-15  │   NULL   │
│  3 │ 2019-12-15  │   NULL   │
└────┴─────────────┴──────────┘

For more information, see MergeTree.

ReplacingMergeTree

ReplacingMergeTree removes duplicate rows with the same primary key during background compaction. Use it when eventual deduplication is acceptable — it does not guarantee uniqueness at query time.

Limitations

In a distributed cluster, rows with the same primary key may land on different shards. Deduplication only happens within a single shard, not across shards.
Before compaction completes, some duplicates may still be present. Compaction runs in the background on an unpredictable schedule.
On large datasets, manually running OPTIMIZE ... FINAL can take a long time, making real-time deduplication impractical.

Note

For more information, see ReplacingMergeTree.

Example: ReplacingMergeTree deduplication

Create a table.

CREATE TABLE test_tbl_replacing (
  id UInt16,
  create_time Date,
  comment Nullable(String)
) ENGINE = ReplacingMergeTree()
   PARTITION BY create_time
   ORDER BY (id, create_time)
   PRIMARY KEY (id, create_time)
   SETTINGS index_granularity=8192;

Insert rows with duplicate primary keys.

INSERT INTO test_tbl_replacing VALUES (1, '2019-12-13', null);
INSERT INTO test_tbl_replacing VALUES (1, '2019-12-13', null);
INSERT INTO test_tbl_replacing VALUES (2, '2019-12-14', null);
INSERT INTO test_tbl_replacing VALUES (3, '2019-12-15', null);
INSERT INTO test_tbl_replacing VALUES (3, '2019-12-15', null);

Query before compaction — duplicates are still present.

SELECT * FROM test_tbl_replacing;

┌─id─┬─create_time─┬─comment──┐
│  1 │ 2019-12-13  │   NULL   │
│  1 │ 2019-12-13  │   NULL   │
│  2 │ 2019-12-14  │   NULL   │
│  3 │ 2019-12-15  │   NULL   │
│  3 │ 2019-12-15  │   NULL   │
└────┴─────────────┴──────────┘

Force compaction.

OPTIMIZE TABLE test_tbl_replacing FINAL;

Query again — duplicates removed.

SELECT * FROM test_tbl_replacing;

┌─id─┬─create_time─┬─comment──┐
│  1 │ 2019-12-13  │   NULL   │
│  2 │ 2019-12-14  │   NULL   │
│  3 │ 2019-12-15  │   NULL   │
└────┴─────────────┴──────────┘

CollapsingMergeTree

CollapsingMergeTree tracks row state changes using a Sign column instead of updating rows in place — updates are expensive in ClickHouse's append-only storage model. Each row is marked as either a state row (Sign = 1) or a cancel row (Sign = -1). During compaction, pairs of state and cancel rows with the same primary key are collapsed and removed.

How collapsing works

To update a row's state:

Insert a cancel row with Sign = -1 and the same primary key values as the row you want to replace.
Insert a new state row with Sign = 1 and the updated values.

To delete a row's state: insert a cancel row that matches the original state row in all columns except Sign.

Usage notes

If state and cancel rows are inserted out of order — for example, if a cancel row is inserted and then its matching state row arrives separately — they may not be collapsed correctly. Consider using VersionedCollapsingMergeTree to handle out-of-order inserts.
Collapsing only happens within a single shard. In a distributed cluster, rows with the same primary key on different nodes are not collapsed.
Before compaction completes, state and cancel rows coexist. When running aggregate queries before compaction, adjust your SQL:
- Replace COUNT() with SUM(Sign)
- Replace SUM(col) with SUM(col * Sign)

Note

For more information, see CollapsingMergeTree.

Example: CollapsingMergeTree state updates

Create a table with a Sign column.

CREATE TABLE test_tbl_collapsing
(
    UserID UInt64,
    PageViews UInt8,
    Duration UInt8,
    Sign Int8
)
ENGINE = CollapsingMergeTree(Sign)
ORDER BY UserID;

Insert the initial state row.

INSERT INTO test_tbl_collapsing VALUES (4324182021466249494, 5, 146, 1);

Update the state: insert a cancel row for the old state and a new state row.

Note
Insert the cancel row and the new state row in the same batch. If you insert the cancel row and then the state row in separate operations, they may arrive out of order. In that case, even if compaction is forced, rows with the same primary key cannot be collapsed or deleted.
```
INSERT INTO test_tbl_collapsing VALUES
  (4324182021466249494, 5, 146, -1),  -- cancel old state
  (4324182021466249494, 6, 185, 1);   -- new state
```

Query before compaction — all three rows are present.

SELECT * FROM test_tbl_collapsing;

┌────────UserID───────┬─PageViews─┬─Duration─┬─Sign──┐
│ 4324182021466249494 │    5      │    146   │   1   │
│ 4324182021466249494 │    5      │    146   │  -1   │
│ 4324182021466249494 │    6      │    185   │   1   │
└─────────────────────┴───────────┴──────────┴───────┘

To get correct aggregate results before compaction, use Sign-aware expressions:

SELECT
    UserID,
    SUM(PageViews * Sign) AS PageViews,
    SUM(Duration * Sign) AS Duration
FROM test_tbl_collapsing
GROUP BY UserID
HAVING SUM(Sign) > 0;

┌────────UserID───────┬─PageViews─┬─Duration──┐
│ 4324182021466249494 │    6      │    185    │
└─────────────────────┴───────────┴───────────┘

Force compaction.

OPTIMIZE TABLE test_tbl_collapsing FINAL;

Query again — only the latest state row remains.

SELECT * FROM test_tbl_collapsing;

┌────────UserID───────┬─PageViews─┬─Duration─┬─Sign──┐
│ 4324182021466249494 │    6      │    185   │   1   │
└─────────────────────┴───────────┴──────────┴───────┘

VersionedCollapsingMergeTree

VersionedCollapsingMergeTree extends CollapsingMergeTree with a Version column that lets the engine correctly match state and cancel rows regardless of insertion order. During compaction, rows with the same primary key, the same Version value, and opposite Sign values are collapsed and removed.

Use this engine when rows may arrive out of order — for example, in event-driven pipelines where cancel events can be processed before the state events they reference.

As with CollapsingMergeTree, adjust aggregate queries to use SUM(Sign) instead of COUNT() and SUM(col * Sign) instead of SUM(col).

Note

For more information, see VersionedCollapsingMergeTree.

Example: VersionedCollapsingMergeTree with out-of-order inserts

Create a table with Sign and Version columns.

CREATE TABLE test_tbl_Versioned
(
    UserID UInt64,
    PageViews UInt8,
    Duration UInt8,
    Sign Int8,
    Version UInt8
)
ENGINE = VersionedCollapsingMergeTree(Sign, Version)
ORDER BY UserID;

Insert a cancel row first (out of order).

INSERT INTO test_tbl_Versioned VALUES (4324182021466249494, 5, 146, -1, 1);

Insert the matching state row and a new state row.

INSERT INTO test_tbl_Versioned VALUES
  (4324182021466249494, 5, 146, 1, 1),   -- state row matching the cancel above
  (4324182021466249494, 6, 185, 1, 2);   -- new state

Query before compaction.

SELECT * FROM test_tbl_Versioned;

┌────────UserID───────┬─PageViews─┬─Duration─┬─Sign───┬Version─┐
│ 4324182021466249494 │    5      │    146   │  -1    │    1   │
│ 4324182021466249494 │    5      │    146   │   1    │    1   │
│ 4324182021466249494 │    6      │    185   │   1    │    2   │
└─────────────────────┴───────────┴──────────┴────────┴────────┘

Use Sign-aware aggregation to get correct results before compaction:

SELECT
    UserID,
    SUM(PageViews * Sign) AS PageViews,
    SUM(Duration * Sign) AS Duration
FROM test_tbl_Versioned
GROUP BY UserID
HAVING SUM(Sign) > 0;

┌────────UserID───────┬─PageViews─┬─Duration─┐
│ 4324182021466249494 │    6      │    185   │
└─────────────────────┴───────────┴──────────┘

Force compaction.

OPTIMIZE TABLE test_tbl_Versioned FINAL;

Query again — the Version column correctly matched the cancel and state rows despite out-of-order insertion.

SELECT * FROM test_tbl_Versioned;

┌────────UserID───────┬─PageViews─┬─Duration─┬─Sign───┬Version─┐
│ 4324182021466249494 │    6      │    185   │   1    │    2   │
└─────────────────────┴───────────┴──────────┴────────┴────────┘

SummingMergeTree

We recommend using SummingMergeTree together with MergeTree. Store the raw detail data in a MergeTree table and use SummingMergeTree for pre-aggregated results. This prevents data loss if your primary key is incorrectly composed.

SummingMergeTree pre-aggregates rows with the same primary key into a single row by summing numeric columns. This reduces storage and speeds up aggregate queries.

Usage notes

Pre-aggregation happens during background compaction, not at insert time. Some data may not yet be aggregated, so always include a GROUP BY clause with SUM() in your queries.
Only numeric columns are summed. String and other non-numeric columns that are not part of the primary key get an arbitrary value after aggregation.

Note

For more information, see SummingMergeTree.

Example: SummingMergeTree pre-aggregation

Create a table.

CREATE TABLE test_tbl_summing
(
    key UInt32,
    value UInt32
)
ENGINE = SummingMergeTree()
ORDER BY key;

Insert rows — some share the same key.

INSERT INTO test_tbl_summing VALUES (1, 1), (1, 2), (2, 1);

Query before compaction — rows not yet aggregated.

SELECT * FROM test_tbl_summing;

┌─key─┬value─┐
│  1  │  1   │
│  1  │  2   │
│  2  │  1   │
└─────┴──────┘

Force compaction.
```
OPTIMIZE TABLE test_tbl_summing FINAL;
```

Query with GROUP BY and SUM() to get correct results.

SELECT key, SUM(value) FROM test_tbl_summing GROUP BY key;

┌─key─┬value─┐
│  1  │  3   │
│  2  │  1   │
└─────┴──────┘

AggregatingMergeTree

AggregatingMergeTree is a more flexible pre-aggregation engine that supports any aggregate function, not just SUM. It uses the special AggregateFunction data type and a two-phase query pattern:

Write phase: use -State suffix functions (for example, sumState, uniqState) to store intermediate aggregate states.
Query phase: use -Merge suffix functions (for example, sumMerge, uniqMerge) to finalize the aggregation.

AggregatingMergeTree is used in two ways: with a materialized view (which automates the write phase), or directly with AggregateFunction columns.

Note

For more information, see AggregatingMergeTree.

Example 1: AggregatingMergeTree with a materialized view

Create a detail table.

CREATE TABLE visits
(
    UserID UInt64,
    CounterID UInt8,
    StartDate Date,
    Sign Int8
)
ENGINE = CollapsingMergeTree(Sign)
ORDER BY UserID;

Create a materialized view that pre-aggregates the detail table using -State functions.

CREATE MATERIALIZED VIEW visits_agg_view
ENGINE = AggregatingMergeTree()
PARTITION BY toYYYYMM(StartDate)
ORDER BY (CounterID, StartDate)
AS SELECT
    CounterID,
    StartDate,
    sumState(Sign)    AS Visits,
    uniqState(UserID) AS Users
FROM visits
GROUP BY CounterID, StartDate;

Insert data into the detail table. The materialized view is updated automatically.

INSERT INTO visits VALUES (0, 0, '2019-11-11', 1);
INSERT INTO visits VALUES (1, 1, '2019-11-12', 1);

Query the materialized view using -Merge functions.

Note

Use sumMerge and uniqMerge — the plain sum and uniq functions cannot process AggregateFunction columns and will return an error.

SELECT
    StartDate,
    sumMerge(Visits) AS Visits,
    uniqMerge(Users) AS Users
FROM visits_agg_view
GROUP BY StartDate
ORDER BY StartDate;

┌──StartDate──┬─Visits─┬─Users──┐
│  2019-11-11 │   1    │   1    │
│  2019-11-12 │   1    │   1    │
└─────────────┴────────┴────────┘

Example 2: AggregatingMergeTree with AggregateFunction columns

Create a detail table.

CREATE TABLE detail_table
(
    CounterID UInt8,
    StartDate Date,
    UserID UInt64
) ENGINE = MergeTree()
PARTITION BY toYYYYMM(StartDate)
ORDER BY (CounterID, StartDate);

Insert data.

INSERT INTO detail_table VALUES (0, '2019-11-11', 1);
INSERT INTO detail_table VALUES (1, '2019-11-12', 1);

Create an aggregate table with an AggregateFunction column.

CREATE TABLE agg_table
(
    CounterID UInt8,
    StartDate Date,
    UserID AggregateFunction(uniq, UInt64)
) ENGINE = AggregatingMergeTree()
PARTITION BY toYYYYMM(StartDate)
ORDER BY (CounterID, StartDate);

Insert aggregated data using uniqState.

Note
Insert using a SELECT with -State functions. Direct INSERT INTO ... VALUES with raw values will fail with a type error.
```
INSERT INTO agg_table
SELECT CounterID, StartDate, uniqState(UserID)
FROM detail_table
GROUP BY CounterID, StartDate;
```

Query the aggregate table using uniqMerge.

SELECT uniqMerge(UserID) AS state
FROM agg_table
GROUP BY CounterID, StartDate;

┌─state─┐
│   1   │
│   1   │
└───────┘

Log family

Engines in the Log family are designed for fast writes to small tables (around one million rows) where the entire table is read at once.

Common characteristics of all Log engines:

Data is appended to disk sequentially
DELETE and UPDATE are not supported
Indexes are not supported
Data is not written atomically
INSERT blocks SELECT on the same table

Engine	Concurrent reads	Storage layout	When to use
TinyLog	No	One file per column	Temporary intermediate data where performance is not critical
StripeLog	Yes	All columns in a single file	Higher query performance than TinyLog with fewer file handles
Log	Yes	One file per column	Higher query performance than TinyLog with per-column access

Integrations family

Engines in the Integrations family connect ApsaraDB for ClickHouse to external data sources, either by importing data or by querying it in place.

Engine	Description
Kafka	Reads data from Kafka topics into ApsaraDB for ClickHouse
MySQL	Queries MySQL tables directly from ApsaraDB for ClickHouse using `SELECT` and other operations
JDBC	Connects to data sources via Java Database Connectivity (JDBC) connection strings
ODBC	Connects to data sources via Open Database Connectivity (ODBC) connection strings
HDFS	Reads files in a specified format from the Hadoop Distributed File System (HDFS)

Special family

Engines in the Special family serve specific architectural needs.

Engine	Description
Distributed	Does not store data. Routes queries to multiple shards and aggregates results.
MaterializedView	Stores pre-computed query results. Used to create materialized views.
Dictionary	Exposes dictionary data as a queryable table.
Merge	Does not store data. Reads from multiple tables simultaneously as a virtual union.
File	Uses a local file as the data store.
NULL	Discards all written data. Reads return empty results. Useful as a target for materialized views during testing.
Set	Always stores data in random access memory (RAM).
Join	Stores join data in RAM for fast lookup.
URL	Reads from and writes to remote HTTP or HTTPS endpoints.
View	Stores a `SELECT` query definition but no data. Acts as a non-materialized view.
Memory	Stores data in RAM. Data is lost when the server restarts. Suitable for temporary tables with fewer than 100 million rows and no data persistence requirements. Commonly used for temporary tables in ApsaraDB for ClickHouse.
Buffer	Buffers writes in RAM and flushes to a destination table when configurable thresholds are met.