Build AP Columnar Tables in Lindorm for Real-Time Analytics - Lindorm

The Online Analytical Processing (OLAP) resource group in the Lindorm compute engine supports three types of data sources: lake mode column-oriented data, AP mode column-oriented data, and LindormTable data. This topic describes how to create AP mode column-oriented tables.

AP mode column-oriented tables come in two types: detail tables and primary key tables. The following table summarizes the key differences to help you choose the right type for your workload.

	Detail table	Primary key table
Duplicate primary keys	Allowed	Not allowed
Data retention	All imported records	Latest record per primary key
Bucketing policy	Random or hash	Hash only
Best for	Logs, audit trails, raw event data	Real-time analytics, current-state queries

Key concepts

Sort key — One or more columns that determine physical storage order. A sort key dramatically improves range query performance by allowing the engine to skip irrelevant data blocks.

Partition — A logical division of a table based on a partition key. Partitions let the engine skip entire data ranges during queries (partition pruning) and simplify data lifecycle management.

Bucket — A subdivision within a partition. Data in each bucket is called a tablet. Buckets distribute data across nodes to maximize query concurrency and prevent hotspots.

Create a detail table

A detail table retains every imported record, including duplicates. It has no primary key uniqueness constraint, so the same row can appear multiple times. Use a detail table when you need a complete historical record—for example, user access logs, clickstream events, or transaction audit trails.

Define columns and a sort key in the CREATE TABLE statement:

CREATE TABLE user_access (
    uid int,
    name varchar(64),
    age int,
    phone varchar(16),
    last_access datetime,
    credits double
)
ORDER BY (uid, name);

The ORDER BY (uid, name) clause sets uid and name as sort columns. Data is physically sorted and stored by this key, which accelerates range scans and index lookups on those columns.

Create a primary key table

A primary key table enforces uniqueness. When new data arrives with a key that already exists, it overwrites the existing record, keeping only the latest version. Use a primary key table for real-time analytics where you need current-state data—for example, order tracking, inventory counts, or user profile dashboards. This eliminates T+1 data latency for decision-making.

Define the primary key with the PRIMARY KEY clause:

CREATE TABLE orders1 (
    order_id bigint NOT NULL,
    dt date NOT NULL,
    user_id INT NOT NULL,
    good_id INT NOT NULL,
    cnt int NOT NULL,
    revenue int NOT NULL
)
PRIMARY KEY (order_id)
DISTRIBUTED BY HASH (order_id);

Primary key tables support hash bucketing only. Specify the bucketing key with DISTRIBUTED BY HASH() when you create the table.

Primary key tables decouple the primary key from the sort key. This lets you optimize the sort key independently for your most frequent query patterns. For example, if most queries filter by order date and merchant ID, set:

ORDER BY (dt, merchant_id)

This makes range queries on those columns significantly faster, regardless of how the primary key is defined.

Distribute data with partitions and buckets

Setting partitions and buckets distributes data evenly across nodes. Even distribution reduces the data scanned during queries and maximizes query concurrency across the cluster.

Partitions

A partition divides a table into separate management units based on a partition key. The engine uses partition pruning to skip irrelevant partitions at query time, reducing I/O and improving query speed—especially on large datasets.

Specify partitions with the PARTITION BY clause. Partition by one or more columns, or by the result of an expression.

Buckets

Each partition is divided into multiple buckets. Two bucketing methods are available:

Random bucketing — No bucketing key required. Data is randomly distributed within each partition. Use this when you have no clear distribution column.
Hash bucketing — Requires a bucketing key. All rows with the same bucketing key value go to the same bucket, which benefits queries that join or aggregate on that key.

The system sets the number of buckets automatically. No manual configuration is needed.

Specify the bucketing method with the DISTRIBUTED BY clause. The following example partitions data by date and city, then applies hash bucketing on the id column:

CREATE TABLE t_recharge_detail1 (
    id bigint,
    user_id bigint,
    recharge_money decimal(32,2),
    city varchar(20) not null,
    dt varchar(20) not null
)
DUPLICATE KEY(id)
PARTITION BY dt, city
DISTRIBUTED BY HASH(`id`);

Queries that filter by dt or city skip irrelevant partitions. Queries that join on id benefit from hash bucketing on that key.