New Features of PolarDB-X HTAP: Clustered Columnar Index (CCI)

By Xuandi

Introduction

The traditional OLTP and OLAP solutions are based on simple read-write splitting or ETL models, extracting data from online databases into the data warehouse in the way of T+1 for calculation. With the explosive growth of data, these solutions show their weaknesses: they have high costs in storage, links, and maintenance, and have poor real-time performance. To address the challenges of explosive data growth, PolarDB-X provides the new feature Clustered Columnar Index (CCI) based on object storage. CCI supports real-time synchronization of row-oriented data to column-oriented storage and offers the following features:

The integrated capabilities of online transaction processing and real-time data analysis meet the requirements of scenarios with mixed OLTP and OLAP.
Combined with the distributed architecture of PolarDB, CCI supports intelligent routing and MPP (Massively Parallel Processing) query acceleration. The computing layer accurately identifies TP (Transaction Processing) and AP (Analytical Processing) traffic and intelligently routes them to different storage media. The layer also enables MPP by default on the AP path to scan CCI, thus improving query and analysis capabilities.
The Delta + Main model is used to support real-time updates in seconds. The combined MVCC multi-version technology ensures that consistent snapshot data can be read at any time.

PolarDB-X provides a cost-effective, real-time, and transparent HTAP (Hybrid Transaction/Analytical Processing) solution that is fully compatible with MySQL.

Prerequisites

Only Enterprise Edition instances support CCI.
The instance version must meet the requirement >= polarx-kernel_5.4.19-16989811_xcluster-20231019

Architecture

Hybrid Row-Column Storage

Key Components

Compute node (CN): As the entry to the system, CN adopts a stateless design, including SQL parser, optimizer, and executor. CN supports distributed routing, computing, dynamic scheduling, distributed transaction coordination by using 2PC, and global secondary index maintenance. It also provides enterprise-level features such as SQL throttling and three-role mode.
Data node (DN): It permanently stores row-oriented data and also ensures high reliability and strong consistency based on the Paxos protocol. DN maintains the visibility of distributed transactions through MVCC. It provides computing push-down to meet the push-down requirements of distributed computing, such as Project, Filter, Join, and Agg.
Global Meta Service (GMS): GMS maintains the global strong consistency of meta information, including tables, schemas, and statistics. GMS also maintains security information such as accounts and permissions, and provides a global timestamp distributor named Timestamp Oracle (TSO).
Change Data Capture (CDC): CDC provides incremental subscription that supports MySQL binary log formats and protocols, and primary/secondary replication that supports MySQL replication protocols.
Columnar engine Columnar engine provides persistent CCI and consumes binary logs of distributed transactions. Columnar builds CCI based on the OSS media to meet the requirements of real-time updates. Combined with CN, it provides snapshot-consistent query capabilities of column-oriented storage.

Column-Oriented Storage Architecture

Architecture Concepts

With the popularization of cloud-native technology, new generations of cloud-native data warehouses represented by Snowflake and the database HTAP architecture are emerging. It can be expected that the HTAP of hybrid row-column storage will become a standard capability of databases in the future. The design of column-oriented storage in the current databases needs more consideration about the low cost, ease of use, and high performance in the future.

PolarDB-X provides CCI. By default, row-oriented tables have primary/secondary key indexes. CCI is an additional column-based secondary index covering all columns of row-oriented storage. A table can have both row- and column-oriented storage data.

Architecture Characteristics

1. Cloud-Native Architecture (Storage-Compute Separation & Low Cost)

PolarDB-X CCI uses cloud-native OSS as the main data storage (the cost is only 1/6 to 1/10 of the local disks). CCI is competitively low in cost combined with the high compression of column-oriented data (which can be compressed 3 to 5 times). In the scenario of building HTAP hybrid row-column storage, the additional cost of column-oriented storage can be controlled within 5 to 10 percent of the row-oriented storage cost.

The storage of PolarDB-X CCI uses the two-layer model of Delta + Main (an LSM-like structure) and marked deletion technology. Therefore, CCI ensures that OSS can also update data with high concurrency. At the same time, the path for column-oriented storage to read OSS uses a multi-layer data local cache and multi-level statistics mechanism to minimize unnecessary remote OSS access.

2. Distributed Technology (Linear Scalability)

Traditional distributed databases usually build column-oriented storage based on the multi-replica mechanism of Paxos/Raft. However, the query scenarios of OLTP and OLAP have different demands and dependencies on resources. The linear scalabilities of TP and AP restrict each other and affect the performance because of the strong consistency partitioning policy and scaling mechanism between different replicas.

PolarDB-X CCI synchronizes binary logs of distributed transactions in real time and converts rows to columns (M:N). It also defines distributed partition keys and sort keys that are unique to CCI. Combined with the distributed parallel technology, CCI provides linear scalability of column-oriented storage queries. At the same time, row- and column-oriented storage are isolated from each other, and storage and computing are more elastic. Therefore, under the distributed condition, column-oriented storage can better pursue excellent linear scalability.

3. Read-Write Splitting (Serverless & Pay by Read)

The components of PolarDB-X CCI use the read-write splitting architecture. Therefore, PolarDB-X CCI has column-oriented write and read. The write nodes of column-oriented storage (columnar nodes) are stateful. They do not send write requests to the external. Instead, Columnar batches CCI data by using group commit. The read nodes of column-oriented storage (CNs) are stateless. They obtain the metadata of the column-oriented storage by using GMS nodes and directly access the CCI data on OSS.

For PolarDB-X instance creation, the system provides column-oriented write components (columnar nodes which keep running and synchronizing CCI) by default. You only need to create CCI by using the DDL syntax, and then the CCI data will be automatically built and updated in real time. Row-oriented index and CCI can be accessed by using the primary instance or purchasing a new read-only instance (CN). Stateless CN is suitable for serverless mode. You only need to pay for the use of CCI reads.

4. Hybrid Row-Column Storage (Ease of Use & Integrated and Vectorized SQL Engine)

PolarDB-X reuses the SQL engine developed by CN to provide the complete read capability of column-oriented storage. A cost optimizer is built for hybrid row-column scenarios. It intelligently identifies routes based on costs, forwarding OLTP queries to row-oriented queries, and forwarding OLAP queries to column-oriented queries. It also supports access to different row- and column-oriented queries at the SQL operator level. The hybrid row-column capability of HTAP is fully implemented to support unified access to a set of SQL engines.

PolarDB-X is fully vectorized. TableScan reads data from column-oriented storage and uses the data structure of columnar chunks. Subsequent operator computing also uses the in-memory columnar chunk structure to improve query performance based on end-to-end vectorization. In addition, TableScan for row-oriented storage is also dynamically converted into columnar chunks to implement hybrid row-column queries based on a unified data structure.

5. An All-in-One Warehouse (Zero-ETL)

Traditional data warehouses synchronize data through ETL. For complex OLAP queries, the warehouses use parallel computing architectures such as MPP/BSP. However, high-concurrency online queries such as serving scenarios have obvious resource concurrency bottlenecks. Therefore, online queries flow back with data to OLTP databases.

PolarDB-X combines AnalyticDB to provide an all-in-one warehouse. Based on the "Zero-ETL" design concept, it shares the same CCI data. Based on the data warehouse capabilities of ADB, it meets the requirements of data aggregation and data association queries of multiple parties to provide traditional warehouse and lake analysis. At the same time, for concurrent queries of online data, the HTAP hybrid row-column architecture of PolarDB-X can be used to avoid traditional data ETL in the whole process.

How It Works

Syntax

PolarDB-X extends the MySQL DDL syntax to define the CCI syntax. The CCI syntax is used in the same way as creating indexes in MySQL. For more information, see CCI syntax documentation.

CLUSTERED COLUMNAR: the keyword to specify the type of the added index as CCI
Index Name: the name of the index to specify the index in the SQL statement
Sort Key: the sort key of the index (data is stored in order by this column in the index file)
Index Partitioning Clause: the partitioning algorithm of the index (the same as the syntax of the partitioning clause in CREATE TABLE.)

For example:

# Create a table
CREATE TABLE t_order (
  `id` bigint(11) NOT NULL AUTO_INCREMENT,
  `order_id` varchar(20) DEFAULT NULL,
  `buyer_id` varchar(20) DEFAULT NULL,
  `seller_id` varchar(20) DEFAULT NULL,
  `order_snapshot` longtext DEFAULT NULL,
  `order_detail` longtext DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `l_i_order` (`order_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 partition by hash(`order_id`) partitions 16;


# Create CCI
CREATE CLUSTERED COLUMNAR INDEX `cc_i_seller` ON t_order (`seller_id`) partition by hash(`order_id`) partitions 16;

Base Table: "t_order" is a partitioned table that is hashed by the "order_id" column.
CCI: "cc_i_seller" is sorted by the "seller_id" column and hashed by the "order_id" column.
Index Definition Clause: CLUSTERED COLUMNAR INDEX cc_i_seller ON t_order (seller_id) partition by hash(order_id) partitions 16.

CCI Building

CCI is built by columnar engine nodes. The data built is eventually stored in a shared object in the CSV + ORC data format. CSV stores real-time incremental data. Excessive incremental data are quickly compacted and dumped into the ORC format. The distributed PolarDB improves both the CSV and ORC storage formats. The PolarDB-X keeps the open source characteristic of the native formats while ensuring that the two formats fully express the data protocols of MySQL.

From the perspective of data synchronization, the two parallel synchronizing paths of full snapshot reading and incremental synchronization complete the data synchronization together. CCI is built before data are imported. In this scenario, only incremental data is synchronized. Columnar nodes consume binary log data and build indexes. Part of the data are imported before CCI is built, and then the rest of the data continues to be imported. In this scenario, except for the path of incremental data synchronization, columnar nodes consume the existing full data at the same time. The parallel consumption of incremental and full data improves the efficiency of building column-oriented storage.

From the perspective of the hierarchy, the columnar engine nodes use the two-layer model of Delta + Main (an LSM-like structure). By using the marked deletion technology, the nodes ensure low latency in data synchronization between row- and column-oriented storage, and guarantee real-time updates in seconds. Data are written to MemTable in real time. In a cycle of group commit, the data are stored in a local CSV file and appended to the tail of the corresponding CSV file on OSS. This file is called the delta file. The CSV file on the OSS does not exist for a long time. It is converted into the ORC file by the compaction thread from time to time.

Query Acceleration

CN bears the traffic of PolarDB-X for query analysis.

As the preceding figure shows, the entire query acceleration includes three levels: the optimizer, the executor, and the storage engine. From the perspective of the optimizer, the PolarDB-X provides a cost-based optimizer for hybrid row-column scenarios. It identifies routes based on the cost, forwarding TP queries to row-oriented queries, and forwarding AP queries to column-oriented queries.

From the perspective of the executor, the PolarDB-X provides an integrated executor for hybrid row-column scenarios and can support HTAP scenarios. At the same time, the operator layer is vectorized to support MPP acceleration. In complex query scenarios, the layer makes full use of multi-node resources for parallel computing to meet the requirement of high throughput.

The executor layer also introduces the local cache technology to offset the network latency caused by the storage-compute separation architecture. The technology loads hot data to local disks in real time to ensure low-latency queries. From the perspective of the storage engine, the building of CCI ensures the atomicity of transaction commit and the transactional consistency of the queried data.

Services

After the introduction of the columnar engine, the PolarDB-X adds the column-oriented read-only instance specifications to the original primary instance and read-only instance specifications.

Primary Instance: By default, only row-oriented data can be queried. However, together with read-only instances, it provides the cluster endpoint of the primary instance for transparent and consistent read-write splitting. The primary instance retains the ability to directly query column-oriented data. In the future, the intelligent routing and hybrid row-column query will be gradually released.
Read-only Instance: It supports queries of row-oriented read-only data and CCI data. An independent read-only connection address is provided. Therefore, an application can connect the data separately and complete the read-write splitting independently.
Column-Oriented Read-Only Instance: It only supports queries of the CCI data. An independent read-only connection address is provided and an application can connect the data separately. Column-oriented read-only instances only include CN, which is lower in price. For more information, see Pricing of column-oriented read-only instances.

Use Scenarios

PolarDB-X CCI provides an all-in-one HTAP experience and can be adapted to various business scenarios:

Real-time data analysis requirements for online data within seconds, such as real-time report
Dedicated data warehouse: Based on the massive data storage capability of PolarDB-X, CCI aggregates multiple upstream data sources and uses them as dedicated data warehouses
ETL computing scenarios: The powerful and flexible computing capabilities provided by PolarDB-X CCI

PolarDB-X and its CCI feature tackle the hybrid TP and AP scenarios with one database product. At the same time, users achieve a transparent and cost-effective HTAP solution based on cloud-native OSS and intelligent routing technologies.

Performance Data

Test Sets:

TPC-H 100 GB Hardware Environment

CPU: Intel(R) Xeon(R) Platinum 8269CY CPU @ 2.50GHz

Software

OS: Linux version 5.10.134-15.al8.x86_64

For more operations about adding CCI online in TPC-H tables, see CCI TPC-H performance white paper.


create clustered columnar index `nation_col_index` on nation(`n_nationkey`) partition by hash(`n_nationkey`) partitions 1;

create clustered columnar index `region_col_index` on region(`r_regionkey`) partition by hash(`r_regionkey`) partitions 1;

create clustered columnar index `customer_col_index` on customer(`c_custkey`) partition by hash(`c_custkey`) partitions 24;

create clustered columnar index `part_col_index` on part(`p_size`) partition by hash(`p_partkey`) partitions 24;

create clustered columnar index `partsupp_col_index` on partsupp(`ps_partkey`) partition by hash(`ps_partkey`) partitions 24;

create clustered columnar index `supplier_col_index` on supplier(`s_suppkey`) partition by hash(`s_suppkey`) partitions 24;

create clustered columnar index `orders_col_index` on orders(`o_orderdate`,`o_orderkey`) partition by hash(`o_orderkey`) partitions 24;

create clustered columnar index `lineitem_col_index` on lineitem(`l_shipdate`,`l_orderkey`) partition by hash(`l_orderkey`) partitions 24;

The following are the running results of TPC-H based on CCI. Unit: second.

The preceding table shows the performance improvement ratio of TPC-H 100 GB based on CCI with the proportional changes of specifications, reflecting the distributed linear scalability.

Community

New Features of PolarDB-X HTAP: Clustered Columnar Index (CCI)

Introduction

Prerequisites

Architecture

Hybrid Row-Column Storage

Key Components

Column-Oriented Storage Architecture

Architecture Concepts

Architecture Characteristics

How It Works

Syntax

CCI Building

Query Acceleration

Services

Use Scenarios

Performance Data

Test Sets:

Software

Read previous post:

Read next post:

ApsaraDB

You may also like

Comments

ApsaraDB

Related Products

PolarDB for Xscale

PolarDB for PostgreSQL

PolarDB for MySQL

Database for FinTech Solution