Community Blog Column-Store Meta Scans and Provisioning IoT Clusters

Column-Store Meta Scans and Provisioning IoT Clusters

Learn how AnalyticDB for PostgreSQL provides the meta scan function, and ApsaraDB for PostgreSQL meets the requirements of IoT applications.

Learn How the Column-Store Meta Scan Accelerates Query Performance in AnalyticDB for PostgreSQL

Alibaba Cloud AnalyticDB for PostgreSQL is a massively parallel processing based data warehouse service on Alibaba Cloud. Its kernel was evolved from the Greenplum open-source database based on the PostgreSQL engine, with improved analysis performance. The column-store meta scan function is key to improving database query performance.

AnalyticDB for PostgreSQL supports column stores and features high data compression performance and superior query performance. However, given that the database system reads data from the entire column or creates a B-Tree index when processing query criteria with a high filter rate, two constraints affect indexing. The first is that indexes are not compressed, which can lead to data bloat. The other constraint is that large result sets lead to index failure and a higher index cost than that of tablescan.

To combat these constraints, AnalyticDB for PostgreSQL provides the meta scan function, which features excellent filter performance and extremely small storage usage.

As a further enhancement of column store tables, the meta scan function collects the maximum and minimum values from the columns of a column store table to implement a filtering function similar to indexing based on the block-offset mechanism. The metadata stores the maximum value, minimum value, and offset, all of which are collected from each block of columns. The maximum and minimum values are used for conditional filtering. The block offset is used to skip the blocks that do not meet filter conditions. This minimizes the I/O of reading column stores and the CPU consumption of block decompression, ultimately improving query performance. In a TPC-H 1 TB test, query performance increased by 29% and can be increased by up to 500% in queries based on conditions with a high filter rate, such as Q06, Q12, Q14, and Q15 of TPC-H.

In AnalyticDB for PostgreSQL, the column-store meta scan function is used to filter out non-compliant blocks based on the maximum and minimum values of row group and batch group metadata so that only compliant blocks need to be read. This reduces the scan I/O and the CPU consumption during block decompression. Therefore, the meta scan function can significantly improve the performance of queries with strong filter conditions.

AnalyticDB for PostgreSQL is built on the open-source databases PostgreSQL and Greenplum, and is maintained by the ApsaraDB OLAP database team. In the near future, Alibaba Cloud will make all of the functions connected with AnalyticDB for PostgreSQL will be made open source as a contribution to the open-source community.

Provisioning IoT Clusters with ApsaraDB for PostgreSQL

Cloud Platforms – Accelerating IoT Applications and Investments

Cloud-based data services are playing a vital role in providing a large and powerful platform to store, manage, and analyze lots of unstructured data for IoT applications. These services are designed to facilitate massive volumes of unprocessed, non-relational data and to perform data analytics, control, and monitoring tasks in an efficient and cost-effective way, which would not be possible otherwise. Moreover, the complexity of data keeps varying, such as the conclusion extracted from the unprocessed data, which was received as input to these IoT devices is mainly in the form of trends, patterns, and statistical visuals to help businesses behave proactively to facilitate their users. Think of this from an automobile's perspective, who might have to keep a track of their systems that are installed in thousands (if not millions) of vehicles on the road.

The technology stack for IoT specific applications is vast, from automating the coffee machine as per the user preference, all the way to the development of smart devices for home automation and automobile industries. Therefore, these cloud-driven data services are able to meet a diversified range of ever growing data requirements.

So, this article focuses on how PostgreSQL by ApsaraDB designed and supported by Alibaba Cloud meets the requirements of IoT applications while maintaining the needs of varying data structures and datasets. Furthermore, we will discuss how IoT clusters are formed and designed to cater to these essential data requirements received from different connected devices.

Here is a default purchase page that you see when you are going to buy the product:

PostgreSQL by ApsaraDB

As this screenshot demonstrates, the product purchase page only asks for the basic information, most of the operational headache is taken care of by the infrastructure. Take the "Edition" as an example, the "High-availability" configures the underlying topology for the resources—master/slave nodes, and backups. Same applies to other settings, such as the networks to deploy this database in.

Provisioning the IoT clusters with Alibaba Cloud ApsaraDB

So far, we have discussed the significance of using cloud service providers to meet the database challenges for IoT applications. In this section, we will highlight why Alibaba Cloud PostgreSQL is the most suitable and reliable database service to meet the super dynamic requirements associated with IoT devices and applications—not only as compared to other cloud-based database system but also when it comes to the relational databases offered by Alibaba Cloud ApsaraDB too. And then why Alibaba Cloud is the most viable cloud service to fulfill these intensified capabilities of Postgres by leveraging the streamline features of a cloud platform. However, if you want to have an overview of the basic functionalities of PostgreSQL supported by Alibaba Cloud ApsaraDB before moving further to explore Postgres capabilities for IoT applications, refer this handy and explanatory documentation - ApsaraDB RDS for PostgreSQL—or read my previous post that explore the deployment options made available on Alibaba Cloud for PostgreSQL.

Related Products

AnalyticDB for PostgreSQL

ApsaraDB AnalyticDB for PostgreSQL is an online MPP (Massively Parallel Processing) data warehousing service based on the open source Greenplum Database.
ApsaraDB AnalyticDB for PostgreSQL provides online expansion and performance monitoring service to free your team from complicated MPP cluster operations and management (O&M). This enables database administrators, developers and data analysis to focus on upgrading enterprise productivity through SQL development.

ApsaraDB RDS for PostgreSQL

An on-demand database hosting service for PostgreSQL with automated monitoring, backup and disaster recovery capabilities

Related Documentation

Scenarios and benefits of AnalyticDB for PostgreSQL


  1. Data warehousing service
  2. Big data analytics platform
  3. Data lake analytics


  1. ETL for offline data processing
  2. Online high-performance query
  3. Multi-model data analysis

Why ApsaraDB for RDS

ApsaraDB for RDS allows you to quickly build a stable and reliable database system. It has the following advantages compared with user-created databases:

  1. Cost-effective and easy to use. You can choose flexible billing methods, change database configurations on demand, and obtain an out-of-the-box database service.
  2. High performance, including suggestions on parameter and SQL query optimization.
  3. High-availability architecture and multiple disaster recovery solutions.
  4. High security. Various preventive measures are used to protect data.

ApsaraDB for RDS has significant advantages in cost-effectiveness, availability, reliability, ease of use, and performance. Its cost is one third of the expense to build databases on ECS instances and one tenth of the expense to build databases on physical servers.

0 0 0
Share on

Alibaba Clouder

2,630 posts | 680 followers

You may also like