What is Hologres - Hologres - Alibaba Cloud Documentation Center

This topic describes what Hologres is and its features.

Hologres is a one-stop real-time data warehouse engine developed by Alibaba. It supports real-time writing, updating, transformation, and analysis of massive data. Hologres supports standard SQL (compatible with the PostgreSQL protocol and syntax, and supports most PostgreSQL functions). It also supports multidimensional analysis (OLAP) and ad hoc analysis of petabyte-scale data, high-concurrency and low-latency online data services (Serving), and fine-grained workload isolation with enterprise-grade security. Hologres is deeply integrated with MaxCompute, Flink, and DataWorks to provide an enterprise-level, all-in-one data warehouse solution for both offline and online data.

Hologres is a high-performance, reliable, cost-effective, and scalable real-time data warehouse engine. It delivers sub-second interactive query services for massive data and supports real-time data warehouse solutions. Hologres is widely used in scenarios such as building real-time data mid-ends, fine-grained analysis, self-service analytics, marketing profiles, audience segmentation, and real-time risk control.

Features

Query and analysis for multiple scenarios
Hologres supports multiple storage modes and index types, such as row-oriented, column-oriented, and hybrid row-column storage. This meets diverse analytical query needs, including simple queries, complex queries, and ad hoc queries. Hologres uses a massively parallel processing architecture to process SQL queries in a distributed manner, improving resource utilization and enabling high-speed analysis of massive data.
- Sub-second interactive analysis
  Hologres uses a scalable Massively Parallel Processing (MPP) architecture for fully parallel computing. It uses vectorized operators to maximize CPU computing power. Based on AliORC storage compression and optimized I/O throughput for SSDs, Hologres provides a sub-second interactive analysis experience for petabyte-scale data.
- High-performance online point queries on primary keys
  Using primary key indexes on row-oriented tables and short-path optimization in the query engine, Hologres supports high-performance online point queries and prefix scans with hundreds of thousands of QPS. It also supports high-throughput real-time updates, delivering performance more than 10 times higher than open source systems. This can be used in scenarios such as dimension table joins and ID mapping in real-time data transformation pipelines.
- Federated Query and Data Lake Acceleration
  Hologres is seamlessly integrated with MaxCompute, lets you use external tables to accelerate queries on MaxCompute data, and supports automatic import of metadata. Compared with direct queries on MaxCompute data, accelerated queries can be 5 to 10 times faster. Hologres supports the association analysis of hot data and cold data. Hologres synchronizes millions of rows from MaxCompute tables to Hologres tables per second, and lets you read data from and write data to Object Storage Service (OSS). This simplifies data import to data lakes or warehouses.
- Semi-structured data analysis
  Hologres natively supports the semi-structured JSON data type. It supports columnar storage compression for JSONB and provides a rich set of JSON operators. This makes the storage and analysis efficiency of JSON data nearly as high as that of native columnar storage.
Native real-time data warehouse
To address the characteristics of real-time data warehouses—such as frequent data updates, simple data models, and agile analysis scenarios—Hologres supports high-concurrency real-time writes and updates. It also supports transaction isolation and atomicity, making data queryable as soon as it is written.
- High-throughput real-time writes and updates
  Hologres is natively integrated with computing frameworks such as Flink and Spark. Using built-in connectors, it supports high-throughput real-time data writes and updates. It supports various scenarios involving source tables, sink tables, and dimension tables, and complex operations such as multi-stream merges.
- What You See Is What You Get (WYSIWYG) development
  Data is queryable as soon as it is written. Hologres supports a three-level system of DB, Schema, and Table, and supports views. It natively supports Update, Delete, and Upsert operations, and provides rich expression capabilities such as joins, nesting, and window functions. It also natively supports semi-structured JSON data analysis and one-click, real-time synchronization of entire databases from sources like MySQL.
- End-to-end event-driven architecture
  Hologres supports exposing table update events through binary logging (Binlog). By consuming Hologres Binlog with Flink, you can achieve end-to-end real-time development across data warehouse layers. This shortens the end-to-end latency of data transformation while meeting hierarchical data administration requirements.
- Real-time materialized views
  Hologres supports the definition of real-time materialized views, which simplifies development for tasks like data transformation and aggregation. Data is written in real time, and aggregations are updated in real time, providing comprehensive support for real-time transformation scenarios.
Enterprise-grade O&M capabilities
Hologres supports fine-grained control over computing workloads and access permissions. It provides rich monitoring and alerting metrics, supports scalable computing resources, and allows for hot upgrades to meet enterprise-grade security and reliability requirements for O&M.
- Data security
  Hologres supports fine-grained access control policies, Bring-Your-Own-Key (BYOK) data storage encryption, and data masking. It also supports Data Security Guard, IP address whitelists, and multiple authentication systems such as RAM, STS, and independent accounts. Hologres is PCI-DSS certified. It also supports data backup and recovery.
- Workload isolation
  Multiple compute instances form a primary/replica architecture. The instances share a single copy of storage but have isolated computing resources. This achieves isolation between writes and reads, and between queries and services. It also enables fault management and supports fast, automatic recovery of failed nodes. No local disks are required, as Pangu provides highly reliable, triplicate redundant storage.
- Self-service O&M capabilities
  Hologres has built-in O&M diagnostic information, such as query history and metadata warehouse tables. You can use this query history and table metadata to quickly identify system bottlenecks and potential risks, which enhances self-service O&M capabilities.
Ecosystem and scalability
Hologres is compatible with the PostgreSQL ecosystem and seamlessly integrates with big data compute engines and the intelligent big data development platform, DataWorks. You can start development without needing to learn additional skills.
- Compatibility with the PostgreSQL ecosystem
  Hologres is compatible with the PostgreSQL ecosystem. It provides JDBC/ODBC interfaces for easy integration with third-party ETL and BI tools, such as Quick BI, DataV, Tableau, and FanRuan. It also supports GIS spatial data analysis and the Oracle function extension package.
- DataWorks development integration
  Hologres is deeply integrated with DataWorks. It provides graphical, intelligent, and one-stop tools for building data warehouses and performing interactive analysis. It supports enterprise-grade capabilities such as data assets, data lineage, real-time data synchronization, and data services.
- Hadoop ecosystem integration
  Hologres supports Hive/Spark connectors. Data processed on the Hadoop platform can be imported into Hologres with high throughput and then served externally. Hologres supports accelerated reads from foreign tables stored in the OSS-HDFS format and supports storage formats such as Hudi and Delta.
- Vector retrieval with DAMO Academy Proxima
  Hologres is tightly integrated with Platform for AI. It has a built-in vector retrieval plugin, DAMO Academy Proxima, which supports online real-time feature storage, real-time retrieval, and vector retrieval.