This document describes Hologres and its key capabilities.
Hologres is Alibaba Cloud's proprietary, one-stop real-time data warehouse engine designed for massive datasets. Seamlessly ingest, update, process, and analyze your data in real time, all while leveraging familiar PostgreSQL-compatible SQL. Hologres delivers petabyte-scale multidimensional OLAP and ad hoc analysis with sub-second interactive query performance, high concurrency, and low latency.
Built for enterprise needs, it offers fine-grained workload isolation and robust security. Crucially, Hologres integrates effortlessly with MaxCompute, Flink, and DataWorks, providing a comprehensive, full-stack solution for both batch and stream data processing, empowering scenarios from real-time data warehousing and fine-grained analytics to marketing profiling and risk control.
Capabilities
Query and analysis in different scenarios
Hologres empowers diverse analytical scenarios with flexible storage formats and indexing. Whether handling simple queries, complex reads, or ad hoc exploration, Hologres' support for row store, columnar storage, and a hybrid row-column format ensures optimal performance. Its Massively Parallel Processing (MPP) architecture distributes SQL query execution across resources, maximizing utilization and delivering high-speed analysis on petabyte-scale datasets.
Sub-second interactive analysis on petabyte-scale data
Hologres' architecture combines a scalable MPP engine, vectorized operators, AliORC compression, and optimized SSD storage. This synergy delivers sub-second interactive performance for even the largest datasets.
High-performance real-time key-value lookups
With primary key indexes on row-store tables and optimized query paths, Hologres achieves hundreds of thousands of QPS for high-performance point lookups and prefix scans. Its real-time update throughput is over 10x faster than open-source options, perfectly suited for dimension table joins and ID mapping in real-time pipelines.
Federated query and data lake acceleration
Hologres integrates seamlessly with MaxCompute for transparent query acceleration (5-10x faster than native reads) and unified analysis of hot/cold data. It also facilitates high-speed data replication (millions of rows/sec) and efficient handling of common OSS data lake formats, simplifying data pipelines.
Semi-structured data analysis
With native JSON support, columnar JSONB, and rich JSON operators, Hologres makes storing and querying JSON almost as efficient as native columnar data.
Real-time data warehousing
Designed for the dynamic needs of real-time data warehousing, Hologres supports high-concurrency real-time writes and updates. It ensures transactional isolation and atomicity, making data immediately queryable after ingestion. This guarantees the freshest data for your simple data models.
High-throughput real-time writes and updates
Hologres integrates natively with computing frameworks like Flink and Spark. Its built-in connectors support high-throughput real-time data ingestion and updates for source tables, sink tables, and dimension tables. It also supports complex operations like multi-stream joins.
WYSIWYG development
Data is queryable immediately after writing. Hologres supports a three-tier structure (DB, schema, table), views, and native
UPDATE,DELETE, andUPSERToperations. It offers a rich feature set including joins, nesting, and window functions. Hologres also natively supports semi-structured JSON analysis and enables real-time database replication from sources like MySQL.End-to-end, event-driven pipeline
Hologres can publish table update events through a binlog. By consuming the Hologres binlog with Flink, you can build a real-time data warehouse end to end. This reduces data processing latency while maintaining proper data governance for each data warehouse layer.
Real-time materialized view
Simplify your development tasks with real-time materialized views. Data is written and aggregations are updated in real time, providing complete support for all your real-time processing needs.
Enterprise-grade operations and maintenance
Hologres supports fine-grained controls for computing workloads and access permissions. It offers a rich set of monitoring metrics, supports elastic scaling of computing resources, and enables hot upgrades to meet enterprise security and reliability requirements.
Data security
Hologres provides fine-grained access control and supports Bring Your Own Key (BYOK) data-at-rest encryption and data masking. It also supports IP whitelists and multiple authentication systems like RAM, STS, and independent accounts. Hologres is PCI-DSS compliant and supports backup and recovery.
Workload isolation
You can configure multiple Hologres instances in a primary-secondary architecture, where instances share a single copy of storage but have isolated computing resources. This isolates write operations from read operations and separates queries from serving workloads. This model also simplifies fault management and supports fast, automatic recovery of failed nodes. Data is stored reliably with three-replica redundancy on Pangu, which eliminates the need for local disks.
Self-service O&M
Hologres has built-in diagnostic information, including query history and metastore tables. You can use this data to quickly identify system bottlenecks and potential risks, which simplifies self-service operations and maintenance.
Ecosystem and extensibility
Hologres is compatible with the PostgreSQL ecosystem and seamlessly integrates with big data compute engines and DataWorks, enabling intuitive development.
PostgreSQL ecosystem compatibility
Hologres is compatible with the PostgreSQL ecosystem and provides JDBC/ODBC interfaces to easily connect to third-party ETL and BI tools, including Quick BI, DataV, Tableau, and FineReport. It also supports GIS spatial data analysis and an Oracle function extension pack.
DataWorks integration
Hologres is deeply integrated with DataWorks, which provides a graphical, intelligent, one-stop tool for data warehousing and interactive analysis. This integration supports enterprise-grade capabilities such as asset management, data lineage, real-time data replication, and data services.
Hadoop integration
Hologres offers high-throughput Hive and Spark connectors for ingesting Hadoop-processed data into Hologres for serving. It also provides accelerated reads from foreign tables in OSS-HDFS format and is compatible with formats like Hudi and Delta.
Vector search
Hologres is tightly integrated with the Platform for AI (PAI). It includes Proxima, a built-in vector search plugin from DAMO Academy, enabling real-time feature storage, recall, and vector search capabilities.