What is Hologres? - Hologres - Alibaba Cloud Documentation Center

This topic describes the overview and features of Hologres.

Hologres is a unified real-time data warehousing service developed by Alibaba Cloud. You can use Hologres to write, update, process, and analyze large amounts of data in real time. Hologres supports standard SQL syntax, is compatible with PostgreSQL, and supports most PostgreSQL functions. Hologres supports online analytical processing (OLAP) and ad hoc analysis for up to petabytes of data, and provides high-concurrency and low-latency online data services. Hologres supports fine-grained isolation of multiple workloads and enterprise-level security capabilities. Hologres is deeply integrated with MaxCompute, Realtime Compute for Apache Flink, and DataWorks, and provides full-stack online and offline data warehousing solutions for enterprises.

Hologres is designed to provide a real-time data warehouse engine that delivers high performance, high reliability, cost efficiency, and high scalability. Hologres provides real-time data warehousing solutions that help manage large amounts of data and interactive query services that can respond in sub-seconds. In most cases, Hologres is used in scenarios such as construction of real-time data mid-ends, fine-grained analysis, self-service analysis, marketing profiling, audience grouping, and real-time risk control.

Features

Queries and analysis in multiple scenarios
Hologres supports multiple index types and storage models such as row-oriented storage, column-oriented storage, and hybrid row-column storage. Hologres also supports diversified queries and analytics, such as simple queries, complex queries, and ad hoc queries. Hologres uses a massively parallel processing (MPP) architecture to process SQL statements in distributed mode. This helps improve resource utilization and accelerates the analysis of large amounts of data.
- Interactive analysis in sub-seconds
  Hologres performs parallel computing that is based on a scalable MPP architecture, and uses vectorization operators to maximize the computing power of CPUs. Hologres improves I/O throughput for SSD storage based on the AliORC storage format. This way, Hologres supports interactive analysis for up to petabytes of data in sub-seconds.
- Online high-performance point queries by using primary keys
  Primary key indexes in row-oriented tables and optimized shortest path queries are used by Hologres to support hundreds of thousands of online high-performance point queries and prefix scans per second, and support real-time data updates with high throughput. Compared with open source systems, Hologres improves performance by more than 10 times. This way, Hologres is suitable for scenarios such as ID mapping and dimension table associations for real-time data processing.
- Federated queries and data lake acceleration
  Hologres is seamlessly integrated with MaxCompute, allows you to use external tables to accelerate queries on MaxCompute data, and supports automatic import of metadata. Compared with direct queries on MaxCompute data, accelerated queries can be 5 to 10 times faster. Hologres supports the association analysis of hot data and cold data. Hologres synchronizes millions of rows from MaxCompute tables to Hologres tables per second, and allows you to read data from and write data to Object Storage Service (OSS). This simplifies data import to data lakes or warehouses.
- Semi-structured data analysis
  Hologres natively supports the JSON data type, column-oriented storage for data of the JSONB type, and various JSON-related expression operators. This way, the storage and analysis efficiency of JSON-formatted data is similar to the storage and analysis efficiency of native column storage.
Native real-time data warehouse
To tackle frequent data updates, simple data models, and quick data analysis in real-time data warehouses, Hologres supports real-time high-concurrency data writes and updates, as well as isolation and atomicity among transactions. This ensures that data can be queried the moment after it is written.
- Real-time high-throughput data writes and updates
  Hologres is integrated with computing frameworks such as Flink and Spark. Therefore, Hologres allows you to use built-in connectors to write and update large amounts of data in real time. You can use various tables such as source tables, result tables, and dimension tables, and perform complex operations, such as merging multiple data streams.
- A development environment in which what you see is what you get
  Hologres allows you to immediately query data after the data is written. You can query data from a specific table, all tables in a schema, or a database. Hologres allows you to update, delete, or upsert a view for one or more tables. You can join tables, perform nested queries, and use window functions to query data in Hologres. Hologres provides native support for the analysis of semi-structured JSON data and allows you to synchronize full data from sources such as MySQL databases to Hologres with a few clicks and synchronize incremental data in real time.
- Event-driven from end to end
  Hologres allows you to parse the binary logs of table update events. You can use Flink to consume Hologres binary logs in order to realize end-to-end real-time development across warehouse layers. This way, you can reduce the end-to-end latency of data processing while meeting the requirements for tiered data governance.
- Real-time materialized view
  Hologres allows you to define real-time materialized views to simplify data development such as data processing and aggregation. The aggregate view is immediately refreshed after you write data to the source table that corresponds to the aggregate view. This feature is suitable for real-time data processing.
Enterprise-level O&M capabilities
Hologres supports fine-grained management in computing loads and access permissions. It provides diversified monitoring and alerting metrics, and supports elastic scaling of computing resources as well as hot system updates. These secure and reliable solutions can meet enterprise-level O&M requirements.
- Data security
  Hologres provides fine-grained access control policies and data security features, including Bring Your Own Key (BYOK) encryption, data masking, Data Security Guard, and IP address whitelists. Hologres supports multiple authentication systems such as Resource Access Management (RAM), Security Token Service (STS), and independent account systems. Hologres has passed Payment Card Industry Data Security Standard (PCI DSS) assessment. Data backup and restoration is supported.
- Load isolation
  Hologres allows you to configure multiple compute instances in primary/secondary mode. In this mode, data is shared among the compute instances. Computing resources of the compute instances are isolated to enable isolation between data writes and data reads and isolation between queries and other services. This facilitates failure management. Faulty nodes can also be quickly and automatically recovered. Hologres allows you to store data in a highly reliable triplicate redundant storage in Apsara Distributed File System. This way, you do not need to use local disks.
- Self-O&M capabilities
  Hologres provides information about O&M diagnostics, such as query history and metadata warehouse tables. You can quickly identify system bottleneck issues and risks based on the built-in information. The self-O&M capabilities are improved.
Ecosystem and scalability
Hologres is compatible with the PostgreSQL ecosystem and seamlessly integrated with DataWorks. DataWorks is the big data computing engine and big data development platform of Alibaba Cloud. You can get started with Hologres without additional learning.
- Compatibility with PostgreSQL
  Compatible with PostgreSQL, Hologres provides a Java Database Connectivity (JDBC) or Open Database Connectivity (ODBC) interface to connect to third-party extract, transform, load (ETL) tools and business intelligence (BI) tools, such as Quick BI, DataV, Tableau, and FanRuan. Hologres supports spatial data analysis based on geographic information systems (GIS) and Oracle extension functions.
- DataWorks development and integration
  Hologres is seamlessly integrated with DataWorks. Together with DataWorks, Hologres provides visualized, intelligent, and all-in-one data warehouse construction and interactive analysis tools. This way, Hologres provides enterprise-level solutions to data asset management, data lineage management, real-time data synchronization, and data services.
- Integration with Hadoop
  Hologres supports the Hive and Spark connectors. You can import data from Hadoop clusters to Hologres at a high throughput rate for providing external services. Hologres accelerates access to OSS-HDFS external tables. Data in the Apache Hudi or Delta Lake format is supported.
- Vector search engine: Proxima
  Hologres is also integrated with Alibaba Cloud Machine Learning Platform for AI (PAI) and has a built-in vector search engine named Proxima. Proxima supports online real-time feature storage, real-time retrievals, and vector searches.