ApsaraDB HybridDB for PostgreSQL is an online MPP (Massively Parallel Processing) data warehousing service based on the open source Greenplum Database.
ApsaraDB HybridDB provides online expansion and performance monitoring service to free your team from complicated MPP cluster operations and management (O&M). This enables database administrators, developers and data analysts to focus on upgrading enterprise productivity through SQL development.
- Superior Performance
ApsaraDB HybridDB enables mixed use of row and column stores. Column stores are 100 times faster than OLAP analytics between row stores.
Supports high-performance parallel data imports into OSS, eliminating the bottleneck of single-channel imports.
Supports rich OLAP SQL syntax and functions, as well as numerous Oracle functions. Industry-popular BI software can be unrolled for ApsaraDB HybridDB directly.
Link ApsaraDB HybridDB with ApsaraDB for RDS to offer OLTP+OLAP (HTAP) hybrid transaction analytics solutions.
- Stable and Reliable
Supports distributed database ACID transactions where all data is synchronized into two copies on two nodes. Also features distributed deployment with ternary protection of the segment, server and cabinet, safeguarding security of the data infrastructure.
Features distributed deployment with ternary protection of the segment, server and cabinet, safeguarding security of important data infrastructure.
- Flexible Scalability
Scale the computing unit (CPU, memory, and storage space) and increase the OLAP performance to handle hundreds of terabytes.
Supports transparent OSS data operations. Cold data of non-online analytics can be stored in OSS with the storage capacity being scaled as needed. By incorporating data compression based on External Tables this solution greatly reduces production costs.
Distributed database on ACID.
Based on distributed MPP (Massively Parallel Processing) architecture.
Storage and computing capabilities capable of linear expansion with increase of segments.
Realizes full potential of OLAP computing efficiency.
Distributed SQL OLAP statistics and window functions.
PL/pgSQL and PL/JAVA stored procedures.
Machine Learning and Analysis
MADlib machine learning base on SQL.
Accords with international OpenGIS standards in its geographic data hybrid analysis.
JSON data type analysis.
HyperLogLog algorithm analysis.
Supported by popular ETL tools with base on PostgreSQL/Greenplum JDBC drive.
MySQL users can be incremental synchronization data with 'rds_dbsync'.
Data queries in standard SQL syntax base on OSS External Table.
OSS External Table supports data compression to reduce production costs.
A maximum of 1,000 server IP addresses are allowed in IP whitelist configuration.
Real-time monitoring at the network access to active DDoS attacks.
Supports real-time analytics on GIS data in SQL syntax to assist in LBS statistics for IoT devices and other Internet-based statistics.
Supports real-time analytics on JSON, XML and fuzzy strings data in SQL syntax to help financial and government organizations, and enterprises achieve message data processing and fuzzy text matching.
How it works
- One-time Development
- IoT Analytics (JSON+GIS)
- Internet Approximating (HyperLogLog)
- OLTP & OLAP
ISVs (Independent Software Vendor) can switch between MPP system applications in the on-premise and cloud environments. Businesses with on-premise infrastructure can leverage Greenplum Database directly, while businesses on the cloud can adopt HybridDB directly.
Developers only need to program the application once and the application will be able to run on both traditional and cloud platforms. At the same time, on-premise and cloud schemas are both connectable through PostgreSQL generic drivers, facilitating business communication with more platforms of the same architecture. You can easily build an integrated “hybrid cloud” data warehouse development platform without worrying about the differences between on-premise and cloud platforms.
IoT Analytics (JSON+GIS)
ApsaraDB HybridDB and PostgreSQL are both nested with the OpenGIS-complying spatial database engine PostGIS for real-time positioning and route planning. PostGIS is supported by ArcGIS, Intergraph and QGIS. You can use simple SQL statements in the application in combination with GIS functions to handle complicated spatial geographical data models (2D and 3D processing supported).
Thanks to Hybrid’s comprehensive data OLAP capability, massive data analytics based on geographic information can be performed to provide decision-making support for IoT, mobile Internet, logistic delivery, smart cities, LBS, O2O business systems, and more.
Internet Approximating (HyperLogLog)
Cardinality estimation is the most common application for Big Data scenarios. Memory demand, post-merging and processing data is major problems that arise during cardinality estimation. Page PV and VU calculations both fall into this demand category.
In SQL, we usually conduct the calculation using COUNT DISTINCT, but performance is low. HyperLogLog improves the query performance of cardinality estimation by 20 to 100 times, with an error rate of approximately 2%. HyperLogLog can be adopted in business scenarios that do not demand precise calculation accuracy. This greatly reduces server computation load and costs.
OLTP & OLAP
A wide range of options are available to import your Greenplum-based data warehouses to ApsaraDB HybridDB. You also don’t need to worry about the complicated O&M for the MPP cluster. At the same time, Alibaba Cloud provides you with a complete set of scaling and availability solutions to enable database administrators, developers and data analysts to focus on upgrading enterprise productivity through SQL and create core value.
With Alibaba Cloud ApsaraDB for RDS, you can realize the high-performance of OLTP applications. RDS supports MySQL, SQL Server and PostgreSQL. In combination with ApsaraDB HybridDB, you are able to integrate OLTP and OLAP databases on the cloud to establish a database architecture platform, including high-concurrency production transactions and decision-making analysis.
1. How can I select the RDS, HybridDB for PostgreSQL and E-MapReduce?
|Based on Greenplum Database||OLAP (On-line Analytical Processing) Data Warehouse||Scale-out storage as needed. MPP distributed architecture analytics and storage performances follow a linear increase curve. Complicated SQL queries can be resolved within seconds or even milliseconds, with concurrency controlled within 500.|
|MySQL / PostgreSQL / SQL Server||OLTP (On-line Transaction Processing) database||They support different database engines and target transaction-based real-time business model processing - CRUD (create, retrieve, update, and delete). Online data less than 2TB is supported.|
|Hadoop, Apache Spark, HBase, Presto, and Storm||Big Data processing solution to quickly process a huge amount of data||Allows you to quickly launch Hadoop clusters within minutes for massive data processing. This way, it simplifies complex big data processing by performing data-intensive tasks for applications involved.|
2. Which ETL tools support ApsaraDB HybridDB for PostgreSQL?
HybridDB is based on the open-source Greenplum Database program and adopts universal JDBC and ODBC interfaces. Therefore, almost all ETL tools that support Greenplum and PostgreSQL also support HybridDB.
3. What is the difference between HybridDB for PostgreSQL and Greenplum Database on which HybridDB is based?
• HybridDB extensions support JSON, HyperLogLog and oss_ext external tables, while the open-source Greenplum Database doesn’t.
• HybridDB is a cloud computing service and configuration is made easy with the click of a mouse with the Alibaba Cloud Console. Users don’t have to worry about the management of data warehouse deployment and expansion, among other complicated configurations.
• HybridDB is structured on the unified management platform of Alibaba Cloud ApsaraDB and imposes limits on the superuser permissions.
4. Is the instance space purchased for HybridDB fully available for use?
The instance space is the truly usable space. HybridDB reserves an additional temporary file space which will not occupy the resources you have purchased.
5. What is the relationship between the HybridDB node type and the Greenplum Database segment?
• A node is composed of one or more segments. The cores, memory and disk space in the node type indicate the truly usable space. Taking a node type of 4 Cores/32GB Mem/2TB HDD for example, this node will contain: 4 segments that has 1 Core/8GB Mem/0.5TB HDD each.
• As an example, for a node type of 4 Cores/32GB Mem/2TB HDD, the corresponding native Greenplum Database contains a primary segment that totals 4 Cores/32GB Mem/2TB HDD, and a mirror segment that has 4 Cores/32GB Mem/2TB HDD. In other words, if you want to build a Greenplum Database cluster of the same configuration, you can prepare physical resources of 8 Cores/64GB Mem/4TB HDD or more (and the additional temporary file space for the cluster).
• All the segments in a node are allocated on the same server. A highly-configured node is conducive to reducing network switching and improving performance. We recommend you to choose a high-configuration node if you require more computing resources. To purchase a super-high configuration node, please contact your customer manager or apply by submitting a “ticket”.
6. How large can the storage capacity of HybridDB be scaled to?
We offer 2048 Cores/16TB Mem/1024TB HDD or higher configuration of computing and storage resources as per your requirements. To purchase a super-high configuration node, please contact your customer manager or apply by submitting a “Ticket” with the Console.