All Products
Search
Document Center

E-MapReduce:New capability in data lakehouse scenarios: EMR supports Hologres and MaxCompute data sources

Last Updated:Apr 27, 2023

Alibaba Cloud E-MapReduce (EMR) allows you to access Hologres and MaxCompute tables by using the Spark and Trino compute engines. This new capability ensures that EMR provides a more comprehensive data lakehouse solution and more efficient and stable data analysis experience.

Background information

Hologres is an all-in-one real-time data warehouse service developed by Alibaba Cloud and allows you to write, update, process, and analyze large amounts of data in real time. It is compatible with PostgreSQL and therefore supports standard SQL syntax. Hologres supports online analytical processing (OLAP) and ad hoc analysis for petabytes of data, and provides high-concurrency, low-latency online data services. Hologres empowers enterprises with full-stack online and offline data warehouse solutions.

MaxCompute is an enterprise-level cloud data warehouse that uses the software as a service (SaaS) model. MaxCompute is suitable for scenarios that require data analysis. It provides a fast, fully managed online data warehousing service in a serverless architecture. MaxCompute eliminates the limits of traditional data platforms in terms of resource extensibility and elasticity, minimizes O&M costs, and allows you to efficiently analyze and process large amounts of data at low costs.

Data lakehouse solution

EMR supports Hologres and MaxCompute data sources. The following benefits are provided:

  • Efficient access to online data: You can directly use online data for big data analysis, without the need to export data from Hologres or MaxCompute tables to a centralized storage service such as Object Storage Service (OSS). This prevents data loss and security issues that may be caused by the data export operation and accelerates data processing and analysis. In addition, you can flexibly use data and quickly respond to business requirements.

  • Lower data processing costs: Extract, transform, and load (ETL) operations on data are not required, and you do not need to additionally store and manage data. This reduces the overall data analysis costs.

Limits

This topic applies to EMR V3.45.1 or a later minor version, or EMR V5.11.1 or a later minor version.

Use Spark to configure a Hologres data source

For more information, see Use Spark to access Hologres.