All Products
Search
Document Center

E-MapReduce:Paimon overview

Last Updated:Jun 26, 2023

Apache Paimon is a data lake storage that allows you to process data in streaming and batch modes. Apache Paimon supports high-throughput data writing and low-latency data queries. Apache Paimon is compatible with common compute engines of Alibaba Cloud E-MapReduce (EMR), such as Flink, Spark, Hive, and Trino. You can use Apache Paimon to deploy your data lake storage service on Hadoop Distributed File System (HDFS) or Alibaba Cloud Object Storage Service (OSS) in an efficient manner, and connect the data lake storage service to the preceding compute engines to perform data lake analytics.

Apache Paimon provides the following features:

  • Builds a low-cost lightweight data lake storage service based on HDFS or OSS.

  • Supports read and write operations on large-scale datasets in streaming and batch modes.

  • Supports batch queries and online analytical processing (OLAP) queries within minutes or even seconds.

  • Supports consumption and generation of incremental data. Apache Paimon can be used for tiered storage of a traditional data warehouse and a streaming data warehouse.

  • Supports data pre-aggregation to reduce storage costs and downstream computing workloads.

  • Supports backtracking of data of historical versions.

  • Supports efficient data filtering.

  • Supports table schema changes.

For more information, see Apache Paimon.