JindoData - E-MapReduce - Alibaba Cloud Documentation Center

JindoData is a suite developed by the Alibaba Cloud big data team for storage acceleration of data lake systems. JindoData provides end-to-end solutions for data lake systems of Alibaba Cloud and other vendors in big data and AI scenarios.

JindoData is built on top of a unified architecture and kernel. JindoData provides the following components: JindoFS (the original JindoFS in block storage mode), JindoFSx (the original JindoFS in cache mode), and JindoSDK. JindoData also provides fully compatible tools such as JindoFuse and Jindo DistCp, and plug-ins.

Precautions

JindoData applies to clusters of EMR V5.14.0 or a later minor version and clusters of EMR V3.48.0 or a later minor version.

JindoData is unavailable for clusters of EMR V5.15.0 or a later minor version and clusters of EMR V3.49.0 or a later minor version. You can use JindoCache for data caching and DLF-Auth for authentication.

JindoFS

JindoFS is a cloud-native storage system based on Alibaba Cloud Object Storage Service (OSS). This system is binary compatible with Apache Hadoop Distributed File System (HDFS) and optimizes the user experience of HDFS and data migration. JindoFS is the upgraded version of JindoFS in block storage mode.

JindoFS is deployed as a service called OSS-HDFS in Alibaba Cloud. OSS-HDFS is deeply integrated with OSS. You can directly use OSS-HDFS without the need to deploy and manage JindoFS in your self-managed clusters.

For more information about OSS-HDFS, see What is OSS-HDFS?

JindoFSx

JindoFSx is the upgraded version of JindoFS in cache storage mode. JindoFSx is a cloud-native data lake storage system suitable for big data and AI scenarios. JindoFSx accelerates the speed of accessing various cloud storage services from big data and AI applications and provides capabilities such as data caching, metadata caching, and P2P acceleration. JindoFSx allows you to manage multiple backend storage systems. You can use a unified namespace to manage the backend storage systems. JindoFSx is compatible with the native access protocols of the backend storage systems. JindoFSx can also provide unified permission management for the systems. JindoFSx supports services such as Alibaba Cloud OSS, Alibaba Cloud OSS-HDFS, Amazon Simple Storage Service (S3), Apache HDFS, and File Storage NAS.

Ecosystem support and plug-ins

Support for JindoSDK
JindoSDK is an OSS client that provides Hadoop SDK and HDFS APIs. The client provides higher performance than open source Hadoop in terms of access to data stored in OSS. JindoSDK also supports JindoFS, JindoFSx, and various cloud object storage services.
Support for JindoShell CLI commands
JindoData supports both HDFS Shell commands and JindoShell CLI commands. This provides extended features, ensures high performance, and optimizes data access operations.
Support for Portable Operating System Interface (POSIX) by using JindoFuse
JindoData enables OSS, JindoFS, and JindoFSx to support POSIX.
Support for data migration by using Jindo DistCp
Jindo DistCp is a tool that helps you migrate data in data centers (HDFS) to a cloud or migrate data across clouds. You can use Jindo DistCp to migrate data in various storage systems to OSS and JindoFS. Jindo DistCp is used in a similar manner as Hadoop DistCp.
Support for JindoTable
JindoTable is a solution developed based on compute engines such as Spark, Hive, and Presto. This solution allows you to manage data of tables.
Plug-ins
Plug-ins such as Flink connectors are supported.