All Products
Search
Document Center

E-MapReduce:SmartData (available only for existing users)

Last Updated:Mar 03, 2025

SmartData is a core self-developed component of E-MapReduce (EMR). SmartData optimizes storage, caching, and computing for various EMR compute engines in a centralized manner and extends storage features. SmartData is used in data access, data governance, and data security scenarios.

The following figure shows the position of SmartData in EMR. SmartData

Composition of SmartData:

  • JindoFS core subsystem: provides caching and cache-based acceleration features for various remote storage systems. For more information, see Overview and usage of JindoFS.

  • JindoTable core subsystem: provides table- and partition-level optimization and governance for data sources, such as a Hive warehouse. For more information, see Use JindoTable.

  • JindoManager: provides a web UI to manage JindoFS and JindoTable services and features. For example, you can view the metrics of the cached files and tables.

  • JindoSDK: provides a unified SDK for various open source compute engines of EMR. It supports Java, C, C++, and Python programming languages and provides a variety of access interfaces and APIs, such as HCFS interfaces, Portable Operating System Interface (POSIX) interfaces, and table-related interfaces.

  • Toolset: includes Jindo tools and the data copy tool Jindo DistCp.

  • Various connectors: include the Hadoop connector, Flink connector, and TensorFlow connector. Kite SDK, Apache Beams, Flume, Sqoop, and Kafka are supported.

The data sources that are supported by JindoFS and JindoTable include Alibaba Cloud OSS, Apache Hadoop HDFS, Hive, and Alibaba Cloud MaxCompute.

SmartData is independently developed and released. For more information about SmartData versions, see Release version.