SmartData is a core self-developed component of E-MapReduce (EMR). SmartData provides centralized storage, optimized caching, and accelerated cache computing for EMR computing engines and extends storage features. SmartData is used in data access, data governance, and data security scenarios.

Composition of SmartData:
  • JindoFS core subsystem: provides caching acceleration and optimization for various remote storage systems.
  • JindoTable core subsystem: provides table- and partition-level optimization and governance for data sources, such as a Hive warehouse. For more information, see Use JindoTable.
  • JindoManager: provides a web UI to manage JindoFS and JindoTable services and features. For example, you can view the metrics of the cached data of files and tables.
  • JindoSDK: provides a unified SDK for various open source computing engines of EMR. It supports Java, C, C++, and Python programming languages and provides a variety of access interfaces and API operations. The interfaces include HCFS interfaces, POSIX interfaces, and table-related interfaces.
  • Toolsets: include Jindo tools and the data copy tool Jindo DistCp.
  • Various connectors: include the Hadoop connector, Flink connector, and TensorFlow connector. Kite SDK, Apache Beams, Flume, Sqoop, and Kafka are supported.
  • Bigboot infrastructure: provides millisecond-scale process monitoring and log cleaning features for the services that are supported by SmartData.

The data sources that are supported by JindoFS and JindoTable include Alibaba Cloud OSS, Apache Hadoop HDFS, Hive warehouse, and Alibaba Cloud MaxCompute.

SmartData is independently developed and released. For more information about SmartData versions, see Overview.