SmartData is a storage service for the E-MapReduce (EMR) Jindo engine. SmartData provides centralized storage and optimized caching and computing for EMR computing engines and extends storage features. SmartData consists of JindoFS, JindoTable, and related tools. This topic describes the updates in SmartData 3.4.X.

OSS storage scalability on JindoFS

  • The Object Storage Service (OSS) Recoverable OutputStream feature is added. It is implemented over the Flush and Recover APIs. This feature is suitable for highly reliable write scenarios, such as Flume-based write.
  • The efficiency of rename operations on the OSS server is improved.
  • The performance of list operations on directories in OSS with bucket versioning enabled is optimized. This prevents list operations on directories from being affected by a large number of temporary files.
  • The performance of Jindo OSS Magic Committer for OSS with bucket versioning enabled is optimized. Jindo OSS Direct Committer is added.
  • The credential provider framework is enhanced. JindoCommonCredentialsProvider is added.
  • The file creation performance is optimized. A redundancy check is no longer performed when you write data to OSS.

JindoFS-based storage optimization

Data encryption is supported in JindoFS in block storage mode. You can use Alibaba Cloud Key Management Service (KMS) and the Advanced Encryption Standard (AES) algorithm to encrypt data.

JindoTable-based computing optimization

The native Optimized Row Columnar (ORC) reader of JindoTable is enhanced.

Other JindoFS tools

Jindo DistCp is enhanced. Incremental migration is optimized. For example, when you migrate HDFS data to OSS, checksums can be implemented on the source and destination paths.

Ecosystem support for JindoFS

Jindo OSS SDK for Python is added. The SDK supports basic OSS operations and is compatible with OSS2 Python libraries.