All Products
Search
Document Center

E-MapReduce:JindoData release notes

Last Updated:Mar 26, 2026

JindoData is a data lake storage acceleration suite developed by the Alibaba Cloud open source big data team. It provides comprehensive access acceleration for Alibaba Cloud and industry-standard data lake storage systems across big data and AI ecosystems. This topic describes the features introduced in each JindoData version.

Background

JindoData is an upgraded version of the original Alibaba Cloud EMR SmartData component. For more information, see JindoData (available only to existing users).

JindoData 4.6.x versions

Overview

JindoData 4.6.x introduces smooth migration from Hadoop Distributed File System (HDFS) to OSS-HDFS, significantly simplifying the data migration process. JindoFS adds a file inventory feature to help you understand data distribution and ownership. For performance, JindoFS improves du and count operation throughput through full and incremental server-side optimizations. JindoSDK 4.6.x adds file-level and block-level write verification to improve write stability, and supports a multi-path access protocol so you can reach the same backend path through different protocol modes.

JindoData 4.6.11

Bug fixes

  • JindoSDK: Fixed an issue where JindoCommitter used a legacy mapred API to write data in an Alibaba Cloud EMR Hadoop 2.8.5 environment.

  • JindoTable: Optimized the feature for restoring tables or partitions in Object Storage Service (OSS). You can now specify the number of restoration days. For more information, see Use JindoTable to archive and restore tables or partitions in OSS.

JindoData 4.6.10

Bug fixes and improvements

  • JindoFS: Optimized the pread prefetch logic.

  • JindoSDK: Added support for concurrent commit tasks to improve job commit performance.

  • JindoSDK: Optimized the path rewrite logic.

  • JindoFuse: Fixed an issue that occurred when appending objects.

JindoData 4.6.8

New features

  • JindoFS: Clients can now set the retention period for the recycle bin.

  • JindoSDK: Added support for MALLOC_CONF to optimize memory usage.

  • JindoFuse: Added graceful shutdown support when mounting OSS-HDFS.

  • JindoFSx: Added support for wildcard characters to filter the file list for cache prefetching.

Bug fixes

  • JindoFSx: Fixed an issue where clearing the cache did not take effect.

JindoData 4.6.7

New features and improvements

  • JindoFuse: Added a graceful shutdown mechanism.

  • JindoFuse: Optimized log output.

Bug fixes

  • JindoFuse: Fixed an issue where O_APPEND or O_TRUNC were not supported when mounting OSS.

JindoData 4.6.6

Improvements

  • Optimized the degree of parallelism for distjob and distcp tasks. The maximum degree of parallelism is now capped at the number of tasks.

JindoData 4.6.5

Bug fixes and improvements

  • Added a ServiceLoader for the OSS scheme pointing to JindoOssFileSystem.

  • Optimized exception handling for the isDirectory() method. When called with a wildcard path such as Path *, the method now returns false instead of throwing an IllegalPath exception.

  • Optimized the Hadoop SDK to prevent ConcurrentModificationException in scenarios where Hadoop configurations are modified concurrently.

  • Optimized JindoMagicCommitter retry logic when writing to OSS to handle cases where temporary directories are abnormal or disks are damaged. This improves write success rates and prevents the InvalidPart exception: One or more of the specified parts could not be found or the specified entity tag might not have matched the part's entity tag.

JindoData 4.6.4

New features

JindoData 4.6.4 adds multi-platform support.

  • For supported platforms, see Download JindoData.

  • On the Java platform, deploy multiple jindo-core packages to enable multi-platform support. By default, jindo-core supports mainstream Linux systems. To use it on other platforms, add the corresponding platform extension package.

  • Dependency packages for multi-platform support are available in the JindoData Maven repository. For Maven-based OSS access configurations, see jindosdk_ide_hadoop.md.

Example: deploying across platforms

  • Mainstream Linux: Add jindo-core-4.6.4.jar and jindo-sdk-4.6.4.jar to the classpath.

  • macOS: Add jindo-core-4.6.4.jar, jindo-sdk-4.6.4.jar, and the jindo-core-macos-10_14-x86_64-4.6.4.jar extension package.

Download jindosdk-4.6.10-macos-10_14-x86_64.tar.gz from the Download JindoData page. This package includes jindo-core-4.6.4.jar, jindo-sdk-4.6.4.jar, and jindo-core-macos-10_14-x86_64-4.6.4.jar.

JindoData 4.6.2

Bug fixes and improvements

JindoFS storage system:

  • Fixed an issue where the service became stuck when converting from Standard to Standard in tiered storage.

  • Fixed an issue where the service became stuck due to an empty manifest file generated during tiered storage.

  • Accelerated the execution of tiered storage tasks.

  • Fixed the RootPolicy feature logic.

  • Fixed an issue where the setAcl operation occasionally caused the service to crash.

  • Fixed a low-probability issue where DB manifest files filled up the disk.

  • Fixed the batch metadata import feature of the migration service.

JindoData 4.6.1

New features and improvements

JindoFS storage system:

  • Reduced redundant log output.

  • Fixed incorrect file sizes when exporting metadata inventory for unclosed files.

JindoFSx storage acceleration system:

  • Added automatic cleanup of temporary cache directories.

JindoSDK and tools:

  • Reduced oversized log output.

  • Enabled server-side path optimization for du and count operations by default.

  • Reduced STS token update frequency to prevent throttling from frequent requests.

  • Changed the RAM role name in credential-free URLs to lowercase to prevent token refresh failures in the ECS credential-free service.

JindoData 4.6.0

New features

JindoFS storage system:

  • Supports exporting file inventories from OSS-HDFS to help you understand data distribution and support custom development.

  • Significantly improves du and count performance through full and incremental server-side optimizations.

  • Supports smooth migration from HDFS to OSS-HDFS, simplifying the data migration process.

  • Supports multi-path protocol access. Access the same backend path through different protocols.

JindoFSx storage acceleration system:

  • Fixed an issue where the client exited unexpectedly when writing to the cache.

  • Fixed an issue where the client exited unexpectedly during metrics reporting.

  • Fixed a memory leak when using Ranger.

JindoSDK and tools:

  • Supports CRC and MD5 checksum verification for writes at the file and block levels.

  • Supports the Jindo Sync tool for data synchronization without a Hadoop environment.

  • Supports the OSS-HDFS TensorFlow connector.

JindoData 4.5.x versions

JindoData 4.5.1

Overview

Version 4.5.1 is a minor upgrade to 4.5.0 that includes important fixes and improvements. JindoFS improves service stability and exception handling. Both JindoFS and JindoFSx further refine the adaptive prefetch algorithm for better prefetch efficiency. JindoDistCp includes numerous fixes and optimizations to improve data copy stability. JindoFuse is redesigned from the ground up for significantly higher performance.

New features and improvements

JindoFS storage system:

  • Improved memory usage.

  • Added exception handling and log-based alerting for ASSUME_ROLE errors.

  • Supports updating dynamic AccessKeys during retries.

  • Further improved the adaptive prefetch algorithm for higher prefetch efficiency.

  • Fixed read and write paths for random file write scenarios.

  • Supports the CheckAccess API.

JindoFSx storage acceleration system:

  • Further improved the adaptive prefetch algorithm for higher prefetch efficiency.

  • Supports spaces in paths.

  • Reduced hot spots during multi-replica reads.

JindoSDK and tools:

  • Jindo commands now cover all Hadoop commands.

  • Jindo commands include native HDFS support for significantly improved performance and usability.

  • JindoDistCp supports integration with Alibaba Cloud CloudMonitor.

  • JindoDistCp supports checksum verification for data migrated from OSS to an HDFS path.

  • JindoDistCp supports job splitting parameters.

  • JindoDistCp fixed the error handling logic for source file deletion during copy operations.

  • JindoSDK optimizes memory usage for random reads.

JindoFuse POSIX support:

  • Redesigned using low-level APIs for significantly improved readdir and related operation performance.

  • Fixed an issue where an abnormal program listed the root directory after JindoFSx was mounted.

JindoData 4.5.0

Overview

Version 4.5.0 focuses on metadata operation performance for the JindoFS storage system, delivering significant improvements. JindoFS tiered storage is enhanced to support Infrequent Access (IA) and Cold Archive storage types. Batch write support is added to optimize large-scale extract, transform, and load (ETL) job performance. A Hadoop-independent Java SDK is introduced for SDKs and ecosystem components.

New features and improvements

JindoFS storage system:

  • Optimized metadata operations for significantly improved performance.

  • Enhanced tiered storage to support IA and Cold Archive storage types.

  • Added batch write support to optimize large-scale ETL job performance.

  • Fixed an issue where accessing OSS caused a service exception due to a server-side authorization error.

JindoFSx storage acceleration system:

  • Fixed a file handle leak in the Storage service.

  • Fixed a thread safety issue in client-side metrics reporting.

  • Optimized the performance of recursively creating parent directories.

  • Optimized the performance of the path rewrite feature.

JindoSDK and tools:

  • Supports an adaptive prefetch algorithm for higher prefetch efficiency.

  • Supports atomic rename operations based on Tablestore.

  • JindoDistCp: optimized the diff feature to support outputting diff files.

  • Implemented unified handling for retry errors, resolving client retry failures caused by server IP address changes.

  • Provides a Hadoop-independent Java SDK with functionality comparable to the Hadoop SDK and Object SDK.

JindoFuse POSIX support:

  • Fixed a memory leak caused by list operations when caching is enabled in JindoFSx.

JindoData 4.4.x versions

Overview

JindoFS now includes tiered storage and data archiving features. Leveraging OSS tiered storage capabilities and compatible with HDFS tiered storage policies, this feature lets you assign lower-cost storage policies to infrequently accessed data to reduce total storage costs. JindoFS also adds support for HDFS AuditLog, improving API compatibility, feature parity, and data migration capabilities with Apache HDFS. Rapid data import for OSS and migration from semi-managed JindoFS instances are also improved. The JindoFS features are delivered through the Alibaba Cloud OSS-HDFS service. For more information, see What is the OSS-HDFS service?.

JindoFSx introduces client-side local cache (LocalCache), providing client-only cache acceleration. This significantly improves metadata caching and enhances cache acceleration for Alibaba Cloud NAS.

For SDKs and ecosystem components, performance and throughput are substantially improved across multiple operations. The Object SDK is now supported — compatible with OSS object storage APIs while delivering better performance and seamless JindoFSx cache acceleration integration. The JindoDistJob tool is introduced to support full and incremental migration of file metadata from semi-managed JindoFS, letting you switch to the JindoFS service-based solution without migrating data blocks. JindoDistCp is greatly enhanced to achieve lossless migration from Apache HDFS to the JindoFS service, including file metadata.

New features and improvements

JindoFS storage system:

  • Supports tiered storage and data archiving, compatible with HDFS storage policies.

  • Supports BatchImport for importing file data in batches.

  • Supports HDFS AuditLog.

  • Supports Concat and SymLink APIs.

  • Optimized the background cleanup process for file data.

  • Optimized the performance of Lease and Lock related operations.

JindoFSx storage acceleration system:

  • Supports cache plugins with a client-side cache mode.

  • Supports plugin-based authorization. KRB5 and SASL library dependencies are not required by default.

  • Significantly improved metadata cache performance and cache acceleration for Alibaba Cloud NAS.

JindoSDK and tools:

  • Improved HTTPS support and enhanced fault tolerance in weak network environments.

  • Removed the default dependency on KRB5 and SASL libraries for easier deployment.

  • Added support for OSS object storage APIs, improving operation performance and integrating with JindoFSx cache acceleration.

  • Added the JindoDistJob tool for rapid migration of data from semi-managed JindoFS in Block mode to the JindoFS service.

  • JindoDistCp now supports lossless migration of file metadata from Apache HDFS to the JindoFS service.

JindoFuse POSIX support:

  • Optimized sequential read performance for large files.

JindoData 4.3.x versions

Overview

JindoData 4.3.0 adds full multicloud architecture support, delivering a data lake storage solution that spans multiple clouds, storage systems, acceleration extensions, protocols, and programming languages. POSIX support in JindoFS is significantly improved. JindoFSx gains its first Kerberos+Ranger security extension support. JindoSDK and ecosystem tools see substantial improvements in test coverage.

New features and improvements

JindoSDK and tools:

  • Supports multicloud storage including Amazon S3, COS, and OBS.

  • Provides the JindoTable tool.

  • Optimized the Flink connector plugin.

  • Improved JindoDistCp.

JindoFSx storage acceleration system:

  • Supports multicloud storage including Amazon S3, COS, and OBS.

  • Optimized data caching and metadata caching.

  • Supports the Kerberos+Ranger authorization solution.

  • Significantly improved observability metrics.

  • Integrated with Fluid.

JindoFS storage system:

  • Supports POSIX Lock and Fallocate capabilities.

  • Supports upgrades for clusters running older JindoFS versions in Block mode.

JindoFuse POSIX support:

  • Added support for XAttr-related APIs: Setxattr, Getxattr, Listxattr, and Removexattr.

  • Supports POSIX Lock and Fallocate capabilities.

  • Supports appendable objects in OSS: Append, Flush, and read-while-writing.

JindoData 4.2.x versions

Overview

JindoData 4.2.0 significantly improves the JindoFSx storage acceleration system. It adds cache acceleration for Apache HDFS and Alibaba Cloud NAS storage, and enhances JindoFuse, JindoDistCp, and JindoTable.

New features and improvements

JindoFSx storage acceleration system:

  • Supports transparent cache acceleration for Alibaba Cloud Apache HDFS (the hdfs:// scheme is unchanged) and unified mount acceleration (fsx://).

  • Supports unified mount acceleration (fsx://) for Alibaba Cloud NAS storage.

  • Fully integrates with the Alibaba Cloud OSS-HDFS service (JindoFS service) with improved write path support.

JindoSDK and tools:

  • Introduces the first C/C++ version of JindoSDK, providing POSIX-like API methods.

  • Supports JindoFuse POSIX. JindoFuse is rebuilt on top of the C/C++ JindoSDK.

  • Supports JindoDistCp data migration. JindoDistCp is refactored for improved usability and robustness, with less-used 3.x features removed.

  • Supports the JindoTable tool. JindoTable is refactored for improved usability and robustness, with less-used 3.x features removed.

JindoData 4.1.x versions

Overview

JindoData 4.1.0 introduces random writes on the Alibaba Cloud OSS-HDFS service (JindoFS service). It also launches the JindoFSx storage acceleration system, which supports distributed caching for native Alibaba Cloud OSS and the OSS-HDFS service.

New features

JindoFS storage system:

  • Supports random file writes, allowing files to be modified after creation.

  • Supports the HDFS recycle bin. The backend cleans up files based on their expiration time.

  • Improved the HDFS snapshot feature to support random file modifications.

  • Improved the directory deletion mechanism for significantly higher operation performance.

  • Implemented the NsWorker framework, which lets the global meta service offload heavy processing to Follower and Learner nodes.

JindoShell CLI support:

  • Set the expiration time for the HDFS recycle bin using commands.

  • Improved the dumpFile command to output information about random write files.

JindoFuse POSIX support:

  • Supports random file modification (Seek and Write).

JindoFSx storage acceleration system:

  • Supports transparent cache acceleration for Alibaba Cloud OSS (the oss:// scheme is unchanged).

  • Supports transparent cache acceleration for the Alibaba Cloud OSS-HDFS service (JindoFS service) (the oss:// scheme is unchanged).

  • Provides a unified namespace feature to mount OSS or OSS-HDFS to the same namespace for unified access using the fsx:// prefix.

  • Supports cache acceleration for large-scale file metadata.

  • Supports acceleration for small file training workloads.

  • Supports P2P acceleration for improved cache read performance when many training nodes prefetch and load model files simultaneously.

JindoSDK Hadoop support:

  • Provides JindoOssFileSystem for transparent cache acceleration for OSS and OSS-HDFS.

  • Provides JindoFsxFileSystem for usage in unified namespace mode.

JindoShell CLI support:

  • Supports JindoFSx data cache commands.

  • Supports JindoFSx metadata cache commands.

  • Supports JindoFSx unified namespace management commands.

JindoFuse POSIX support:

  • Supports mounting an oss:// path with Fuse for JindoFSx cache reads and writes.

  • Supports mounting an fsx:// path with Fuse for JindoFSx cache reads and writes.

JindoData 4.0.x versions

Overview

JindoData 4.0.0 is the first release after the architecture upgrade of the original Alibaba Cloud EMR SmartData component (which reached major version 3.8.0). This version focuses on integration with Alibaba Cloud OSS and the OSS-HDFS service (JindoFS service).

The JindoFSx storage acceleration system is not included in JindoData 4.0.0.

New features

Alibaba Cloud OSS service:

JindoSDK Hadoop support:

  • Provides a Java Hadoop SDK for Alibaba Cloud OSS that is fully compatible with the Hadoop OSS connector and delivers significantly higher performance.

  • Supports multiple credential provider methods: configuration, ECS Role, and the EMR credential-free mechanism.

  • Supports archiving upon write, including Archive and Deep Cold Archive storage types.

JindoShell CLI support:

  • Provides additional command extensions for Hadoop and HDFS Shell for OSS-oriented operations.

  • Supports the ls2 extended command, which shows the storage status (Standard, IA, or Archive) of an object in OSS alongside the standard ls output.

  • Supports the archive command for directory-level archiving operations.

  • Supports the restore command for directory-level restoration operations.

JindoFuse POSIX support:

  • An optimized Fuse client for Alibaba Cloud OSS. The native code implementation significantly improves performance.

JindoDistCp data migration:

  • Supports migrating data from self-managed HDFS clusters to Alibaba Cloud OSS, with optimizations for large files and many small files.

Alibaba Cloud OSS-HDFS service (JindoFS service):

JindoFS service:

  • Adds a new bucket storage option for Alibaba Cloud OSS products. It provides metadata acceleration, is binary compatible, and is fully aligned with Apache HDFS features, supporting lift-and-shift migration from HDFS.

  • Natively supports file system directory semantics with significantly optimized directory operations, including atomic and millisecond-level rename for extra-large directories.

  • Natively supports file system file semantics: HDFS write leases, one-write-multiple-reads, and read-while-writing.

  • Supports append, flush, sync, and truncate operations on files.

  • Supports HDFS snapshots with a nearly unlimited number of snapshots, facilitating data backup, disaster recovery, and restoration.

  • Supports file permissions. Import and set user group information (UserGroupsMapping) using JindoShell commands.

  • Supports the Hadoop proxy user access control mechanism.

JindoSDK Hadoop support:

  • Built-in access to the Alibaba Cloud OSS-HDFS service (JindoFS service), providing a comprehensive HDFS API access experience.

JindoShell CLI support:

  • Provides additional command extensions for Hadoop and HDFS Shell for OSS-HDFS service operations.

  • Import and set user group information (UserGroupsMapping) using commands.

  • Set Hadoop proxy user rules using commands.

JindoFuse POSIX support:

  • An optimized Fuse client for the Alibaba Cloud OSS-HDFS service (JindoFS service). The full native code implementation significantly improves performance.

Known issues

  • JindoSDK does not support writing files larger than 80 GB to OSS.

  • JindoSDK does not support writing to OSS in append mode.

  • JindoSDK does not support client-based encryption for OSS.

  • JindoSDK does not support older JindoFS versions in Block mode or Cache mode.

  • The Alibaba Cloud OSS-HDFS service (JindoFS service) does not support system upgrades from older JindoFS versions in Block mode. Use the JindoDistCp migration tool to migrate data from the old system to the new service.