This topic describes the use methods, benefits, and features of Alibaba Cloud Object Storage Service (OSS) and OSS-HDFS.

Background information

OSS is a secure, cost-effective, and highly reliable cloud storage service that allows you to store large amounts of data. OSS is designed to provide 99.9999999999% (twelve 9's) data durability and 99.995% data availability. OSS provides multiple storage classes to help you manage and reduce storage costs. For more information, see What is OSS?.

OSS-HDFS (JindoFS) is a cloud-native data lake storage service. OSS-HDFS provides centralized metadata management capabilities and is fully compatible with Hadoop Distributed File System (HDFS) API. OSS-HDFS also supports Portable Operating System Interface (POSIX). You can use OSS-HDFS to manage data in data lake-based computing scenarios in the big data and AI fields. For more information, see Overview of the OSS-HDFS service.

JindoData is a suite developed by the Alibaba Cloud big data team for storage acceleration of data lake systems. JindoData provides end-to-end solutions for data lake systems of Alibaba Cloud and other vendors in big data and AI scenarios. JindoData is built on top of a unified architecture and kernel. JindoData provides the following components: JindoFS (the original JindoFS in block storage mode), JindoFSx (the original JindoFS in cache mode), and JindoSDK. JindoData also provides fully compatible tools, such as JindoFuse and Jindo DistCp, and plug-ins. For more information, see Overview.

Use methods

  • By default, JindoSDK is deployed in E-MapReduce (EMR) clusters. You can use JindoSDK to access OSS or OSS-HDFS.
  • In other Alibaba Cloud services, you can download the latest version of the JindoSDK JAR package, install JindoSDK, and then use JindoSDK. For more information, see Deploy JindoSDK in an environment other than EMR.

Benefits

OSS or OSS-HDFS provides the following benefits when they are used as an underlying storage service:
  • Ready to use. OSS and OSS-HDFS are cloud-native storage services. You can use OSS and OSS-HDFS by calling RESTful APIs without the need to deploy the services. By default, JindoSDK is deployed in EMR clusters. You can use JindoSDK to access OSS or OSS-HDFS.
  • Cost-effective. You can use OSS or OSS-HDFS to reduce storage costs. OSS and OSS-HDFS provide various storage classes, such as Infrequent Access (IA), Archive, and Cold Archive, that you can use to store data. This reduces the storage costs of cold data.
  • High expandability. OSS and OSS-HDFS are highly expandable. The storage space of OSS or OSS-HDFS is not limited by hard disk capacity. You do not need to manually expand storage capacity.

Features

The following table describes the differences between the features of OSS and OSS-HDFS.
ScenarioFeatureOSSOSS-HDFS
Big data scenario (Hadoop)Operations for files and directories, and related operationsSupportedSupported
Support for granting permissions on files and directoriesNot supportedSupported
Atomic operations for directories and rename operationsSupported (poor performance)Supported (millisecond-granularity rename operations)
Support for specifying a point in time by using setTimesNot supportedSupported
Extended attributes (XAttrs)Not supportedSupported
ACLNot supportedSupported
Support for accelerating on-premises read cachingSupportedSupported
SnapshotsNot supportedSupported
File-related operations, such as flush, sync, truncate, and appendNot supportedSupported
Truncate operations on filesNot supportedSupported
Checksum verificationSupportedSupported
Automatic clean-up of the HDFS recycle binNot supportedSupported
AI scenario (POSIX)Metadata consistencyWeakStrong
File-related operations, such as flush, sync, truncate, and appendSupported (However, limits on the operations exist. For information, see Limits.)Supported
Truncate operations on filesNot supportedSupported
Random writes to filesNot supportedSupported