Hadoop compatible file storage-Apsara File Storage for HDFS - Object Storage Service

Apsara File Storage for HDFS is a cloud file storage service that exposes the same interface as a Hadoop Distributed File System (HDFS). MapReduce, Hive, Spark, and Flink jobs connect to it as the default file system without code changes or recompilation.

Key capabilities:

Unlimited capacity with linear performance scaling
High-throughput, high IOPS, low-latency access from Elastic Compute Service (ECS) instances and Container Service
A unique namespace shared across multiple compute nodes
99.999999999% (eleven 9's) data durability
Network isolation, security groups, and RAM user authorization
Pay-as-you-go billing by default, with subscription resource plans for additional discounts

Use cases

Apsara File Storage for HDFS is suited for workloads that require sustained high throughput, such as big data analytics and machine learning. ECS instances and other compute resources access stored data directly—no need to copy data to local storage before processing. Deploy Hadoop or machine learning applications across multiple compute nodes and run online or offline computing jobs against the same file system. Export results back to the file system for permanent storage.

Performance

Throughput is the primary performance metric. The practical throughput of a file system is bounded by the maximum bandwidth of the attached ECS instance. For example, an ECS instance with 1.5 Gbit/s of bandwidth supports a maximum file system throughput of 187.5 Mbit/s. Throughput scales linearly with file system capacity: provisioning more capacity directly increases available throughput.

Data durability and availability

Apsara File Storage for HDFS stores multiple replicas of every file. Replicas are placed on devices isolated across different fault domains for geo-redundancy, providing 99.999999999% (eleven 9's) data durability.

Security

Apsara File Storage for HDFS protects data using five complementary mechanisms:

Mechanism	Description
Network isolation	Isolates file systems within a Virtual Private Cloud (VPC)
Classic network user isolation	Controls access in classic network environments
File system permission control	Standard permission control for file systems
Security group access control	Restricts access at the network level using security groups
RAM user authorization	Grants fine-grained permissions to RAM users

SDK and console

Apsara File Storage for HDFS provides two management interfaces:

SDK — The Apsara File Storage for HDFS SDK for Java (aliyun-sdk-dfs-x.y.z.jar) implements Hadoop-compatible file system operations. Applications built on MapReduce, Hive, Spark, and Flink can use the SDK to switch to Apsara File Storage for HDFS as the default file system without modifying or recompiling code.
Console — Use the Apsara File Storage for HDFS console to create and manage file systems through a graphical web interface.

Note During public preview, only the file system SDK is available.

Billing

Apsara File Storage for HDFS is billed based on file system capacity and preset throughput.

Billing method	Description
Pay-as-you-go	Billed hourly based on used resources. The default option.
Subscription (resource plan)	Purchase capacity in advance for a lower per-unit price.