Apsara File Storage for HDFS is a cloud file storage service that exposes the same interface as a Hadoop Distributed File System (HDFS). MapReduce, Hive, Spark, and Flink jobs connect to it as the default file system without code changes or recompilation.
Key capabilities:
Unlimited capacity with linear performance scaling
High-throughput, high IOPS, low-latency access from Elastic Compute Service (ECS) instances and Container Service
A unique namespace shared across multiple compute nodes
99.999999999% (eleven 9's) data durability
Network isolation, security groups, and RAM user authorization
Pay-as-you-go billing by default, with subscription resource plans for additional discounts
Use cases
Apsara File Storage for HDFS is suited for workloads that require sustained high throughput, such as big data analytics and machine learning. ECS instances and other compute resources access stored data directly—no need to copy data to local storage before processing. Deploy Hadoop or machine learning applications across multiple compute nodes and run online or offline computing jobs against the same file system. Export results back to the file system for permanent storage.
Performance
Throughput is the primary performance metric. The practical throughput of a file system is bounded by the maximum bandwidth of the attached ECS instance. For example, an ECS instance with 1.5 Gbit/s of bandwidth supports a maximum file system throughput of 187.5 Mbit/s. Throughput scales linearly with file system capacity: provisioning more capacity directly increases available throughput.
Data durability and availability
Apsara File Storage for HDFS stores multiple replicas of every file. Replicas are placed on devices isolated across different fault domains for geo-redundancy, providing 99.999999999% (eleven 9's) data durability.
Security
Apsara File Storage for HDFS protects data using five complementary mechanisms:
| Mechanism | Description |
|---|---|
| Network isolation | Isolates file systems within a Virtual Private Cloud (VPC) |
| Classic network user isolation | Controls access in classic network environments |
| File system permission control | Standard permission control for file systems |
| Security group access control | Restricts access at the network level using security groups |
| RAM user authorization | Grants fine-grained permissions to RAM users |
SDK and console
Apsara File Storage for HDFS provides two management interfaces:
SDK — The Apsara File Storage for HDFS SDK for Java (
aliyun-sdk-dfs-x.y.z.jar) implements Hadoop-compatible file system operations. Applications built on MapReduce, Hive, Spark, and Flink can use the SDK to switch to Apsara File Storage for HDFS as the default file system without modifying or recompiling code.Console — Use the Apsara File Storage for HDFS console to create and manage file systems through a graphical web interface.
Billing
Apsara File Storage for HDFS is billed based on file system capacity and preset throughput.
| Billing method | Description |
|---|---|
| Pay-as-you-go | Billed hourly based on used resources. The default option. |
| Subscription (resource plan) | Purchase capacity in advance for a lower per-unit price. |