All Products
Search
Document Center

Batch Compute:Access data on NAS

Last Updated:Feb 20, 2024

Alibaba Cloud Network Attached Storage (NAS) is a file storage service oriented towards Alibaba Cloud ECS instances, HPC, and Docker among other computing nodes. It provides standard file access protocols, so you do not have to modify existing applications. This gives you a distributed file system with unlimited capacity and performance scaling, a single namespace, multi-party sharing, high reliability, and high availability.

Currently, Batch Compute users can use Alibaba Cloud NAS by configuring the user NAS information in APIs and SDKs.

1. Concepts

Note

For security, performance, and convenience purposes, Batch Compute only allows you to use Alibaba Cloud NAS in VPC currently. The following are some basic concepts of VPC and Alibaba Cloud NAS:

  • VPC: A Virtual Private Cloud (VPC) is built and customized based on Alibaba Cloud. Full logical isolation is achieved between different VPCs. You can create and manage cloud product instances, such as ECS, Server Load Balancer, ApsaraDB for RDS, and NAS, in your own VPC.

  • File system: A file system is a basic unit for you to buy NAS instances. The storage capacity, price, quota, and mount entry are all subject to the file system. For more information, see How to create a file system.

  • Mount entry: A mount entry is an access destination address of the file system instance in a VPC or a classic network. Each mount entry is mapped to a domain name. When you use the “mount” command, you can specify the mount entry domain name to mount the corresponding NAS file system to a local place. For more information, see How to create a mount entry. Because Batch Compute only allows you to use NAS in a VPC, you must set the mount entry type to VPC when creating a mount entry.

The preceding concepts are important for you to use NAS in Batch Compute. For more information about how to buy and use NAS and VPC instances, see NAS official documentation and VPC official documentation.

2. Use NAS instances

Note

To use an NAS instance in Batch Compute, follow these steps:

2.1. Data preparation

Note

The VPC name, file system name, and mount entry name in the following example are only for your reference. The specific names are subject to your setting.

  • Create a VPC: You can select a VPC from existing VPCs or create a VPC by referencing to the documentation. The ID is automatically generated by the VPC, which is assumed to be vpc-m5egk1jsm3qkbxxxxxxxx.

  • Create a VSwitch: You can select a vSwitch from existing vSwitches or create a vSwitch by referencing to the documentation. The ID is automatically generated by the VPC, which is assumed to be vsw-2zeue3c2rciybxxxxxxxx. We recommend that you set the network segment of the vSwitch to be the subnet of the VPC.

  • Create a NAS file system: Create a file system on the NAS console according to the previous documentation. The ID is automatically generated by NAS, which is assumed to be 0266e49fea. Currently, Batch Compute supports only NFS-type NAS file systems.

  • Create a mount entry: Set the mount entry type to VOC, and select the VPC and vSwitch created preceding. If no special requirements exist, select the default permission group. After the mount entry is created, view the mounting address on the mount entry management page. The address is generally in a format similar to 0266e49fea-yio75.cn-beijing.nas.aliyuncs.com.

2.2. Use your VPC to create a cluster

Note

If your program wants to access files saved in the NAS instance at the Batch Compute cluster node, you have to specify VpcId and CidrBlock when creating the Batch Compute cluster and the Batch Compute cluster must be in your VPC.

  • VpcId indicates the ID of the VPC where your NAS mount entry is located. You must replace the ID in the example with your actual ID.

  • CidrBlock indicates the private network scope of the Batch Compute cluster in the VPC. It is generally the subnet segment of the VPC. You can enter an appropriate value based on your network planning. For more information, see Network planning.

2.2.1. Use Python SDK

# The following is a network planning for VPC and cluster when a cluster is created.

cluster_desc['Configs']['Networks']['VPC']['VpcId'] = 'vpc-m5egk1jsm3qkbxxxxxxxx'
cluster_desc['Configs']['NetWorks']['VPC']['CidrBlock'] = '192.168.0.0/20'

# You can specify your VPC and cluster network planning when submitting the AutoJob.
job_desc['DAG']['Tasks']['my-task']['AutoCluster']['Configs']['Networks']['VPC']['VpcId'] = 'vpc-m5egk1jsm3qkbxxxxxxxx'
job_desc['DAG']['Tasks']['my-task']['AutoCluster']['Configs']['Networks']['VPC']['CidrBlock'] = '192.168.0.0/20'

2.2.2. Use Java SDK

Configs configs = new Configs();
Networks networks = new Networks();
VPC vpc = new VPC();
vpc.setCidrBlock("192.168.0.0/20");
vpc.setVpcId("vpc-m5egk1jsm3qkbxxxxxxxx");
networks.setVpc(vpc);
configs.setNetworks(networks);
//Species the NAS file system and permission group information when a cluster is created.
ClusterDescription clusterDescription = new ClusterDescription();
clusterDescription.setConfigs(configs);

//You can specify your NAS file system and permission group information when submitting the AutoJob.
AutoCluster autoCluster = new AutoCluster();
autoCluster.setConfigs(configs);

2.3. Specify a mounting entry

Note

Batch Compute automatically mounts the NAS mount entry to a local directory according to your mounting information. You have to specify the mapping between the NAS mount entry and the local directory in the Mounts.Entries in the cluster description or job description.

  • Source: The source is prefixed by nas://, which is followed by the NAS mount entry and NAS file system directory information. Pay attention to the following points:

    • Batch Compute supports multiple sources for mounting. To distinguish sources, the NFS-type NAS file systems have to be prefixed by nas://currently.

    • In Windows operating systems, due to the action of Windows NFS client, an exclamation point (!) must be added at the end.

    • You can manually mount the file system in your program (the step of creating a cluster in your VPC is required). For more information, see Manual mount.

  • Destination indicates the local directory of the Batch Compute cluster, which is automatically created by Batch Compute. You do not need to create this directory in advance.

  • WriteSupport indicates whether the write operation is supported. If you set this parameter to False, the distributed cache function unique to Batch Compute is used to improve the NAS access performance.

2.3.1. Use Python SDK

# Cluster-level mounting
# For Linux
cluster_desc['Configs']['Mounts']['Entries'] = {
    'Source': 'nas://0266e49fea-yio75.cn-beijing.nas.aliyuncs.com:/',
    'Destination': '/home/admin/mydir/',
    'WriteSupport': true,
}

# For Windows
cluster_desc['Configs']['Mounts']['Entries'] = {
    'Source': 'nas://0266e49fea-yio75.cn-beijing.nas.aliyuncs.com:/!',
    'Destination': 'E:',
    'WriteSupport': true,
}

# Job-level mounting
# For Linux
job_desc['DAG']['Tasks']['my-task']['Mounts']['Entries'] = {
    'Source': 'nas://0266e49fea-yio75.cn-beijing.nas.aliyuncs.com:/',
    'Destination': '/home/admin/mydir/',
    'WriteSupport': true,
}


# For Windows
job_desc['DAG']['Tasks']['my-task']['Mounts']['Entries'] = {
    'Source': 'nas://0266e49fea-yio75.cn-beijing.nas.aliyuncs.com:/!',
    'Destination': 'E:',
    'WriteSupport': true,
}

2.3.2. Use Java SDK

MountEntry mountEntry = new MountEntry();
mountEntry.setSource("nas://0266e49fea-yio75.cn-beijing.nas.aliyuncs.com:/");
mountEntry.setDestination("/home/admin/mydir/");
mountEntry.setWriteSupport(true);
mounts.addEntries(mountEntry);

3. NAS mounting precautions

3.1. Permission group capacity

Currently, a single NAS permission group supports up to 300 rules. Batch Compute supports only classic networks. One VM corresponds to one permission rule. We recommend that the total number of NAS cluster instances does not exceed the quota; otherwise, the behavior is undefined.

3.2. NAS service billing

No additional fees are collected for NAS service use in Batch Compute. The NAS service is billed. For more information about NAS service billing rules and price, see NAS billing rules and price.

3.3. Priority of Cluster Mounts and Job Mounts

As you may find, you are allowed to specify to mount a NAS file system to local directory in the Mounts field at both the cluster creation and the job submission stages. Currently, the priority of Cluster Mounts and Job Mounts in Batch Compute is as follows:

  • Mount entries in Job Mounts tasks overwrite those in Cluster Mounts (including AutoCluster) tasks, but file system and permission group information in Cluster Mounts tasks is not affected.

  • After the job is finished, mount entries in Cluster Mounts tasks are not recovered.