All Products
Search
Document Center

MaxCompute:External volume operations

Last Updated:Jan 25, 2024

In MaxCompute, external volumes are used as a distributed file system and used to store unstructured data. You can create external volumes to use the MaxCompute engine to query and process data of files stored in Object Storage Service (OSS). This way, you do not need to import the data into MaxCompute tables. This helps reduce data redundancy and transmission overheads. This topic describes common operations that you can perform on external volumes.

The following table describes common operations that you can perform on external volumes.

Operation

Description

Authorized user

Operation platform

Create an external volume

Creates an external volume in a project.

  • Owner of the external volume

  • Project owner

  • Users who are assigned the Super_Administrator or Admin role

View the directory structure of an external volume

Views the directory structure of an external volume.

Delete an external volume

Deletes an external volume.

Prerequisites

  • The MaxCompute client V0.43.0 or later is installed. For more information, see MaxCompute client (odpscmd). You can also run commands provided in this topic on the DataStudio page or on the SQL Query page of the DataWorks console. To run the commands in the DataWorks console, you must make sure that the version of the MaxCompute client that is integrated with DataStudio or SQL Query must be V0.43.2 or later. You can run the Show version; command on the DataStudio page or on the SQL Query page to query the MaxCompute client version. For more information, see Use MaxCompute in DataWorks.

  • If you use the SDK for Java, the version of the SDK for Java must be V0.43.0 or later.

  • An application for trial use of the external volume feature is submitted, and the application is approved.

    Before you use the external volume feature, you must submit an application for enabling this feature at the project level. For more information, see Apply for trial use of new features.

  • The Alibaba Cloud account or RAM user is granted access permissions on OSS. For more information about authorization, see STS authorization for OSS.

Create an external volume

Syntax

vfs -create <volume_name>  
    -storage_provider <oss> 
    -url <oss://oss_endpoint/bucket/path>
    -acd <true|false>
    -role_arn <arn:aliyun:xxx/aliyunodpsdefaultrole> 

The following table describes the parameters.

Parameter

Required

Description

volume_name

Yes

The name of the external volume that you want to create.

storage_provider

Yes

The storage provider. Only OSS is supported. Therefore, you must set this parameter to oss.

url

Yes

The OSS directory in which data files are stored. The OSS directory is in the oss://<oss_endpoint>/<Bucket name>/<OSS directory name> format.

Important

You must specify both the names of the bucket and the level-2 directory for the url parameter.

  • oss_endpoint: the endpoint of OSS. You must use an internal endpoint of OSS to prevent extra fees that are incurred by Internet traffic, such as oss://oss-cn-beijing-internal.aliyuncs.com/xxx. For more information about the internal endpoints of OSS, see Regions and endpoints.

    Note

    We recommend that the OSS bucket for storing data files is deployed in the same region as your MaxCompute project. MaxCompute can be deployed only in some regions. Therefore, cross-region data connectivity issues may occur.

  • Bucket name: the name of the OSS bucket. For more information about bucket names, see List buckets.

  • OSS directory name: The directory name does not need to be followed by the file name.

acd

No

Specifies whether to automatically create a directory if the directory does not exist.

Valid values:

  • false: If the directory does not exist, an error is reported, indicating that the system fails to create the external volume. This is the default value.

  • true: If the directory does not exist, the system automatically creates the directory based on role_arn.

Note

If the acd parameter is set to true when you create an external volume and the specified directory does not exist, MaxCompute uses permissions on OSS to create a directory. After the directory is created, MaxCompute does not delete the directory regardless of whether the external volume is successfully created. If the acd parameter is set to true when you create an external volume and the specified directory already exists, MaxCompute directly uses the directory instead of creating another directory.

role_arn

Yes

The Alibaba Cloud Resource Name (ARN) of the RAM role that has the permissions to access OSS. For more information about how to obtain the ARN, see Use temporary credentials provided by STS to access OSS.

The path of the created external volume is in the odps://[project_name]/[volume_name] format. project_name specifies the name of the MaxCompute project. volume_name specifies the name of the external volume. This path can be used by the Spark engine and MapReduce tasks.

Examples

Create an external volume named test_ext_l.

vfs -create test_ext_l -storage_provider oss -url oss://oss-cn-hangzhou-internal.aliyuncs.com/test/ex_volume/ -role_arn acs:ram::xxxxxxx:role/aliyunodpsdefaultrole;

View the list of external volumes and the directory structure of an external volume

Syntax

-- View the list of external volumes.
vfs -ls /;

-- View the directory structure of an external volume.
vfs -ls [-R] /<volume_name>; 

The following table describes the parameters.

Parameter

Required

Description

volume_name

Yes

The name of the external volume that you want to view.

Examples

  • View the list of external volumes.

    vfs -ls /;

    Sample response:

    > vfs -ls /;
    	Found 2 items
    	drwxrwxrwx - 0 2023-03-11 12:06 /test_ext_l -> oss://oss-cn-shanghai-internal.aliyuncs.com/test/ex_volume
    	drwxrwxrwx - 0 2023-03-21 07:33 /myfirst_volume4 -> oss://oss-cn-shanghai-internal.aliyuncs.com/paristech/data

    If a user does not have permissions on an external volume, no information is displayed in the returned result. For example, a user named dev01 does not have permissions on the myfirst_volume4 external volume. If the user dev01 wants to query data from the myfirst_volume4 external volume, you must run the following command to grant the user dev01 the Read permission on the myfirst_volume4 external volume:

    grant Read on volume myfirst_volume4 to RAM$xxxxxx:dev01;
    Note

    The following permissions on external volumes can be granted: Read, Write, and CreateVolume.

  • View the directory structure of an external volume named test_ext_l.

    vfs -ls -R /test_ext_l;

    Sample response:

    drwxrwxrwx - 0 2023-03-27 07:31 /test_ext_l/test -> oss://oss-cn-hangzhou-internal.aliyuncs.com/test/ex_volume/test

Delete an external volume

Syntax

  • Syntax 1:

    vfs -rm -r /<volume_name> 
  • Syntax 2:

    vfs -rmv /<volume_name>

The following table describes the parameters.

Parameter

Required

Description

volume_name

Yes

The name of the external volume that you want to delete.

Examples

Delete an external volume named test_ext_l.

vfs -rm -r /test_ext_l; 

References

  • For more information about how to manage external volumes by using SDKs, see Manage external volumes by using SDKs.

  • In MaxCompute, you can create an external volume and mount the external volume to an OSS path. Then, you can use the MaxCompute permission management system to control access to the external volume in a fine-grained manner. You can also use the MaxCompute engine to process data of files stored in the external volume. For more information about the examples on how to use external volumes, see Use MaxCompute external volumes to process unstructured data.