All Products
Search
Document Center

Batch Compute:Access data on OSS

Last Updated:Feb 20, 2024

This article explains your question such as how can you mount 10 GB data to OSS Bucket to a directory of a virtual machine (VM), so that you can use the data on Batch Compute accessing a local file.

Assume that the 10 GB data is saved in the directory oss://mybucket/mydir/ in your OSS Bucket.

You can mount this OSS directory to the directory /home/admin/mydir/ which is treated as a local directory by the task program.

  • NOTE: If a Windows image is used, the directory can only be mounted to a drive (such as “E:”), but cannot be mounted to a folder.

OSS APIs are different from conventional file system APIs. To support quick migration of conventional programs, Batch Compute allows you to directly mount your OSS directory to a local file system of the VM, such that the application can access data in your OSS Bucket without programming your OSS Bucket.

OSS mounting is divided into read-only mounting and writable mounting.

  • The read-only mounting mode is used to access input data of a program. Data access requests at a read-only mount point are automatically converted to OSS access requests and it does not need to download the data to a local file of the VM.

  • Result data written to a writable mounting directory is saved in a local file of the VM and then automatically uploaded to a corresponding place of the OSS Bucket when the job is finished. In writable mounting mode, make sure that the VM has allocated sufficient disk space for the result data.

1. Mount a data directory

You can configure the mounting parameter when submitting a job by referring to the following example.

1.1. Use Java SDK

TaskDescription taskDesc = new TaskDescription();
taskDesc.addInputMapping("oss://mybucket/mydir/", "/home/admin/mydir/"); //Read-only mounting
taskDesc.addOutputMapping("/home/admin/mydir/", "oss://mybucket/mydir/"); //Writable mounting

1.2. Use Python SDK

# Read-only mounting
job_desc['DAG']['Tasks']['my-task']['InputMapping'] = {
    "oss://mybucket/mydir/": "/home/admin/mydir/"
}

# Writable mounting
job_desc['DAG']['Tasks']['my-task']['OutputMapping'] = {
    "/home/admin/mydir/": "oss://mybucket/mydir/"
}

1.3. Use command line tool

bcs sub "python main.py" -r oss://mybucket/mydir/:/home/admin/mydir/   # Use commas to separate multiple mappings.
Note

Note: If InputMapping is set, the read-only mounting mode is used. The directory can be read, but cannot be written or deleted. When OutputMapping is set, files, or directories written to the directory /home/admin/mydir/ are automatically uploaded to oss://mybucket/mydir/ after the job is finished. The InputMapping and OutputMapping cannot be mounted to the same directory.

2. Mount a program directory

Note

Question: I have a program main.py in my OSS Bucket. How can I use this program on Batch Compute?

Assume that main.py is saved at oss://mybucket/myprograms/main.py in your OSS Bucket.

Mount oss://mybucket/myprograms/ to /home/admin/myprograms/.

Run the command line python /home/admin/myprograms/main.py.

2.1. Use Java SDK

TaskDescription taskDesc = new TaskDescription();
taskDesc.addInputMapping("oss://mybucket/myprograms/", "/home/admin/myprograms/"); //Read-only mounting

Command cmd = new Command()
cmd.setCommandLine("python /home/admin/myprograms/main.py")

params.setCommand(cmd);
taskDesc.setParameters(params);

2.2. Use Python SDK

# Read-only mounting
job_desc['DAG']['Tasks']['my-task']['InputMapping'] = {
    "oss://mybucket/myprograms/": "/home/admin/myprograms/"
}
job_desc['DAG']['Tasks']['my-task']['Parameters']['Command']['CommandLine']='python /home/admin/myprograms/main.py'

2.3. Use command line tool

bcs sub "python /home/admin/myprograms/main.py" -r oss://mybucket/myprograms/:/home/admin/myprograms/

3. Mount a file

Note

Questions: How can I mount files of different prefixes in my OSS Bucket to the same directory of the VM on Batch Compute in batch?

Assume that you want to mount two files in bucket1 and bucket2 respectively to the local directory /home/data/ of the VM:

Mount oss://bucket1/data/file1 to /home/data/file1.

Mount oss://bucket2/data/file2 to /home/data/file2.

Besides, you want to upload the two local files from the VM to bucket1 and bucket2:

Mount /home/output/output1 to oss://bucket1/output/file1.

Mount /home/output/output2 to oss://bucket2/output/file2.

You can see the following example:

3.1. Use Java SDK

TaskDescription taskDesc = new TaskDescription();
taskDesc.addInputMapping("oss://bucket1/data/file1", "/home/data/file1");
taskDesc.addInputMapping("oss://bucket2/data/file2", "/home/data/file2");
taskDesc.addOutputMapping("/home/output/output1", "oss://bucket1/output/file1");
taskDesc.addOutputMapping("/home/output/output2", "oss://bucket2/output/file2");

3.2. Use Python SDK

# Read-only mounting
job_desc['DAG']['Tasks']['my-task']['InputMapping'] = {
    "oss://bucket1/data/file1": "/home/data/file1",
    "oss://bucket2/data/file2": "/home/data/file2"
}

# Writable mounting
job_desc['DAG']['Tasks']['my-task']['OutputMapping'] = {
    "/home/output/output1": "oss://bucket1/output/file1",
    "/home/output/output2": "oss://bucket2/output/file2"
}

4. Limits of InputMapping mounting

4.1. Naming of storage objects on OSS

  • A single file name can contain 255 bytes at most. The number of bytes occupied by other characters is calculated according to the actual bytes occupied by the characters after UTF-8 encoding. Generally, a Chinese character occupies three bytes.

  • The maximum length of combination path + file name calculated from the root directory is 1023 bytes.

  • Valid file names are defined according to the UTF-8 character set. Other character sets must be UTF-8 encoded and then the file name validity is checked.

  • All characters of UTF-8 0x80 and later are supported.

  • Characters of 0x00-0x1F and 0x7F, and \, /, :, *, ?, “, <, >, and | are not supported.

4.2. Access permissions for InputMapping mounting files

Considering consistency of file writing by multiple nodes to the OSS Bucket, InputMapping mounting is a read-only action on the OSS Bucket, and operations of an application through the file system interface do not modify or delete any file in the OSS Bucket under any circumstances.

  • Any modifications to the mounted directory are cached to a local file. The content after modification can be read by the application through the file system interface but is not automatically synchronized to the OSS Bucket. After the application running finishes, the file system is unmounted, the mounting service is disabled, and all modifications cached locally are discarded. If the mounting service is enabled and the file system is mounted, the local file is the same to the object content in the OSS object and the last modification is not retained.

  • In the mounted directory, existing files in the OSS Bucket are assigned with the read and run permissions but no write permission. The application can read the files but cannot modify, intercept, rename, or delete the files.

  • In the mounted directory, the application can freely create files and folders to which read, write, and execution permissions are assigned. The application can modify, intercept, rename, and delete these files.

  • In Windows operating systems, content in read-only folders cannot be deleted. Therefore, in an existing folder in the OSS Bucket, the application can create, modify, or intercept a file in the folder, but cannot delete or rename any file. For example, \127.0.0.1\ossdata\bucket\dir is a folder in the OSS Bucket. If the application creates a file named “file” in this directory after mounting it, the file \127.0.0.1\ossdata\bucket\dir\file cannot be deleted or renamed. However, if a folder named “local” is created under \127.0.0.1\ossdata\bucket\dir, files created under \127.0.0.1\ossdata\bucket\dir\local can be deleted and renamed, for example, \127.0.0.1\ossdata\bucket\dir\local\file. This restriction does not exist in Linux operating systems.

4.3. Mounting language

All object names in the OSS Bucket are saved after UTF-8 encoding. The mounting service can convert the character sets. You have to specify the character set to be used by the application in cluster/job description. Only after the character set is converted by the mounting service can the application access the correct file.

Note that the character set used by the application may be different from the default character set of the operating system. For example, if an application in Traditional Chinese runs on an operating system in simplified Chinese, the character set must be BIG5. In this way, the mounting service automatically converts the UTF-8-encoded path and file name in the OSS Bucket to BIG5-encoded one. Although the file name in the mounted disk is gibberish in the operating system, the file name accessed by the application is correct.

The character set conversion function affects only the file name. It has no impact on the file content.

4.4. File lock

The NFS-based DOS Share and file lock affect the performance. In case of frequent I/O operation, you are recommended to disable the file lock by adding the nolock option during mounting. However, some applications such as 3DSMAX must use the file lock because the application execution may fail if the file lock is disabled.

4.5. File mounting

To mount a separate file in the OSS Bucket to a local file of the VM, make sure that the local directory reachable for all files is not the parent directory to which another file or directory is mounted.

4.6. Other conventions

  • When an OSS application is running, do not modify the mounted folder which is being accessed by the application; otherwise, a conflict may occur. Any modification (including but not limited to, deleting or modifying the file content, intercepting a file, or appending the file content) to the data being accessed by the application may result in application running failure.

  • The OSS Bucket capacity is approximately infinite. The current disk usage statistics collected by the operating system do not indicate the current usage of the bucket of the OSS Bucket, and thus is of no significance.

  • Multiple directories in the same bucket can be mounted to different places at a time.

  • Content of multiple buckets in the same account can be mounted to different places at a time.

  • Files of only one account instead of multiple accounts can be mounted at a time.