Before running a task, upload the required resources — JAR packages, scripts, and data files — or mount a cloud storage path to EMR Serverless Spark. Tasks access these files at runtime through one of two mechanisms:
| Approach | How it works | Best for |
|---|---|---|
| Managed file directory | Upload local files directly through the console to a workspace-managed bucket | Small files (under 500 MB) you want to reference as task dependencies or input data |
| Integrated file directory | Mount an Object Storage Service (OSS) bucket or General-purpose NAS file system to your workspace | Large datasets, shared storage, or files already stored in OSS or NAS |
Limits
The maximum size of a single file uploaded to a managed file directory is 500 MB. For larger files, store them in OSS and add an integrated file directory to mount the OSS path.
A workspace supports a maximum of 10 integrated file directories.
Tasks submitted through Livy Gateway or Kyuubi Gateway do not support integrated file directories.
Managed file directory
Upload a file
Log on to the E-MapReduce console.
In the navigation pane on the left, choose EMR Serverless > Spark.
On the Spark page, click the name of the target workspace.
On the EMR Serverless Spark page, click Files in the navigation pane on the left.
On the Managed File Directory page, click Upload File.
In the Upload File dialog box, click the upload area to select a local file, or drag a file to the upload area.
Manage files and folders
On the Managed File Directory page, you can perform the following operations:
Files: Download File, Copy Address, Delete
Folders: Create Folder, Rename, Delete
Integrated file directory
After you add an integrated file directory, workspace members with file editing permissions can edit files and folders in the integrated OSS file directory from the file management interface. Members with Data Studio permissions can read and write files and folders using Data Studio tasks.
Add a file directory
On the Integrated File Directory page, click Create File Directory.
In the Create File Directory dialog box, configure the parameters for your storage type, then click OK.
OSS
Parameter Description File directory name Name of the file directory. OSS path An OSS path that you have permission to access. The workspace execution role must have access to this path. Mount Path A custom path. Must be under /mnt.General-purpose NAS
Parameter Description File directory name Name of the file directory. File system A General-purpose NAS file system that you have permission to access. The workspace execution role must have access to this file system. Mount Target The mount target for accessing the NAS file system. File system path An existing storage path in NAS. Leave blank to mount the root directory. Mount path A custom path. Must be under /nas.
Delete a file directory
Deleting a file directory removes the association between the workspace and OSS or NAS. Files stored in OSS or NAS are not deleted.
On the Integrated File Directory page, click Delete in the Actions column for the directory you want to remove.
Click OK.
What's next
Use files as task dependencies: After uploading to a managed file directory, reference the files as JAR package dependencies or input data sources when configuring tasks.
Access files in Notebook sessions: After adding an integrated file directory, select the mounted path when configuring Notebook sessions to read and write files directly.
Access files in Data Studio tasks: Select the mounted directory when configuring Data Studio tasks to read and write files using the integrated storage path.