By default, when you create a Data Science Workshop (DSW) instance by using a pay-as-you-go public resource group, the system provides a system disk of 100 GB. If the system disk space is insufficient, you can expand the system disk or mount a dataset. This topic describes the advantages, disadvantages, and use scenarios of expanding a system disk and mounting a dataset.
Comparison overview
Feature | System disk expansion | Dataset mounting |
Read and write speed | High. | Low. The read and write speed depends on the storage type of the dataset that you mount, such as Object Storage Service (OSS), File Storage NAS (NAS), and Cloud Parallel File Storage (CPFS). |
Ease of expansion | High. The expansion operation is simple and does not interrupt services. | Dataset mounting requires specific configurations and technical operations. |
Persistence | Low. After you delete a DSW instance, the system disk is reclaimed. | High. Data is stored in cloud storage such as OSS, NAS, and CPFS. |
Data sharing | Not supported. | Data can be shared among multiple instances and cloud services. |
Data security | Low. | High. Data is persistently stored in cloud storage such as OSS, NAS, and CPFS. |
Use scenarios | Scenarios that require high-performance I/O operations or temporary storage. | Scenarios that require persistent storage, shared access, and high data security. |
Details
System disk expansion
Advantages:
High read and write speed: System disks are high-performance storage devices that have high read and write speed and are suitable for scenarios that require high-performance I/O operations.
Ease of expansion: A system disk is easy to expand. You need to only update related configurations to perform the expansion. After you expand a system disk, the system disk is not reclaimed until you delete the DSW instance to which the disk is attached. The system disk is not reclaimed even if you do not use the instance for a long period of time.
Disadvantages:
Inconvenient data sharing: A system disk is attached to a single DSW instance, and data cannot be shared among multiple instances.
Non-persistent storage: After you delete a DSW instance, the system disk is reclaimed.
NoteFor a DSW instance that you create by using a public resource group and whose system disk is not expanded, the system disk is cleared 15 days after the instance is stopped.
If a DSW instance that you create by using a dedicated resource group is stopped or deleted, the system disk is cleared.
Use scenarios:
Temporary storage: scenarios that require quick access to data.
High-performance I/O operations: applications that require high read and write speed, such as databases, and log record and analysis applications.
Dataset mounting
Advantages:
Persistent storage: Data is stored in datasets, such as OSS, NAS, and CPFS datasets, and is independent of the lifecycle of a DSW instance. Even if an instance is stopped or deleted, data is not lost.
Data sharing: Data can be shared among DSW instances or cloud services, which can read data by accessing the mount path.
Data security: Compared with system disks, dataset storage is more reliable and secure.
Disadvantages:
Low read and write speed: Compared with system disks, the speed of accessing data in a dataset may be lower.
Additional configurations: Mounting a dataset requires additional configurations and management, and requires users to perform specific technical operations. For more information, see Mount datasets or OSS paths.
Use scenarios:
Persistent storage: scenarios in which you want to persistently store data such as training datasets, model parameters, and result data.
Shared access: scenarios that require multiple instances or users to access and work on the same dataset, such as collaborative projects and team workflows.
Data security: applications that require high data durability and security.
References
Upload and download data files: If the size of a file exceeds 5 GB, you can upload the file to an OSS bucket, create an OSS dataset, and mount the dataset to a DSW instance. This way, you can directly read the OSS data in the DSW instance.