By default, when you create a Data Science Workshop (DSW) instance by using a pay-as-you-go public resource group, the system provides a system disk of 100 GB. If the system disk space is insufficient, you can expand the system disk or mount a dataset. This topic describes the advantages, disadvantages, and use scenarios of expanding a system disk and mounting a dataset.
Quick comparison
Feature | System disk expansion | Dataset mounting |
Read and write speed | High. | Low. The read and write speed depends on the storage type of the dataset that you mount, such as Object Storage Service (OSS), File Storage NAS (NAS), and Cloud Parallel File Storage (CPFS). |
Ease of expansion | High. The expansion operation is simple and does not interrupt services. | Dataset mounting requires specific configurations and technical operations. |
Persistence | Low. After you delete a DSW instance, the system disk is reclaimed. | High. Data is stored in cloud storage such as OSS, NAS, and CPFS. |
Data sharing | Not supported. | Data can be shared among multiple instances and cloud services. |
Data security | Low. | High. Data is persistently stored in cloud storage such as OSS, NAS, and CPFS. |
Use scenarios | Scenarios that require high-performance I/O operations or temporary storage. | Scenarios that require persistent storage, shared access, and high data security. |
Details
Method 1: System disk expansion
Advantages:
High read and write speed: System disks are high-performance storage devices that have high read and write speed and are suitable for scenarios that require high-performance I/O operations.
Ease of expansion: A system disk is easy to expand. You need to only update related configurations to perform the expansion. After you expand a system disk, the system disk is not reclaimed until you delete the DSW instance to which the disk is attached. The system disk is not reclaimed even if you do not use the instance for a long period of time.
Disadvantages:
No cross-instance sharing: A system disk is attached to a single instance and cannot be shared with other instances.
Data loss on instance deletion: Data on the system disk is retained when an instance is stopped. However, all data on the system disk is permanently deleted when the DSW instance is deleted.
No downscaling after expansion: Once the system disk is expanded, its capacity cannot be reduced.
Billing continues for a scaled-out system disk after the instance is stopped.
When you stop a pay-as-you-go DSW instance, billing for compute resources stops. However, if you have scaled out the system disk, billing for its storage space continues because the disk resource is still occupied. To stop all charges, back up your data and then delete the instance.
Use scenarios:
Temporary storage: scenarios that require quick access to data.
High I/O operations: Ideal for applications that require high read and write speeds, such as databases, log record, and analysis.
Method 2: Dataset mounting
Advantages:
Persistent storage: Data is stored in datasets, such as OSS, NAS, and CPFS datasets, and is independent of the lifecycle of a DSW instance. Even if an instance is stopped or deleted, data is not lost.
Data sharing: Data can be shared among DSW instances or cloud services, which can read data by accessing the mount path.
Data security: Compared with system disks, dataset storage is more reliable and secure.
Disadvantages:
Low read and write speed: Compared with system disks, the speed of accessing data in a dataset may be lower.
Additional configurations: Mounting a dataset requires additional configurations and management, and requires users to perform specific technical operations. For more information, see Mount a dataset, OSS, NAS, or CPFS.
Use scenarios:
Persistent storage: scenarios in which you want to persistently store data such as training datasets, model parameters, and result data.
Shared access: scenarios that require multiple instances or users to access and work on the same dataset, such as collaborative projects and team workflows.
Data security: applications that require high data durability and security.
References
Upload and download data files: If the size of a file exceeds 5 GB, you can upload the file to an OSS bucket, create an OSS dataset, and mount the dataset to a DSW instance. This way, you can directly read the OSS data in the DSW instance.