Data Science Workshop (DSW) provides a cloud-based IDE for AI development. If you are familiar with tools like Jupyter Notebook or Visual Studio Code, you can quickly start developing models. This document shows you how to create a DSW instance and troubleshoot common issues.
Quickly create a basic DSW instance
Log on to the PAI console, select a Region, and in the left-side navigation pane, click Workspace List. Select and enter the target workspace.
In the left-side navigation pane, click Interactive Modelling (DSW) > Create Instance. Configure the following key parameters and leave the others at their default settings. For a complete list of console parameters, see Full list of console parameters.
Parameter | Description |
Instance Name | Example: dsw_test. |
Resource Type | Select Public Resources. This resource type uses the pay-as-you-go billing method. |
Instance Type | Example: ecs.gn7i-c8g1.2xlarge (1 × A10 GPU, 8 vCPUs, 30 GiB memory). If this instance type is out of stock, try selecting another one from the list. |
Image Configuration | Select Official Image, then search for and select modelscope:1.31.0-pytorch2.8.0-gpu-py311-cu124-ubuntu22.04 (Python 3.11, CUDA 12.4). We recommend using ModelScope images for their broad compatibility and comprehensive set of third-party libraries. |
Click OK to create the instance. When the instance status changes to Running, it is ready.
If the instance fails to start, see Common issues when you start a DSW instance.
On the DSW instance list page, find the instance and click Open in the Actions column to go to the DSW instance and start developing models.
For more information about the features of the DSW instance interface and how to stop, release, or change a DSW instance, see Access and manage DSW instances in the console.
Warning After you create a DSW instance using public resources, billing starts based on the instance's uptime. To avoid unnecessary charges, stop or delete the instance when not in use.
The system disk for the DSW instance created in this example is a free cloud disk. If the instance remains stopped for more than 15 consecutive days, the data on the cloud disk is permanently deleted and cannot be recovered. Back up important data in a timely manner, or mount a cloud storage service and transfer your data.
Configure for common use cases
A basic DSW instance might not meet all AI development needs. The following table summarizes configurations for common use cases.
Use case | Need / Pain point | Key configuration | Related documentation |
Persistently store code and data | The system disk of a DSW instance provides temporary storage. Data is deleted when the instance is deleted or remains stopped for an extended period. Save important files for long-term use or share data between multiple instances. | Use Mount Dataset or Mount Storage Path to mount cloud storage, such as Object Storage Service (OSS), to a specified folder on an instance. | Mount a dataset, OSS bucket, NAS file system, or CPFS file system |
Increase public network download speed | DSW instances use a shared gateway by default. Due to bandwidth limitations, download speeds for large files might be insufficient. | In the network information section, configure a Virtual Private Cloud (VPC) and use a Dedicated Gateway. This also requires a NAT Gateway and an Elastic IP Address (EIP) for the VPC. | Remote connection: Direct SSH |
Develop remotely using SSH | Use local tools like VSCode or PyCharm for development and debugging instead of being limited to a web-based IDE. | In the access configuration, select Enable SSH, enter the SSH Public Key, and select Public Network Access. Associate an existing NAT Gateway and Elastic IP Address (EIP). | Use a dedicated gateway to increase the public network access speed |
Access web services within the instance | Publish a web application running inside the instance to the public internet so it can be accessed or shared via a URL. | In the access configuration, add a Custom Service, configure the service port, and enable public network access. Add an inbound rule to the security group to allow traffic on that port. | Access services in an instance over the Internet |
Full list of console parameters
Basic information
Parameter | Description |
Instance Name | Enter a unique, descriptive name for the instance. |
Tags | Add tags to the instance based on business needs to facilitate multi-dimensional search, location, batch operations, and billing. |
Resource information
Parameter | Description |
Resource Type | |
Environment information
Parameter | Description |
Image Configuration | In addition to Official Image, the following image types are supported: Custom Image: You can use a custom image that has been added to PAI. The image repository must be set to allow public pulls, or the image must be stored in Container Registry (ACR). For more information, see Custom images. Image URL: You can configure the URL of a custom or official image that is accessible on the public network. If it is a private image URL, click Enter Username And Password and configure the image repository username and password. To increase the image pull speed, see Image acceleration.
|
System Disk | Used to store files during development. When you set Resource Type to Public Resources, or when you set Resource Quota to subscription general computing resources (CPU cores ≥ 2 and memory ≥ 4 GB, or equipped with a GPU), each instance receives a 100 GiB free disk as a system disk. You can expand the disk. The expansion price is subject to the console interface.
Warning If you only use the free quota for the cloud disk, its contents are deleted if the instance is stopped for more than 15 consecutive days. After expansion, the entire cloud disk (free and paid portions) is no longer subject to the 15-day stop limit but will continue to incur charges while stopped. Downgrading the disk size after expansion is not supported. Expand the disk as needed. When an instance is deleted, the cloud disk is also deleted. Ensure you back up all necessary data before deletion.
To use permanent storage, configure Dataset Mounting or Storage Path Mounting. |
Dataset Mounting | Stores datasets for reading or persists files created during development. The following two dataset types are supported: Custom Dataset: Create a custom dataset to store your training data files. You can set it to read-only and select a specific version. Public Dataset: PAI provides pre-built public datasets, which can only be mounted in read-only mode.
Mount Path: The path where the dataset is mounted in the DSW instance, for example, /mnt/data. Access the dataset from your code using this path.
Note The mount paths for multiple datasets cannot be the same. If you configure a CPFS type dataset, you must configure the network settings and ensure the selected VPC is the same as the one used by CPFS. Otherwise, the DSW instance will fail to create. When the resource group is a dedicated resource group, the first dataset must be a NAS type, and it will be mounted to both your specified path and the default DSW working directory /mnt/workspace/.
For more information about mounting, see Mount a dataset, OSS bucket, NAS file system, or CPFS file system. |
Storage Mounting | Use storage mounting to access datasets or persist files. For more information about mounting, see Mount a dataset, OSS bucket, NAS file system, or CPFS file system. |
Working Directory | The startup directory for JupyterLab and the Web IDE. The default is /mnt/workspace. |
Expand for more configurations
Parameter | Description |
Custom Startup Script | Customizes the environment or performs initialization tasks during instance startup. The custom script runs after the image and resources are ready but before development applications like JupyterLab and Web IDE start.
Note Timeout is 3 minutes: The custom script increases instance startup time and has a timeout of 3 minutes. Do not perform long-running tasks like image downloads in the script. View script run logs: After the instance starts, find the logs generated by the custom script in the /var/log/user-command/ directory.
|
Environment Variables | Used for the main container startup, system processes, and user processes. Add custom environment variables or override system defaults as needed. Note: Do not modify the following environment variables: # Modification will not take effect
USER_NAME # Will be overwritten by the logic in the service
# System variables that are not recommended for modification. Modification may affect normal use.
JUPYTER_NAME: Constructed from instance information by default. Can be used to modify the jupyterlab URL access path.
JUPYTER_COMMAND: Jupyter startup command. Default is set to lab to start jupyterlab.
JUPYTER_SERVER_ADDR: JupyterLab service listening address. Default is 0.0.0.0.
JUPYTER_SERVER_PORT: JupyterLab service listening port. Default is 8088.
JUPYTER_SERVER_AUTH: JupyterLab access password. Default is empty.
JUPYTER_SERVER_ROOT: Jupyter working directory. Priority is lower than WORKSPACE_DIR.
CODE_SERVER_ADDR: code-server service listening address. Default is 0.0.0.0.
CODE_SERVER_PORT: code-server service listening port. Default is 8082.
CODE_SERVER_AUTH: code-server access password. Default is empty.
WORKSPACE_DIR: The system sets this environment variable based on the working directory parameter set when the instance is created. It can change the startup directory of jupyter and code-server. An error may occur if the path does not exist.
|
Advanced Configuration | Adjusts certain secure kernel parameters required by your services. This is currently supported only for Lingjun resource group instances. For parameter details, see the table below. |
Advanced configuration parameter | Default value | Description | Notes |
VmMaxMapCount | 65530 | Sets the maximum number of memory map areas a process can have. For example, it can be set to 1024000. | Values below 65530 do not take effect. Excessively high values can lead to wasted memory resources. |
Network information
Parameter | Description |
VPC Configuration | This parameter is available only when Resource Type is set to Public Resources. To use a DSW instance within a Virtual Private Cloud (VPC), create a VPC in the same region as the DSW instance and configure this parameter. You also need to configure a VSwitch and a Security Group. For details on configuration policies for different scenarios, see DSW network access and configuration. |
Public Network Gateway | The following configuration methods are supported: Public Gateway: The network bandwidth is limited. During periods of high user concurrency or when downloading large files, the network speed might be insufficient. Dedicated Gateway: To solve the bandwidth limitations of the public gateway, create a public NAT Gateway in the DSW instance's VPC, bind an EIP, and configure SNAT entries. For more information, see Improve public network access speed with a dedicated gateway.
The following parameters are available only when a CPFS dataset is mounted:
Note If a CPFS dataset is mounted, you must configure a VPC, and the selected VPC must be the same as the one used by CPFS. |
Access configuration
Parameter | Description |
Enable SSH | For remote connection to the instance. This option is available only after you select a VPC. When enabled, a Custom Service named SSH appears. If you use a custom image, ensure that sshd is installed. |
SSH Public Key | You can configure this parameter after turning on the SSH Configuration switch.
Note To support both VPC and public network login, add public keys from multiple clients. Add each public key on a new line. You can add up to 10 public keys. |
Custom Service | Used to configure SSH remote access or access services in an instance over the Internet. |
Create Domain Name For VPC Access | Creates an internal authoritative domain (PrivateZone). Use this domain within the VPC to access the instance's SSH service or other custom services, which avoids the inconvenience of a changing instance IP address. Creating a PrivateZone domain incurs charges. For more information, see Alibaba Cloud DNS Product Billing. |
NAT Gateway | When accessing a service in the instance from the public network, this gateway maps public requests (EIP:Port) to the private DSW instance (Private IP:Port). |
Elastic IP Address | Provides a public IP address for accessing services in the instance from the public network. |
Roles and permissions
Parameter | Description |
Visibility | Choose Visible Only To Instance Owner or Publicly Visible Within Workspace. |
Instance Owner | Only the workspace administrator can change the instance owner. |
Expand For More Configurations
Parameter | Description |
Instance RAM Role | Associate a RAM role with the instance to grant it access to other cloud resources. This method uses temporary credentials from STS to access other cloud resources, which avoids using long-term AccessKeys and reduces the risk of key exposure. The following options are available: Default PAI Role: Has permissions to access internal PAI products, MaxCompute, and OSS. Temporary access credentials issued based on the default PAI role have permissions equivalent to the DSW instance owner when accessing internal PAI products and MaxCompute tables. When accessing OSS, it can only access the default storage path bucket configured for the current workspace. Custom Role: Configure a custom role for customized or more fine-grained permission management. Do Not Associate Role: Select this if you want to access other cloud products directly using an AccessKey.
For more information on configuring instance RAM roles, see Configure an instance RAM role for a DSW instance. |
FAQ
Common issues when starting a DSW instance
Click to expand
Q: DSW instance fails to start
Troubleshooting: Click the DSW instance name. The error message is displayed on the Events tab.

The following are common errors and the corresponding solutions:
Your requested resource type [ecs.******] is not enough currently, please try other regions or other resource types
Cause: The instance type you selected has insufficient inventory in the current region, which prevents the instance from being created.
Solution: You can try to create the instance again later or switch to a different instance type or region.
Your resource usage has exceeded the default limitation. Please contact us via ticket system to raise the limitation.
Cause: Each Alibaba Cloud account is limited to creating DSW instances with a maximum of 2 × GPUs in each region. If the selected instance type exceeds this limit, the creation fails.
Solution: To increase your quota, you can submit a ticket.
Sales of this resource are temporarily suspended in the specified zone. We recommend that you use the multi-zone creation function to avoid the risk of insufficient resource.
Solution: You can try the following operations to avoid the risk of insufficient resources:
Switch to another region.
Adjust the instance type.
Try to start the instance during off-peak hours.
CommodityInstanceNotAvailableError: Commodity instance has been released due to prolonged arrears at past. Please create a new instance for use
The charge of current ECI instance has been stopped, but the related resources are still being cleaned.
Cause: Trial resources are public resources. If you start a DSW instance during peak hours, it may take more than 30 minutes to start. If the system cannot retrieve the resource within one hour, a message appears indicating that the selected instance type is unavailable.
Solution: You can try the following operations:
Switch the region.
Change the instance type. You cannot change the instance type of a pending instance. You must stop the instance and then change the instance type.
Use the instance during off-peak hours, such as outside of working hours.
If none of the preceding methods resolve the issue, you can contact your business manager.
The cluster resources are fully utilized. Please try later or other regions.
Create ECI failed because the specified instance is out of stock. It is recommended to use the multi-zone creation function to avoid the risk of stockout.
Cause: The specified computing resource is out of stock.
Solution: You can try the following operations:
Switch the region.
Change the instance type. You cannot change the instance type of a pending instance. You must stop the instance and then change the instance type.
Use the instance during off-peak hours, such as outside of working hours.
If none of the preceding methods resolve the issue, you can contact your business manager.
back-off 10s restarting failed container=dsw-notebook pod
Cause: The system disk is full. You must expand the system disk.
To view the system disk usage:


Solution: Click Change Configuration to expand the system disk.

Important After you expand the system disk, billing for the system disk continues regardless of whether the instance is running. To stop all billing for a DSW instance, you must delete the instance. Before you delete the instance, make sure that you have backed up all necessary data.
the available zone with vSwitch is out of stock
Cause: A VPC is configured for the DSW instance. The vSwitch in the VPC has a zone property. After the vSwitch is configured, the search for computing resources is limited to the zone where the vSwitch is located, which may cause a resource shortage.
Solution: You can change the configuration of the DSW instance and set the VPC to empty.

Note If you want to use a VPC, we recommend that you switch to another zone and create a new vSwitch and DSW instance. This expands the range of available resources and prevents shortages caused by a limited resource scope.
Startup failed with the message "Workspace member not found"
Solution: Contact your workspace administrator to add your account as a member of the workspace.
failed to create containerd container: failed to prepare layer from archive: failed to validate archive quota ...
Cause: The available disk space is insufficient for the instance image.
Solution: Go to the instance details page and scale out the system disk. Note that scaling out the system disk incurs additional fees based on its capacity.

Other reasons for startup failure:
Creation failure due to overdue payment
If your account has an overdue payment, you cannot create a DSW instance. Vouchers cannot be used to offset overdue payments. You can log on to the User Center to check if your account has an overdue payment.
Q: Can I run a Python file when a DSW instance starts?
Yes, you can set a Custom Startup Script when you create a DSW instance or change the instance configuration.

You can use this feature to customize the environment or run initialization tasks when an instance starts. The custom script runs after the image and resources are ready and before developer applications such as JupyterLab and Code Server start.
Note A custom script increases the instance startup time and has a 3-minute timeout. To prevent the script from timing out, do not run long-running tasks, such as downloading large files or images, in the custom script.
After the instance starts, you can find the logs generated by the custom script in the /var/log/user-command/ path.
Q: Cannot find a DSW instance?
On the overview page, you can view the different types of instances created in different regions. Try switching between regions to find your instance.

Q: What should I do if the DSW page is abnormal or unresponsive?
Problems such as a blank page, a Notebook that keeps loading, or a Terminal that does not accept commands are usually related to your local environment. Try the following troubleshooting steps:
Clear your browser cache and try again.
Use your browser's incognito or private mode to access the page.
Change your network environment. For example, switch from your company's internal network to a mobile hotspot to check for firewall restrictions.
Try using another browser, such as Chrome or Firefox.
Q: Will data on the system disk lost when a DSW instance that uses a disk as its system disk is stopped, restarted, has its instance type changed, or has its image replaced?
DSW instances that use a cloud disk as the system disk include instances created in a public resource group and general-purpose resource instances for which you select Disk as the system disk. The data on the system disks of these instances is affected as follows:
Stopping an instance: Data might be lost. If the disk has not been expanded and the instance remains stopped for more than 15 days, the data is deleted and cannot be recovered. If the disk has been expanded, or if the instance is stopped for 15 days or less, data is not lost.
Restarting an instance: Data is not lost. After an instance is stopped or restarted, all packages installed using pip, code files, and other data on the system disk are retained.
Changing the instance type: Data is not lost. Adjusting the instance type, such as the CPU, memory, or GPU configuration, does not affect data on the system disk.
Replacing the image: Some data might be lost. Changing the image does not affect data in mounted datasets or in OSS. However, content on the system disk might be reset. Therefore, save your instance data before you change the image. For example, you can copy or move the data to a dataset or to OSS. For more information, see Mount a dataset, OSS, NAS, or CPFS.
For general-purpose resource instances that use Temporary Storage as the system disk, all data on the system disk is lost when the instances are stopped, restarted, or have their specifications or image changed, regardless of whether their AI resource group is configured with subscription disks.
Q: Can DSW instances created using public resources be recovered if they are released after not being logged into for more than 15 days?
For DSW instances created with public resources, if the cloud disk system disk has not been expanded and the instance has not been started for more than 15 consecutive days, its system disk is automatically cleared and cannot be recovered.
Common issues when stopping or releasing a DSW instance
Click to expand
Q: How do I release a DSW instance?
On the DSW instance list page, click Stop or Delete for the instance.

Note: If you expanded the system disk when you created the DSW instance, billing for the system disk continues regardless of whether the instance is running. To stop all billing for a DSW instance, you must delete the instance.
Q: Why can't I find my DSW instance?
If you cannot find an instance, try switching to a different region and workspace.

Q: How do I release a free trial resource plan?
Free trial resource plans do not need to be released or stopped.
Q: How do I completely stop billing for a DSW instance? What is the difference between "Stop" and "Delete"?
Stop instance: This operation releases the instance's computing resources (CPU/GPU) and pauses billing for computing. Note: The expanded system disk continues to be billed.
Delete instance: This operation permanently deletes the instance and all its resources, including the system disk. All related billing stops.
How to choose:
Stop: Use this if you do not need the instance temporarily but want to keep the data and environment for future use.
Delete: Use this if you no longer need the instance and want to stop all billing. You must back up your data before you perform this operation.
Q: Why is my DSW instance stuck in the "Stopping" or "Deleting" state and the operation cannot be completed?
Stopping or deleting an instance takes time because the system needs to safely terminate tasks, save the state, and reclaim resources. If an instance is unresponsive for a long time, the common reasons are as follows:
In this situation, wait for a few moments and then refresh the page. The instance status should change to Stopped.
Q: Will my data and code be lost after stopping or deleting a DSW instance?
Whether data is retained depends on your operation and the instance's resource group type.
Stop instance:
The data retention policy varies by resource group type.
For most pay-as-you-go and general-purpose instances that use a cloud Disk as the system disk, data is deleted and cannot be recovered if the disk has not been expanded and the instance is stopped for more than 15 days. The data is retained if the disk has been expanded or the instance is stopped for 15 days or less.
Instances using temporary storage as the system disk: Data is stored in temporary storage. Stopping the instance deletes the data, and it cannot be recovered.
Delete instance:
All data on its system disk is permanently erased and cannot be recovered. Therefore, you must back up all important data before deletion.
Q: Why does my running DSW instance stop automatically?
The instance is configured with an idle auto-shutdown policy. This policy is designed to save resources and is enabled by default for free trial instances.
Q: I have stopped or deleted all my DSW instances. Why does it still show "Running" or why do I receive billing notifications?
Check for the following common reasons:
Confusing resource plans with instances. The "Running" status you see may refer to a resource plan (such as "250 billable hours per month"), not an instance. A resource plan is always active within its validity period, and its status is independent of the instance.
The expanded system disk is still being billed. Stopping an instance only pauses computing fees. An expanded system disk continues to incur storage fees.
There is a delay in billing. Billing is not in real time. A bill may be generated several hours after you use the resource. For example, fees incurred in the morning may not appear on the bill until the afternoon.