All Products
Search
Document Center

Platform For AI:Create a DSW instance

Last Updated:Apr 27, 2025

Data Science Workshop (DSW) is a cloud integrated development environment (IDE) for Platform for AI (PAI). DSW incorporates multiple development environments, including Notebook, VS Code, and Terminal. It eliminates the need to manually purchase, install, and start ECS instances and allows you to get started quickly with AI model-based code writing, debugging, and running.

Prerequisites

  1. You have activated PAI and created a workspace by using an Alibaba Cloud account.

    Log on to the PAI console, select a region where you want to activate PAI in the upper-left corner of the page, and then complete authentication, authorization, and service activation.

  2. You have authorized the operation account.

    If you use an Alibaba Cloud account, skip this step. If you use a RAM user, you must assign the algorithm developer, algorithm O&M engineer, or administrator role to the RAM user.

Create a DSW instance in the PAI console

Important

After you create an instance by using public resources, you are charged based on the subscription duration of the instance. The billing stops only after you stop or delete the instance. For more information about billing rules, see Billing of DSW.

  1. Go to the Data Science Workshop (DSW) page.

    1. Log on to the PAI console.

    2. On the Overview page, select a region in the top navigation bar.

    3. In the left-side navigation pane, click Workspaces. On the Workspaces page, click the name of the workspace that you want to manage.

    4. In the left-side navigation pane, choose Model Training > Data Science Workshop (DSW).

  2. Click Create Instance.

  3. On the Create Instance page, configure the following key parameters.

    Basic Information

    Parameter

    Description

    Instance Name

    The name of the DSW instance.

    Tag

    The instance tag. You can tag the instance based on your business requirements. This facilitates multi-dimensional resource searching, resource locating, batch operation, and cost allocation.

    Resource Information

    Parameter

    Description

    Resource Type

    • Public Resources: supports the pay-as-you-go billing method. You cannot change the billing method from pay-as-you-go to subscription.

      Note

      Each Alibaba Cloud account can purchase up to two GPUs per region. If the resource usage exceeds the limit, an error may occur. If you want to increase the quota, submit a ticket.

      • Instance Type: You can select a CPU, GPU, or free instance type. For more information, see Overview of instance families.

      • Bidding Purchase: You can use preemptible instances to reduce costs. For more information, see Configure a preemptible DSW instance.

        This parameter is available only in the China (Hangzhou), China (Shanghai), China (Beijing), China (Ulanqab), China (Shenzhen), China (Guangzhou), Japan (Tokyo), and Singapore regions.

    • Resource Quota: supports the subscription billing method.

      • Resource Quota: You can select general computing resources or Lingjun resources. If no resources can be selected, click Associate Resource Quota.

      • Instance Type: You can configure the GPUs, vCPUs, and Memory parameters based on your business requirements.

      • Priority: Valid values: 1 to 9. A large value indicates a high priority.

      • CPU Affinity: If you enable CPU affinity, processes in containers or pods can be bound to specific CPU cores. This reduces CPU cache misses and context switches and increases CPU utilization and performance in scenarios that require high performance and low latency.

        This parameter is available only in the China (Beijing) and China (Shenzhen) regions.

    Environment Information

    Parameter

    Description

    Image

    You can select the following image types in addition to Alibaba Cloud images:

    • Custom Image: a custom image that is added to PAI. You need to configure the image repository to support public downloading or you can store an image in Alibaba Cloud Container Registry. For more information about how to add a custom image, see Custom images.

    • Image Address: supports the addresses of custom images that you can access over the Internet or the official image addresses.

      • If you enter the address of a private image, you must click enter username and password to configure the username and password of the image repository.

      • For information about how to improve the image loading speed, see Use an accelerated image in PAI.

    System Disk

    Used to store files generated during development. If you set Resource Type to Public Resources or select subscription general computing resources that provide 2 CPU cores or more and 4 GB of memory or more or GPUs) for Resource Quota:

    Each instance is provided with a free system disk of 100 GiB. The disk storage space can be expanded. For information about the disk storage expansion pricing, go to the PAI console.

    Warning
    • When you use only the free system disk, if an instance stops for more than 15 days, the disk data is deleted.

    • After the expansion, you cannot reduce the storage space. Proceed with caution.

    • After the expansion, the disk is not cleared if the instance is stopped and not recovered for more than 15 days. However, you continue to be charged for data storage.

    • If you delete the instance, the system disk is also released and the data stored in the disk is deleted. Make sure that you have backed up your data before you delete the instance.

    If you want to permanently store the data, you can configure Dataset Mounting or Storage Path Mounting.

    Dataset Mounting

    Used to store the dataset to be read or permanently store files generated during development. The following dataset types are supported:

    • Custom Dataset: You can create a custom dataset to store data files for training. You can configure whether to enable the read-only mode. You can also select the desired dataset version from the Versions list.

    • Public Dataset: PAI provides built-in public datasets that support only the read-only mount mode.

    Mount Path is the location where the dataset is mounted in the DSW instance. For example, you can search for the /mnt/data/ mount path from the code to obtain the dataset.

    Note
    • You cannot mount multiple datasets to the same path.

    • If you use a Cloud Parallel File Storage (CPFS) dataset, specify a virtual private cloud (VPC) for the instance. The VPC must be the same as the VPC of the CPFS dataset. Otherwise, the DSW instance may fail to be created.

    • If you set the Resource Quota parameter to a dedicated resource group, the first dataset that you mount to the instance must be a NAS dataset. The dataset is simultaneously mounted to the path that you specify and the default working directory/mnt/workspace/.

    For more information about mounting, see Mount datasets or OSS paths.

    Storage Path Mounting

    You can select a storage type to store the dataset to be read or permanently store the files generated during development.

    • Supported storage types: OSS, general-purpose NAS, extreme NAS, CPFS, and CPFS for Lingjun.

    • Mount Path is the location where the dataset is mounted in the DSW instance. For example, you can search for the /mnt/data/ mount path from the code to obtain the dataset.

    For more information about mounting, see Mount datasets or OSS paths.

    Working Directory

    The working directory is the startup directory of Notebook and WebIDE. The working directory is mounted to the /mnt/workspace directory.

    Network Information

    Parameter

    Description

    VPC

    This parameter is available only if you set the Resource Type parameter to Public Resources.

    To connect to a DSW instance over VPC, you must configure this parameter together with the vSwitch and Security Group parameters. For more information about configurations in different scenarios, see DSW network configuration.

    Internet Gateway

    You can select one of the following options for Internet Gateway:

    • Public Gateway: The network bandwidth is limited. The network speed may not meet your requirement in high concurrent scenarios or when you download large-sized files.

    • Private Gateway (recommended): To resolve the issue of limited network bandwidth for public gateways, you can create an Internet NAT gateway in the VPC of a DSW instance, associate an elastic IP address (EIP) with the DSW instance, and configure an SNAT entry. For more information, see Improve Internet access rate by using a private gateway.

    You need to configure the following parameters if you select a CPFS dataset for the Custom Dataset parameter:

    • Enable All Options: By default, this option is not selected, which indicates that the system disables the VPCs that cannot connect to the CPFS dataset.

    Note

    If you use a CPFS dataset, you must specify a VPC for the DSW instance, and the VPC must be the same as the VPC of the CPFS dataset.

    Access Configuration

    Parameter

    Description

    Enable SSH

    Used to connect to an instance remotely. You can configure this parameter if you use a VPC. If you configure a custom image, you must make sure that sshd is installed on the custom image.

    SSH Public Key

    You can configure this parameter after you turn on Enable SSH.

    Note

    If you want to enable VPC and Internet logon, you must add the public keys of multiple clients. Separate public keys by pressing the Enter key. You can add up to 10 public keys.

    SSH Access Method

    You can configure this parameter after you turn on Enable SSH.

    • Access over VPC: the default access method. You can remotely connect to the DSW instance by using SSH from another terminal, such as an ECS instance in the VPC.

    • Access over Internet: After you select Access over Internet, the Internet access method is added. You can configure the following parameters and connect to the DSW instance over SSH by using an on-premises CLI or another terminal.

      • NAT Gateway: Select the Internet NAT gateway that you created for the VPC.

      • EIP: Select the EIP that you created on the Internet NAT gateway.

    Custom Services

    Allow access to custom services running in the instance over the Internet. For more information, see Access service over Internet.

    Create Private Zone in VPC

    Create a private domain (Private Zone). You can use the Private Zone in this VPC to access the SSH service or other custom services of the current instance. This avoids the inconvenience caused by the changing IP address of the instance. Note that the Private Zone will incur fees. For more information, see Billing.

    Advanced Information

    Parameter

    Description

    Visibility

    You can select Visible to the Instance Owner or Visible to the Current Workspace.

    Instance Owner

    Only the workspace administrator can change the instance owner.

    Show More: Instance RAM Role

    Parameter

    Description

    Instance RAM Role

    When you access other cloud resources from a DSW instance, you can associate a RAM role with the instance. This method allows you to use temporary Security Token Service (STS) tokens instead of AccessKey pairs to access the resources, which effectively reduces the risk of AccessKey pair leaks.

    You can select one of the following options for this parameter:

    • Default Roles of PAI: The default roles of PAI have the permissions to access PAI services, MaxCompute, and OSS. If you use the temporary credentials issued by the default roles of PAI, you are granted the same permissions as the DSW instance owner when you access PAI services and MaxCompute tables. When you access OSS, you can access only the bucket that is configured as the default storage path for the current workspace.

    • Custom Roles: If you want to perform customized or fine-grained permission control, you can configure custom roles.

    • Does Not Associate Role: If you want to directly access resources of other cloud services by using an AccessKey pair, you can choose not to associate a role with the instance.

    For more information about how to configure an instance RAM role, see Configure RAM roles for a DSW instance.

  4. After you confirm the configurations, click Yes.

Common causes of instance startup failures

Failed to start a DSW instance

Troubleshooting method: Click the name of the DSW instance and check the error information on the Events tab.

image

Here are the common errors for startup failure:

  • Your requested resource type [ecs.******] is not enough currently, please try other regions or other resource types

    • Cause: The selected instance type is out of stock in this region and the instance cannot be created.

    • Solution: Try to create the instance later, change the instance type, or select another region.

  • Your resource usage has exceeded the default limitation. Please contact us via ticket system to raise the limitation.

    • Cause: Each Alibaba Cloud account can create instances of up to 2 × GPU instances per region. If you select more than 2 × GPU, the creation fails.

    • Solution: If you need to increase the quota, submit a ticket.

  • Sales of this resource are temporarily suspended in the specified zone. We recommend that you use the multi-zone creation function to avoid the risk of insufficient resource.

    The sale of resources in the specified region is suspended. Solutions:

    • Switch to another region.

    • Select another instance type.

    • Run the instance during off-peak hours.

  • The charge of the current ECI instance has been stopped, but the related resources are still being cleaned.

    • Cause: You are using public resources. Starting a DSW instance during peak hours may take more than half an hour. If resources are not obtained within an hour, the system will indicate that the selected specifications are not available in the current region.

    • Solution:

      • Switch to another region.

      • Select another instance type. You cannot change the type of an instance in the Waiting state. Stop the instance in the first place.

      • Run the instance during off-peak hours, such as non-working hours.

      • If the issue persists, contact your account manager.

  • The cluster resources are fully utilized. Please try later or in other regions.

    • Cause: The current computing resources are fully occupied.

    • Solution:

      • Switch to another region.

      • Select another instance type. You cannot change the type of an instance in the Waiting state. Stop the instance in the first place.

      • Run the instance during off-peak hours, such as non-working hours.

      • If the issue persists, contact your account manager.

  • Create ECI failed because the specified instance is out of stock. It is recommended to use the multi-zone creation function to avoid the risk of stockout.

    Cause: The specified computing resources are out of stock.

    Solution:

    • Switch to another region.

    • Select another instance type. You cannot change the type of an instance in the Waiting state. Stop the instance in the first place.

    • Run the instance during off-peak hours, such as non-working hours.

    • If the issue persists, contact your account manager.

  • Back-off 10s restarting failed container=dsw-notebook pod

    • Cause: The system disk is full and needs to be expanded.

      Check the system disk usage:

      image

      image

    • Solution: Expand the system disk by Change Settings:

      image

      Important

      After expanding the system disk, you will be charged for it even if the DSW instance is not running. To stop billing for the instance, you must delete it. Make sure to backup necessary data before deletion.

  • The available zone with vSwitch is out of stock

    • Cause: When creating a DSW instance, a VPC is configured. The vSwitch under the VPC has zone attributes, limiting the search scope of computing resources to the zone. This may cause a resource shortage.

    • Solution: Change the settings of the instance and set VPC configuration to empty.

      image

      Note

      If a VPC is required, consider switching to another zone and creating new vSwitch and DSW instance. This will expand the range of available resources and avoid stockout issues.

  • Other startup failures:

    • Instance creation failures due to overdue payment

      If you have overdue payments, DSW instances fail to be created. Overdue payments cannot be offset by vouchers. You can log on to the Expenses and Costs console to check whether your account has overdue payments.

References