All Products
Search
Document Center

Platform For AI:DSW FAQ

Last Updated:Nov 03, 2025

This topic answers frequently asked questions about DSW.

Instance startup

Q: DSW instance fails to start

Troubleshooting: Click the DSW instance name. The error message is displayed on the Events tab.

image

The following are common errors and the corresponding solutions:

  • Your requested resource type [ecs.******] is not enough currently, please try other regions or other resource types

    • Cause: The instance type you selected has insufficient inventory in the current region, which prevents the instance from being created.

    • Solution: You can try to create the instance again later or switch to a different instance type or region.

  • Your resource usage has exceeded the default limitation. Please contact us via ticket system to raise the limitation.

    • Cause: Each Alibaba Cloud account is limited to creating DSW instances with a maximum of 2 × GPUs in each region. If the selected instance type exceeds this limit, the creation fails.

    • Solution: To increase your quota, you can submit a ticket.

  • Sales of this resource are temporarily suspended in the specified zone. We recommend that you use the multi-zone creation function to avoid the risk of insufficient resource.

    Solution: You can try the following operations to avoid the risk of insufficient resources:

    • Switch to another region.

    • Adjust the instance type.

    • Try to start the instance during off-peak hours.

  • CommodityInstanceNotAvailableError: Commodity instance has been released due to prolonged arrears at past. Please create a new instance for use

    • Cause: The instance was released by the system because of a prolonged overdue payment.

    • Solution: You can create a new instance.

  • The charge of current ECI instance has been stopped, but the related resources are still being cleaned.

    • Cause: Trial resources are public resources. If you start a DSW instance during peak hours, it may take more than 30 minutes to start. If the system cannot retrieve the resource within one hour, a message appears indicating that the selected instance type is unavailable.

    • Solution: You can try the following operations:

      • Switch the region.

      • Change the instance type. You cannot change the instance type of a pending instance. You must stop the instance and then change the instance type.

      • Use the instance during off-peak hours, such as outside of working hours.

      • If none of the preceding methods resolve the issue, you can contact your business manager.

  • The cluster resources are fully utilized. Please try later or other regions.

    • Cause: The current computing resources are fully occupied.

    • Solution: You can try the following operations:

      • Switch the region.

      • Change the instance type. You cannot change the instance type of a pending instance. You must stop the instance and then change the instance type.

      • Use the instance during off-peak hours, such as outside of working hours.

      • If none of the preceding methods resolve the issue, you can contact your business manager.

  • Create ECI failed because the specified instance is out of stock. It is recommended to use the multi-zone creation function to avoid the risk of stockout.

    Cause: The specified computing resource is out of stock.

    Solution: You can try the following operations:

    • Switch the region.

    • Change the instance type. You cannot change the instance type of a pending instance. You must stop the instance and then change the instance type.

    • Use the instance during off-peak hours, such as outside of working hours.

    • If none of the preceding methods resolve the issue, you can contact your business manager.

  • back-off 10s restarting failed container=dsw-notebook pod

    • Cause: The system disk is full. You must expand the system disk.

      To view the system disk usage:

      image

      image

    • Solution: Click Change Configuration to expand the system disk.

      image

      Important

      After you expand the system disk, billing for the system disk continues regardless of whether the instance is running. To stop all billing for a DSW instance, you must delete the instance. Before you delete the instance, make sure that you have backed up all necessary data.

  • the available zone with vSwitch is out of stock

    • Cause: A VPC is configured for the DSW instance. The vSwitch in the VPC has a zone property. After the vSwitch is configured, the search for computing resources is limited to the zone where the vSwitch is located, which may cause a resource shortage.

    • Solution: You can change the configuration of the DSW instance and set the VPC to empty.

      image

      Note

      If you want to use a VPC, we recommend that you switch to another zone and create a new vSwitch and DSW instance. This expands the range of available resources and prevents shortages caused by a limited resource scope.

  • Startup failed with the message "Workspace member not found"

    Solution: Contact your workspace administrator to add your account as a member of the workspace.

  • failed to create containerd container: failed to prepare layer from archive: failed to validate archive quota ...

    • Cause: The available disk space is insufficient for the instance image.

    • Solution: Go to the instance details page and scale out the system disk. Note that scaling out the system disk incurs additional fees based on its capacity.

      image

Other reasons for startup failure:

  • Creation failure due to overdue payment

    If your account has an overdue payment, you cannot create a DSW instance. Vouchers cannot be used to offset overdue payments. You can log on to the User Center to check if your account has an overdue payment.

Q: Can I run a Python file when a DSW instance starts?

Yes, you can set a Custom Startup Script when you create a DSW instance or change the instance configuration.

image

You can use this feature to customize the environment or run initialization tasks when an instance starts. The custom script runs after the image and resources are ready and before developer applications such as JupyterLab and Code Server start.

Note
  • A custom script increases the instance startup time and has a 3-minute timeout. To prevent the script from timing out, do not run long-running tasks, such as downloading large files or images, in the custom script.

  • After the instance starts, you can find the logs generated by the custom script in the /var/log/user-command/ path.

Q: Cannot find a DSW instance?

On the overview page, you can view the different types of instances created in different regions. Try switching between regions to find your instance.

image

Q: What should I do if the DSW page is abnormal or unresponsive?

Problems such as a blank page, a Notebook that keeps loading, or a Terminal that does not accept commands are usually related to your local environment. Try the following troubleshooting steps:

  1. Clear your browser cache and try again.

  2. Use your browser's incognito or private mode to access the page.

  3. Change your network environment. For example, switch from your company's internal network to a mobile hotspot to check for firewall restrictions.

  4. Try using another browser, such as Chrome or Firefox.

Q: Will data on the system disk lost when a DSW instance that uses a disk as its system disk is stopped, restarted, has its instance type changed, or has its image replaced?

DSW instances that use a cloud disk as the system disk include instances created in a public resource group and general-purpose resource instances for which you select Disk as the system disk. The data on the system disks of these instances is affected as follows:

  • Stopping an instance: Data might be lost. If the disk has not been expanded and the instance remains stopped for more than 15 days, the data is deleted and cannot be recovered. If the disk has been expanded, or if the instance is stopped for 15 days or less, data is not lost.

  • Restarting an instance: Data is not lost. After an instance is stopped or restarted, all packages installed using pip, code files, and other data on the system disk are retained.

  • Changing the instance type: Data is not lost. Adjusting the instance type, such as the CPU, memory, or GPU configuration, does not affect data on the system disk.

  • Replacing the image: Some data might be lost. Changing the image does not affect data in mounted datasets or in OSS. However, content on the system disk might be reset. Therefore, save your instance data before you change the image. For example, you can copy or move the data to a dataset or to OSS. For more information, see Mount a dataset, OSS, NAS, or CPFS.

For general-purpose resource instances that use Temporary Storage as the system disk, all data on the system disk is lost when the instances are stopped, restarted, or have their specifications or image changed, regardless of whether their AI resource group is configured with subscription disks.

Q: Can DSW instances created using public resources be recovered if they are released after not being logged into for more than 15 days?

For DSW instances created with public resources, if the cloud disk system disk has not been expanded and the instance has not been started for more than 15 consecutive days, its system disk is automatically cleared and cannot be recovered.

Instance stop, deletion, and release

Q: How do I release a DSW instance?

On the DSW instance list page, click Stop or Delete for the instance.

image

Note: If you expanded the system disk when you created the DSW instance, billing for the system disk continues regardless of whether the instance is running. To stop all billing for a DSW instance, you must delete the instance.

Q: Why can't I find my DSW instance?

If you cannot find an instance, try switching to a different region and workspace.

image

Q: How do I release a free trial resource plan?

Free trial resource plans do not need to be released or stopped.

Q: How do I completely stop billing for a DSW instance? What is the difference between "Stop" and "Delete"?

  • Stop instance: This operation releases the instance's computing resources (CPU/GPU) and pauses billing for computing. Note: The expanded system disk continues to be billed.

  • Delete instance: This operation permanently deletes the instance and all its resources, including the system disk. All related billing stops.

How to choose:

  • Stop: Use this if you do not need the instance temporarily but want to keep the data and environment for future use.

  • Delete: Use this if you no longer need the instance and want to stop all billing. You must back up your data before you perform this operation.

Q: Why is my DSW instance stuck in the "Stopping" or "Deleting" state and the operation cannot be completed?

Stopping or deleting an instance takes time because the system needs to safely terminate tasks, save the state, and reclaim resources. If an instance is unresponsive for a long time, the common reasons are as follows:

  • The instance has processes that have not terminated properly.

  • High memory usage prevents the instance from responding to the shutdown command.

In this situation, wait for a few moments and then refresh the page. The instance status should change to Stopped.

Q: Will my data and code be lost after stopping or deleting a DSW instance?

Whether data is retained depends on your operation and the instance's resource group type.

  • Stop instance:

    The data retention policy varies by resource group type.

    • For most pay-as-you-go and general-purpose instances that use a cloud Disk as the system disk, data is deleted and cannot be recovered if the disk has not been expanded and the instance is stopped for more than 15 days. The data is retained if the disk has been expanded or the instance is stopped for 15 days or less.

    • Instances using temporary storage as the system disk: Data is stored in temporary storage. Stopping the instance deletes the data, and it cannot be recovered.

  • Delete instance:

    All data on its system disk is permanently erased and cannot be recovered. Therefore, you must back up all important data before deletion.

Q: Why does my running DSW instance stop automatically?

The instance is configured with an idle auto-shutdown policy. This policy is designed to save resources and is enabled by default for free trial instances.

  • Trigger condition: The instance's CPU and GPU usage are continuously below the set threshold for 3 hours.

  • Recommended action:

    • Manual stop: To ensure resource savings, manually stop the instance when it is not in use. The auto-shutdown policy is not guaranteed to be triggered every time.

    • Modify policy: To run long-term tasks, you can modify or disable this policy. The steps are as follows:

      Modify the DSW auto-shutdown policy

      1. Go to the Workspace Details page and click Workspace Configuration > Scheduling Configuration.

        image

      2. Find the DSW configuration area, where you can modify the DSW shutdown policy and exclusion policy.

        image

Q: I have stopped or deleted all my DSW instances. Why does it still show "Running" or why do I receive billing notifications?

Check for the following common reasons:

  • Confusing resource plans with instances. The "Running" status you see may refer to a resource plan (such as "250 billable hours per month"), not an instance. A resource plan is always active within its validity period, and its status is independent of the instance.

  • The expanded system disk is still being billed. Stopping an instance only pauses computing fees. An expanded system disk continues to incur storage fees.

  • There is a delay in billing. Billing is not in real time. A bill may be generated several hours after you use the resource. For example, fees incurred in the morning may not appear on the bill until the afternoon.

Billing and bills

Q: How is DSW billed? Why am I charged even when my instance is on but not running any code?

  • DSW supports subscription and pay-as-you-go billing methods. You can choose a billing method as needed. For billing details, see DSW billing.

  • Pay-as-you-go billing is based on the runtime of your instance. Because a running instance continuously occupies computing resources, you are charged as long as the instance is in the "Running" state, even if no code is being executed.

Q: How do I view my DSW bill?

For pay-as-you-go users, you can go to the Expenses and Costs page to view bill details. For more information, see View bill details.

Q: Why is my account still being charged after I have stopped my DSW instance?

There are usually two main reasons for continued charges after you stop an instance:

  • System disk expansion: If you expanded the system disk when you created the DSW instance, storage fees for the system disk continue to be incurred even if the instance is in the "Stopped" state.

  • Billing delay: If DSW uses the pay-as-you-go mode, there is a certain delay in bill generation and deduction. The billing notification you receive may be for the actual usage before you stopped the instance, not for fees incurred after it was stopped.

Q: How can I completely stop all billing related to a DSW instance?

  • To completely stop all billing for a DSW instance, the most thorough method is to delete the instance. Make sure to back up all necessary data before deletion because data cannot be recovered after the instance is deleted.

    image

  • You can switch to different workspaces and regions to ensure that all instances are deleted.

    image

Q: How is the fee calculated if a pay-as-you-go DSW instance is used for less than an hour?

The fee for a pay-as-you-go instance is calculated based on the actual number of minutes used. The formula is: Bill amount = (Unit price / 60) × Actual service duration (minutes).

Model pulling

Q: An error occurs when pulling a model: Failed to pull image "crpi-****-vpc.cn-hangzhou.personal.cr.aliyuncs.com/apo/cat:full"

When you create a DSW instance, if you configure an image address and the image repository is private, you must enter the username and password for the image repository when you enter the image address.

image

Image usage

Q: An error occurs when creating an image: insufficient capacity of ephemeral storage

Cause: The logic for checking the size when creating an image is to check whether the remaining free space on the system disk is greater than the size of the write layer. If the free space is insufficient, this error is reported.

Solution: In the DSW Terminal, run df -h to view the disk space usage of the file system. Ensure that the overlay usage does not exceed the free space on /dev/vda4. If it does, you can resolve this issue by setting a Custom Exclusion Path when you create the image.

image

image

Q: How do I use a Docker image in DSW?

  • Use a Docker image to start a DSW instance: You can push the Docker image to Alibaba Cloud Container Registry (ACR) and then add it to the custom images of your PAI workspace. This lets you select the corresponding image to start the instance when you create a DSW instance.

  • To package the current DSW image environment for launching other instances or deploying models, see Create a DSW instance image.

  • Install and use Docker in the DSW cloud IDE: Instances created with public resources and general computing resources do not support installing and using Docker in DSW. Lingjun resources support it.

Q: Why does creating a DSW image fail or time out?

  • Image size exceeds the limit: When you create a DSW image, the data volume in a single-layer image should not exceed 10 GiB. Otherwise, the build fails. We recommend that you try to reduce the image size.

  • Region mismatch: The DSW instance and the Container Registry (ACR) instance must be in the same region. Otherwise, the corresponding image repository cannot be found when you create the image.

  • Insufficient system disk space: When you create an image, if the remaining free space on the system disk is less than the size of the data to be written to the image layer, an "insufficient capacity of ephemeral storage" error is reported.

  • Network issues: When you use a Personal Edition ACR instance, the image is pushed over the Internet. A larger image may fail due to network fluctuations or a long transmission time. If an Enterprise Edition ACR instance is attached to the same VPC as the DSW instance, the image can be pushed through the internal network, which is faster and more stable.

Q: Why is the "Create Image" button grayed out, or why can't I find my image repository when creating an image?

  1. Incorrect instance status: The "Create Image" feature is only available for DSW instances that are in the "Running" state. If the instance is "Stopped" or in another state, the button is grayed out and unavailable.

  2. Prerequisites not met or incorrect configuration:

    • You must first create an ACR instance in the same region as the DSW instance, and create a namespace and image repository in it.

    • Make sure that the DSW instance and the ACR instance are in the same region.

Q: An error occurs when creating an image: Push image registry-vpc.cn-****.aliyuncs.com/****/lm-mirrors:**** Failed: Push container failed, Container Name: dsw-notebook

When you create an image, ensure that the data volume in a single-layer image does not exceed 10 GiB. Otherwise, the build fails. For DSW instances in a public resource group, you can set a custom exclusion path to exclude specific files or directories from the final image. Alternatively, you can mount a storage path, such as an OSS path, to store and access data.

image

System disk expansion

Q: How large is the DSW instance system space, and what should I do if the disk is full?

Files and data in a DSW instance are stored on the system disk by default, and a certain amount of free quota is provided.

  • View free quota

    Instances created in a public resource group have a free quota of 100 GiB. For general computing resources, a system disk with a free quota is provided only after the specification requirements are met. Lingjun resources do not provide free cloud disks. You can view the specific free disk space size in the system disk option on the instance configuration page. Steps:

    1. Click the instance name on the instance list page.

    2. In the upper-right corner, click Change Configuration and scroll down to the System Disk section.

    image

  • View system disk usage

    Click the DSW instance name. In the Environment Context area, you can view the system disk usage.

    image

  • How to expand the system disk when it is full

    If the system disk space usage exceeds the free quota, you can choose to expand the system disk or mount a dataset.

Q: Does the system disk support scale-in?

No, it does not. The DSW system disk cannot be scaled in after it has been expanded. If you find that the system disk space of a previously created DSW instance is too large, you can back up important information in the instance to OSS by mounting a dataset, OSS, NAS, or CPFS. Then, you can delete the DSW instance to avoid continuous billing and create a new DSW instance with an appropriate system disk space to meet your needs.

Mount configuration

Q: How do I mount and use my own file system in a DSW instance?

You can mount OSS, NAS, CPFS, or Intelligent Computing CPFS when you create a new instance. You can enter the mount directory through the DSW Terminal to view and use the files.

Currently, DSW only supports mounting a file system in the same region when you create an instance. For more information, see Create a DSW instance.

Q: When mounting a NAS dataset in PAI-DSW, an error is reported when starting the instance: The specified MountTarget 3b79d4a2ac-xmk97.cn-shanghai.nas.aliyuncs.com is not in VPC vpc(VPC-connected instance)

  • Cause: This is caused by adding and configuring a mount target when you create the NAS dataset.

  • Solution: Set the mount target to empty when you create the dataset.

image

Q: When using an ECS to set up FTP to upload and download files to NAS, an error is reported when executing the mount command: mount:wrong fs type,bad option,bad superblock

  • Phenomenon description

  • Solution

    Before you execute the mount command, install the nfs-utils package first.

    yum install nfs-utils

Q: If an "Input/output error" is reported when accessing a mounted directory after mounting an OSS dataset, how should I resolve it?

image

This problem is caused by not granting the role OSS access permissions (AliyunPAIDLCAccessingOSSRole). For specific authorization operations, see PAI service account authorization.

Q: How can I reduce the risk of OOM (Out of Memory) when using jindo to mount an OSS dataset?

You can solve this in the following two ways:

  • Method 1: Use Jindo version 6.8.1, which has optimized memory usage.

    {
        "fs.jindo.fuse.pod.image.tag":"6.8.1"
    }

    image

  • Method 2: Use ossfs.

    When you submit the task, specify:

    {
        "mountType": "ossfs"
    }

    image

    By turning off readdirplus optimization with the following configuration, you can reduce the metadata cache usage when you list folder contents, thus mitigating the OOM problem to the greatest extent:

    {
        "mountType": "ossfs",
        "fs.ossfs.args": "-oreaddirplus=false"
    }

Q: I have successfully mounted OSS, why can't I see it in the file browser on the left side of the JupyterLab interface?

This is because the DSW file browser displays the instance's working directory by default, which is usually /mnt/workspace. The mount path you specified when you mounted OSS (for example, /mnt/data) is not in the default working directory, so it is not directly displayed in the file list on the left.

Solution:

  • Access through code: Your files have actually been successfully mounted. In your code, you need to use the full mount path to access them, for example, open('/mnt/data/my_file.csv').

  • Change the mount target: To easily see the files in the UI, you can set the mount path to a subdirectory of the working directory when you configure the mount, for example, /mnt/workspace/my_oss_data. After the mount is complete, you can see your OSS files in the my_oss_data folder in the file browser.

  • Access through the terminal: You can enter the mount directory in the DSW Terminal using the cd /mnt/data command, and then use commands such as ls to view and operate the files.

Q: When using a mounted OSS, the program reports "Transport endpoint is not connected" or "Input/output error"?

This error indicates that the mount connection between the DSW instance and OSS has been disconnected. The following are possible causes and troubleshooting methods:

  1. RAM role permission issue: Check whether the RAM role configured for your DSW instance has been granted permission to access OSS (for example, AliyunPAIDLCAccessingOSSRole). Insufficient permissions are a common cause of being unable to read from OSS.

  2. Insufficient mount service resources: When you perform high-intensity random read/write operations or many small file operations, the ossfs or JindoFuse process responsible for mounting may crash due to insufficient memory (OOM). In the advanced configuration of the mount, you can disable the metadata cache or increase the memory configuration. For more information, see JindoFuse.

  3. Restore connection:

    • For mounts at startup, the easiest way to restore is to restart the DSW instance, and the system automatically re-executes the mount.

    • You can also use the PAI SDK to execute a dynamic mount command to remount the path without restarting the instance.

Q: What types of data does DSW support for mounting? Can it directly mount Alibaba Cloud Drive or MaxCompute tables?

DSW supports using cloud storage services such as OSS, NAS, and CPFS by creating datasets or directly mounting paths. 

  • Alibaba Cloud Drive is not supported: Currently, DSW does not directly support mounting a personal Alibaba Cloud Drive. We recommend that you store data that needs to be processed in OSS.

  • Mounting MaxCompute tables is not supported: MaxCompute (formerly ODPS) table data cannot be directly "mounted" to a DSW directory like a file system. You can read and write it in DSW code through the SDK or API provided by PAI. For more information, see Read and write MaxCompute tables using PyODPS.

Q: Will my code and data be lost after a DSW instance is shut down or deleted? How can I achieve data persistence and migration?

The system disk of a DSW instance is temporary storage. For public resource groups, data is cleared if the instance is stopped for more than 15 days. For dedicated resource groups, system disk data is also cleared after the instance is stopped or deleted.

To achieve persistent storage of data and code, and to migrate them between different instances, you must use external mounted storage. 

  • Persistence solution: Save all your important data, code, and models to a mounted OSS or NAS path. This way, even if the DSW instance is deleted, all your assets remain safely stored in your own OSS or NAS.

  • Migration solution: When you need to migrate data from one DSW instance to another, simply mount the same OSS or NAS path that contains this data in the new instance. This is the most convenient way to migrate data.

Q: Why are files in the working directory not visible in OSS after a successful mount?

Files in the working directory are not visible in OSS because of a path mismatch. The default mount path for OSS is /mnt/data, while the default working directory for DSW is /mnt/workspace. You can use the following command to copy the files from the working directory to /mnt/data. The files will then become visible in OSS.

cp -r /mnt/workspace/. /mnt/data/

Data reading, upload, and download

Q: How do I use DSW to read OSS data?

You can use a Python SDK or API to read OSS data. For more information, see Read and write data in Object Storage Service (OSS).

Q: How do I upload and download folders?

Currently, DSW does not support directly uploading and downloading folders. However, you can upload and download folders by packaging them into compressed files. The DSW Terminal provides a Linux environment where you can use standard Linux command-line tools such as tar, gzip, and unzip to decompress files. The following is an example using tar.

  1. Use tar --version to check if tar is installed. If not, you can install it using the following commands.

    # Installation command for Debian-based systems (such as Ubuntu)
    sudo apt install tar
    
    # Installation command for Red Hat-based systems (such as CentOS, Fedora)
    sudo yum install tar
  2. Decompress the folder.

    # Compress a folder, where /path/to/directory is the folder to be compressed
    tar -cvf archive_name.tar /path/to/directory
    
    # Decompress a folder
    tar -xvf archive_name.tar

Q: How do I transfer and share data between two DSW instances?

You can use the following two methods:

  • Mount a dataset, OSS, NAS, or CPFS: Both DSW instances mount the same dataset or OSS path, and then store the data in that dataset or storage path to achieve data sharing.

  • Upload and download files: Download the data to be shared from the source DSW instance, and then upload it to the other DSW instance.

Q: What should I do if there is no response or the download fails after clicking "Download"?

This is usually caused by network congestion or browser issues. You can try the following steps:

  1. Wait a moment. Large files require a longer response time to download.

  2. Change your browser or use your browser's incognito mode to try again.

  3. For larger files (such as those over 200 MB) or in cases of an unstable network, we recommend that you download by mounting OSS.

Q: What should I do if a message indicates that the "File Transfer Station" space is insufficient?

The total capacity of the File Transfer Station is 10 GB. You need to go to the transfer station management page and clear the transfer station files to release space. If the page does not refresh promptly, try refreshing your browser.

Q: Why does it always jump to the "File Transfer Station" when uploading?

This is normal. To ensure upload stability and speed, all files larger than 10 MB are automatically transferred through the File Transfer Station and saved to your instance upon completion.

Q: How do I upload a large on-premises file (such as a model over 5 GB) or a large amount of data to DSW and use it?

The system disk space of a DSW instance is limited and is temporary storage. We do not recommend that you directly upload large files or large amounts of data. You can first upload the data to Alibaba Cloud Object Storage Service (OSS) and then mount it to the DSW instance for use. For more information, see Mount a dataset, OSS, NAS, or CPFS.

Remote connection

Q: When connecting to a DSW instance with ProxyClient, a disconnection error is reported: client_loop: send disconnect: Broken pipe

When you use it to connect to a DSW instance using SSH, if there is no operation for a long time, a disconnection is triggered, and the system may prompt:

image

To fundamentally solve this problem, we recommend that you use the more stable Remote connection: Direct SSH connection method to connect to the DSW instance.

Q: Failed to open an on-premises folder after remotely connecting to an instance using VSCode

This problem is generally caused by the VSCode client. We recommend that you upload the on-premises file to the DSW in the cloud. For specific operations, see Upload and download files.

Q: SSH direct connection configuration fails with the following error: Failed to update private zone items: Failed to add zone?

The error is caused by the internal DNS resolution service not being enabled. You can enable this service by following the instructions in Enable internal DNS resolution.

Network issues

Q: How to solve slow network download speed?

Because DSW and DLC instances use a shared gateway by default, download speeds for large files may not meet your needs due to bandwidth limitations. Therefore, when you want to increase the network download speed, you can refer to the following content:

Q: Does a DSW instance have a public IP address?

A DSW instance is not assigned a public IP address by default. To access the external network or allow external access to your DSW instance, we recommend that you configure a NAT Gateway or use an EIP. For more information, see DSW network access and configuration.

Q: Can the public port be repeated when a DSW instance exposes public access through a NAT gateway?

When you use a DSW custom service to expose an interface, all custom services that share the same NAT Gateway must use unique ports, even if they are in different DSW instances.

Q: Why can't a DSW instance access the Internet?

By default, a DSW instance uses the Public Gateway for Internet access. If you cannot access the Internet, check the instance configuration page to see if Dedicated Gateway is selected for Internet Access Gateway. If a dedicated gateway is selected, you must configure an elastic IP address and an SNAT entry. For more information, see Improve public network access speed through a dedicated gateway. Alternatively, you can select the public gateway.

image

Third-party library installation

Q: How to use third-party libraries in DSW

DSW supports installing third-party libraries. You can enter the following commands in the DSW Terminal to complete the installation.

#Python 3 version.
pip install --user xxx
#Python 2 version.
source activate python2
pip install --user xxx

Replace xxx with the name of the third-party library. After the library is installed, click Restart Kernel to restart the service.

Q: Why is the installed third-party package not taking effect?

After you install a third-party package using the pip command, if you cannot find the package when you import it using the import command, try to restart the service. If the error persists, check the current environment. By default, DSW installs third-party packages to the Python 3 environment. To install a package in another environment, you must manually switch to the environment first. The following is an example.

Install a third-party library in the Python 2 environment.
source activate python2
pip install --user xxx
Install a third-party library in the TensorFlow 2.0 environment.
source activate tf2
pip install --user xxx

Replace xxx with the name of the third-party package that you want to install.

Q: The code reports that the CUDA driver version is too low. Do I need to manually upgrade the NVIDIA driver in DSW?

Do not upgrade the driver version. The driver and CUDA in a DSW instance are pre-installed and locked. They cannot and should not be manually modified because this can easily damage the instance and make it unrecoverable. The correct approach is to replace the DSW image. Stop the current instance, create a new instance, and select an official image with a higher version of CUDA and driver.

For example, the official image: modelscope:1.9.4-pytorch2.0.1tensorflow2.13.0-gpu-py38-cu118-ubuntu20.04. Here, cu118 represents CUDA version 11.8.

Q: I failed to install a package in DSW using pip install, and it reported a dependency conflict or version error. What should I do?

This is usually caused by an incompatible environment. Troubleshoot and resolve it in the following order:

  1. Preferred solution: Replace the image. Stop the current instance, create a new DSW instance, and select a different official image. For example, if the current PyTorch 2.1 image does not work, you can try the PyTorch 2.3 image, or try the modelscope series of images, which usually have better compatibility.

  2. Install a specific version. Check the official documentation of the package, find a version that supports your current DSW environment (Python/CUDA version), and then execute pip install package_name==x.y.z.

  3. Change the download source. Try using a domestic mirror such as the Tsinghua source: pip install -i https://pypi.tuna.tsinghua.edu.cn/simple package_name.

Q: I have clearly installed a library in the DSW Terminal, why can't I find it when I import it in Jupyter Notebook?

This may be because the Terminal and Jupyter are using two different Python environments. You can use the which python command to confirm which Python environment is currently being used, or install the required library in the Notebook, for example:

image

Q: Can I use Docker in DSW to deploy my application?

To use Docker in Lingjun resources, you can submit a ticket to be added to the whitelist. For DSW instances that are not Lingjun resources, running Docker inside the instance container is not currently supported.

Q: There is no unzip or 7z command in my DSW instance. How do I decompress files?

You can use the apt-get command to install them.

  • Install unzip: In the Terminal, run apt-get update && apt-get install -y unzip, and then use unzip your_file.zip.

  • Install p7zip (for 7z): In the Terminal, run apt-get update && apt-get install -y p7zip-full, and then use 7z x your_file.7z.

Q: After I shut down (stop) my DSW instance, will the packages I installed with pip and the code I wrote be lost?

If a cloud disk is used as the system disk, they will not be lost. The instance's disk data (including the environments in /mnt/workspace and /root) is retained. The next time you start the instance, all environments and files are still there. Only deleting the instance completely clears all data.

Q: Why does the installation keep getting stuck or timing out when using pip?

This may be a network issue.

  1. Check if the instance is configured for Internet access. If you selected "No Internet Access" when you created the instance or configured a VPC without setting up a NAT gateway, you cannot connect to external download sources.

  2. Try changing the download source, for example, from the default Alibaba source to the Tsinghua source: pip install -i https://pypi.tuna.tsinghua.edu.cn/simple <package_name>.

  3. If the network is not working, you can download the .whl format installation package on your own computer, upload it to DSW, and perform an offline installation.

Q: How do I get root permissions in the DSW WebIDE?

Most of DSW's official images run as the root user by default. When you open the Terminal, if you see the command prompt is root@..., it means you are already root. The warning message "It is not recommended to run as the root user" that appears during pip installation can be safely ignored. If your image does not log on as root, this is a setting of the image itself, and you need to switch to an image that supports root.

Q: How do I start xserver in DSW?

DSW does not support starting xserver.

Model deployment

Q: How do I deploy a model generated by DSW?

  • Use the EAS model deployment service

    After you have finished modeling, you can use PAI-EAS to deploy the model as an online service. For more information, see Deploy models as online services.

  • Download the model for on-premises deployment

    You can right-click the model generated by DSW to download it to your on-premises device.

Instance running

Q: When running machine learning code, why does the page prompt for re-login after being idle for a period of time?

For security reasons, the DSW login session is valid for 3 hours. After it expires, you need to log on again, but this does not affect the execution of the task. To run a task for a long time, we recommend that you use the nohup command to run the task in the background in the DSW Terminal.

Q: After closing the browser or shutting down the computer, will the training task running in DSW continue?

Yes, it will. A DSW instance runs in the cloud, and closing your on-premises device does not affect its running state. However, note that some instances, especially free trial instances, may be configured with an idle auto-shutdown policy. If the instance's CPU, GPU, and other resources are continuously below a certain threshold for a period of time, the system may determine it to be idle and automatically stop it, thereby interrupting your task.

Q: Why can't DSW start Docker?

Because DSW itself runs in a container, DSW does not support installing Docker. The corresponding CUDA version is pre-installed on the underlying virtual machine and cannot be changed. You can use nvidia-smi to view the corresponding CUDA version.

Q: Why are there no bash features like tab auto-completion in the Terminal?

Because some images have usage restrictions, you need to manually type bash in the Terminal and press Enter to start bash-related features.image.png

Q: If you find that the DSW instance specifications do not meet the requirements during AI development in DSW, how do you solve it?

You can update the DSW instance specifications by following these steps:

  1. In the DSW instance list, click the instance name to go to the instance details page.

  2. On the Instance Configuration tab, click Change Configuration.

  3. In the Change Instance Configuration panel, you can update the instance specifications.

    Note

    When you update the DSW instance specifications, if the instance is running, the update operation immediately restarts the instance. Make sure that you have saved the content in the instance.

Q: My memory usage is high. How can I release it?

imageIf your memory usage is too high and affects normal use, you can solve it in two ways.

  • If the command line stops responding because of high memory usage, click Stop Instance in the upper-right corner. Alternatively, return to the DSW console and click the Stop button in the instance's row. Wait for the instance to stop before restarting it.

  • If you can interact through the command line in the instance, you can enter the top command in the instance's Terminal to view the memory usage information of all current processes. %MEM represents the percentage of memory occupied, and PID represents the process ID.image

    If you want to end a process with high memory usage, enter the following in the command line:

    kill PID

    You need to replace PID with the PID of the process you want to end. After you run it, you will see the memory usage decrease.image

Q: An error is reported during runtime: RuntimeError: CUDA error: too many resources requested for launch

Cause: When you encounter this error, it indicates that the resources requested by the CUDA kernel exceed the available resources. This error is usually related to the hardware limitations of the GPU.

Solution: Try restarting the instance and running the program again. If it still does not work, you need to choose a GPU-accelerated instance with higher specifications.

Q: Can a swap space be created to use virtual memory when DSW is out of memory?

No, it cannot. DSW itself is a container and does not support creating or managing swap space.

The reasons are as follows:

  • Permission restrictions: The kernel permissions of the container are restricted, and it cannot mount a Swap file. Even if you obtain root permissions in the container, you cannot bypass the resource policies of the host.

  • Platform policy: The platform uniformly schedules and restricts resources to ensure the stability and security of the multitenancy environment.

Recommendation: If memory is insufficient, optimize your code or upgrade the instance type.