All Products
Search
Document Center

Platform For AI:FAQ about DSW

Last Updated:Mar 05, 2024

This topic provides answers to some frequently asked questions about Data Science Workshop (DSW).

What is DSW?

DSW is a cloud-native machine learning and data science platform provided by Platform for AI (PAI). You can use the built-in JupyterLab, WebIDE, or Terminal in DSW. You can also establish a remote connection between your on-premises device and DSW over SSH to use a variety of computing resources and environments provided by DSW. DSW allows you to write and execute code online, submit the code as offline tasks, and then download generated trained models.

How do I mount and use my own Apsara File Storage NAS (NAS) file system on a DSW instance?

A DSW instance comes with a system disk storage as a temporary storage. After the instance is stopped or deleted, this storage is cleared. To permanently store data, you must mount your own NAS file system on the instance. All of your NAS files are stored in the /nas directory. You can view and use the files in this directory by using DSW Terminal.

The latest version of DSW allows you to mount your own NAS file system on a DSW instance only when you create the DSW instance. For more information, see Create and manage DSW instances. After a DSW instance is created, you cannot modify the instance information or change the mounted NAS file system.

Note

If a NAS file system is mounted on a DSW instance, the NAS file system is used for storage. The temporary storage of the DSW instance is no longer used.

How do I use a third-party library in DSW?

You can install a third-party library in DSW by running commands on the Terminal interface in DSW. The following code provides an example:

#Install a third-party library in the Python 3 environment. 
pip install --user xxx
#Install a third-party library in the Python 2 environment. 
source activate python2
pip install --user xxx

Replace xxx with the name of the third-party library that you want to install. After the installation is complete, choose Kernel > Restart Kernel to restart the service.

Why does the system require logon again when I pause for a period of time during the execution of machine learning code?

To ensure security, a logon session in DSW is valid for 3 hours. After the session expires, you must log on again. Task execution is not affected by logon session timeout. If you need to run a task that lasts for a long period of time, we recommend that you run the nohup command on the Terminal interface in DSW to run the task in the background.

I established an FTP connection by using ECS and uploaded and downloaded files to a NAS file system. What do I do if the message "mount:wrong fs type,bad option,bad superblock" appears after I run the mount command?

  • Problem description

  • Solution

    Before you run the mount command, install the nfs-utils package.

    yum install nfs-utils

How do I use DSW to read data from Object Storage Service (OSS)?

Go to the Terminal interface of a DSW instance and run the ossutil command to upload and download objects. Perform the following steps:

  1. Download, install, and configure ossutil in the Terminal of a DSW instance. For more information, see Install ossutil.

  2. Upload an object to an OSS bucket or download an object from an OSS bucket to a DSW instance. For more information, see ossutil command reference.

Why does the third-party library I installed fail to take effect?

After you run the pip command to install a third-party library and run the import command to import the library, restart the service if the library is not found. If the error persists, check whether the current environment is valid. By default, third-party libraries for DSW are installed in the Python 3 environment. To install a third-party library in another environment, you must manually switch to the environment first. The following code provides an example:

Install a third-party library in the Python 2 environment. 
source activate python2
pip install --user xxx
Install a third-party library in the TensorFlow 2.0 environment. 
source activate tf2
pip install --user xxx

Replace xxx with the name of the third-party library that you want to install.

How do I deploy a model that is generated by DSW?

  • Use Elastic Algorithm Service (EAS) to deploy a model service

    You can run commands in DSW Terminal to use the built-in EASCMD client to deploy a model service. For more information, see Create and manage DSW instances.

  • Download a model to an on-premises device for deployment

    You can right-click a model that is generated by DSW and download the model to an on-premises device.

How is DSW billed?

DSW can be billed on a subscription or pay-as-you-go basis. You can select a billing method based on your needs. For more information, see Billing of DSW.

How do I view the bills of DSW?

If you use the pay-as-you-go billing method, you can view billing details by choosing Expenses > User Center in the top navigation bar of the Alibaba Cloud Management Console. For more information, see View bills and usage details.

Why cannot I start Docker in DSW?

DSW is running in a container. You cannot install Docker for DSW. CUDA of a specific version is pre-installed on the underlying virtual machine before delivery and cannot be changed. You can use the NVIDIA System Management Interface (nvidia-smi) to query the CUDA version.

What do I do if I fail to start a DSW instance and the message "The cluster resources are fully utilized" appears?

If the message The cluster resources are fully utilized. Please try later or other regions. appears and DSW instances fail to start, you can try the following methods:

  • Change an instance type. Some instance types may have sufficient resources.

  • Change a region. Some regions may have sufficient resources.

  • Create DSW instances during off-peak hours. Resources may be sufficient during off-peak hours, such as evenings or weekends.

  • If the issue persists, contact your account manager.

What do I do if I fail to start a DSW instance and the message "available zone with vSwitch is out of stock" appears?

The DSW instance that you created is configured with a VPC. The vSwitch of the VPC requires computing resources to reside in the same zone as the vSwitch. This may result in a resource shortage.

We recommend that you do not configure a VPC when you create the DSW instance. If you need to use a VPC, we recommend that you use a vSwitch that resides in another region so that you can use the computing resources in that region when you create DSW instances.

What do I do if I fail to start a DWS instance and the message "Your resource usage has exceeded the default limitation. Please contact us via ticket system to raise the limitation." appears?

Each Alibaba Cloud account can use only two GPUs in reach region. When the resource usage exceeds the limit, this issue may occur. If you want to increase the quota, submit a ticket.

Why cannot I use bash features such as auto-completion in Terminal?

The bash features are limited in certain images. You need to enter bash in the Terminal and press the Enter key to enable bash features. image.png

What do I do if the specifications of a DSW instance do not meet the requirements when I perform AI development in DSW?

Perform the following steps to update DSW instance specifications:

  1. On the Interactive Modeling (DSW) page, find the DSW instance that you want to manage and click the instance name to go to the Instance Details page.

  2. On the Instance Settings tab, click Change Settings.

  3. In the Change Instance Settings panel, update the instance specification.

    Note

    When you update the specifications of a running DSW instance, the update operation immediately restarts the instance. Make sure that you have saved the data in the instance.

What do I do if the "Input/output error" occurs when PAI accesses the mount directory after I mount an OSS dataset?

image

This issue occurs because you do not grant OSS access permissions (AliyunPAIDLCAccessingOSSRole) to PAI. For more information about how to grant OSS access permissions (AliyunPAIDLCAccessingOSSRole) to PAI, see Authorize the service-linked role.