All Products
Search
Document Center

Platform For AI:Single-node PyTorch transfer learning with NAS

Last Updated:Jun 20, 2026

This topic describes how to use DLC, DSW, and NAS to perform offline transfer learning based on PyTorch.

Prerequisites

Create a General-purpose NAS file system in your desired region. For more information, see Create a General-purpose NAS file system.

Limitations

The steps in this topic apply only to jobs that run on compute clusters in a public resource group.

Step 1: Create a dataset

  1. Go to the Datasets page.

    1. Log on to the PAI console.

    2. In the left-side navigation pane, click Workspaces. On the Workspaces page, click the name of the workspace that you want to manage.

    3. In the left-side navigation pane, choose AI Asset Management > Datasets.

  2. Create a basic dataset and set Storage Type to NAS.

Step 2: Create a DSW instance

Create a DSW instance and configure the following key parameters. For details about the other parameters, see Create a DSW instance.

Parameter

Description

Environment Information

Dataset Mounting

Click Custom Dataset, select the NAS-type dataset that you created in Step 1, and specify the mount path

Working Directory

Select Dataset-/mnt/data/.

Network information

VPC Settings

No VPC configuration is required.

Step 3: Prepare the data

The data used in this article is stored in a public location. You can download the data (Download data) and use it after you decompress the file.

  1. Go to the development environment of a DSW instance.

    1. Log on to the PAI console.

    2. In the left-side navigation pane, click Workspaces. On the Workspaces page, click the name of the workspace that you want to manage.

    3. In the upper-left corner of the page, select the region where you want to use PAI.

    4. In the left-side navigation pane, choose Model Training > Data Science Workshop (DSW).

    5. Optional: On the Data Science Workshop (DSW) page, enter the name of a DSW instance or a keyword in the search box to search for the DSW instance.

    6. Click Open in the Actions column of the instance.

  2. In the top menu bar of the DSW environment, click the Notebook tab.

  3. Download the data.

    1. In the upper-left toolbar, click the 创建文件夹 icon to create a folder. For this example, name the folder pytorch_transfer_learning.

    2. In the top menu bar of the DSW environment, click the Terminal tab to open a terminal.

    3. In the terminal, use the cd command to change to the pytorch_transfer_learning folder, then use the wget command to download the dataset:

      cd /mnt/workspace/pytorch_transfer_learning/
      wget https://pai-public-data.oss-cn-beijing.aliyuncs.com/hol-pytorch-transfer-cv/data.tar.gz

      The URL https://pai-public-data.oss-cn-beijing.aliyuncs.com/hol-pytorch-transfer-cv/data.tar.gz specifies the download location for the dataset.

      ~/workspace> cd pytorch_transfer_learning/
      ~/workspace/pytorch_transfer_learning> wget https://pai-public-data.oss-cn-beijing.aliyuncs.com/hol-pytorch-transfer-cv/data.tar.gz
      --2021-01-28 10:55:55--  https://pai-public-data.oss-cn-beijing.aliyuncs.com/hol-pytorch-transfer-cv/data.tar.gz
      Resolving pai-public-data.oss-cn-beijing.aliyuncs.com (pai-public-data.oss-cn-beijing.aliyuncs.com)...
      Connecting to pai-public-data.oss-cn-beijing.aliyuncs.com (pai-public-data.oss-cn-beijing.aliyuncs.com)|xxx|:443... connected.
      HTTP request sent, awaiting response... 200 OK
      Length: 47237380 (45M) [application/x-gzip]
      Saving to: 'data.tar.gz'
      data.tar.gz                100%[=======================================================================>]  45.05M  16.0MB/s    in 2.8s
      2021-01-28 10:55:58 (16.0 MB/s) - 'data.tar.gz' saved [47237380/47237380]
      ~/workspace/pytorch_transfer_learning> ls
      data.tar.gz  hol-transfer_learning_tutorial.py  input  LICENSE  main.py  output  README.md
      ~/workspace/pytorch_transfer_learning>
    4. Use the tar -xf ./data.tar.gz command to decompress the dataset.

    5. Switch to the Notebook tab. In the directory tree on the left, navigate to the pytorch_transfer_learning directory. Right-click the unzipped data folder (hymenoptera_data), select Rename from the context menu, and rename the folder to input.

Step 4: Prepare the code and output folder

  1. In the terminal of your DSW instance, use the wget command to download the training code into the pytorch_transfer_learning folder.

    cd /mnt/workspace/pytorch_transfer_learning/
    wget https://pai-public-data.oss-cn-beijing.aliyuncs.com/hol-pytorch-transfer-cv/main.py

    where https://pai-public-data.oss-cn-beijing.aliyuncs.com/hol-pytorch-transfer-cv/main.py is the storage location of the training code.

  2. In the pytorch_transfer_learning folder, create a new folder named output to store the trained model.

    mkdir output
  3. Ensure the pytorch_transfer_learning folder contains the following items:

    The folder should contain the following files and directories:

    • input: the folder for the training data.

    • main.py: the training script.

    • output: the folder for storing the output model.

Step 5: Create a job

  1. Log on to the PAI console. In the top navigation bar, select your target region and workspace, and then click Go to DLC.

  2. On the DLC page, click Create Task.

  3. On the Create job page, configure the following parameters.

    Parameter

    Description

    Basic Information

    Job Name

    Enter a name for the deep learning training job.

    Environment Information

    Node Image

    Select Alibaba Cloud Image and choose a PyTorch image. For example, you can select pytorch-training:1.12-gpu-py39-cu113-ubuntu20.04.

    Dataset

    Click Custom Dataset and select the NAS dataset that you created in Step 1.

    Start Command

    Set this parameter to python /mnt/data/pytorch_transfer_learning/main.py -i /mnt/data/pytorch_transfer_learning/input -o /mnt/data/pytorch_transfer_learning/output.

    Third-party Libraries

    Select Third-Party Libraries and enter the following content in the text box.

    numpy==1.16.4
    absl-py==0.11.0

    Code Build

    No configuration is required.

    Resource Information

    Source

    Select Public Resources.

    Framework

    Select PyTorch.

    Job Resource

    For Job resources, select a server. For example, set Resource Type to ecs.g6.xlarge under CPU and set Nodes to 1.

  4. Click Confirm.

Step 6: View job details and logs

  1. On the Deep Learning Containers (DLC) page, click the name of your job.

  2. On the job details page, you can view the Basic Information and Resource Information of the job.

  3. At the bottom of the job details page, find the target instance in the Instance section and click Log in the Actions column.

    The following is a sample of the log output.

    Epoch 5/9
    ----------
    train Loss: 0.4959 Acc: 0.7951
    val Loss: 0.2213 Acc: 0.9150
    Epoch 6/9
    ----------
    train Loss: 0.6845 Acc: 0.7664
    val Loss: 0.5303 Acc: 0.8301
    Epoch 7/9
    ----------
    train Loss: 0.4233 Acc: 0.8156
    val Loss: 0.2569 Acc: 0.9150
    Epoch 8/9
    ----------
    train Loss: 0.4147 Acc: 0.8443
    val Loss: 0.2397 Acc: 0.9346
    Epoch 9/9
    ----------
    train Loss: 0.3133 Acc: 0.8770
    val Loss: 0.2333 Acc: 0.9346
    Training complete in 3m 50s
    Best val Acc: 0.934641