All Products
Search
Document Center

Platform For AI:Submit standalone PyTorch transfer learning jobs with NAS

Last Updated:Mar 11, 2026

Run standalone PyTorch transfer learning jobs with training data stored in NAS file systems using DLC and DSW.

Prerequisites

Create a General-purpose NAS file system in your target region. For more information, see Create a General-purpose NAS file system.

Limitations

This document applies only to general computing resources in public resource groups.

Create a dataset

  1. Go to the Datasets page.

    1. Log on to the PAI console.

    2. In the left-side navigation pane, click Workspaces. On the Workspaces page, click the name of the workspace that you want to manage.

    3. In the left-side navigation pane, choose AI Asset Management > Datasets.

  2. Create a basic dataset. Set Storage Class to General-purpose NAS file system.

Create a DSW instance

Create a DSW instance. For more information, see Create a DSW instance.image

Parameter

Description

Environment Context

Dataset Mount

Click Custom Dataset, select the created NAS dataset, and set Mount Path to /mnt/data/.

Working Directory

Select Dataset-/mnt/data/.

Network Information

VPC Configuration

Not required for this procedure.

Prepare training data

Download and decompress training data from this public dataset.

  1. Go to the development environment of a DSW instance.

    1. Log on to the PAI console.

    2. In the left-side navigation pane, click Workspaces. On the Workspaces page, click the name of the workspace that you want to manage.

    3. In the upper-left corner of the page, select the region where you want to use PAI.

    4. In the left-side navigation pane, choose Model Training > Data Science Workshop (DSW).

    5. Optional: On the Data Science Workshop (DSW) page, enter the name of a DSW instance or a keyword in the search box to search for the DSW instance.

    6. Click Open in the Actions column of the instance.

  2. In the menu bar, select Notebook.

  3. Download training data.

    1. Click the 创建文件夹 icon to create a folder named pytorch_transfer_learning.

    2. Select Terminal to open the terminal.

    3. Navigate to the folder and download the dataset.

      cd /mnt/workspace/pytorch_transfer_learning/
      wget https://pai-public-data.oss-cn-beijing.aliyuncs.com/hol-pytorch-transfer-cv/data.tar.gz

      image

    4. Decompress the downloaded dataset.

      tar -xf ./data.tar.gz
    5. Navigate to the pytorch_transfer_learning directory in the Notebook tab, right-click the decompressed data folder (hymenoptera_data), select Rename, and rename it to input.

Prepare training code

  1. Download training code to the pytorch_transfer_learning folder.

    cd /mnt/workspace/pytorch_transfer_learning/
    wget https://pai-public-data.oss-cn-beijing.aliyuncs.com/hol-pytorch-transfer-cv/main.py
  2. Create an output folder to store the trained model.

    mkdir output
  3. Verify that the folder contains the following files:

    • input: training data

    • main.py: training code

    • output: model storage location

    最终的文件夹内容

Create a job

  1. Go to the Create Job page.

    1. Log on to the PAI console, select the target region and workspace, and click Deep Learning Containers (DLC).

    2. On the Deep Learning Containers (DLC) page, click Create Job.

  2. Configure the following parameters on the Create Job page:

    Parameter

    Description

    Basic Information

    Job Name

    Specify the job name.

    Environment Context

    Node Image

    Select Alibaba Cloud Image and a PyTorch image. Example: pytorch-training:1.12-gpu-py39-cu113-ubuntu20.04.

    Dataset

    Select the created NAS dataset.

    Start Command

    Enter the following command to specify input and output paths:

    python /mnt/data/pytorch_transfer_learning/main.py -i /mnt/data/pytorch_transfer_learning/input -o /mnt/data/pytorch_transfer_learning/output

    Third-party Library Configuration

    Enter the following third-party libraries:

    numpy==1.16.4
    absl-py==0.11.0

    Code Configuration

    Not required for this procedure.

    Resource Information

    Resource Source

    Select Public Resources.

    Framework

    Select PyTorch.

    Job Resources

    Select a resource specification. For example, under Resource Specification, select CPU > ecs.g6.xlarge, and set Nodes to 1.

  3. Select OK to create the job.

View job status

  1. On the Deep Learning Containers (DLC) page, select the job name to view details.

  2. View Basic Information and Resource Information on the job details page.

  3. In the Instance section, select Log in the Actions column to view training logs.

    Example training logs:image