All Products
Search
Document Center

Platform For AI:DSW quick start

Last Updated:May 27, 2026

Train and deploy a deep learning model in DSW using the MNIST handwritten digit recognition example.

Note

The MNIST handwritten digit recognition task is a classic deep learning introductory task: train a model to recognize handwritten digits (0–9).

image

Prerequisites

Activate PAI and create a workspace. Log on to the PAI console, select a region, and follow the prompts to grant permissions and activate the service.

Billing

This example uses pay-as-you-go public resources for a DSW instance and an EAS model service. Billing details: DSW billing and EAS billing.

Create a DSW instance

  1. Go to the DSW page.

    1. Log on to the PAI console.

    2. Select the target region.

    3. In the left navigation pane, click Workspaces, and then click the name of the workspace that you want to use.

    4. In the left navigation pane, choose Model Training > Interactive Modeling (DSW), and then click Create an instance..

      image

  2. On the Configure Instance page, configure these key parameters. Use defaults for the rest.

    • Resource Type: Select Public Resources, a pay-as-you-go option.

    • Instance Type: Select ecs.gn7i-c8g1.2xlarge.

      If unavailable, select another GPU-accelerated instance type.
    • Image config: Select Alibaba Cloud Image, and then search for and select the following image: modelscope:1.26.0-pytorch2.6.0-gpu-py311-cu124-ubuntu22.04.

      Use this exact image to avoid environment issues.
    • Mount storage: Stores files persistently during development. This tutorial uses Object Storage Service (OSS). Click OSS, click the image icon, select a Bucket, and create a directory such as pai_test. Example configuration:

      If OSS is not activated or you have no bucket in the current region:

      Activate OSS and create a bucket

      1. Activate the OSS service.

      2. Log on to the OSS console, click Create Bucket, enter a Bucket Name, set Region to the same region as PAI, accept default values for other parameters, and then click Create.

        image

      • URI: oss://**********oss-cn-hangzhou-internal.aliyuncs.com/pai_test/

      • Mount Path: /mnt/data/

  3. Click OK to create the DSW instance.

    If the instance fails to start, see Create a DSW instance for troubleshooting.

Develop a model in DSW

  1. In the instance list, click Open to open DSW. On the Launcher page, click Create Notebook.

    image

  2. Download the MNIST training code: click mnist.ipynb, then click the image icon in the upper-left corner of DSW to upload the file.

    image

  3. Open mnist.ipynb and click the image button to run the training cell. The code downloads the MNIST dataset to dataSet and saves the best checkpoint to output. Training takes about 10 minutes.

    image

    image

    98% validation accuracy indicates good performance. Proceed to the next step.

  4. Monitor training progress in TensorBoard. Run the following cell and open the URL http://localhost:6006/.

    image

    View the train_loss (training) and validation_loss (validation) curves.

    image

    After you view the charts, click the image icon in the cell to stop TensorBoard before running the subsequent cells.

    (Optional) Adjust hyperparameters based on the loss curve to improve model performance

    Evaluate training performance from the loss trends:

    • Underfitting: The train_loss and validation_loss values are still decreasing when the training ends.

      Increase num_epochs or learning_rate, then retrain.

    • Overfitting: The train_loss value continues to decrease, but the validation_loss value starts to increase before the training ends.

      Decrease num_epochs or learning_rate, then retrain.

    • Good fit: Both the train_loss and validation_loss values stabilize before the training ends.

      Proceed to the next steps.

  5. Test the model. Run the following cell to display 20 test images with their true labels and predictions.

    image

    Sample output:

    image

  6. Copy model files to OSS for persistent storage. The DSW instance uses a free cloud disk. Content on the cloud disk is deleted if the instance remains stopped for more than 15 days. Storing files in OSS also enables deployment with EAS.

    image

    Log on to the OSS console to view the copied files:

    image

Model development is complete. To use this model in production, deploy it as an online service with EAS.

Important

This DSW instance uses pay-as-you-go billing. To avoid further charges, stop or delete the instance when done.image

Deploy the model with EAS

Elastic Algorithm Service (EAS) allows you to quickly deploy trained models as online inference services or AI web applications. EAS supports heterogeneous resources and integrates automatic scaling, one-click stress testing, canary releases, and real-time monitoring to ensure service stability in high-concurrency scenarios at a lower cost.

  1. Write the model service API and copy it to OSS. The training code includes the API and copy commands. Run the following cell.

    image

  2. (Optional) Verify the API in DSW. Run the following cell to install dependencies and start the service.

    image

    Test the API: click WebIDE at the top of the page, open request_web.py in the left pane, and click the image button to send a test request.

    image

    The following result is returned:

    {"prediction": 7}
    Note

    To access the DSW web service from the public internet, configure a VPC, NAT Gateway, and EIP for the instance. Access a service in an instance over the internet.

  3. Configure EAS. In the left navigation pane of the PAI console, click Elastic Algorithm Service (EAS) > Deploy Service > Custom Deployment.

    image

    Configure these key parameters and use defaults for the rest:

    • Deployment Method: Image-based Deployment

    • Image Configuration: Select Image Address, and then copy and paste the image address used in the DSW instance.

      Using the same image as DSW avoids runtime environment issues.

      image

    • Mount storage: The model files and service API code have been copied to OSS. Click OSS and select the corresponding OSS path.

      image

    • Command to Run: Same as the DSW startup command, but web.py is now mounted at /mnt/data/. Update the path to web.py. Run command: python /mnt/data/web.py

    • Port Number: Set the port that is used in web.py to 9000.

    • Third-party Library Settings: The selected image lacks the bottle library. Add it in the third-party library configuration.

      image

    • Resource Type: Select Public Resources. For Instance Type, select ecs.gn7i-c8g1.2xlarge.

    • Configure a system disk: Set the size to 20 GB.

      The large image requires additional disk space to start successfully.

    Click Deploy to create the service. Deployment takes about 5 minutes. The service is ready when the status shows Running.

  4. On the service details page, click View Endpoint Information to obtain the Internet Endpoint and Token.

    image

  5. Call the service. Run the following code, replacing Internet Endpoint and Token with the values from the previous step.

    import requests
    
    """
    Test image URLs:
    Label is 7
    http://aliyun-document-review.oss-cn-beijing.aliyuncs.com/dsw_files/mnist_label_7_No_0.jpg
    Label is 2
    http://aliyun-document-review.oss-cn-beijing.aliyuncs.com/dsw_files/mnist_label_2_No_1.jpg
    Label is 1
    http://aliyun-document-review.oss-cn-beijing.aliyuncs.com/dsw_files/mnist_label_1_No_2.jpg
    Label is 0
    http://aliyun-document-review.oss-cn-beijing.aliyuncs.com/dsw_files/mnist_label_0_No_3.jpg
    Label is 4
    http://aliyun-document-review.oss-cn-beijing.aliyuncs.com/dsw_files/mnist_label_4_No_4.jpg
    Label is 9
    http://aliyun-document-review.oss-cn-beijing.aliyuncs.com/dsw_files/mnist_label_9_No_5.jpg
    """
    
    image_url = 'http://aliyun-document-review.oss-cn-beijing.aliyuncs.com/dsw_files/mnist_label_7_No_0.jpg'
    
    # The client downloads the image and obtains the binary data.
    img_response = requests.get(image_url, timeout=10)
    # Automatically check if the request was successful based on the status code.
    img_response.raise_for_status()
    img_bytes = img_response.content
    
    # Header information. Replace <your_token> with your actual token.
    # In production environments, we recommend that you set the token as an environment variable to prevent sensitive information leaks.
    # For more information about how to configure environment variables, see https://www.alibabacloud.com/help/en/sdk/developer-reference/configure-the-alibaba-cloud-accesskey-environment-variable-on-linux-macos-and-windows-systems.
    headers = {"Authorization": "<your_token>"}
    # Send the binary data as the body of a POST request to the model service.
    resp = requests.post('<your_public_endpoint>/predict_image', data=img_bytes, headers=headers)
    print(resp.json())

    The following result is returned:

    {"prediction": 7}
Important

This EAS service uses pay-as-you-go billing. To avoid further charges, stop or delete the service when done.

image

References