All Products
Search
Document Center

Platform For AI:Quick Start for Data Science Workshop (DSW)

Last Updated:Nov 28, 2025

Data Science Workshop (DSW) provides a cloud-based AI development Integrated Development Environment (IDE) with multiple built-in development environments. If you are familiar with Notebook or VSCode, you can quickly start developing models. This topic uses the MNIST handwriting recognition task as an example to demonstrate how to quickly develop a model in DSW.

Note

The MNIST handwriting recognition task is one of the most classic introductory tasks in deep learning. The goal is to build a machine learning model to recognize 10 handwritten digits (0 to 9).

image

Prerequisites

Before you begin, you must activate PAI and create a workspace using your Alibaba Cloud account. To do this, log on to the PAI console, select a region in the upper-left corner, and then grant the required permissions to activate the product.

Billing information

This example uses public resources to create a DSW instance and an Elastic Algorithm Service (EAS) model service. These resources are billed on a pay-as-you-go basis. For more information about billing rules, see DSW billing and EAS billing.

Create a DSW instance

  1. Go to the DSW page.

    1. Log on to the PAI console.

    2. In the upper-left corner of the page, select the destination region.

    3. In the navigation pane on the left, click Workspace, and then click the name of the workspace that you want to manage.

    4. In the navigation pane on the left, choose Model Training > Data Science Workshop (DSW). Then, click Create Instance.

      image

  2. On the Create Instance page, configure the following key parameters and use the default values for the other parameters.

    • Resource Type: Select Public Resources. The billing method for this resource type is pay-as-you-go.

    • Instance Type: Select ecs.gn7i-c8g1.2xlarge.

      If the inventory for this instance type is insufficient, you can select another GPU-accelerated instance type.
    • Image config: Select Alibaba Cloud Image, and then search for and select the following image: modelscope:1.26.0-pytorch2.6.0-gpu-py311-cu124-ubuntu22.04.

      To avoid environment issues, select the same image as the one used in this topic.
    • Storage Path Mounting: To persistently store files from the model development process, this topic uses Object Storage Service (OSS). Click OSS, click the image icon, select a Bucket, and create a folder, such as pai_test. The complete parameter configuration is as follows.

      If you have not activated OSS or do not have an available bucket in the current region, follow these steps to activate OSS and create a bucket:

      (Optional) Activate OSS and create a bucket

      1. Activate the OSS service.

      2. Log on to the OSS console, click Create Bucket, enter a Bucket Name, select the same Region as your PAI workspace, keep the other parameters at their default values, and then click Create.

        image

      • Uri: oss://**********oss-cn-hangzhou-internal.aliyuncs.com/pai_test/.

      • Mount Path: /mnt/data/.

  3. Click OK to create the DSW instance.

    If the instance fails to start, see Common Issues with Instance Startup and Release for troubleshooting.

Develop a model in DSW

  1. Open the DSW instance

    Click Open to go to the development environment of the DSW instance that you created.

    image

    The PAI-DSW interface is shown in the following figure:

    image

  2. Write the model development code. This topic uses the Notebook development environment as an example and provides the training code for MNIST handwriting recognition. Click mnist.ipynb to download the code. Then, in the upper-left corner of the DSW page, click the image icon to upload the code file.

    image

  3. Run the model training code. Open the mnist.ipynb file, find the cell that contains the training code, and then click the image button to run the code. The code automatically downloads the MNIST dataset to the dataSet directory and saves the best checkpoint to the output directory after training. The training process takes about 10 minutes.

    image

    image

    During training, the accuracy of the model on the validation set is displayed. This value represents the model's generalization ability on unknown data. In this example, the accuracy on the validation set is 98%, which indicates that the model performs well. You can proceed to the next steps.

  4. View the loss curve in TensorBoard to understand the training status. Run the following cell and click the TensorBoard URL: http://localhost:6006/.

    image

    In TensorBoard, you can view the train_loss curve, which reflects the loss on the training dataset, and the validation_loss curve, which reflects the loss on the validation set.

    image

    After you view the graph, click the image icon in the cell to stop TensorBoard.

    (Optional) Adjust hyperparameters based on the loss graph to improve model performance

    You can evaluate the training performance of the current model based on the trend of the loss value:

    • The train_loss and validation_loss values are still decreasing before the training ends (underfitting). You can increase the value of num_epochs, which is positively correlated with the training depth, or increase the learning_rate and then retrain the model. This improves the model's fit to the training data.

      You can increase the value of `num_epochs`, which is positively correlated with the training depth, or increase the `learning_rate`. Retraining the model with either change improves its fit to the training data.

      • The train_loss value continues to decrease, but the validation_loss value starts to increase before the training ends (overfitting). You can decrease the value of num_epochs or decrease the learning_rate and then retrain the model. This prevents the model from being overtrained.

        To prevent the model from being overtrained, you can decrease the value of num_epochs or learning_rate and then retrain the model.

      • Both the train_loss and validation_loss values stabilize before the training ends (good fit). If the model is in this state, you can proceed to the next steps.

        If the model is in this state, you can proceed to the next step.

  5. Invoke the trained model to test its performance. Run the cell shown in the figure. The cell displays 20 test images and outputs their true labels and the model's prediction results.

    image

    Sample output:

    image

  6. Copy the model file to Object Storage Service (OSS) for persistent storage. The DSW instance in this topic is created using public resources, and its files are stored on a free disk. If the instance remains stopped for more than 15 days, the content on the disk is deleted. Therefore, you must copy the model file to OSS for persistent storage. This also makes it easier to deploy the model using PAI-EAS later.

    image

    Log on to the OSS console to view the file:

    image

This completes the model development. If you want to invoke the model in other applications in a production environment, see Deploy the model as an online service using EAS.

Important

The DSW instance in this topic is created using public resources and is billed on a pay-as-you-go basis. When you no longer need the DSW instance, stop or delete it to avoid further charges.image

Deploy the model as an online service using EAS

Once a model is trained, you can use Elastic Algorithm Service (EAS) to quickly deploy it as an online inference service or an AI web application. EAS supports heterogeneous resources and combines features like Automatic scalingOne-click stress testingCanary release, and Real-time monitoring to ensure stable, continuous service in high-concurrency scenarios at a lower cost.

  1. Write a web interface for the model service and copy it to OSS. The web interface code and the copy command are provided. You can run the following cell to perform these operations.

    image

  2. (Optional) Verify that the web interface can be started in DSW. Run the following cell to install the missing third-party packages and start the service.

    image

    Run the code to test the service interface. At the top of the page, click WebIDE. On the left, click the request_web.py code file. Then, click the image icon to run the code and send a request to the service interface.

    image

    The following result is returned:

    {"prediction": 7}
    Note

    If you want to directly access the web service that is running in DSW from the internet, you must configure a virtual private cloud (VPC), a NAT Gateway, and an Elastic IP Address (EIP) for DSW. For more information, see Access a service in an instance over the internet.

  3. Configure EAS. In the PAI console, in the navigation pane on the left, click Elastic Algorithm Service (EAS) > Deploy Service > Custom Deployment.

    image

    Configure the following key parameters and use the default values for the other parameters:

    • Deployment Method: Image-based Deployment

    • Image Configuration: Select Image Address. Copy and paste the URL of the image that is used for the DSW instance.

      The environment of this image was verified to correctly run the model service code in this topic when you used DSW. Therefore, use the same image for deployment to avoid unknown runtime environment issues.

      image

    • Mount storage: The model file and service interface code have been copied to OSS. Therefore, click OSS and select the corresponding OSS path.

      image

    • Command: The command is the same as the service startup command in DSW. However, because web.py is now mounted to /mnt/data/, you must modify the path of web.py accordingly. The final command is: python /mnt/data/web.py

    • Port: Configure the port that is used in web.py, which is 9000.

    • Third-party Library Configuration: During testing in DSW, the selected image was found to be missing the bottle library. Therefore, you must add this library in the third-party library configuration.

      image

    • Resource Type: Select Public Resources. For Resource Specification, select ecs.gn7i-c8g1.2xlarge.

    • Configure a system disk: Click Show More and set Extra System Disk to 20 GB.

      Because the image that is used is large, the service cannot start due to insufficient space if you do not set an extra system disk.

    Click Deploy to create the service. The creation process takes about 5 minutes. When the status changes to Running, the service is deployed.

  4. View the invocation information. On the model service details page, click View Invocation Information to obtain the Public Endpoint and Token.

    image

  5. Invoke the service. Run the following service request code. Replace the Endpoint and Token in the code with the actual information that you obtained in the previous step.

    import requests
    
    """
    Test image URLs:
    label is 7
    http://aliyun-document-review.oss-cn-beijing.aliyuncs.com/dsw_files/mnist_label_7_No_0.jpg
    label is 2
    http://aliyun-document-review.oss-cn-beijing.aliyuncs.com/dsw_files/mnist_label_2_No_1.jpg
    label is 1
    http://aliyun-document-review.oss-cn-beijing.aliyuncs.com/dsw_files/mnist_label_1_No_2.jpg
    label is 0
    http://aliyun-document-review.oss-cn-beijing.aliyuncs.com/dsw_files/mnist_label_0_No_3.jpg
    label is 4
    http://aliyun-document-review.oss-cn-beijing.aliyuncs.com/dsw_files/mnist_label_4_No_4.jpg
    label is 5
    http://aliyun-document-review.oss-cn-beijing.aliyuncs.com/dsw_files/mnist_label_9_No_5.jpg
    """
    
    image_url = 'http://aliyun-document-review.oss-cn-beijing.aliyuncs.com/dsw_files/mnist_label_7_No_0.jpg'
    
    # The client downloads the image to get the binary data.
    img_response = requests.get(image_url, timeout=10)
    # Automatically check if the request is successful based on the status code.
    img_response.raise_for_status()
    img_bytes = img_response.content
    
    # Header information. Replace YOUR_TOKEN with the actual token.
    # In a production environment, we recommend that you set the token as an environment variable to prevent sensitive information leaks.
    # For more information about how to configure environment variables, see https://www.alibabacloud.com/help/en/sdk/developer-reference/configure-the-alibaba-cloud-accesskey-environment-variable-on-linux-macos-and-windows-systems
    headers = {"Authorization": "YOUR_TOKEN"}
    # Send the binary data as the body of a POST request to the model service.
    resp = requests.post('YOUR_ENDPOINT/predict_image', data=img_bytes, headers=headers)
    print(resp.json())

    The following result is returned:

    {"prediction": 7}
Important

The EAS service in this topic is created using public resources and is billed on a pay-as-you-go basis. When you no longer need the service, stop or delete it to avoid further charges.

image

References