All Products
Search
Document Center

Platform For AI:Create and manage TensorBoard instances

Last Updated:Dec 08, 2025

You can create and manage TensorBoard instances on the TensorBoard tabs of the Jobs page in the Platform for AI (PAI) console. A TensorBoard instance can be associated with a dataset or a Deep Learning Containers (DLC) job. After the instance is started, you can view the visualized analysis report of model training results of TensorBoard. This topic describes how to create and manage TensorBoard instances.

Limits

You cannot use the TensorBoard feature for DLC jobs that are created in the Malaysia (Kuala Lumpur) region.

Account and permission requirements

  • Alibaba Cloud account: An Alibaba Cloud account can perform all operations without requiring additional authorization.

  • RAM user: You need to add a Resource Access Management (RAM) user as a workspace member that has specific roles and assign permissions to the roles. For more information, see Appendix: Roles and permissions.

Create a TensorBoard instance

To create a TensorBoard instance, perform the following steps:

  1. Go to the Distributed Training Jobs page

    1. Log on to the PAI console.

    2. In the left-side navigation pane, click Workspaces. On the Workspaces page, click the name of the workspace that you want to manage.

    3. In the left-side navigation pane, choose AI Asset Management > Jobs to go to the Distributed Training Jobs page.

  2. On the TensorBoard tab, click Create TensorBoard.

  3. On the Create TensorBoard page, configure the parameters and click OK. The following tables describe the parameters.

    • Basic Information

      Parameter

      Description

      Name

      The name of the TensorBoard instance.

      Configuration Type

      • Mount Type: You can select Mount Dataset, Mount OSS, and By Task. We recommend that you select Mount Dataset.

      • Summary Path: The path in which the TensorBoard summary logs are stored. You can obtain the complete path from the SummaryWriter class in the training code.

      Sample configurations for the example:

      • Mount Dataset: Select a dataset and enter the relative path of the summary directory in the dataset.image

      • Mount OSS: Select an OSS storage path and enter the relative path of the summary directory in OSS.image

      • By Task: Select a desired DLC job and enter the complete path of the log files in the container.image

    • Resource Configuration

      The following table describes the supported resource types.

      Resource type

      Description

      Free Quota

      The system provides you with a certain amount of free resources. Each instance can use up to 2 vCPUs and 4 GiB of memory. If the amount of the free quota cannot meet your business requirements, you can disable instances that run on free quotas to release free resources and use the released free resources to create the TensorBoard instance.

      General Computing

      Public Resources: uses the pay-as-you-go billing method. Only general computing uses public resources. You can select an instance type based on your business requirements.

      Resource Quota: uses the subscription billing method. You must purchase computing resources and create quotas before you specify this parameter. You must configure the following parameters together with this parameter:

      Note

      This feature is available only to users in the whitelist. If you want to use this feature, contact your account manager to configure the whitelist.

      • Priority: the priority of a TensorBoard instance. Valid values: 1 to 9. The value 1 indicates the lowest priority.

      • Job Resource: the resources that you use to run a TensorBoard instance. The resources include the number of vCPUs and the memory. The unit of the memory size is GiB.

    • VPC

      If you use Public Resources to create a TensorBoard instance, the virtual private cloud (VPC)-related parameters are available.

      • If you do not configure a VPC, Internet connection is used. In this case, the system may stutter during TensorBoard instance startup or when you view the reports due to the limited bandwidth of the Internet connection.

      • To ensure sufficient network bandwidth and stable performance, we recommend that you configure a VPC.

        Select a VPC, a vSwitch, and a security group in the current region. After you complete the configuration, the cluster in which the TensorBoard instance runs can access the services in the selected VPC and use the security group that you specified to control access.

        Important

        If the TensorBoard instance uses a dataset that requires a VPC, such as a Cloud Parallel File Storage (CPFS) dataset or a NAS dataset that has a mount target in the VPC, you must configure a VPC.

  4. Find the TensorBoard instance that you create and click View TensorBoard in the Actions column after the TensorBoard instance enters the Running state.

    The TensorBoard page appears. TensorBoard allows you to view the dataset or summary log file during the training in a visualized manner to help you better understand and debug the training. This improves the training effect.image

Manage a TensorBoard instance

image

  • Start a TensorBoard instance

    Click Start in the Actions column to restart a stopped TensorBoard instance.

  • View the details of a TensorBoard instance

    Click the name of the TensorBoard instance. On the TensorBoard instance details page, view Basic Information and Configuration Information.

  • View associated DLC jobs

    View the number of the DLC jobs that you associate with the TensorBoard instance. On the Tensorboard tab, move the pointer over the image icon in the Associated Task column to view the ID of the associated DLC job. Click the ID to go to the details page of the DLC job.

  • View associated datasets

    View the number of the datasets that you associate with the TensorBoard instance. On the Tensorboard tab, move the pointer over the image icon in the Associated Dataset column to view the ID of the associated dataset. Click the ID to go to the details page of the dataset.

  • View the running duration

    View the running duration of the TensorBoard instance. The running duration starts when the instance is started. After you stop the TensorBoard instance, the running duration is reset. On the Tensorboard tab, view the running duration of the TensorBoard instance in the Running Duration column.

  • Stop a TensorBoard instance

    • Click Stop in the Actions column of the TensorBoard instance.

    • Click Auto-stop Settings in the Actions column of the TensorBoard instance to specify the time at which you want the instance to automatically stop.

References

You can also create and manage TensorBoard instances on the Deep Learning Containers (DLC) page. For more information, see TensorBoard.