All Products
Search
Document Center

Platform For AI:TensorBoard

Last Updated:May 10, 2024

You can create a TensorBoard instance for a Deep Learning Containers (DLC) job in Platform for AI (PAI) and view the visualized analysis report of model training results the TensorBoard. This topic describes how to create and manage TensorBoard instances.

Prerequisites

A DLC job is created and associated with a dataset. For more information, see Submit training jobs.

Limits

You can use TensorBoard to view analysis reports only for training jobs that are associated with a dataset.

Create a TensorBoard instance

  1. Go to the Deep Learning Containers (DLC) page.

    1. Log on to the PAI console.

    2. In the left-side navigation pane, click Workspaces. On the Workspaces page, click the name of the workspace that you want to manage.

    3. In the left-side navigation pane, choose Model Development and Training > Deep Learning Containers (DLC).

  2. In the Actions column of the job that you want to view, click TensorBoard. In the TensorBoard panel, click Create TensorBoard. image

  3. On the Create TensorBoard page, configure the parameters and click OK. The following tables describe the parameters.

    • Basic Information

      Parameter

      Description

      TensorBoard Name

      The name of the TensorBoard instance.

      TensorBoard Configuration

      The following configuration types are supported:

      • Mount Dataset

        • Datasets: Select the dataset that is associated with the DLC job.

        • Summary Path: Enter the path of the summary directory in the dataset.

      • By Task

        • DLC Job: Select an existing DLC job.

        • Summary Path: Enter the absolute path of the summary directory in the task. For example, if the summary file is in the /tensorboards/summary directory of the dataset and the mount path of the dataset in the DLC job is /mnt/data, the absolute path of the summary file in the DLC job is /mnt/data/tensorboards/summary.

      You can click Add to mount multiple summary paths for each TensorBoard instance to compare metrics across multiple jobs.

    • Resource Configuration

      Valid values:

      Resource type

      Description

      Free Quota

      PAI provides a free quota for TensorBoard instances.

      Paid Resources

      If the free quota is exhausted, you can use paid resources to start TensorBoard instances or stop idle instances that are running to reduce costs. For information about the billing of resource types, see the "Appendix: Pricing details of the public resource group" section in the Billing of general computing resources topic.

    • VPC

      This parameter is available only if you use Paid Resources to create a TensorBoard instance.

      • If you do not configure virtual private cloud (VPC), Internet connection is used. However, the system may stutter during the startup of TensorBoard instances or reports viewing due to the limited bandwidth of the Internet connection.

      • To ensure sufficient network bandwidth and stable performance, we recommend that you configure VPC.

        Select a VPC, a vSwitch, and a security group in the current region. After you complete the configuration, the cluster in which the TensorBoard instance runs can access the services in the selected VPC and use the security group that you specified to control access.

        Important

        If the TensorBoard instance uses a dataset that requires a VPC, such as a CPFS dataset or a NAS dataset that has a mount target in the VPC, you must configure a VPC.

  4. Go to the TensorBoard page to view the analysis report.

    1. In the left-side navigation pane of the workspace page, choose AI Computing Asset Management > Jobs.

    2. On the TensorBoard tab, if the Status of the Tensorboard instance is Running, click View Tensorboard in the Actions column.

      The TensorBoard page appears. image

Manage a TensorBoard instance

Perform the following steps to manage existing TensorBoard instances.

  1. Go to the Distributed Training Jobs page

    1. Log on to the PAI console.

    2. In the left-side navigation pane, click Workspaces. On the Workspaces page, click the name of the workspace that you want to manage.

    3. In the left-side navigation pane, choose AI Computing Asset Management > Jobs to go to the Distributed Training Jobs page.

  2. Manage TensorBoard instances.image

    • On the TensorBoard tab, click the name of the TensorBoard instance. The Details page appears. On the Details page, you can view the Basic Information and Configuration Information of the TensorBoard instance.

    • View associated tasks

      Move the pointer over the image icon in the Associated Task column to view the ID of the associated DLC job. You can also the ID to go to the task details page.

    • View associated datasets

      Move the pointer over the image icon in the Associated Dataset column to view the ID of the associated datasets. You can click the ID to go to the details page of the dataset.

    • View the running duration

      You can view the running duration of the TensorBoard instance. The running duration starts when the instance is started. After you stop the instance, the running duration is reset.

    • Stop a TensorBoard instance:

      • Click Stop in the Actions column of the instance that you want to stop to stop the instance.

      • Click Auto-stop Settings in the Actions column of the instance that you want to stop to specify the time at which the instance is automatically stopped.

References

You can create a TensorBoard instance for a DLC job on the AI Computing Asset Management > Tasks page. For more information, see Create and manage TensorBoard tasks.