This topic describes how to manage clusters on the Deep Learning Containers (DLC) platform and how to manage deep learning jobs in DLC Dashboard.

DLC platform
Column name Description
DLC Cluster ID/Name Information about a DLC cluster, including the ID and name of the DLC cluster.
ACK Cluster ID/Name Information about a cluster of Alibaba Cloud Container Service for Kubernetes (ACK) that has been added to DLC. You can click the ID of an ACK cluster in the ACK Cluster ID/Name column to log on to the ACK console.
Status
  • Initializing: DLC is initializing the ACK cluster.
  • Deploying: DLC is deploying components in the ACK cluster.
  • Running: DLC has deployed components in the ACK cluster. The ACK cluster is running.
  • Deployment Failed: DLC fails to deploy components in the ACK cluster. You can click Log in the Actions column to check the causes. You can also click Retry in the Actions column to deploy the components again.
Type Indicates how the ACK cluster is added to DLC.
  • Bind: The ACK cluster is created in the ACK console and then added to DLC.
  • Create: The ACK cluster is created from the DLC platform.
Creation Time The time when the DLC cluster is created.
Description The description of the DLC cluster that you have entered after you click Add Cluster.
Actions
  • Cluster Console: Log on to DLC Dashboard, where you can submit training jobs and view progress of training jobs.
  • Components: View DLC components that have been deployed in the ACK cluster.
  • Logs: View log data generated when the system adds the ACK cluster and deploys the DLC components.
  • Retry: If the Status column displays Deployment Failed, you can click Retry to deploy the DLC components again.