This topic describes how to submit a TensorFlow training job and a cron job in the AI development console.

Prerequisites

Submit a TensorFlow training job

  1. In the left-side navigation pane of the AI development console, click Submit Job.
  2. In the Basic Information section, set Job Name, Job Type (the default is TF Stand-alone), Namespace, Data Configuration, and Code Configuration, and then enter the command to execute in the Execution Command field.
    Notice Namespace: You can select only the namespace that is allocated to you by the cluster administrator. You can set other parameters based on your requirements.
  3. In the Resources section, specify instances count and image for training models. Then, specify CPU (Cores) (the default is 4), Memory (GB) (the default is 8 GB), and GPU (Card Numbers) (the default is 0) for the training job.
  4. Click Submit.
  5. After the training job is submitted, click Job List in the left-side navigation pane of the AI development console. On the page that appears, you can view information about the job, such as the name and the status of the job.

Submit a cron job

  1. In the left-side navigation pane of the AI development console, click Submit Job.
  2. In the Basic Information section, set Job Name, Job Type (the default is TF Stand-alone), Namespace, Data Configuration, and Code Configuration, and then enter the command to execute in the Execution Command field.
    Notice Namespace: You can select only the namespace that is allocated to you by the cluster administrator. You can set other parameters based on your requirements.
  3. Turn on dlc-dashboard-cron and set the following parameters for the cron job:
    • Cron Schedule: Enter a standard cron expression. For more information about how to use cron expressions, see how-use-cron-linux.
    • If the current training job is still in progress, you can select a concurrency policy from the dlc-dashboard-cron-co drop-down list. Valid values:
      • Allow: allows you to create a new training job.
      • Forbid: forbids you from creating a new training job before the current training job is finished.
      • Replace: replaces the current training job with a new training job.
    • History Record Limit: TensorFlow training jobs that are created by the cron job are retained in the cluster. If the number of retained jobs exceeds the limit, the system deletes the TensorFlow training jobs that were created at the earliest point in time.
  4. In the Resources section, specify instances count and image for training models. Then, specify CPU (Cores) (the default is 4), Memory (GB) (the default is 8 GB), and GPU (Card Numbers) (the default is 0) for the training job.
  5. Click Submit.
  6. After the training job is submitted, click Job List in the left-side navigation pane of the AI development console. On the page that appears, you can view information about the job, such as the name and the status of the job.