All Products
Search
Document Center

Container Service for Kubernetes:Submit a TensorFlow training job and a cron job

Last Updated:Oct 23, 2023

This topic describes how to submit a TensorFlow training job and a cron job in AI Developer Console.

Prerequisites

Submit a TensorFlow training job

  1. Log on to the AI development console. For more information, see Step 2: Log on to AI Developer Console.

  2. In the left-side navigation pane of AI Developer Console, click Submit Job.

  3. In the Basic Information section:

    • Configure parameters such as Job Name, Job Type (default type: TF-Stand-alone), Namespace, and Execution Command.

      Important

      Namespace: You can select only the namespace that is allocated to you by the cluster administrator. You can set other parameters based on your requirements.

    • Optional: Turn on Tensorboard to visualize the training results.

    • Optional: Turn on Cron to configure a cron job.

      • Cron Schedule: Enter a standard cron expression. For more information about how to use cron expressions, see How I use cron in Linux.

      • If the current training job is still in progress, you can select a concurrency policy from the Concurrency Policy drop-down list. Valid values:

        • Allow: allows you to create a new training job.

        • Forbid: forbids you from creating a new training job before the current training job is finished.

        • Replace: replaces the current training job with a new training job.

      • History Record Limit: TensorFlow training jobs that are created by the cron job are retained in the cluster. If the number of retained jobs exceeds the limit, the system deletes the TensorFlow training jobs that were created at the earliest point in time.

  4. In the Resources section, configure the following parameters for the training job: Instances Count, Image, CPU (Cores) (default value: 4), Memory (GB) (default value: 8 GB), and GPU (Card Numbers) (default value: 0).

  5. In the Advance Configuration section, configure the Label, Annotation, and NodeSelector parameters for Kubernetes objects.

  6. Click Submit Job.

  7. In the left-side navigation pane of AI Developer Console, click Job List to view the information about a job, such as the name and status of the job.