This topic describes how to submit a TensorFlow training job and a cron job in the
AI development console.
Submit a TensorFlow training job
- In the left-side navigation pane of the AI development console, click Submit Job.
- In the Basic Information section, set Job Name, Job Type (the default is TF Stand-alone), Namespace, Data Configuration, and Code Configuration, and then enter the command to execute in the Execution Command field.
Notice Namespace: You can select only the namespace that is allocated to you by the cluster administrator. You can set other parameters
based on your requirements.
- In the Resources section, specify instances count and image for training models. Then, specify CPU (Cores) (the default is 4), Memory (GB) (the default is 8 GB), and GPU (Card Numbers) (the default is 0) for the training job.
- Click Submit.
- After the training job is submitted, click Job List in the left-side navigation pane of the AI development console. On the page that
appears, you can view information about the job, such as the name and the status of
the job.
Submit a cron job
- In the left-side navigation pane of the AI development console, click Submit Job.
- In the Basic Information section, set Job Name, Job Type (the default is TF Stand-alone), Namespace, Data Configuration, and Code Configuration, and then enter the command to execute in the Execution Command field.
Notice Namespace: You can select only the namespace that is allocated to you by the cluster administrator. You can set other parameters
based on your requirements.
- Turn on dlc-dashboard-cron and set the following parameters for the cron job:
- Cron Schedule: Enter a standard cron expression. For more information about how to use cron expressions,
see how-use-cron-linux.
- If the current training job is still in progress, you can select a concurrency policy
from the dlc-dashboard-cron-co drop-down list. Valid values:
- Allow: allows you to create a new training job.
- Forbid: forbids you from creating a new training job before the current training job
is finished.
- Replace: replaces the current training job with a new training job.
- History Record Limit: TensorFlow training jobs that are created by the cron job are retained in the cluster.
If the number of retained jobs exceeds the limit, the system deletes the TensorFlow
training jobs that were created at the earliest point in time.
- In the Resources section, specify instances count and image for training models. Then, specify CPU (Cores) (the default is 4), Memory (GB) (the default is 8 GB), and GPU (Card Numbers) (the default is 0) for the training job.
- Click Submit.
- After the training job is submitted, click Job List in the left-side navigation pane of the AI development console. On the page that
appears, you can view information about the job, such as the name and the status of
the job.