This topic describes how to configure datasets and source code repositories for a training job.
Prerequisites
AI Developer Console and the scheduling component of the cloud-native AI suite are installed in the ACK Pro cluster. The cluster runs Kubernetes 1.20 or later.
A Resource Access Management (RAM) user is created in the RAM console by the cluster administrator. A quota group is added and associated with the RAM user. For more information, see Step 1: Create a quota group for the RAM user.
Create a persistent volume claim (PVC). For more information, see Mount a statically provisioned NAS volume in the console and Use the console to mount a statically provisioned OSS volume.
NoteIn most cases, data used to train models is stored in Object Storage Service (OSS) volumes or Apsara File Storage NAS (NAS) volumes.
Configure a dataset
Log on to AI Developer Console. For more information, see Step 2: Log on to AI Developer Console.
In the left-side navigation pane of AI Developer Console, click Data Config.
On the Data Config page, click New Data Configuration.
On the New Data Configuration page, set Name, Namespace, and Persistent Volume Claim for the dataset and specify local directory based on your requirements.
For more information about PVCs, see Create a PVC.
When the ACK cluster runs the job, the ACK cluster mounts the local directory to the container in which the job runs. This enables the job to access the data and model stored in the local directory.
Click Submit.
After you complete the configuration, you can view the detailed information about the dataset on the Data tab of the Data Config page.
Configure a source code repository
In the left-side navigation pane of the AI Developer Console, click Data Config.
On the Data Config page, click New Git configuration.
In the New Code Configuration dialog box, set Name, Git Repository, and Default Branch for the source code repository and specify local directory based on your requirements. When the ACK cluster runs the job, the ACK cluster mounts the local directory to the container in which the job runs. This enables the job to access the source code stored in the local directory.
Click Submit.
After you complete the configuration, you can view the detailed information about the dataset on the Data tab of the Data Config page.