Machine Learning Platform for AI allows you to register a dataset by creating a dataset or importing a dataset file. It also allows you to use manifest files to manage all registered datasets.

Create a dataset

If your source data (such as image, text, video, and audio files) is stored in Object Storage Service (OSS), you can directly create a dataset in the console. The system scans all types of file in the specified directory and then generates a manifest file in the specified OSS path.

  1. Navigate to the Register Dataset page.
    1. Log on to the Machine Learning Platform for AI console.
    2. In the left-side navigation pane, choose Data Preprocessing > Dataset Manager.
    3. On the Dataset Manager page, click Register Dataset.
  2. On the Register Dataset page, set the following parameters.
    Parameter Description
    Dataset Name The dataset name must be 1 to 24 characters in length, and can contain underscores (_) and hyphens (-). It must start with a letter, digit, or Chinese character.
    Method Set Method to New Dataset.
    Storage Type Only OSS is supported.
    Path Set Path to the OSS folder where your source data is stored.
    Data Type Only Image is supported.
    Tags You can attach a maximum of 10 tags to each dataset. The tag must be 1 to 10 characters in length, and can contain underscores (_) and hyphens (-). It must start with a letter, digit, or Chinese character.
  3. Click Submit. Then, a manifest file is generated. The following is an example of the manifest file.
    {"data":{"picUrl":"oss://****/pics/fruit/apple-1.jpg"}}
    {"data":{"picUrl":"oss://****/pics/fruit/apple-10.jpg"}}
    {"data":{"picUrl":"oss://****/pics/fruit/apple-11.jpg"}}
    ...

Import a dataset file

If you have already created a local CSV file or manifest file, you can register a dataset by importing the dataset file. If you import a CSV file, the system automatically converts it to a manifest file.

  1. Navigate to the Register Dataset page.
    1. Log on to the Machine Learning Platform for AI console.
    2. In the left-side navigation pane, choose Data Preprocessing > Dataset Manager.
    3. On the Dataset Manager page, click Register Dataset.
  2. On the Register Dataset page, set the following parameters.
    Parameter Description
    Dataset Name The dataset name must be 1 to 24 characters in length, and can contain underscores (_) and hyphens (-). It must start with a letter, digit, or Chinese character.
    Method Set Method to Import Dataset.
    Storage Type Only OSS is supported.
    Path Select an OSS path.
    Data Type Drag and drop a local CSV or manifest file to the area on the right side of Data Type.
    Note If the imported file is used in a labeling job, the names of the fields in the file must comply with the data schema of the template that is used to create the labeling job. For more information, see Data labeling templates.
    Tags You can attach a maximum of 10 tags to each dataset. The tag must be 1 to 10 characters in length, and can contain underscores (_) and hyphens (-). It must start with a letter, digit, or Chinese character.
  3. Click Submit.