All Products
Search
Document Center

Platform For AI:Create a dataset for a labeling job

Last Updated:Apr 18, 2026

To create a labeling job, you must select a dataset. This topic describes how to create a dataset for data labeling and explains the required data formats.

Background information

Before labeling data in iTAG, you must create a dataset from the files you want to label. PAI allows you to create a labeling job using either a common template or a custom template. The data preparation and dataset creation methods vary depending on the template you choose. For more information, see the following sections:

  • Common template

    Common templates are available for four data types: image, text, video, or audio. For the steps and format requirements to create these datasets, see Create a text dataset and Create an image, video, or audio dataset.

  • Custom template

    A custom template provides more flexibility. For example, you can label multiple data types, such as images and text, in a single labeling job. For the steps and format requirements to create a dataset for this use case, see Create a custom dataset.

Prerequisites

Object Storage Service (OSS) must be activated. For details, see Get started with the OSS console.

Create a text dataset

Item

Method 1: From cloud service

Method 2: Local upload

Procedure

  1. Create a local .manifest or .txt file according to the format requirements in this topic.

  2. Upload the file to OSS. For more information, see Upload files.

  3. Create a dataset from a cloud service. For more information, see Create a dataset: From an Alibaba Cloud cloud service.

  1. Create a local .csv or .xlsx file according to the format requirements in this topic.

  2. Go to iTAG.

    1. Log on to the PAI console.

    2. In the left-side navigation pane, click Workspaces. On the Workspaces page, click the name of the target workspace.

    3. In the left-side navigation pane, choose Data Preparation>iTAG.

  3. On the iTAG page, click Go to Task Center or Go to Management Page.

  4. On the Data Management tab, click Create Original Dataset.

  5. On the Create Original Dataset page, configure the following key parameters:

    • For Import Data, select Local Upload.

    • For Import Format, select File.

    • Configure OSS Bucket and File Path in OSS.

    • Click Upload file and select the .csv or .xlsx file that you created.

  6. Click Create.

File name extension

A .manifest or .txt file.

A .csv or .xlsx file.

File format

{"data":{"source":"text sample 1"}}
{"data":{"source":"text sample 2"}}
{"data":{"source":"text sample 3"}}

The source parameter specifies the content of the sample to be labeled. You must replace the value of source with the corresponding text content.

A column in the .csv or .xlsx file contains the text to be labeled.

File example

textDemo1.manifest

textDemo2.csv

Create an image, video, or audio dataset

This section uses images as an example. The procedure is the same for video and audio files.

Item

Method 1: Scan folder

Method 2: Local upload

Procedure

  1. Upload the image files to an OSS bucket to generate their URLs. For more information, see Upload files.

  2. Create a dataset by scanning a folder, which automatically generates a .manifest file. For more information, see Create and manage datasets.

  1. Create a local folder containing the images.

  2. Go to iTAG.

    1. Log on to the PAI console.

    2. In the left-side navigation pane, click Workspaces. On the Workspaces page, click the name of the target workspace.

    3. In the left-side navigation pane, choose Data Preparation>iTAG.

  3. On the iTAG page, click Go to Task Center or Go to Management Page.

  4. On the Data Management tab, click Create Original Dataset. In the Create Original Dataset panel, configure the following parameters:

    • For Import Data, select Local Upload.

    • For Import Format, select Folder.

    • Configure OSS bucket and File Path in OSS.

    • Click Upload Folder and upload the local folder.

  5. Click Create.

File format

{"data":{"source":"oss://****.oss-cn-hangzhou.aliyuncs.com/iTAG/pic/1.jpg"}}
{"data":{"source":"oss://****.oss-cn-hangzhou.aliyuncs.com/iTAG/pic/10.jpg"}}
{"data":{"source":"oss://****.oss-cn-hangzhou.aliyuncs.com/iTAG/pic/11.jpg"}}

Here, source specifies the content of the sample to be annotated, and the value of source is the OSS storage path URL of the corresponding sample.

File example

Create a custom dataset

Item

From cloud service

Procedure

  1. Create a local .manifest or .txt file according to the format requirements in this topic.

  2. Upload the file to OSS. For more information, see Upload files.

  3. Create a dataset from a cloud service. For more information, see Create and manage datasets.

File name extension

A .manifest or .txt file.

File format

{"data":{"picture_url":"oss://****.oss-cn-hangzhou.aliyuncs.com/iTAG/pic/1.jpg","text":"Jack Ma and 17 other founders established Alibaba Group in a Hangzhou apartment. The group's first website was Alibaba.com, an English-language global wholesale marketplace."}}
{"data":{"picture_url":"oss://****.oss-cn-hangzhou.aliyuncs.com/iTAG/pic/10.jpg","text":"Alibaba Group held the first West Lake Summit, bringing together business and thought leaders from the internet industry to discuss key topics."}}
{"data":{"picture_url":"oss://****.oss-cn-hangzhou.aliyuncs.com/iTAG/pic/11.jpg","text":"Alibaba Group raised USD 82 million from several top-tier investment firms, which was the largest private equity financing in China's internet industry at the time."}}

Each line's "data" object represents a data item to be labeled. This object can contain multiple key-value pairs, allowing you to include different data types, such as an image and text, in a single labeling job.

For example, the following line defines a data item that includes both an image with the storage path oss://****.oss url 01 and the text text sample1.

{"data":{"picture_url":"oss://****.oss url 01","text":"text sample1"}}

File example

multiModal.manifest

Next steps

You can use a created dataset to create a labeling job. For more information, see Create a labeling job.