All Products
Search
Document Center

Platform For AI:Create a dataset for a labeling job

Last Updated:Mar 06, 2026

Select a dataset when creating labeling jobs. This topic describes dataset creation and format requirements.

Background information

Before creating a labeling job with iTAG, create the labeling file as a dataset. iTAG of Platform for AI (PAI) supports labeling jobs using a common template or custom template. Data preparation and dataset creation methods vary based on the selected template.

Prerequisites

Activate Object Storage Service (OSS). For more information, see Get started with the OSS console.

Create a text dataset

Item

Method 1: Use data stored in an Alibaba Cloud storage service

Method 2: Upload data from an on-premises machine

Procedure

  1. Create a .manifest or .txt file based on format requirements.

  2. Upload the .manifest or .txt file to OSS. For more information, see 简单上传.

  3. Create a dataset based on data stored in an Alibaba Cloud storage service. For more information, see Create a dataset based on data that is stored in an Alibaba Cloud storage service.

  1. Create a .csv or .xlsx file based on format requirements.

  2. Go to the iTAG page.

    1. Log on to the PAI console.

    2. In the left-side navigation pane, click Workspaces. On the Workspaces page, click the workspace name.

    3. In the left-side navigation pane, choose Data Preparation>iTAG.

  3. On the iTAG page, click Go to Task Center or Go to Management Page.

  4. On the page that appears, click the Data Management tab. In the upper-right corner of the Data Management tab, click Create Original Dataset.

  5. In the Create Original Dataset dialog box, configure parameters.

    • Select Local Upload for Import Data.

    • Select File for Import Format.

    • Configure the OSS Bucket and OSS File Path parameters.

    • Click Upload File and select the .csv or .xlsx file.

  6. Click Create.

File name extension

A .manifest or .txt file.

A .csv or .xlsx file.

File format

{"data":{"source":"text sample 1"}}
{"data":{"source":"text sample 2"}}
{"data":{"source":"text sample 3"}}

source indicates the sample content to label. Replace the source value with the text content to label.

A column in the .csv or .xlsx file can be text content or an image URL.

File demo

textDemo1.manifest

textDemo2.csv

Create an image dataset, a video dataset, or an audio dataset

This section describes image dataset creation. The procedure for video and audio datasets follows the same process.

Item

Method 1: Scan a folder

Method 2: Upload data from an on-premises machine

Procedure

  1. Upload image files to an OSS bucket. For more information, see 简单上传.

  2. Create a dataset by scanning a folder. For more information, see Create and manage datasets.

  1. Create a folder containing image files.

  2. Go to the iTAG page.

    1. Log on to the PAI console.

    2. In the left-side navigation pane, click Workspaces. On the Workspaces page, click the workspace name.

    3. In the left-side navigation pane, choose Data Preparation>iTAG.

  3. On the iTAG page, click Go to Task Center or Go to Management Page.

  4. On the page that appears, click the Data Management tab. In the upper-right corner of the Data Management tab, click Create Original Dataset. In the Create Original Dataset dialog box, configure parameters.

    • Select Local Upload for Import Data.

    • Select Folder for Import Format.

    • Configure the OSS Bucket and OSS File Path parameters.

    • Click Upload Folder to upload the folder.

  5. Click Create.

File content

{"data":{"source":"oss://****.oss-cn-hangzhou.aliyuncs.com/iTAG/pic/1.jpg"}}
{"data":{"source":"oss://****.oss-cn-hangzhou.aliyuncs.com/iTAG/pic/10.jpg"}}
{"data":{"source":"oss://****.oss-cn-hangzhou.aliyuncs.com/iTAG/pic/11.jpg"}}

source indicates the sample content to label. Replace the source value with the OSS bucket path.

File demo

Create a custom dataset

Item

Use data stored in an Alibaba Cloud storage service

Procedure

  1. Create a .manifest or .txt file based on format requirements.

  2. Upload the .manifest or .txt file to OSS. For more information, see 简单上传.

  3. Create a dataset based on data stored in an Alibaba Cloud storage service. For more information, see Create a dataset based on data that is stored in an Alibaba Cloud storage service.

File name extension

A .manifest or .txt file.

File format

{"data":{"picture_url":"oss://****.oss-cn-hangzhou.aliyuncs.com/iTAG/pic/1.jpg","text":"Jack Ma established Alibaba Group in an apartment in Hangzhou with 18 founders. The first website of Alibaba Group is Alibaba.com, which is an English website that focuses on the global wholesale trade market."}}
{"data":{"picture_url":"oss://****.oss-cn-hangzhou.aliyuncs.com/iTAG/pic/10.jpg","text":"Alibaba Group held the first West Lake Cybersecurity Conference. During the conference, commercial and opinion leaders of the Internet industry came together to discuss major issues of the industry."}}
{"data":{"picture_url":"oss://****.oss-cn-hangzhou.aliyuncs.com/iTAG/pic/11.jpg","text":"Alibaba Group raised USD 82 million from multiple investment agencies. This event became the largest private equity financing in the China Internet industry at that time."}} 

"data" at the beginning of each row indicates a labeling job. A labeling job includes multiple sample types. Sample names are separated by commas (,).

The following sample code shows an image and text in a labeling job. The sample image storage path is oss://****.oss url 01. The sample text is text sample1.

{"data":{"picture_url":"oss://****.oss url 01","text":"text sample1"}}

File demo

multiModal.manifest

What to do next

After creating a dataset, create a labeling job based on the dataset. For more information, see Create a labeling job.