Select a dataset when creating labeling jobs. This topic describes dataset creation and format requirements.
Background information
Before creating a labeling job with iTAG, create the labeling file as a dataset. iTAG of Platform for AI (PAI) supports labeling jobs using a common template or custom template. Data preparation and dataset creation methods vary based on the selected template.
-
Common templates
iTAG provides the following types of common templates: image, text, video, and audio. For dataset creation and format requirements using common templates, see Create a text dataset and Create an image dataset, a video dataset, or an audio dataset.
-
Custom templates
Custom templates enable flexible data labeling. Label multiple sample types such as images and text in a single labeling job. For dataset creation and format requirements using custom templates, see Create a custom dataset.
Prerequisites
Activate Object Storage Service (OSS). For more information, see Get started with the OSS console.
Create a text dataset
|
Item |
Method 1: Use data stored in an Alibaba Cloud storage service |
Method 2: Upload data from an on-premises machine |
|
Procedure |
|
|
|
File name extension |
A .manifest or .txt file. |
A .csv or .xlsx file. |
|
File format |
|
A column in the .csv or .xlsx file can be text content or an image URL. |
|
File demo |
Create an image dataset, a video dataset, or an audio dataset
This section describes image dataset creation. The procedure for video and audio datasets follows the same process.
|
Item |
Method 1: Scan a folder |
Method 2: Upload data from an on-premises machine |
|
Procedure |
|
|
|
File content |
|
|
|
File demo |
|
|
Create a custom dataset
|
Item |
Use data stored in an Alibaba Cloud storage service |
|
Procedure |
|
|
File name extension |
A .manifest or .txt file. |
|
File format |
The following sample code shows an image and text in a labeling job. The sample image storage path is
|
|
File demo |
What to do next
After creating a dataset, create a labeling job based on the dataset. For more information, see Create a labeling job.