Create a labeling job in iTAG - Platform For AI - Alibaba Cloud Documentation Center

After you create a dataset, you can create a labeling job and use iTAG to complete the labeling job. Platform for AI (PAI) provides common labeling templates for you to create labeling jobs. If the common labeling templates do not meet your business requirements, you can combine content and topic components to create custom labeling templates based on your business scenarios. This topic describes how to create a labeling job by using a common labeling template.

Prerequisites

PAI is activated, and a workspace is created.
You can use the default workspace or create a workspace based on your business plan. For more information about how to create a default workspace, see Activate PAI and create a default workspace. For more information about how to create a regular workspace, see Create a workspace.
Object Storage Service (OSS) is activated. The file that contains the data that you want to label is uploaded to an OSS bucket, and a dataset is generated for the file. For more information, see Create a dataset for a labeling job.

Limits

Only workspace administrators and labeling administrators can manage labeling jobs. If you do not have required permissions, contact a workspace administrator to assign the labeling administrator role to your account. For more information, see Manage members of the workspace.

Procedure

Go to the iTAG page.
1. Log on to the PAI console.
2. In the left-side navigation pane, click Workspaces. On the Workspaces page, click the name of the workspace that you want to manage.
3. In the left-side navigation pane, choose Data Preparation > iTAG.
On the Jobs tab of the iTAG page, click Create Task.

On the Create Labeling Job page, configure the parameters. The following table describes key parameters. Configure other parameters based on your business requirements.

Parameter	Description
Task Name	The name of the labeling job. The name must be 1 to 100 characters in length, and can contain letters, digits, underscores (_), and hyphens (-). It must start with a letter or a digit.
Input Dataset	Select a dataset that is created on the Dataset management page in the PAI console.
Template Type	Select a type of labeling template. Valid values: Common Template: common labeling templates provided by PAI. Custom Template: custom labeling templates that you created. You can combine content components and topic components as prompted to create custom labeling templates. Custom labeling templates are suitable for scenarios in which you have diversified requirements. For more information about the input and output data formats of custom labeling templates, see Custom labeling templates.
Template	If you select Common Template for the Template Type parameter, you can specify the type of the common labeling template that you want to use. Valid values: Valid values if you select Image: For more information about the scenarios for which the labeling templates are suitable and the input and output data formats of the labeling templates, see Image labeling templates. OCR: extracts text from selected parts of images by using optical character recognition (OCR). Object Detection: finds objects in images. Image Classification: classifies images by adding one or more labels to the images. PDF: performs OCR and label classification on PDF files. Moderation and Matting: performs moderation and matting tasks on images. Table Recognition: recognizes the essential information of a table and edits the content as required. Valid values if you select Text For more information about the scenarios for which the labeling templates are suitable and the input and output data formats of the labeling templates, see Text labeling templates. Named Entity Recognition: recognizes named entities. Text Classification: classifies text by adding one or more labels to the text. Relationship Analysis for Named Entities: analyzes relationships between named entities, which is suitable for scenarios such as knowledge graph creation. Valid values if you select Video: For more information about the scenarios for which the labeling templates are suitable and the input and output data formats of the labeling templates, see Video labeling template. Video Classification: classifies videos by adding one or more labels to the videos. Valid values if you select Audio: For more information about the scenarios for which the labeling templates are suitable and the input and output data formats of the labeling templates, see Audio labeling template. Audio Classification: classifies audio files by adding one or more labels to the audio files. Audio Segmentation: divides an audio file into several audio clips and adds labels to the audio clips. Automatic Speech Recognition: converts the content of audio files to text.
OCR Identification Result Configuration	This parameter is available only if you set Template to Image and Image to OCR. By default, OCR Identification Result is selected, which specifies that text is extracted from selected parts of images by using OCR.
Label Configuration	Enter the names of labels that labeling workers need to recognize, select, and label in the labeling job. Press the Enter key to complete the configuration of a label. For example, when you create a labeling job that is used to recognize cats in images, you can enter the names of labels such as Cat, American Shorthair, and British Shorthair to help with image labeling. In this section, you can also specify whether labeling workers can add more than one label to the objects they select in the labeling job. If you want to add only one label to an object, select Single Choice. If you want to add more than one label to an object, select Multiple Choice. In this example, if you select Multiple Choice, a labeling worker can add both the Cat and American Shorthair labels to a selected cat image. Note Take note that the Single Choice or Multiple Choice option indicates only the number of labels that can be added to an object at a time, but not the number of times that an object can be selected and labeled.
Enable or disable smart labeling	In the Intelligent Labeling Configurations step, configure data pre-labeling. For more information, see Configure intelligent pre-labeling in iTAG.
Task Description	The description of the labeling job, which is used to distinguish different labeling jobs.
Assign Subtask Packages	The rule based on which the labeling job is divided into multiple job packages. After the job packages are distributed, labeling workers claim job packages and label data entries in the job packages. Valid values: Fixed Size: specifies a fixed number of data entries for each job package. The following section describes the requirements for the number of data entries in different scenarios: If the dataset has 0 to 20,000 data entries, a job package can contain 1 to 200 data entries. If the dataset has 20,000 to 100,000 data entries, a job package can contain 5 to 200 data entries. If the dataset has 100,000 to 500,000 data entries, a job package can contain 25 to 200 data entries. If the dataset has 500,000 to a million data entries, a job package can contain 50 to 200 data entries. Based on Imported Field: distributes the labeling job based on the value of the field that you specify. Data entries that have the same field value are placed in the same job package. Targeted Distribution: distributes the labeling job to specific labeling workers or teams.
Check Proportion	The percentage of the job packages that you want to review in all job packages in the labeling job. This parameter is required if you select Labeling - Checking or Labeling - Checking - Acceptance for the Task Workflow parameter. The default percentage is 100%.
User Configuration	The user configuration. You can specify labeling workers, reviewers, acceptance staff, and job administrators based on the value that you specify for Task Workflow. You can specify multiple members in the current workspace to cooperate on the labeling job. For more information about the roles involved in iTAG labeling jobs, see Overview.

After the application configuration is completed, click Create.

View the job list

After a labeling job is complete, you can go to the Task Center tab to view the job list. In the job list, you can view the states of all labeling jobs and select options in the Actions column to manage labeling jobs. For example, you can view the job details and labeling results.

Area	Task	Description
1	Process labeling jobs	In the upper-right corner of the iTAG page, click Go to the iTAG Page to go to the iTAG console. In the iTAG console, you can process, review, and accept the job packages that you claim. For more information, see Process labeling jobs.
2	View the status of a task	On the Task Center tab, you can view the status of all labeling jobs.
3	Manage subtask packages	If a labeling job is not complete, you can click Subtask Details in the Actions column to view the details of job packages in the labeling job. If a job package is not complete, you can click Transfer to transfer the job package to other labeling workers. You can also click Release to release the job package. Then, the job package can be claimed by other labeling workers.
4	Export labeling result View export progress	After a labeling job is complete, you can click Export Labeling Result in the Actions column to export the labeling results as prompted. You can click Obtain data record in the upper-right corner of the Task Center tab to view the export progress. For more information, see Export labeling results.
5	Related operations	You can click the icon to perform other operations on a labeling job, such as unpublishing or publishing the job.

What to do next

You can claim and process job packages in a labeling job. For more information, see Process labeling jobs.