iTAG supports audio classification, audio segmentation, and automatic speech recognition (ASR) labeling templates. Learn the input and output data formats for each template.
Supported templates
The following audio labeling templates are available:
Audio classification
Audio classification assigns one or more predefined labels to an audio clip. This template supports both single-label and multi-label classification.
-
Use case
Ambient sound classification.
-
Data structure
-
Input data
Each line in the input .manifest file is a JSON object that represents a single audio file. Each object must contain the source field.
{"data":{"source":"oss://example-bucket.oss-cn-hangzhou.aliyuncs.com/iTAG/audio/1.wav"}} ... -
Output data
Each line in the output .manifest file is a JSON object that contains the source audio file location and annotation results.
{ "data": { "source": "oss://example-bucket.oss-cn-hangzhou.aliyuncs.com/audio/6.wav" }, "label-1432993193909231616": { "results": [ { "questionId": "1", "data": "Label 1", "markTitle": "single-choice", "type": "survey/value" } ] } }
-
Audio segmentation
Audio segmentation identifies and labels specific time segments within an audio file. Use the sound wave graph to define start and end times for each segment.
-
Use case
Conversation analysis.
-
Data structure
-
Input data
Each line in the input .manifest file is a JSON object that represents a single audio file. Each object must contain the source field.
{"data":{"source":"oss://example-bucket.oss-cn-hangzhou.aliyuncs.com/iTAG/audio/1.wav"}} ... -
Output data
Each line in the output .manifest file is a JSON object that contains the source audio file location and annotation results.
{ "data": { "source": "oss://example-bucket.oss-cn-hangzhou.aliyuncs.com/audio/21.wav" }, "label-1435480301706092544": { "results": [ { "duration": 0, "objects": [ { "result": { "Audio recognition result": "This is the transcribed content for segment 1.", "single-choice": "Label 1" }, "color": null, "id": "wavesurfer_ei0aet9uvp8", "start": 2.3886218302094817, "end": 4.635545755237045 }, { "result": { "Audio recognition result": "This is the transcribed content for segment 2.", "single-choice": "Label 2" }, "color": null, "id": "wavesurfer_kl39gnlb2k", "start": 5.698280044101433, "end": 7.348048511576626 } ], "empty": false } ] } }
-
Automatic speech recognition (ASR)
ASR converts spoken audio into written text. This template supports transcription with label assignment.
-
Use case
Dialect recognition.
-
Data structure
-
Input data
Each line in the input .manifest file is a JSON object that represents a single audio file. Each object must contain the source field.
{"data":{"source":"oss://example-bucket.oss-cn-hangzhou.aliyuncs.com/iTAG/audio/1.wav"}} ... -
Output data
Each line in the output .manifest file is a JSON object that contains the source audio file location and annotation results.
{ "data": { "source": "oss://example-bucket.oss-cn-hangzhou.aliyuncs.com/audio/14.wav" }, "label-1435448359497441280": { "results": [ { "questionId": "1", "data": "This is the transcribed content.", "markTitle": "Audio recognition result", "type": "survey/value" }, { "questionId": "3", "data": [ "Label 1", "Label 2" ], "markTitle": "multiple-choice", "type": "survey/multivalue" } ] } }
-