iTAG provides labeling templates for audio classification, audio segmentation, and audio recognition. When you create a labeling job, select the template that is appropriate for your scenario. This topic describes the scenarios and data structures for these audio templates.
Background information
This topic describes the data structures for the following audio labeling templates:
Audio classification
Audio classification assigns one or more labels from a predefined set to an audio input. This template supports both single-label and multi-label audio classification.
-
Scenarios
Example scenarios include environmental sound classification.
-
Data structures
-
Input data
Each row in the manifest file represents an object and must contain the source field.
{"data":{"source":"oss://example-bucket.oss-cn-hangzhou.aliyuncs.com/iTAG/audio/1.wav"}} ... -
Output data
Each row in the manifest file contains the object and its labeling result. The following code shows the JSON structure of each row.
{ "data": { "source": "oss://itag.oss-cn-hangzhou.aliyuncs.com/example-bucket/6.wav" }, "label-1432993193909231616": { "results": [ { "questionId": "1", "data": "Label 1", "markTitle": "Single-choice", "type": "survey/value" } ] } }
-
Audio segmentation
Audio segmentation divides an audio clip into multiple segments based on a waveform graph and then assigns a label to each segment.
-
Scenarios
Analyze conversation content and more.
-
Data structures
-
Input data
Each row in the manifest file represents an object and must contain the source field.
{"data":{"source":"oss://example-bucket.oss-cn-hangzhou.aliyuncs.com/iTAG/audio/1.wav"}} ... -
Output data
Each row in the manifest file contains the object and its labeling result. The following code shows the JSON structure of each row.
{ "data": { "source": "oss://itag.oss-cn-hangzhou.aliyuncs.com/example-bucket/21.wav" }, "label-1435480301706092544": { "results": [ { "duration": 0, "objects": [ { "result": { "Audio recognition result": "Recognized content 1.", "Single-choice": "Label 1" }, "color": null, "id": "wavesurfer_ei0aet9uvp8", "start": 2.3886218302094817, "end": 4.635545755237045 }, { "result": { "Audio recognition result": "Recognized content 2.", "Single-choice": "Label 2" }, "color": null, "id": "wavesurfer_kl39gnlb2k", "start": 5.698280044101433, "end": 7.348048511576626 } ], "empty": false } ] } }
-
Audio recognition
Audio recognition, also known as Automatic Speech Recognition (ASR), converts an audio clip into text and matches the text with corresponding labels.
-
Scenarios
Example scenarios include dialect recognition.
-
Data structures
-
Input data
Each row in the manifest file represents an object and must contain the source field.
{"data":{"source":"oss://example-bucket.oss-cn-hangzhou.aliyuncs.com/iTAG/audio/1.wav"}} ... -
Output data
Each row in the manifest file contains the object and its labeling result. The following code shows the JSON structure of each row.
{ "data": { "source": "oss://itag.oss-cn-hangzhou.aliyuncs.com/example-bucket/14.wav" }, "label-1435448359497441280": { "results": [ { "questionId": "1", "data": "ASR result.", "markTitle": "ASR result", "type": "survey/value" }, { "questionId": "3", "data": [ "Label 1", "Label 2" ], "markTitle": "Multiple-choice", "type": "survey/multivalue" } ] } }
-