iTAG of Machine Learning Platform for AI (PAI) provides labeling templates for audio classification, audio segmentation, and Automatic Speech Recognition (ASR). When you create an audio labeling job, you can select a labeling template based on your business scenario. This topic describes scenarios of audio labeling templates and the data structures of input and output data for these templates.
Background information
iTAG provides audio labeling templates that support the following features:
Audio classification
Audio classification is used to find one or more labels that match input audio from a set of labels and add the labels to the audio. This template supports single-label and multi-label audio classification.
Scenarios
This labeling template applies to scenarios such as classification of environment sound.
Data structures
Input data
Each row in the .manifest file of input data contains an object. Each row must contain the source field.
{"data":{"source":"oss://examplebucket.oss-cn-hangzhou.aliyuncs.com/iTAG/audio/1.wav"}} ...
Output data
Each row in the .manifest file of output data contains an object and the labeling results for the object. The following code provides an example on the JSON string in each row:
{ "data": { "source": "oss://itag.oss-cn-hangzhou.aliyuncs.com/examplebucket/6.wav" }, "label-1432993193909231616": { "results": [ { "questionId": "1", "data": "Label 1", "markTitle": "Single-choice", "type": "survey/value" } ] } }
Audio segmentation
Audio segmentation is used to divide a piece of audio into several clips and label these clips. You can use a sound wave graph to decide how to divide the audio.
Scenarios
This labeling template applies to scenarios such as dialogue analysis.
Data structures
Input data
Each row in the .manifest file of input data contains an object. Each row must contain the source field.
{"data":{"source":"oss://examplebucket.oss-cn-hangzhou.aliyuncs.com/iTAG/audio/1.wav"}} ...
Output data
Each row in the .manifest file of output data contains an object and the labeling results for the object. The following code provides an example on the JSON string in each row:
{ "data": { "source": "oss://itag.oss-cn-hangzhou.aliyuncs.com/examplebucket/21.wav" }, "label-1435480301706092544": { "results": [ { "duration": 0, "objects": [ { "result": { "Audio segmentation result": "Result 1", "Single-choice": "Label 1" }, "color": null, "id": "wavesurfer_ei0aet9uvp8", "start": 2.3886218302094817, "end": 4.635545755237045 }, { "result": { "Audio segmentation result": "Result 2", "Single-choice": "Label 2" }, "color": null, "id": "wavesurfer_kl39gnlb2k", "start": 5.698280044101433, "end": 7.348048511576626 } ], "empty": false } ] } }
ASR
ASR is used to transform the content of audio to text and label the text.
Scenarios
This labeling template applies to scenarios such as dialect recognition.
Data structures
Input data
Each row in the .manifest file of input data contains an object. Each row must contain the source field.
{"data":{"source":"oss://examplebucket.oss-cn-hangzhou.aliyuncs.com/iTAG/audio/1.wav"}} ...
Output data
Each row in the .manifest file of output data contains an object and the labeling results for the object. The following code provides an example on the JSON string in each row:
{ "data": { "source": "oss://itag.oss-cn-hangzhou.aliyuncs.com/examplebucket/14.wav" }, "label-1435448359497441280": { "results": [ { "questionId": "1", "data": "ASR result", "markTitle": "ASR result", "type": "survey/value" }, { "questionId": "3", "data": [ "Label 1", "Label 2" ], "markTitle": "Multiple-choice", "type": "survey/multivalue" } ] } }