Scenarios and data structures of audio templates - Platform For AI - Alibaba Cloud - Platform For AI

Supported templates

The following audio labeling templates are available:

Audio classification
Audio segmentation
Automatic speech recognition (ASR)

Audio classification

Audio classification assigns one or more predefined labels to an audio clip. This template supports both single-label and multi-label classification.

Use case

Ambient sound classification.

Data structure

Input data
Each line in the input .manifest file is a JSON object that represents a single audio file. Each object must contain the source field.
```
{"data":{"source":"oss://example-bucket.oss-cn-hangzhou.aliyuncs.com/iTAG/audio/1.wav"}}
...
```

Output data

Each line in the output .manifest file is a JSON object that contains the source audio file location and annotation results.

{
    "data": {
        "source": "oss://example-bucket.oss-cn-hangzhou.aliyuncs.com/audio/6.wav"
    },
    "label-1432993193909231616": {
        "results": [
            {
                "questionId": "1",
                "data": "Label 1",
                "markTitle": "single-choice",
                "type": "survey/value"
            }
        ]
    }
}

Audio segmentation

Audio segmentation identifies and labels specific time segments within an audio file. Use the sound wave graph to define start and end times for each segment.

Use case

Conversation analysis.

Data structure

Input data
Each line in the input .manifest file is a JSON object that represents a single audio file. Each object must contain the source field.
```
{"data":{"source":"oss://example-bucket.oss-cn-hangzhou.aliyuncs.com/iTAG/audio/1.wav"}}
...
```

Output data

Each line in the output .manifest file is a JSON object that contains the source audio file location and annotation results.

{
    "data": {
        "source": "oss://example-bucket.oss-cn-hangzhou.aliyuncs.com/audio/21.wav"
    },
    "label-1435480301706092544": {
        "results": [
            {
                "duration": 0,
                "objects": [
                    {
                        "result": {
                            "Audio recognition result": "This is the transcribed content for segment 1.",
                            "single-choice": "Label 1"
                        },
                        "color": null,
                        "id": "wavesurfer_ei0aet9uvp8",
                        "start": 2.3886218302094817,
                        "end": 4.635545755237045
                    },
                    {
                        "result": {
                            "Audio recognition result": "This is the transcribed content for segment 2.",
                            "single-choice": "Label 2"
                        },
                        "color": null,
                        "id": "wavesurfer_kl39gnlb2k",
                        "start": 5.698280044101433,
                        "end": 7.348048511576626
                    }
                ],
                "empty": false
            }
        ]
    }
}

Automatic speech recognition (ASR)

ASR converts spoken audio into written text. This template supports transcription with label assignment.

Use case

Dialect recognition.

Data structure

Input data
Each line in the input .manifest file is a JSON object that represents a single audio file. Each object must contain the source field.
```
{"data":{"source":"oss://example-bucket.oss-cn-hangzhou.aliyuncs.com/iTAG/audio/1.wav"}}
...
```

Output data

Each line in the output .manifest file is a JSON object that contains the source audio file location and annotation results.

{
    "data": {
        "source": "oss://example-bucket.oss-cn-hangzhou.aliyuncs.com/audio/14.wav"
    },
    "label-1435448359497441280": {
        "results": [
            {
                "questionId": "1",
                "data": "This is the transcribed content.",
                "markTitle": "Audio recognition result",
                "type": "survey/value"
            },
            {
                "questionId": "3",
                "data": [
                    "Label 1",
                    "Label 2"
                ],
                "markTitle": "multiple-choice",
                "type": "survey/multivalue"
            }
        ]
    }
}