All Products
Search
Document Center

Platform For AI:Audio labeling templates

Last Updated:Apr 09, 2026

iTAG supports audio classification, audio segmentation, and automatic speech recognition (ASR) labeling templates. Learn the input and output data formats for each template.

Supported templates

The following audio labeling templates are available:

Audio classification

Audio classification assigns one or more predefined labels to an audio clip. This template supports both single-label and multi-label classification.

  • Use case

    Ambient sound classification.

  • Data structure

    • Input data

      Each line in the input .manifest file is a JSON object that represents a single audio file. Each object must contain the source field.

      {"data":{"source":"oss://example-bucket.oss-cn-hangzhou.aliyuncs.com/iTAG/audio/1.wav"}}
      ...
    • Output data

      Each line in the output .manifest file is a JSON object that contains the source audio file location and annotation results.

      {
          "data": {
              "source": "oss://example-bucket.oss-cn-hangzhou.aliyuncs.com/audio/6.wav"
          },
          "label-1432993193909231616": {
              "results": [
                  {
                      "questionId": "1",
                      "data": "Label 1",
                      "markTitle": "single-choice",
                      "type": "survey/value"
                  }
              ]
          }
      }

Audio segmentation

Audio segmentation identifies and labels specific time segments within an audio file. Use the sound wave graph to define start and end times for each segment.

  • Use case

    Conversation analysis.

  • Data structure

    • Input data

      Each line in the input .manifest file is a JSON object that represents a single audio file. Each object must contain the source field.

      {"data":{"source":"oss://example-bucket.oss-cn-hangzhou.aliyuncs.com/iTAG/audio/1.wav"}}
      ...
    • Output data

      Each line in the output .manifest file is a JSON object that contains the source audio file location and annotation results.

      {
          "data": {
              "source": "oss://example-bucket.oss-cn-hangzhou.aliyuncs.com/audio/21.wav"
          },
          "label-1435480301706092544": {
              "results": [
                  {
                      "duration": 0,
                      "objects": [
                          {
                              "result": {
                                  "Audio recognition result": "This is the transcribed content for segment 1.",
                                  "single-choice": "Label 1"
                              },
                              "color": null,
                              "id": "wavesurfer_ei0aet9uvp8",
                              "start": 2.3886218302094817,
                              "end": 4.635545755237045
                          },
                          {
                              "result": {
                                  "Audio recognition result": "This is the transcribed content for segment 2.",
                                  "single-choice": "Label 2"
                              },
                              "color": null,
                              "id": "wavesurfer_kl39gnlb2k",
                              "start": 5.698280044101433,
                              "end": 7.348048511576626
                          }
                      ],
                      "empty": false
                  }
              ]
          }
      }

Automatic speech recognition (ASR)

ASR converts spoken audio into written text. This template supports transcription with label assignment.

  • Use case

    Dialect recognition.

  • Data structure

    • Input data

      Each line in the input .manifest file is a JSON object that represents a single audio file. Each object must contain the source field.

      {"data":{"source":"oss://example-bucket.oss-cn-hangzhou.aliyuncs.com/iTAG/audio/1.wav"}}
      ...
    • Output data

      Each line in the output .manifest file is a JSON object that contains the source audio file location and annotation results.

      {
          "data": {
              "source": "oss://example-bucket.oss-cn-hangzhou.aliyuncs.com/audio/14.wav"
          },
          "label-1435448359497441280": {
              "results": [
                  {
                      "questionId": "1",
                      "data": "This is the transcribed content.",
                      "markTitle": "Audio recognition result",
                      "type": "survey/value"
                  },
                  {
                      "questionId": "3",
                      "data": [
                          "Label 1",
                          "Label 2"
                      ],
                      "markTitle": "multiple-choice",
                      "type": "survey/multivalue"
                  }
              ]
          }
      }