All Products
Search
Document Center

Platform For AI:Audio labeling data format

Last Updated:Mar 06, 2026

iTAG provides labeling templates for audio classification, audio segmentation, and audio recognition. When you create a labeling job, select the template that is appropriate for your scenario. This topic describes the scenarios and data structures for these audio templates.

Background information

This topic describes the data structures for the following audio labeling templates:

Audio classification

Audio classification assigns one or more labels from a predefined set to an audio input. This template supports both single-label and multi-label audio classification.

  • Scenarios

    Example scenarios include environmental sound classification.

  • Data structures

    • Input data

      Each row in the manifest file represents an object and must contain the source field.

      {"data":{"source":"oss://example-bucket.oss-cn-hangzhou.aliyuncs.com/iTAG/audio/1.wav"}}
      ...
    • Output data

      Each row in the manifest file contains the object and its labeling result. The following code shows the JSON structure of each row.

      {
          "data": {
              "source": "oss://itag.oss-cn-hangzhou.aliyuncs.com/example-bucket/6.wav"
          },
          "label-1432993193909231616": {
              "results": [
                  {
                      "questionId": "1", 
                      "data": "Label 1", 
                      "markTitle": "Single-choice", 
                      "type": "survey/value"
                  }
              ]
          }
      }

Audio segmentation

Audio segmentation divides an audio clip into multiple segments based on a waveform graph and then assigns a label to each segment.

  • Scenarios

    Analyze conversation content and more.

  • Data structures

    • Input data

      Each row in the manifest file represents an object and must contain the source field.

      {"data":{"source":"oss://example-bucket.oss-cn-hangzhou.aliyuncs.com/iTAG/audio/1.wav"}}
      ...
    • Output data

      Each row in the manifest file contains the object and its labeling result. The following code shows the JSON structure of each row.

      {
          "data": {
              "source": "oss://itag.oss-cn-hangzhou.aliyuncs.com/example-bucket/21.wav"
          }, 
          "label-1435480301706092544": {
              "results": [
                  {
                      "duration": 0, 
                      "objects": [
                          {
                              "result": {
                                  "Audio recognition result": "Recognized content 1.", 
                                  "Single-choice": "Label 1"
                              }, 
                              "color": null, 
                              "id": "wavesurfer_ei0aet9uvp8", 
                              "start": 2.3886218302094817, 
                              "end": 4.635545755237045
                          }, 
                          {
                              "result": {
                                  "Audio recognition result": "Recognized content 2.", 
                                  "Single-choice": "Label 2"
                              }, 
                              "color": null, 
                              "id": "wavesurfer_kl39gnlb2k", 
                              "start": 5.698280044101433, 
                              "end": 7.348048511576626
                          }
                      ], 
                      "empty": false
                  }
              ]
          }
      }

Audio recognition

Audio recognition, also known as Automatic Speech Recognition (ASR), converts an audio clip into text and matches the text with corresponding labels.

  • Scenarios

    Example scenarios include dialect recognition.

  • Data structures

    • Input data

      Each row in the manifest file represents an object and must contain the source field.

      {"data":{"source":"oss://example-bucket.oss-cn-hangzhou.aliyuncs.com/iTAG/audio/1.wav"}}
      ...
    • Output data

      Each row in the manifest file contains the object and its labeling result. The following code shows the JSON structure of each row.

      {
          "data": {
              "source": "oss://itag.oss-cn-hangzhou.aliyuncs.com/example-bucket/14.wav"
          }, 
          "label-1435448359497441280": {
              "results": [
                  {
                      "questionId": "1", 
                      "data": "ASR result.", 
                      "markTitle": "ASR result", 
                      "type": "survey/value"
                  }, 
                  {
                      "questionId": "3", 
                      "data": [
                          "Label 1", 
                          "Label 2"
                      ], 
                      "markTitle": "Multiple-choice", 
                      "type": "survey/multivalue"
                  }
              ]
          }
      }