All Products
Search
Document Center

Platform For AI:Overview

Last Updated:Nov 02, 2023

To convert an existing file to a TFRecord file, you must first convert the file to a labeled dataset by using Machine Learning Platform for AI (PAI). Then, you can convert the labeled dataset to a TFRecord file. This topic describes the formats of entries in labeled datasets for single-label image classification, multi-label image classification, object detection, image segmentation, text recognition, and text detection.

The following table describes the elements in a CSV file that contains labeled data.

Element

Data type

Description

Entry ID

INT

The ID of the entry.

Raw data

JSON

The URL of the source image.

Labeling result

JSON

The labeling result.

Single-label image classification

# The entry ID, raw data, and labeling result.
1,{"url":"http://a.jpg"},"{"option":"Passport"}
2,{"url":"http://b.jpg"},"{"option":"Passport"}

The labeling result contains the following information:

{
    "option":"Passport"    # The label of the image.
}

Multi-label image classification

# The entry ID, raw data, and labeling result.
1,{"url":"http://a.jpg"},{["option":"Passport", "option":"ID card"]}
2,{"url":"http://b.jpg"},{["option":"Passport", "option":"Exit-Entry Permit for Traveling to and from Hong Kong and Macau"]}

The labeling result contains the following information:

{
    "option":["Passport", "ID card"]    # The labels of the image.
}

Object detection

# The entry ID, raw data, and labeling result.
1,{"url": "http://b.jpg"},[{"text": "{\"class*\": \"Category 1\"}", "coord": ["306.73", "517.59", "324.42", "282.07", "347.69", "282.07", "333.73", "519.45"]}, {"text": "{\"class*\": \"Category 2\"}", "coord": ["342.11", "723.32", "349.56", "608.81", "366.31", "606.95", "360.73", "730.76"]}]
2,{"url": "http://a.jpg"},[{"text": "{\"class*\": \"Category 1\"}", "coord": ["338.35", "8.53", "700.16", "8.53", "700.16", "50.35", "338.35", "50.35"]}, {"text": "{\"class*\": \"Category 2\"}", "coord": ["26.88", "64.00", "218.03", "64.00", "218.03", "99.84", "26.88", "99.84"]}]"

The labeling result contains the following information:

[        # The list of objects.
    {
        "text":"{\"class*\": \"Category 1\"}",    # The JSON string that indicates the category to which the object belongs.
        "coord":[    # The coordinates that identify the location of the bounding box drawn around the object.
            "338.35",
            "8.53",
            "700.16",
            "8.53",
            "700.16",
            "50.35",
            "338.35",
            "50.35"
        ]
    },
    {
        "text":"{\"class*\": \Category 2\"}",
        "coord":[
            "26.88",
            "64.00",
            "218.03",
            "64.00",
            "218.03",
            "99.84",
            "26.88",
            "99.84"
        ]
    }
]

Image segmentation

Download a sample CSV file.

# The entry ID, raw data, and labeling result.
1,{"http://a.jpg"},{"ossUrl":"http://ossgw.alicdn.com/a.png"}

The labeling result contains the following information:

{
    "ossUrl":"http://ossgw.alicdn.com/a.png"
    # The Object Storage Service (OSS) URL of the mask generated for the image. The mask is in PNG format. An image has red, blue, and green channels for storing information. The red channel usually stores category information of the image.
    # The ID of the channel starts from 0. Valid values: 0 to 3. The value of 0 indicates the background.
}

Text recognition

# The entry ID, raw data, and labeling result.
1,{"url": "http://b.jpg"},{"text": "Text 1"}
2,{"url": "http://a.jpg"},{"text": "Text 2"}

The labeling result contains the following information:

{
    "text":"Text 1"# The recognized text.
}

Text detection

# The entry ID, raw data, and labeling result.
1,{"url": "http://b.jpg"},[[{"text": "{\"direction\": \"Bottom right\", \"class*\": \"Category 1\"}", "coord": ["306.73", "517.59", "324.42", "282.07", "347.69", "282.07", "333.73", "519.45"]}, {"text": "{\"direction\": \"Bottom right\", \"class*\": \"Category 2\"}", "coord": ["342.11", "723.32", "349.56", "608.81", "366.31", "606.95", "360.73", "730.76"]}], {"option": "Bottom right"}]
2,{"url": "http://a.jpg"},[[{"text": "{\"direction\": \"Bottom down\", \"class*\": \"Category 1\"}", "coord": ["338.35", "8.53", "700.16", "8.53", "700.16", "50.35", "338.35", "50.35"]}, {"text": "{\"direction\": \"Bottom down\", \"class*\": \"Category 2\"}", "coord": ["26.88", "64.00", "218.03", "64.00", "218.03", "99.84", "26.88", "99.84"]}], {"option": "Bottom down"}]

The labeling result contains the following information:

[        # The list of text lines.
    [
        {
            "text":"{\"direction\": \"Bottom down\", \"class*\": \"Category 1\"}",
                         # The JSON string that indicates the information about the text line. The direction field indicates the orientation of the text line, whereas the class* field indicates the category of the text.
            "coord":[ # The coordinates that identify the location of the bounding box drawn around the text line.
                "338.35",
                "8.53",
                "700.16",
                "8.53",
                "700.16",
                "50.35",
                "338.35",
                "50.35"
            ]
        },
        {
            "text":"{\"direction\": \"Bottom down\", \"class*\": \"Category 2\"}",
            "coord":[
                "26.88",
                "64.00",
                "218.03",
                "64.00",
                "218.03",
                "99.84",
                "26.88",
                "99.84"
            ]
        }
    ],
    {
        "option":"Bottom down"    # The orientation of the image.
    }
]

The image orientation specifies whether the bottom of the image is downward, upward, leftward, or rightward.

End-to-end text recognition

# The entry ID, raw data, and labeling result.
1,{"url": "http://b.jpg"},[[{"text": "{\"text\": \"Text 1\", \"direction\": \"Bottom right\", \"class*\": \"Category 1\"}", "coord": ["306.73", "517.59", "324.42", "282.07", "347.69", "282.07", "333.73", "519.45"]}, {"text": "{\"text\": \"Text 2\", \"direction\": \"Bottom right\", \"class*\": \"Category 2\"}", "coord": ["342.11", "723.32", "349.56", "608.81", "366.31", "606.95", "360.73", "730.76"]}], {"option": "Bottom right"}]
2,{"url": "http://a.jpg"},[[{"text": "{\"text\": \"Text 3\", \"direction\": \"Bottom down\", \"class*\": \"Category 1\"}", "coord": ["338.35", "8.53", "700.16", "8.53", "700.16", "50.35", "338.35", "50.35"]}, {"text": "{\"text\": \"Text 4\", \"direction\": \"Bottom down\", \"class*\": \"Category 2\"}", "coord": ["26.88", "64.00", "218.03", "64.00", "218.03", "99.84", "26.88", "99.84"]}], {"option": "Bottom down"}]

The labeling result contains the following information:

[        # The list of text lines.
    [
        {
            "text":"{\"text\": \"Text 3\", "\"direction\": \"Bottom down\", \"class*\": \"Category 1\"}",
                         # The JSON string that indicates the information about the text line. The direction field indicates the orientation of the text line, whereas the class* field indicates the category of the text.                         
            "coord":[ # The coordinates that identify the location of the bounding box drawn around the text line.
                "338.35",
                "8.53",
                "700.16",
                "8.53",
                "700.16",
                "50.35",
                "338.35",
                "50.35"
            ]
        },
        {
            "text":"{\"text\": \"Text 4\", \"direction\": \"Bottom down\", \"class*\": \"Category 2\"}",
            "coord":[
                "26.88",
                "64.00",
                "218.03",
                "64.00",
                "218.03",
                "99.84",
                "26.88",
                "99.84"
            ]
        }
    ],
    {
        "option":"Bottom down"    # The orientation of the image.
    }
]

The image orientation specifies whether the bottom of the image is downward, upward, leftward, or rightward.