This topic describes the /green/image/scan operation that you can call to submit optical character recognition (OCR) tasks and obtain OCR results in real time. You can submit the OCR tasks to detect and obtain text in images.
Operation description
Operation: /green/image/scan
You can call this operation to submit synchronous OCR tasks. For more information about how to construct an HTTP request, see Request structure. You can also select an existing HTTP request. For more information, see SDK overview.
- Billing method:
You are charged for calling this operation. For more information about the billing method, see Content Moderation Pricing.
- Response timeout:
The maximum response time that is allowed for a synchronous moderation request is 6s. If the moderation is not complete within 6s, a timeout error is returned. If you do not need to obtain moderation results in real time, you can send asynchronous moderation requests. In most cases, we recommend that you send synchronous moderation requests because synchronous moderation operations are easier to call. We recommend that you set the timeout period to 6s for calling synchronous moderation operations.
- Return results:
In general, moderation results are returned within 1s after you send a synchronous moderation request. The time may increase in special scenarios where a large number of requests are to be processed in the system, the size of images is large, or the images contain a large number of words. The speed of OCR is inversely proportional to the number of words in images. If the images to be moderated contain a large number of words, we recommend that you send asynchronous moderation requests.
- Limits on images:
- The images must use HTTP or HTTPS URLs.
- The images must be in the PNG, JPG, JPEG, BMP, GIF, or WEBP format.
- An image can be up to 20 MB in size. The limit for the image size is applicable to both synchronous and asynchronous moderation operations.
- The duration for downloading an image is limited to 3s. If an image fails to be downloaded within 3s, a timeout error is returned.
- We recommend that you submit images of at least 256 × 256 pixels to ensure the moderation effects.
- The response time of an operation for moderating images varies based on the duration for downloading these images. Make sure that you use a stable and reliable storage service to store the images to be moderated. We recommend that you use Object Storage Service (OSS) or Content Delivery Network (CDN).
QPS limit
You can send up to 50 requests per second to call this operation by using your Alibaba Cloud account. If you send an excessive number of requests, throttling is implemented, and your business may be affected.
Request parameters
Parameter | Type | Required | Example | Description |
---|---|---|---|---|
bizType | String | No | default | The business scenario. You can create a business scenario in the Content Moderation console. For more information, see Customize policies for machine-assisted moderation. |
scenes | StringArray | Yes | ["ocr"] | The moderation scenario. Set the value to ocr. |
tasks | JSONArray | Yes | The list of OCR tasks. Each element in the JSON array is an OCR task structure and corresponds to an image. The JSON array can contain a maximum of 100 elements. In other words, you can submit a maximum of 100 OCR tasks at a time. To submit 100 OCR tasks at a time, you must raise the relevant concurrency limit to a number greater than 100. For more information about the structure of each element, see task. |
Parameter | Type | Required | Example | Description |
---|---|---|---|---|
dataId | String | No | test_data_xxxx | The ID of the image to be moderated. Make sure that each ID is unique in a request. |
url | String | Yes | https://example.com/test_image_xxxx.png | The URL of the image to be moderated. |
interval | Integer | No | 2 | The interval between two frames that are consecutively captured. This parameter is
dedicated for GIF or long image moderation.
By default, only the first frame of a GIF image or a long image is moderated. You can use the interval parameter to specify the interval between two frames that the system consecutively captures. This helps reduce moderation costs. Note The interval and maxFrames parameters must be used in pairs. For example, the interval parameter is set to 2 and the maxFrames parameter is set to 100 for moderating a GIF image or a long image. In this example,
one out of every two frames is moderated and a maximum of 100 frames are moderated.
The fee is calculated based on the actual number of moderated frames.
|
maxFrames | Integer | No | 100 | The maximum number of frames to be captured. This parameter is dedicated for GIF or
long image moderation. Default value: 1.
If the value of the |
Response parameters
Parameter | Type | Example | Description |
---|---|---|---|
code | Integer | 200 | The returned HTTP status code. |
msg | String | OK | The message that is returned for the request. |
dataId | String | test_data_xxxx | The ID of the moderation object.
Note If you set the dataId parameter in the moderation request, the dataId parameter is returned in the response.
|
taskId | String | img5A@k7a@B4q@6K@d9nfKgOs-1sWeLu | The ID of the OCR task. |
url | String | https://example.com/test_image_xxxx.png | The URL of the moderation object. |
results | Array | The return results. If HTTP status code 200 is returned after a successful call, the array in the return results contains one or more elements. Each element is a structure. For more information about the structure of each element, see result. |
Parameter | Type | Example | Description |
---|---|---|---|
scene | String | ocr | The moderation scenario. The value is fixed to ocr. |
label | String | ocr | The category of the OCR results. Valid values:
|
suggestion | String | review | The machine-assisted moderation result of the moderated image. Valid values:
|
rate | Float | 99.91 | The probability that the moderated image falls into the detected category. You can ignore this parameter in the OCR scenario. |
ocrLocations | Array | The information about the single text entry in the moderated static image, which includes
the text, text size, and text location. For more information about the structure,
see ocrLocation.
Note If no text is detected in the moderated image, this parameter is not returned.
|
|
ocrData | Array | ["hello, this is a test text."] | The combination of all text in the moderated static image. In general, the text combination
is stored as the first element of the array.
Note If no text is detected in the moderated image, this parameter is not returned.
|
frames | Array | xxx | The frames that are captured from the moderated GIF image and the text that is detected
in each frame.
Note If no more than one frame is captured, this parameter is not returned.
|
Parameter | Type | Example | Description |
---|---|---|---|
text | String | hello | The single text entry that is detected in the moderated image. |
x | Float | 41 | The distance between the upper-left corner of the text area and the y-axis, with the upper-left corner of the image being the coordinate origin. Unit: pixels. |
y | Float | 84 | The distance between the upper-left corner of the text area and the x-axis, with the upper-left corner of the image being the coordinate origin. Unit: pixels. |
w | Float | 83 | The width of the text area. Unit: pixels. |
h | Float | 26 | The height of the text area. Unit: pixels. |
Examples
{
"scenes": [
"ocr"
],
"tasks": [
{
"dataId": "test_data_xxxx",
"url": "https://test_image_xxxx.png"
}
]
}
{
"code": 200,
"data": [
{
"code": 200,
"dataId": "test_data_xxxx",
"extras": {
},
"msg": "OK",
"results": [
{
"label": "ocr",
"ocrData": [
"hello, this is a test text."
],
"ocrLocations": [
{
"h": 26,
"text": "hello",
"w": 83,
"x": 41,
"y": 84
},
{
"h": 25,
"text": " this is a test text.",
"w": 95,
"x": 78,
"y": 114
}
],
"rate": 99.91,
"scene": "ocr",
"suggestion": "review"
}
],
"taskId": "img5A@k7a@B4q@6K@d9nfKgOs-1sWeLu",
"url": "https://example.com/test_image_xxxx.png"
}
],
"msg": "OK",
"requestId": "C4AB08A9-AD75-4410-859B-0B9EF6DFC3C4"
}