All Products
Search
Document Center

Realtime Compute for Apache Flink:PDF_TO_IMAGES

Last Updated:Apr 08, 2026

Splits a PDF file into per-page images and returns each page as a row.

Syntax

PDF_TO_IMAGES(content [, image_format] [, dpi] [, start_page] [, pages])

Parameters

Parameter

Type

Required

Description

content

VARBINARY

Yes

The binary content of the PDF file. Use FETCH_CONTENT to retrieve content from a remote file.

image_format

STRING

No

The output image format. Supported values: 'jpg', 'png'. Default: 'jpg'.

dpi

INT

No

The rendering resolution in dots per inch (DPI), which controls image sharpness. Default: 200.

start_page

INT

No

The first page to process. Page numbers are 0-indexed. Default: 0.

pages

INT

No

The number of pages to process, starting from start_page. Must be used with start_page and cannot be used alone. The function processes pages in the range [start_page, start_page + pages). Default: all pages from start_page to the end of the document.

Return parameters

The function returns one row per page, with the following columns:

Parameter

Type

Description

mime_type

STRING

The MIME type of the output image, such as image/jpeg.

page_no

INT

The PDF page number, 0-indexed.

image_content

VARBINARY

The binary content of the page image.

Example

The following query fetches a PDF from a URL and converts each page to a JPEG image at 150 DPI. The LATERAL TABLE syntax calls PDF_TO_IMAGES as a table-valued function and joins its output rows with the input.

SELECT
    p.mime_type AS mime_type,
    p.page_no AS page_no
FROM (
    SELECT FETCH_CONTENT(pdf_url) AS pdf_content
    FROM (
        VALUES ('https://example.com/sample.pdf')
    ) T (pdf_url)
) AS t1,
LATERAL TABLE(PDF_TO_IMAGES(t1.pdf_content, 'jpg', 150)) AS p(mime_type, page_no, image_content);

Sample output:

mime_type(STRING)

page_no(INT)

image/jpeg

0

image/jpeg

1

image/jpeg

2

image/jpeg

3