The Paimon Blob format stores large binary objects, such as images and videos. Unlike inline formats, it uses separate files with a layout optimized for random access.
Supported versions
This feature requires engine version esr-4.7.0 or later.
Paimon Blob tables
To create a Paimon Blob table, you must define its schema and properties. The following template shows a standard creation statement and explains the key parameters.
CREATE TABLE blob_tbl (
fileName STRING,
picture BINARY
)
USING paimon
TBLPROPERTIES (
'row-tracking.enabled' = 'true',
'data-evolution.enabled' = 'true',
'blob-field' = 'picture'
);
Key parameters:
-
'blob-field' = 'picture': Declares thepicturefield as the Blob field. This field must be of theBINARYtype. -
'row-tracking.enabled' = 'true': Enables row-level tracking, which is required forMERGE INTOupdate and delete operations. -
'data-evolution.enabled' = 'true': Allows schema evolution.
Usage example
This example demonstrates how to read image files from Object Storage Service (OSS) and write them to a Paimon Blob table.
Prepare sample images
Download the following sample images to use in this example.
Upload sample images
Upload the images to the Object Storage Service (OSS) console. For details, see Simple Upload.
In this example, upload the two sample images, cat.png and dog.jpg, to the pictures/ directory in your bucket.
Develop and run
-
On the EMR Serverless Spark page, click Development in the left-side navigation pane.
This opens the development page. The main panel contains the Development and Data Directory tabs. The directory tree contains the Development and Git Directory nodes.
-
Create a notebook.
-
On the Development tab, click the
icon. -
In the dialog box that appears, enter a name, select interactive development > Notebook as the type, and then click OK.
-
-
In the upper-right corner, select a running notebook session instance.
You can also select Create Notebook Session from the drop-down list to create a notebook session instance. For more information about notebook sessions, see Manage notebook sessions.
-
Copy the following code into a Python cell in the new notebook. Replace
<yourBucketName>with your bucket name and<yourPicturePath>with the path to your images, such aspictures.from PIL import Image import io from IPython.display import display from pyspark.sql.functions import input_file_name, col, monotonically_increasing_id, regexp_extract # 1. Recursively read image files from OSS. df = ( spark.read.format("binaryFile") .option("recursiveFileLookup", "true") .load("oss://<yourBucketName>/<yourPicturePath>/") ) # 2. Extract the file name and image binary data. df_with_id = ( df.select( col("content").alias("picture"), regexp_extract(input_file_name(), r".*/(.+)$", 1).alias("fileName") ) .select("fileName", "picture") ) # 3. Create a temporary view. df_with_id.createOrReplaceTempView("temp_images") # 4. Preview the image metadata. print("Preview of image metadata (first few rows):") spark.sql("SELECT fileName, length(picture) AS size_bytes FROM temp_images LIMIT 5").show(truncate=False) # 5. Create the blob table. spark.sql("DROP TABLE IF EXISTS blob_tbl") spark.sql(""" CREATE TABLE blob_tbl ( fileName STRING, picture BINARY ) USING paimon TBLPROPERTIES ( 'row-tracking.enabled' = 'true', 'data-evolution.enabled' = 'true', 'blob-field' = 'picture' ) """) # 6. Write data to the table. print("Writing images to 'blob_tbl'...") spark.sql("INSERT INTO blob_tbl SELECT * FROM temp_images") # 7. Read and display the first two images. print("\nFetching and displaying the first 2 images from the table...\n") result_df = spark.sql("SELECT fileName, picture FROM blob_tbl LIMIT 2") rows = result_df.collect() if not rows: print("No images found in the table.") else: for i, row in enumerate(rows, start=1): print(f"[{i}] Displaying: {row.fileName}") try: img = Image.open(io.BytesIO(row.picture)) display(img) except Exception as e: print(f" ⚠️ Failed to load image: {e}") -
Click Execute All Cells and view the results below.
Preview of image metadata (first few rows): +--------+----------+ |fileName|size_bytes| +--------+----------+ |cat.png |354076 | |dog.jpg |59207 | +--------+----------+ Writing images to 'blob_tbl'... Fetching and displaying the first 2 images from the table... [1] Displaying: cat.png [2] Displaying: dog.jpg
Related documentation
For more information about Paimon Blob tables, see Blob Storage.