Load Vector & Raster Data into GanosBase at Scale via OSS - PolarDB - Alibaba Cloud - PolarDB

GanosBase supports both vector and raster data — the two primary spatial data types in GIS applications. This topic explains how to import each type from Object Storage Service (OSS) into GanosBase using SQL-native tools, and helps you choose the right method for your use case.

Why not use PostGIS import tools

PostGIS, the spatial extension for PostgreSQL Community Edition, provides shp2pgsql and raster2pgsql for importing vector and raster data. These tools are not compatible with PolarDB for PostgreSQL for two reasons:

They require full control over the server's local disk. PolarDB uses a shared storage architecture with no local disk and does not expose host-level information.
PostGIS Raster and GanosBase Raster use different internal storage structures. For more information, see Regional Surface Rainfall Analysis Based on Ganos GanosBase Raster Engine.

GanosBase addresses this by providing function-level import tools that work directly with OSS. All import operations are standard SQL — no command-line access to the host required.

Choose an import method

Use this table to select the right method before diving into the steps.

Data type	Method	Storage	Best for
Vector	QGIS (or other GIS desktop tools)	In-database	One-time imports with a visual workflow
Vector	Foreign Data Wrapper (FDW)	In-database or query-only	Programmatic imports; unified access to OSS and cluster tables
Raster	`ST_ImportFrom`	In-database (pixel data stored in 256x256-pixel chunks)	Persistent storage; data remains available even if the original OSS file is deleted
Raster	`ST_CreateRast`	Reference only (metadata stored; pixel data stays in OSS)	Large files where you want to avoid duplicating data in the database
Any	Python batch script	Depends on the function used	Bulk imports from OSS

Import vector data

Import using QGIS

PolarDB for PostgreSQL is fully compatible with PostgreSQL Community Edition, so any GIS desktop tool that supports PostgreSQL connections works with GanosBase. QGIS is a common choice.

Create a PostgreSQL data source connection in QGIS pointing to your PolarDB cluster, then use the tool's built-in data import capability to load vector data.

Import using FDW (recommended)

Foreign Data Wrapper (FDW) is a PostgreSQL extension for accessing external data, including files in OSS and tables in other databases within the same cluster. GanosBase FDW automatically maps geometric spatial data types to the Geometry type, letting you query foreign tables and local cluster tables in a unified way.

The following example imports poly.shp from OSS into GanosBase.

Install the required extensions.

CREATE EXTENSION ganos_spatialref;
CREATE EXTENSION ganos_geometry;
CREATE EXTENSION ganos_fdw;

SELECT ST_RegForeignTables('oss://<access_id>:<secret_key>@[<Endpoint>]/<bucket>/path_to/poly.shp');

Verify the registration by querying the information_schema.foreign_tables view.

SELECT foreign_table_name FROM information_schema.foreign_tables
  ORDER BY foreign_table_name ASC;

The steps above register poly.shp as a foreign table and let you query it through data mapping. No data is written to the database. To copy the data into a local table, run:

CREATE TABLE poly_db AS SELECT * FROM poly;

Import raster data

GanosBase provides two functions for raster import: ST_ImportFrom and ST_CreateRast. Both create a Raster object that stores image metadata, but they handle pixel data differently.

Function	Pixel data location	If the OSS source file is deleted
`ST_ImportFrom`	Copied into the database in 256x256-pixel chunks (configurable via `chunkTableName`)	Data remains available
`ST_CreateRast`	Stays in OSS; only a logical reference is stored in the database	Image is no longer viewable

Use ST_ImportFrom when you need the data to be self-contained in the database. Use ST_CreateRast when you want to keep pixel data in OSS and can guarantee the OSS source remains intact.

Import TIFF data

TIFF is a widely used raster format. The following example loads a TIFF file from OSS.

Create a table with a raster column to store the data.

CREATE TABLE raster_table
(
  id     integer,
  format text,
  rast   raster
);

Import the TIFF file using ST_ImportFrom or ST_CreateRast.

-- ST_ImportFrom: copies pixel data into the database as chunks named 'chunk_table'
INSERT INTO raster_table
SELECT 1, 'TIFF', ST_ImportFrom('chunk_table', 'oss://<access_id>:<secret_key>@[<Endpoint>]/<bucket>/path_to/file');

-- ST_CreateRast: stores only metadata; pixel data stays in OSS
INSERT INTO raster_table
SELECT 2, 'TIFF', ST_CreateRast('oss://<access_id>:<secret_key>@[<Endpoint>]/<bucket>/path_to/file');

Query the imported raster objects using GanosBase raster UDFs (user-defined functions).

SELECT
  id,
  ST_Georeference(rast),
  st_extent(rast),
  ST_NumBands(rast),
  ST_SRID(rast),
  st_value(rast, 0, 100, 100),
  st_value(rast, 1, 200, 200)
FROM raster_table;

Sample result:

Import HDF5 and NetCDF data

HDF5 and NetCDF are standard raster formats used in earth observation, scientific computing, and geoscience (meteorology and oceanography). Both ST_ImportFrom and ST_CreateRast support them with the same syntax as TIFF.

-- Import a NetCDF file using ST_ImportFrom
INSERT INTO raster_table
SELECT 3, 'NC', ST_ImportFrom('chunk_table', 'oss://<access_id>:<secret_key>@[<Endpoint>]/<bucket>/path_to/file');

-- Import an HDF5 file using ST_CreateRast
INSERT INTO raster_table
SELECT 4, 'NC', ST_CreateRast('oss://<access_id>:<secret_key>@[<Endpoint>]/<bucket>/path_to/file');

Files with subdatasets

HDF5 and NetCDF files often contain multiple subdatasets. Append the subdataset name to the file path when importing:

NetCDF: 'oss://...file.nc:sub_name'
HDF5: 'oss://...file.hdf5://path/sub_name'

-- NetCDF with a subdataset
INSERT INTO raster_table
SELECT 5, 'NC', ST_ImportFrom('chunk_table', 'oss://<access_id>:<secret_key>@[<Endpoint>]/<bucket>/path_to/file.nc:sub_name');

-- HDF5 with a subdataset
INSERT INTO raster_table
SELECT 5, 'HDF5', ST_ImportFrom('chunk_table', 'oss://<access_id>:<secret_key>@[<Endpoint>]/<bucket>/path_to/file.hdf5://path/sub_name');

Files with more than three dimensions

GanosBase stores raster data in (w, h, b) dimensions, where w is chunk width, h is chunk height, and b is the number of bands. For HDF5 and NetCDF files with more than three dimensions, calculate b as the product of all dimension sizes except the longitude and latitude axes, then pass it in the chunkdim parameter.

For example, the following NetCDF file has four dimensions: lon, lat, time, and isobaric.

Excluding lon and lat, the product of the remaining dimension sizes is 120. Pass that value as the band count in chunkdim:

INSERT INTO nc_table
VALUES (1, ST_ImportFrom('nc_rbt', '/Users/xiaofei/Data/raster/nc_demo.nc', '{"chunkdim":"(256,256,120)"}'));

Import data in batches

For bulk imports from OSS, use a script. The following Python example iterates over HDF5 files in an OSS bucket and inserts each one into the database using ST_ImportFrom. It uses the oss2 library to list objects and psycopg2 to execute SQL.

Initialize the OSS Python SDK before running this script. For setup instructions, see Initialization. Store your credentials as environment variables rather than hardcoding them in the script.

import os
import oss2
import psycopg2
import logging

# Read credentials from environment variables
access_key_id     = os.environ['OSS_ACCESS_KEY_ID']
access_key_secret = os.environ['OSS_ACCESS_KEY_SECRET']
oss_endpoint      = os.environ['OSS_ENDPOINT']       # e.g. oss-cn-hangzhou.aliyuncs.com
bucket_name       = os.environ['OSS_BUCKET_NAME']    # e.g. my-geo-bucket

# Connect to OSS
auth   = oss2.Auth(access_key_id, access_key_secret)
bucket = oss2.Bucket(auth, oss_endpoint, bucket_name)

# Connect to the database
con = psycopg2.connect(
    database=os.environ['DB_NAME'],
    user=os.environ['DB_USER'],
    password=os.environ['DB_PASSWORD'],
    host=os.environ['DB_HOST'],
    port=os.environ['DB_PORT'],
    options="-c search_path=aster,public"  # target schema
)
cur = con.cursor()

# Collect HDF5 file names from the OSS bucket
with open("upload.txt", mode='w', encoding='utf-8') as ff:
    for obj in oss2.ObjectIterator(bucket, prefix="your/prefix/"):
        filename = obj.key.split('/')[-1]
        if ".hdf5" in filename:
            ff.write(filename + '\n')

# Read the file list
with open('upload.txt', 'r') as f:
    file_list = [line for line in f.read().split('\n') if line]

# Import each file
raster_table_name = "hdf5_raster_table"
for filename in file_list:
    oss_path = f"oss://{access_key_id}:{access_key_secret}@{oss_endpoint}/{bucket_name}/{filename}"
    insert_sql = f"""
        INSERT INTO {raster_table_name}(filename, band_index, raster_obj)
        VALUES (%s, %s, ST_ImportFrom('chunk_table', %s))
    """
    try:
        cur.execute(insert_sql, (filename, 1, oss_path))
        con.commit()
        logging.info(f"{filename} imported successfully")
    except (Exception, psycopg2.Error) as e:
        logging.error(f"{filename} failed: {e}")
        con.rollback()

Set the following environment variables before running the script:

Environment variable	Description	Example
`OSS_ACCESS_KEY_ID`	AccessKey ID	`LTAI5tXxx`
`OSS_ACCESS_KEY_SECRET`	AccessKey Secret	`xXxXxXxXx`
`OSS_ENDPOINT`	OSS endpoint	`oss-cn-hangzhou.aliyuncs.com`
`OSS_BUCKET_NAME`	OSS bucket name	`my-geo-bucket`
`DB_NAME`	Database name	`mydb`
`DB_USER`	Database username	`myuser`
`DB_PASSWORD`	Database password	`mypassword`
`DB_HOST`	Database host	`pc-xxx.polardbpg.rds.aliyuncs.com`
`DB_PORT`	Database port	`5432`