OSS FDW (foreign data wrapper) lets AnalyticDB for PostgreSQL read and query data files stored in Object Storage Service (OSS) directly, without moving data into the database. Built on PostgreSQL FDW, it supports cross-account access and a wide range of file formats.
With OSS foreign tables, you can:
Import OSS data into row-oriented or column-oriented AnalyticDB for PostgreSQL tables for accelerated analysis.
Query and analyze large amounts of OSS data in place.
Join OSS foreign tables with AnalyticDB for PostgreSQL tables in a single query.
Supported formats
| Format | Compression |
|---|---|
| CSV | Uncompressed, GZIP, standard Snappy |
| TEXT | Uncompressed, GZIP, standard Snappy |
| JSON | Uncompressed, GZIP |
| JSON Lines | Uncompressed, GZIP |
| ORC | — |
| Parquet | — |
| Avro | — |
Hadoop Snappy-compressed objects are not supported. For data type mappings between ORC, Parquet, and Avro and AnalyticDB for PostgreSQL, see Data type mappings for OSS foreign tables.
Limitations
The AnalyticDB for PostgreSQL instance and the OSS bucket must be in the same region.
The
filetypeparameter applies only to CSV, TEXT, JSON, and JSON Lines objects. Snappy compression is not supported for JSON and JSON Lines.The
log_errorsandsegment_reject_limitparameters apply only to CSV and TEXT objects.Only internal (VPC) endpoints are supported for the OSS server
endpointparameter.
Prerequisites
Before you begin, make sure you have:
An AnalyticDB for PostgreSQL instance and an OSS bucket in the same region
The OSS bucket name, an object path or directory, and the internal endpoint. See Get OSS bucket information below.
An AccessKey ID and AccessKey secret with access to the OSS bucket. See Create an AccessKey pair.
A sample data file. Download example.csv to follow the examples in this topic.
Get OSS bucket information
Log on to the OSS console.
In the left-side navigation pane, click Buckets.
On the Buckets page, click the name of your bucket. Note the bucket name.
On the Object Management page, note the object path.
In the left-side navigation pane, click Overview.
In the Port section, find the Access from ECS over the VPC (internal network) endpoint. Note the endpoint and bucket domain name.
Use the internal (VPC) endpoint whenever possible to avoid public network egress charges and latency.
How it works
Setting up OSS FDW involves three DDL steps:
Create an OSS server — register the OSS endpoint and bucket.
Create a user mapping — bind your AnalyticDB for PostgreSQL database user to the OSS AccessKey credentials.
Create an OSS foreign table — define the table schema and map it to an OSS object path or directory.
After these steps, query the foreign table using standard SQL, just like a regular AnalyticDB for PostgreSQL table.
Step 1: Create an OSS server
Run CREATE SERVER to register the OSS endpoint. For the full PostgreSQL reference, see CREATE SERVER.
Syntax
CREATE SERVER server_name
FOREIGN DATA WRAPPER fdw_name
[ OPTIONS ( option 'value' [, ... ] ) ]Parameters
| Parameter | Required | Description |
|---|---|---|
server_name | Yes | Name for the OSS server. |
fdw_name | Yes | Must be oss_fdw. |
OPTIONS parameters:
| Parameter | Required | Default | Description |
|---|---|---|---|
endpoint | Yes | — | Internal OSS endpoint. Only internal endpoints are supported. See Regions and endpoints. |
bucket | No | — | OSS bucket name. Must be set on the server, the foreign table, or both. If set on both, the foreign table value takes effect. |
speed_limit | No | 1024 bytes | Minimum transfer rate before a timeout is triggered. Requires speed_time. By default, a timeout occurs if less than 1,024 bytes is transferred within 90 consecutive seconds. |
speed_time | No | 90 seconds | Timeout window for speed_limit. Requires speed_limit. |
connect_timeout | No | 10 seconds | Connection timeout. |
dns_cache_timeout | No | 60 seconds | DNS resolution cache timeout. |
Example
CREATE SERVER oss_serv
FOREIGN DATA WRAPPER oss_fdw
OPTIONS (
endpoint 'oss-cn-********.aliyuncs.com',
bucket 'adb-pg'
);To modify an existing server, use ALTER SERVER. For the full reference, see ALTER SERVER.
-- Modify a parameter
ALTER SERVER oss_serv OPTIONS (SET endpoint 'oss-cn-********.aliyuncs.com');
-- Add a parameter
ALTER SERVER oss_serv OPTIONS (ADD connect_timeout '20');
-- Remove a parameter
ALTER SERVER oss_serv OPTIONS (DROP connect_timeout);To delete a server, use DROP SERVER. See DROP SERVER.
Step 2: Create a user mapping
Run CREATE USER MAPPING to bind a database user to the OSS AccessKey credentials. For the full reference, see CREATE USER MAPPING.
Syntax
CREATE USER MAPPING FOR { username | USER | CURRENT_USER | PUBLIC }
SERVER server_name
[ OPTIONS ( option 'value' [, ... ] ) ]Parameters
| Parameter | Description |
|---|---|
username | A specific AnalyticDB for PostgreSQL database username. |
USER / CURRENT_USER | The current database user. |
PUBLIC | Creates a public mapping for all database users, including users created in the future. |
server_name | The OSS server name from step 1. |
OPTIONS parameters:
| Parameter | Required | Description |
|---|---|---|
id | Yes | AccessKey ID for the OSS bucket. See Create an AccessKey pair. |
key | Yes | AccessKey secret for the OSS bucket. |
For cross-account data access, use the AccessKey ID and AccessKey secret of the Alibaba Cloud account that owns the OSS bucket.
Example
CREATE USER MAPPING FOR PUBLIC
SERVER oss_serv
OPTIONS (
id 'LTAI****************',
key 'yourAccessKeySecret'
);To delete a user mapping, use DROP USER MAPPING. See DROP USER MAPPING.
Step 3: Create an OSS foreign table
Run CREATE FOREIGN TABLE to define the table schema and map it to OSS objects. For the full reference, see CREATE FOREIGN TABLE.
Syntax
CREATE FOREIGN TABLE [ IF NOT EXISTS ] table_name ( [
column_name data_type [ OPTIONS ( option 'value' [, ... ] ) ] [ COLLATE collation ] [ column_constraint [ ... ] ]
[, ... ]
] )
SERVER server_name
[ OPTIONS ( option 'value' [, ... ] ) ]Parameters
| Parameter | Required | Description |
|---|---|---|
table_name | Yes | Name of the OSS foreign table. |
column_name | Yes | Column name. |
data_type | Yes | Column data type. |
OPTIONS parameters:
| Parameter | Required | Default | Description |
|---|---|---|---|
filepath | Yes (one of three) | — | Full OSS object path. Selects only the specified object. |
prefix | Yes (one of three) | — | Object path prefix. Selects all objects whose path starts with the prefix. Regex is not supported. |
dir | Yes (one of three) | — | OSS directory path. Must end with /. Selects all objects in the directory, excluding subdirectories. |
bucket | No | — | OSS bucket name. If set on both the server and the table, the table value takes effect. |
format | Yes | — | Object format. Valid values: csv, text, orc, avro, parquet, json, jsonline. |
filetype | No | plain | Compression type. Valid values: plain (no compression), gzip, snappy (standard Snappy only). Applies to CSV, TEXT, JSON, and JSON Lines only. |
log_errors | No | false | Whether to log error rows instead of failing. CSV and TEXT only. |
segment_reject_limit | No | — | Error threshold before scanning stops. A % suffix means percentage; no suffix means row count. For example, 10% stops scanning when more than 10% of rows have errors. CSV and TEXT only. |
header | No | false | Whether the source object has a header row. CSV only. |
delimiter | No | , for CSV; tab for TEXT | Field delimiter. Single-byte character only. CSV and TEXT only. |
quote | No | " | Quotation mark character. Single-byte character only. CSV only. |
escape | No | " | Escape character for the quote character. Single-byte character only. CSV only. |
null | No | \N for CSV; empty string for TEXT | String representation of NULL. CSV and TEXT only. |
encoding | No | Client encoding | Character encoding of the data file. CSV and TEXT only. |
force_not_null | No | false | If true, empty string field values are never treated as NULL. CSV and TEXT only. |
force_null | No | false | If true, an empty string enclosed in quotation marks is treated as NULL. CSV and TEXT only. |
How `prefix` matching works:
If prefix is test/filename, the following objects are selected:
test/filenametest/filenamexxxtest/filename/aatest/filenameyyy/aatest/filenameyyy/bb/aa
If prefix is test/filename/, only test/filename/aa is selected.
Example
CREATE FOREIGN TABLE ossexample (
date text,
time text,
open float,
high float,
low float,
volume int
) SERVER oss_serv OPTIONS (dir 'dir_oss_adb/', format 'csv');Verify the foreign table
After creating the foreign table, confirm it maps to the expected OSS objects using either method:
-- Method 1: Check the query plan and matched objects
EXPLAIN VERBOSE SELECT * FROM ossexample;
-- Method 2: List matched OSS objects
SELECT * FROM get_oss_table_meta('ossexample');To delete a foreign table, use DROP FOREIGN TABLE. See DROP FOREIGN TABLE.
Query OSS data
Query OSS foreign tables using standard SQL, the same way you query native AnalyticDB for PostgreSQL tables.
Filter by column value:
SELECT * FROM ossexample WHERE volume = 5;Aggregate with GROUP BY:
SELECT low, sum(volume)
FROM ossexample
GROUP BY low
ORDER BY low
LIMIT 5;Count matching rows:
SELECT count(*) FROM ossexample WHERE volume = 5;Join OSS foreign tables with AnalyticDB for PostgreSQL tables
Create a native AnalyticDB for PostgreSQL table and insert data:
CREATE TABLE example (id int, volume int); INSERT INTO example VALUES(1,1), (2,3), (4,5);Join the OSS foreign table with the native table:
SELECT example.volume, min(high), max(low) FROM ossexample, example WHERE ossexample.volume = example.volume GROUP BY example.volume ORDER BY example.volume;
Fault tolerance
OSS FDW supports fault-tolerant scanning through the log_errors and segment_reject_limit parameters. When enabled, rows with parsing errors are logged instead of causing the entire scan to fail.
Create a fault-tolerant foreign table:
CREATE FOREIGN TABLE oss_error_sales (id int, value float8, x text)
SERVER oss_serv
OPTIONS (
log_errors 'true', -- Log error rows instead of failing
segment_reject_limit '10', -- Stop scanning if more than 10 rows have errors
dir 'error_sales/',
format 'csv',
encoding 'utf8'
);Query error logs:
SELECT * FROM gp_read_error_log('oss_error_sales');Clear error logs:
SELECT gp_truncate_error_log('oss_error_sales');FAQ
Does deleting data from an OSS foreign table delete the data in OSS?
No. Data stored in OSS cannot be deleted by deleting data from an OSS foreign table.