When your query filters on a partition column, AnalyticDB for PostgreSQL reads only the matching OSS directories instead of scanning the entire bucket. This partition pruning can cut I/O dramatically for large datasets.
OSS foreign tables support only list partitions.
Prerequisites
Before you begin, make sure you have:
-
An OSS server created in AnalyticDB for PostgreSQL. See the Create an OSS server section in *Use OSS foreign tables for data lake analysis*
-
A user mapping to the OSS server. See the Create a user mapping to the OSS server section in *Use OSS foreign tables for data lake analysis*
How partition pruning works
Partition pruning relies on a key=value directory structure in OSS. The engine maps each partition column filter to a directory path and reads only the objects under that path.
The required OSS object path format is:
oss://bucket/partcol1=partval1/partcol2=partval2/
For example, a table partitioned by year and then by month expects this layout:
oss://my-bucket/
├── year=2022/
│ ├── month=07/
│ │ ├── data1.csv
│ │ └── data2.csv
│ └── month=08/
│ └── data3.csv
└── year=2023/
└── month=01/
└── data4.csv
A query with WHERE year = 2022 AND month = 07 reads only the objects under year=2022/month=07/ and skips all other directories.
Create a partitioned OSS foreign table
Use CREATE FOREIGN TABLE with PARTITION BY LIST to define a partitioned OSS foreign table. The syntax follows the same conventions as standard partitioned tables in AnalyticDB for PostgreSQL. For the full CREATE FOREIGN TABLE syntax, see Create an OSS foreign table.
Choose one of two patterns depending on whether all partitions share the same subpartition structure.
Pattern 1: With a subpartition template
Use a subpartition template when all partitions share the same subpartitions. The system automatically creates the defined subpartitions for any new partition you add later.
CREATE FOREIGN TABLE ossfdw_parttable(
key text,
value bigint,
pt text, -- partition key
region text -- subpartition key
)
SERVER oss_serv
OPTIONS (dir 'PartationDataDirInOss/', format 'jsonline')
PARTITION BY LIST (pt)
SUBPARTITION BY LIST (region)
SUBPARTITION TEMPLATE (
SUBPARTITION hangzhou VALUES ('hangzhou'),
SUBPARTITION shanghai VALUES ('shanghai')
)
( PARTITION "20170601" VALUES ('20170601'),
PARTITION "20170602" VALUES ('20170602'));
Pattern 2: Without a subpartition template
Define subpartitions explicitly for each partition when partitions have different subpartition structures.
CREATE FOREIGN TABLE ossfdw_parttable1(
key text,
value bigint,
pt text, -- partition key
region text -- subpartition key
)
SERVER oss_serv
OPTIONS (dir 'PartationDataDirInOss/', format 'jsonline')
PARTITION BY LIST (pt)
SUBPARTITION BY LIST (region)
(
VALUES('20181218')
(
VALUES('hangzhou'),
VALUES('shanghai')
),
VALUES('20181219')
(
VALUES('nantong'),
VALUES('anhui')
)
);
Manage partitions and subpartitions
Use ALTER TABLE to add or remove partitions and subpartitions after the table is created.
Add partitions and subpartitions
Add a partition to a table with a subpartition template
The system generates subpartitions automatically based on the template.
ALTER TABLE ossfdw_parttable ADD PARTITION VALUES ('20170603');
The following figure shows the updated table structure.
Add a partition to a table without a subpartition template
Specify subpartitions explicitly because no template is defined.
ALTER TABLE ossfdw_parttable1 ADD PARTITION VALUES ('20181220')
(
VALUES('hefei'),
VALUES('guangzhou')
);
Add a subpartition to an existing partition
ALTER TABLE ossfdw_parttable ALTER PARTITION FOR ('20170603') ADD PARTITION VALUES('nanjing');
The following figure shows the updated table structure.
Drop partitions and subpartitions
Drop a partition
ALTER TABLE ossfdw_parttable DROP PARTITION FOR ('20170601');
Drop a subpartition
ALTER TABLE ossfdw_parttable ALTER PARTITION FOR ('20170602') DROP PARTITION FOR ('hangzhou');
Drop a partitioned OSS foreign table
DROP FOREIGN TABLE ossfdw_parttable;
Use case: query log data shipped from Simple Log Service
Partitioned OSS foreign tables work well for querying log data that Simple Log Service ships to OSS. Configure the OSS LogShipper to write objects in a key=value directory format so that AnalyticDB for PostgreSQL can prune partitions at query time.
-
Ship log data from Simple Log Service to OSS. In the OSS LogShipper panel, we recommend that you set Shard Format to
date=%Y%m/userlogin. This produces the following OSS directory structure:oss://testBucketName/adbpgossfdw ├── date=202002 │ ├── userlogin_158561762910654****_647504382.csv │ └── userlogin_158561784923220****_647507440.csv └── date=202003 └── userlogin_158561794424704****_647508762.csv -
Create a partitioned OSS foreign table that maps to the
datepartition column.CREATE FOREIGN TABLE userlogin ( uid integer, name character varying, source integer, logindate timestamp without time zone, "date" int ) SERVER oss_serv OPTIONS ( dir 'adbpgossfdw/', format 'text' ) PARTITION BY LIST ("date") ( VALUES ('202002'), VALUES ('202003') ); -
Query and analyze the execution plan to verify that partition pruning is working. For example, assume that you want to query the total number of user logons in February 2022.
EXPLAIN SELECT uid, count(uid) FROM userlogin WHERE "date" = 202002 GROUP BY uid;The output confirms pruning is active. The
Foreign Scan on userlogin_1_prt_1node and theOss Urlline show that only thedate=202002/directory is scanned:QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------- Gather Motion 3:1 (slice2; segments: 3) (cost=5135.10..5145.10 rows=1000 width=12) -> HashAggregate (cost=5135.10..5145.10 rows=334 width=12) Group Key: userlogin_1_prt_1.uid -> Redistribute Motion 3:3 (slice1; segments: 3) (cost=5100.10..5120.10 rows=334 width=12) Hash Key: userlogin_1_prt_1.uid -> HashAggregate (cost=5100.10..5100.10 rows=334 width=12) Group Key: userlogin_1_prt_1.uid -> Append (cost=0.00..100.10 rows=333334 width=4) -> Foreign Scan on userlogin_1_prt_1 (cost=0.00..100.10 rows=333334 width=4) Filter: (date = 202002) Oss Url: endpoint=oss-cn-hangzhou-zmf-internal.aliyuncs.com bucket=adbpg-regress dir=adbpgossfdw/date=202002/ filetype=plain|text Oss Parallel (Max 4) Get: total 0 file(s) with 0 bytes byte(s). Optimizer: Postgres query optimizer (13 rows)The
dir=adbpgossfdw/date=202002/value confirms that only thedate=202002directory is scanned. Thedate=202003directory is skipped.