OSS foreigntables are designed based on the PostgreSQL Foreign Data Wrapper (FDW) framework to access OSS data for analysis.

OSS foreign tables allow you to perform the following operations:

  • Import OSS data to local row-oriented and column-oriented tables for accelerated analysis.
  • Directly query and analyze huge amounts of OSS data.
  • Join OSS foreign tables and local tables for data analysis

OSS foreign tables support gzip-compressed files in the ORC and CSV formats and can be partitioned based on one or more columns for partition filtering.

OSS data may come from business applications, log archives of Alibaba Cloud Log Service, and ETL operations of Data Lake Analysis.

Use an OSS foreign table

The procedure of using an OSS foreign table can be simplified as follows:

For example, you can use the ossutil Overview to view the following information about a TPC-H lineitem table in an OSS bucket.

[adbpgadmin@localhost]$ ossutil ls oss://adbpg-tpch/data/tpch_data_10x/lineitem.tbl
LastModifiedTime                   Size(B)  StorageClass   ETAG                                  ObjectName
2020-03-12 09:29:48 +0800 CST    144144997      Standard   1F426F2FFC70A0262D2D69183BC3A0BD-57   oss://adbpg-tpch/data/tpch_data_10x/lineitem.tbl.1
2020-03-12 09:29:58 +0800 CST    145177420      Standard   CFE2CFF1C8059547DC9F1711E77F74DD-57   oss://adbpg-tpch/data/tpch_data_10x/lineitem.tbl.10
2020-03-10 21:23:24 +0800 CST    145355168      Standard   35C6227D1C29F1236A92A4D5D7922625-57   oss://adbpg-tpch/data/tpch_data_10x/lineitem.tbl.11
... ...

In the preceding information,

  • adbpg-tpch is the name of the OSS bucket.
  • data/tpch_data_10x/... is the path of the file relative to the OSS bucket.

The following section describes how to create and use an OSS foreign table in detail.

1. Create an OSS server

To create an OSS server, you must specify <oss endpoint> to access the OSS server.

1.1 Example

CREATE SERVER oss_serv              -- The name of the OSS server.
    FOREIGN DATA WRAPPER oss_fdw
    OPTIONS (
        endpoint '<Oss endpoint>',  -- The endpoint of the OSS server.
        bucket '<Oss bucket>'       -- The bucket where the data file is located.
  );

1.2 Syntax

-- Create an OSS server.
CREATE SERVER server_name [ TYPE 'server_type' ] [ VERSION 'server_version' ]
    FOREIGN DATA WRAPPER fdw_name
    [ OPTIONS ( option 'value' [, ... ] ) ]

-- Delete an OSS server.
DROP SERVER [ IF EXISTS ] name [ CASCADE | RESTRICT ]

For more information about options in the OPTIONS clause, see 1.3 Parameter options. For more information about how to create an OSS server, visit CREATE SERVER.

1.3 Parameter options

Option Type Unit Required Default value Description
endpoint String Yes

The endpoint corresponding to the OSS region.

Note:

If you access your AnalyticDB for PostgreSQL instance from an ECS instance, we recommend that you use an internal endpoint containing the "internal" keyword to avoid incurring public traffic.

bucket String No

Specifies the bucket where the data file is located. You must use OSS to create the bucket before data import.

Note:

You must set bucket for either an OSS server or an OSS table. If you set bucket for both, the bucket value for the OSS table overwrites that for the OSS server.

Note:

  • The following error-tolerance parameters can be used when you access OSS. You can retain the default values.
  • If you retain the default values for the following parameters, a timeout is triggered after the transmission rate remains lower than 1 KB/s for 1,500 consecutive seconds. For more information, see Error handling.
speed_limit Numeric Bytes/second No 1024 Specifies the minimum tolerable transmission rate.
speed_time Numeric Seconds No 1500 Specifiesthe maximum tolerable duration for speed_limit.
connect_timeout Numeric Seconds No 10 Specifies the connection timeout period.
dns_cache_timeout Numeric Seconds No 60 Specifies the timeout period for DNS resolution.

2. Create a user mapping to an OSS server

After creating an OSS server, you need to create a user mapping from your AnalyticDB for PostgreSQL instance to the OSS server.

2.1 Example

CREATE USER MAPPING FOR PUBLIC -- Create a user mapping to the OSS server for all users. For more information, see 2.2 Syntax.
    SERVER oss_serv                         -- Specify the OSS server that you want to access.
    OPTIONS ( 
      id '<oss access id>',         -- Specify the AccessKey ID of the OSS account.
      key '<oss access key>'        -- Specify the AccessKey secret of the OSS account.
    );

2.2 Syntax

-- Create a user mapping.
CREATE USER MAPPING FOR { username | USER | CURRENT_USER | PUBLIC }
    SERVER servername
    [ OPTIONS ( option 'value' [, ... ] ) ]

-- Delete a user mapping.
DROP USER MAPPING [ IF EXISTS ] FOR { user_name | USER | CURRENT_USER | PUBLIC }
    SERVER server_name

Description:

  • username The name of an existing user that is mapped to foreign server.
  • CURRENT_USERand USERmatch the name of the current user.
  • PUBLIC all roles, including those that might be created later.

For more information about options in the OPTIONS clause, see2.3 Parameter options. For more information about how to create a user mapping, visit CREATE USER MAPPING.

2.3 Parameter options

Option Required Default value Description
id Yes The AccessKey ID of the OSS account.
key Yes The AccessKey secret of the OSS account.

3. Create an OSS foreign table

After an OSS bucket and the account that is used to access the OSS bucket are specified, you can define an OSS foreign table. OSS foreign tables support data files in multiple formats for different business scenarios.

  • OSS foreign tables allow you to access uncompressed files in the CSV and TEXT formats.
  • OSS foreign tables allow you to access gzip-compressed files in the CSV and TEXT formats.
  • OSS foreign tables allow you to access binary files in the ORC format. For more information, see the "Data type mappings between ORC files and AnalyticDB for PostgreSQL files" section in this topic.

3.1 Example

-- Create an uncompressed OSS foreign table in the TEXT format.
CREATE FOREIGN TABLE oss_lineitem (                     -- The name of the OSS foreign table.
    l_orderkey bigint,
    l_partkey bigint,
    l_suppkey bigint,
    l_linenumber bigint,
    l_quantity double precision,
    l_extendedprice double precision,
    l_discount double precision,
    l_tax double precision,
    l_returnflag CHAR(1),
    l_linestatus CHAR(1),
    l_shipdate DATE,
    l_commitdate DATE,
    l_receiptdate DATE,
    l_shipinstruct CHAR(25),
    l_shipmode CHAR(10),
    l_comment VARCHAR(44)
) server oss_serv                                       -- Specify the OSS server.
    options (
        prefix 'data/tpch_data_10x/lineitem.tbl',       -- Specify the prefix with which to match OSS files.
        format 'text',                                  -- Parse files in the TEXT format.
        delimiter '|'                                   -- Specify the column delimiter.
    );

-- Create a gzip-compressed OSS foreign table in the CSV format.
CREATE FOREIGN TABLE oss_lineitem_csv_gz (              -- The name of the OSS foreign table.
    l_orderkey bigint,
    l_partkey bigint,
    l_suppkey bigint,
    l_linenumber bigint,
    l_quantity double precision,
    l_extendedprice double precision,
    l_discount double precision,
    l_tax double precision,
    l_returnflag CHAR(1),
    l_linestatus CHAR(1),
    l_shipdate DATE,
    l_commitdate DATE,
    l_receiptdate DATE,
    l_shipinstruct CHAR(25),
    l_shipmode CHAR(10),
    l_comment VARCHAR(44)
) server oss_serv                                       -- Specify the OSS server.
    options (
        prefix 'data/tpch_data_10x/lineitem.tbl.gz',    --Specify the prefix with which to match OSS files.
        format 'csv',                                   -- Parse files in the CSV format.
        delimiter '|',                                  -- Specify the column delimiter.
        filetype 'gzip'                                 -- Specify the gzip-compressed file.
    );

-- Create an OSS foreign table in the ORC format.
CREATE FOREIGN TABLE oss_lineitem_orc (                 -- The name of the OSS foreign table.
    l_orderkey bigint,
    l_partkey bigint,
    l_suppkey bigint,
    l_linenumber bigint,
    l_quantity double precision,
    l_extendedprice double precision,
    l_discount double precision,
    l_tax double precision,
    l_returnflag CHAR(1),
    l_linestatus CHAR(1),
    l_shipdate DATE,
    l_commitdate DATE,
    l_receiptdate DATE,
    l_shipinstruct CHAR(25),
    l_shipmode CHAR(10),
    l_comment VARCHAR(44)
) server oss_serv                                       -- Specify the OSS server.
    options (
        prefix 'data/tpch_orc_data_10x/lineitem.orc',   -- Specify the prefix with which to match OSS files.
        format 'orc'                                    -- Parse files in the ORC format.
    );

Note:

After you create an OSS foreign table, you can use one of the following statements to check whether OSS objects that match the table meet expectations:

  • Statement 1: explain verbose select * from <OSS foreign table>;
  • Statement 2: select * from get_oss_table_meta('<OSS foreign table>');

3.2 Syntax

-- Create an OSS foreign table.
CREATE FOREIGN TABLE [ IF NOT EXISTS ] table_name ( [
    column_name data_type [ OPTIONS ( option 'value' [, ... ] ) ] [ COLLATE collation ] [ column_constraint [ ... ] ]
      [, ... ]
] )
    SERVER server_name
  [ OPTIONS ( option 'value' [, ... ] ) ]

-- Delete an OSS foreign table.
DROP FOREIGN TABLE [ IF EXISTS ] name [, ...] [ CASCADE | RESTRICT ]

For more information about options in the OPTIONS clause, see3.3 Parameter options. For more information about how to create a foreign table, visit CREATE FOREIGN TABLE.

3.3 Parameter options

3.3.1 Common options

Option Type Unit Required Default value Description
filepath String

Yes. Select one of the three options.

Note: The three options all specify the paths relative to the OSS bucket.

Matches a single file.
prefix String Matches multiple files by prefix.
dir String Matches a list of files and subfolders. The dir value must end with a forward slash (/).
bucket String No You must set bucket for either an OSS server or an OSS table. If you set bucket for both, the bucket value for the OSS table overwrites that for the OSS server.
format String Yes

Specifies the file format. Valid values:

  • csv
  • text
  • orc

3.3.2 Options for CSV and TEXT files

Note: The following options are valid only for CSV and TEXT files and invalid for ORC files.

Option Type Unit Required Default value Description
filetype String No plain

Valid values:

  • plain: only reads the raw binary data.
  • gzip: reads the raw binary data and decompresses the GZIP package.
log_errors Boolean No false

Specifies whether to record errors in log files.

For more information, see 3.4 Fault tolerance mechanism.

segment_reject_limit Numeric No

Specifies the maximum allowable number of error lines before the execution is aborted. If the value contains a percent sign (%), it indicates the percentage of error lines. Otherwise, the value indicates the number of error lines.

Example:

  • segment_reject_limit = '10 indicates that execution is aborted when the number of error lines on a compute node exceeds 10.
  • segment_reject_limit = '10 indicates that execution is aborted when the number of error lines on a compute node is more than 10% of the processed lines.

For more information, see 3.4 Fault tolerance mechanism.

The following table describes the formatting options. For more information, visit COPY
header Boolean No false

Specifies whether the source file contains the header column.

  • This option is only valid for CSV files.
delimiter String No Default delimiter for TEXT files: tab. Default delimiter for CSV files: comma (,).

The column delimiter

  • Only single-byte characters are allowed.
quote String No Double quotation mark (").

The column quotation

  • This option is only valid for CSV files.
  • Only single-byte characters are allowed.
escape String No The value is the same as the quote value by default.

Specifies the character that appears before the data character that matches the quote value.

  • Only single-byte characters are allowed.
  • This option is only valid for CSV files.
null String Default delimiter for TEXT files: \N. Default delimiter for CSV files: spaces that are not enclosed in quotes. Specifies the null string for a file.
encoding String No If you do not specify the option, the encoding type on the client is used by default. Specifies the encoding format of local data files.
force_not_null Boolean No false If the option is set to true, the values of specified columns are matched against the null string.
force_null Boolean No false
  • If this option is set to true, the column values that match the null string are returned as NULL even if the values are quoted.
  • Without this option, only unquoted values matching the null string are returned as NULL.

3.4 Fault tolerance mechanism

When creating an OSS foreign table, you can set the log_errors and segment_reject_limit options to avoid unexpected exits due to error lines of raw data during a scan of OSS foreign tables. In the preceding information,

  • "log_errors" indicates whether to record information of the error lines.
  • "segment_reject_limit" indicates the fault tolerance ratio, namely the percentage of error lines in all parsed lines.

Note: OSS foreign tables in the ORC format do not support fault tolerance.

  1. Create a FDW-based OSS foreign table that supports fault tolerance.
    create foreign table oss_error_sales (id int, value float8, x text)
        server oss_serv
        options (log_errors 'true',         -- Record the information of the error lines.
                 segment_reject_limit '10', -- The number of error lines can be up to 10. Otherwise, the system returns an error and exits.
                 dir 'error_sales/',        -- Specify the OSS directory that the foreign table matches.
                 format 'csv',              -- Parse files in the CSV format.
                 encoding 'utf8');          -- Specify the encoding format.
  2. Scan the foreign table.

    Three lines of error records are added to an OSS file to demonstrate the effect of fault tolerance.

    postgres=# select * from oss_error_sales ;
    NOTICE:  found 3 data formatting errors (3 or more input rows), rejected related input data
     id |     value     |     x
    ----+---------------+-----------
      1 |  0.1102213212 | abcdefg
      1 |  0.1102213212 | abcdefg
      2 | 0.92983182312 | mmsmda123
      3 |     0.1123123 | abbs
      1 |  0.1102213212 | abcdefg
      2 | 0.92983182312 | mmsmda123
      3 |     0.1123123 | abbs
      1 |  0.1102213212 | abcdefg
      1 |  0.1102213212 | abcdefg
      2 | 0.92983182312 | mmsmda123
      3 |     0.1123123 | abbs
      3 |     0.1123123 | abdsa
      1 |  0.1102213212 | abcdefg
      2 | 0.92983182312 | mmsmda123
      3 |     0.1123123 | abbs
      1 |  0.1102213212 | abcdefg
      2 | 0.92983182312 | mmsmda123
      3 |     0.1123123 | abbs
    (18 rows)
  3. View the log of error lines.
    postgres=# select * from gp_read_error_log('oss_error_sales');
               cmdtime            |     relname     |        filename         | linenum | bytenum |                          errmsg                           | rawdata | rawbytes
    ------------------------------+-----------------+-------------------------+---------+---------+-----------------------------------------------------------+---------+----------
     2020-04-22 19:37:35.21125+08 | oss_error_sales | error_sales/sales.2.csv |       2 |         | invalid byte sequence for encoding "UTF8": 0xed 0xab 0xad |         | \x
     2020-04-22 19:37:35.21125+08 | oss_error_sales | error_sales/sales.2.csv |       3 |         | invalid byte sequence for encoding "UTF8": 0xed 0xab 0xad |         | \x
     2020-04-22 19:37:35.21125+08 | oss_error_sales | error_sales/sales.3.csv |       2 |         | invalid byte sequence for encoding "UTF8": 0xed 0xab 0xad |         | \x
    (3 rows)
  4. Delete the log of error lines.
    postgres=# select gp_truncate_error_log('oss_error_sales');
     gp_truncate_error_log
    -----------------------
     t
    (1 row)

4. Use an OSS foreign table

4.1 Import OSS data to local tables

Perform the following steps to import data:

  1. Distribute data evenly among multiple OSS files.

    Note:

    • All compute nodes of an AnalyticDB for PostgreSQL instance read data in parallel from the data files stored in OSS by using a round-robin algorithm.
    • We recommend that you use the same data encoding formats for data files and databases to simplify the encoding process and maximize efficiency. The default database encoding format is UTF-8.
    • Multiple CSV or TEXT files can be read in parallel. The default number of files to read in parallel is 4.
    • To maximize reading efficiency, we recommend that you set the number of files as an integer multiple of the number of data nodes. The number of data nodes is the product of the number of compute nodes and the number of cores per compute node.
  2. Create an OSS foreign table for each database in your AnalyticDB for PostgreSQL instance.
  3. Execute the following statement to import data in parallel:
-- INSERT statement
INSERT INTO <Local destination table> SELECT * FROM <OSS foreign table>;

-- CREATE TABLE AS statement
CREATE TABLE <Local destination table> AS SELECT * FROM <OSS foreign table>;
  • Example 1 - Use the INSERT statement to import the oss_lineitem_orc data to a local AOCS table
-- Create a local AOCS table.
CREATE TABLE aocs_lineitem (
    l_orderkey bigint,
    l_partkey bigint,
    l_suppkey bigint,
    l_linenumber bigint,
    l_quantity double precision,
    l_extendedprice double precision,
    l_discount double precision,
    l_tax double precision,
    l_returnflag CHAR(1),
    l_linestatus CHAR(1),
    l_shipdate DATE,
    l_commitdate DATE,
    l_receiptdate DATE,
    l_shipinstruct CHAR(25),
    l_shipmode CHAR(10),
    l_comment VARCHAR(44)
) WITH (APPENDONLY=TRUE, ORIENTATION=COLUMN, COMPRESSTYPE=ZSTD, COMPRESSLEVEL=5);

-- Import the oss_lineitem_orc data to the local AOCS table.
INSERT INTO aocs_lineitem SELECT * FROM oss_lineitem_orc;
  • Example 2 - Use the CREATE TABLE AS statement to import the oss_lineitem_orc data to a local heap table
create table heap_lineitem as select * from oss_lineitem_orc distributed by (l_orderkey);

4.2 Query and analyze OSS data

You can query an OSS foreign table in the same way as a local table. Common query scenarios are as follows:

  • Filtering by key-value pair
select * from oss_lineitem_orc where l_orderkey = 14062498;
  • Aggregation query
select count(*) from oss_lineitem_orc where l_orderkey = 14062498;
  • Filtering and grouping followed by the LIMIT clause
select l_partkey, sum(l_suppkey)
  from oss_lineitem_orc
 group by l_partkey
 order by l_partkey
 limit 10;

4.3 Join OSS foreign tables and local tables for data analytics

  • Example - Join local AOCS table aocs_lineitem and OSS foreign tables for a TPC-H Q3 query
select l_orderkey,
       sum(l_extendedprice * (1 - l_discount)) as revenue,
       o_orderdate,
       o_shippriority 
  from oss_customer,                                    -- OSS foreign table
       oss_orders,                                      -- OSS foreign table
       aocs_lineitem                                    -- Local AOCS table
 where c_mktsegment = 'furniture'
   and c_custkey = o_custkey
   and l_orderkey = o_orderkey
   and o_orderdate < date '1995-03-29'
   and l_shipdate > date '1995-03-29'
 group by l_orderkey, o_orderdate, o_shippriority
 order by revenue desc, o_orderdate
 limit 10;

5. Use a partitioned OSS foreign table

AnalyticDB for PostgreSQL has added support for partitions on a basis of OSS foreign tables. Using partition columns in a WHERE condition can significantly reduce the amount of data obtained from remote servers.

The partition function of FDW-based OSS foreign tables in AnalyticDB for PostgreSQL has certain requirements for file organization. The partition data of a partitioned OSS foreign table must be located within the oss://bucket/partcol1=partval1/partcol2=partval2/ directory. In the preceding directory, partcol1 and partcol2 indicate partition columns. partval1 and partval2 indicate the corresponding values of the partition columns.

5.1 Example

The following SQL statements create the userlogin foreign table that has two partitions. The paths of the two partitions in OSS are oss://bucket/userlogin/month=201812/ and oss://bucket/userlogin/month=201901/.

CREATE FOREIGN TABLE userlogin (
        uid integer,
        name character varying,
        source integer,
        logindate timestamp without time zone,
        month int
) SERVER oss_serv OPTIONS (
    dir 'userlogin/',
    format 'text'
)
PARTITION BY LIST (month)
(
        VALUES ('201812'), 
        VALUES ('201901')       
)

5.2 Syntax

FDW-based OSS foreign tables in AnalyticDB for PostgreSQL and standard partitioned tables use the same syntax to perform partitioning. For more information about the partitioning syntax of standard partitioned tables, see Table partitioning. FDW-based OSS foreign tables in AnalyticDB for PostgreSQL support only list partitions.

5.3 Usage scenario

Partitioned foreign tables are commonly used to access logs delivered by Log Service. For more information, see Use FDW-based OSS foreign tables to access logs delivered by Log Service. You can use a similar method to organize directories when writing data from business apps to OSS, and define partitions for foreign tables.

Best practices

Collect statistics of foreign tables

Data of OSS foreign tables is stored in OSS. By default, the statistics of foreign tables is not collected. When the statistics does not exist or is stale, a query optimizer may generate inefficient query plans for complicated queries such as JOIN operations. You can use the ANALYZE statement to update statistics. The execution process is as follows:

-- Execute the EXPLAIN statement to view the query plan before the ANALYZE statement is executed.
EXPLAIN <Table name>;

-- Use the ANALYZE statement to collect statistics of an OSS foreign table.
ANALYZE <Table name>;

-- Use the EXPLAIN statement to view the query plan after the ANALYZE statement is executed.
EXPLAIN <Table name>;

Perform the mapping between the number of OSS foreign tables and the number of compute nodes

  • All compute nodes of an AnalyticDB for PostgreSQL instance read data in parallel from the data files stored in OSS by using a round-robin algorithm.
  • We recommend that you use the same data encoding formats for data files and databases to simplify the encoding process and maximize efficiency. The default database encoding format is UTF-8.
  • Multiple CSV or TEXT files can be read in parallel. The default number of files to read in parallel is 4.
  • To maximize reading efficiency, we recommend that you set the number of files as an integer multiple of the number of data nodes. The number of data nodes is the product of the number of compute nodes and the number of cores per compute node.

View query plans

Execute the following statement to view the query plan of an OSS foreign table:

EXPLAIN SELECT COUNT(*) FROM oss_lineitem_orc WHERE l_orderkey > 14062498;

Note:

  • You can execute the EXPLAIN VERBOSE statement to view more information.

View OSS file information

You can execute the following statement to obtain file information of a specified OSS foreign table:

SELECT * FROM get_oss_table_meta('<OSS FOREIGN TABLE>');

Data type mappings between ORC files and AnalyticDB for PostgreSQL files

The ORC files support 18 data types, among whichMap, Struct, and Union have no mappings in AnalyticDB for PostgreSQL. ORC files of the LIST type can only be converted into one-dimensional arrays in AnalyticDB for PostgreSQL.

The following table lists the mappings between ORC files and AnalyticDB for PostgreSQL files.

AnalyticDB for PostgreSQL data type ORC data type
BOOL BOOLEAN
INT2 SHORT
INT4 INT
INT8 LONG
FLOAT4 FLOAT
FLOAT8 DOUBLE
NUMERIC DECIMAL
CHAR CHAR
TEXT STRING
BYTEA BINARY
TIMESTAMP TIMESTAMP
DATE DATE
INT2[] LIST(SHORT)
INT4[] LIST(INT)
INT8[] LIST(LONG)
FLOAT4[] LIST(FLOAT)
FLOAT8[] LIST(DOUBLE)

Use FDW-based OSS foreign tables to access logs delivered by Log Service

Partitions of FDW-based OSS foreign tables in AnalyticDB for PostgreSQL are more widely used together with Log Service. Log Service is an end-to-end logging service developed by Alibaba Cloud and is widely used in big data scenarios. For more information about Log Service, see What is Log Service?

If you want to create a partitioned OSS foreign table based on the data delivered from Log Service to OSS, you must set Shard Format in the OSS LogShipper dialog box as shown in the following figure. For more information about the parameters, see Ship logs to OSS.

If the parameters are configured as shown in the preceding figure, logs generated within the same month are stored in a single OSS directory. The configuration generates the following directory layout in OSS:

oss://oss-fdw-test/adbpgossfdw
├── date=202002
│   ├── userlogin_1585617629106546791_647504382.csv
│   └── userlogin_1585617849232201154_647507440.csv
└── date=202003
    └── userlogin_1585617944247047796_647508762.csv

Then, create the following partitioned OSS foreign table based on the columns contained in the files delivered by Log Service:

CREATE FOREIGN TABLE userlogin (
        uid integer,
        name character varying,
        source integer,
        logindate timestamp without time zone,
        "date" int
) SERVER oss_serv OPTIONS (
    dir 'adbpgossfdw/',
    format 'text'
)
PARTITION BY LIST ("date")
(
        VALUES ('202002'), 
        VALUES ('202003')       
)

Finally, implement required analysis logic based on the partitioned foreign table. For example, you can analyze all user logons in February 2020.

adbpg=# explain select uid, count(uid) from userlogin where "date" = 202002 group by uid;
                                                                            QUERY PLAN                                                                             
-------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Gather Motion 3:1  (slice2; segments: 3)  (cost=5135.10..5145.10 rows=1000 width=12)
   ->  HashAggregate  (cost=5135.10..5145.10 rows=334 width=12)
         Group Key: userlogin_1_prt_1.uid
         ->  Redistribute Motion 3:3  (slice1; segments: 3)  (cost=5100.10..5120.10 rows=334 width=12)
               Hash Key: userlogin_1_prt_1.uid
               ->  HashAggregate  (cost=5100.10..5100.10 rows=334 width=12)
                     Group Key: userlogin_1_prt_1.uid
                     ->  Append  (cost=0.00..100.10 rows=333334 width=4)
                           ->  Foreign Scan on userlogin_1_prt_1  (cost=0.00..100.10 rows=333334 width=4)
                                 Filter: (date = 202002)
                                 Oss Url: endpoint=oss-cn-hangzhou-zmf-internal.aliyuncs.com bucket=adbpg-regress dir=adbpgossfdw/date=202002/ filetype=plain|text
                                 Oss Parallel (Max 4) Get: total 0 file(s) with 0 bytes byte(s).
 Optimizer: Postgres query optimizer
(13 rows)

In the preceding statements, the FDW-based OSS foreign table reads only data generated in February 2020, rather than data generated in March 2020 to reduce the amount of data to read and maximize query efficiency.

Troubleshooting

During a scan of an OSS foreign table, the following error message may be displayed:

"oss server returned error: StatusCode=..., ErrorCode=..., ErrorMessage="...", RequestId=..."

Where:

  • StatusCode: the HTTP status code corresponding to the error.
  • ErrorCode: the error code returned by OSS.
  • ErrorMessage: the error message returned by OSS.
  • RequestId: the UUID that identifies the request. If you cannot solve a problem, provide the RequestId to seek assistance from OSS development engineers.

For more information about error types, seeOSS error response.

Differences between OSS foreign tables and OSS external tables

  • AnalyticDB for PostgreSQL allows you to use OSS external tables for data import and export. However, OSS external tables cannot meet requirements for analysis of large amounts of data.
  • OSS foreign tables are developed based on the PostgreSQL Foreign Data Wrapper (FDW) framework. OSS foreign tables support gzip-compressed files in the CSV and TEXT formats and can be partitioned based on one or more columns. OSS foreign tables allow you to collect the statistics of foreign table so that the optimizer can generate an optimal query plan.
  • Foreign tables are superior to external tables in terms of performance, functionality, and stability. Therefore, the Greenplum community plans to replace external tables with foreign tables.

References