Alibaba Cloud allows you to use the oss_fdw plug-in to load data in Object Storage Service (OSS) to PolarDB for PostgreSQL databases and write data in PolarDB for PostgreSQL databases to OSS.

oss_fdw parameters

The oss_fdw plug-in uses a method similar to other Foreign Data Wrapper (FDW) interfaces to encapsulate external data stored in OSS. You can use oss_fdw to read data stored in OSS. This process is similar to reading data tables. oss_fdw provides unique parameters to connect and parse file data in OSS.

Note
  • oss_fdw can read and write files of the following types in OSS: TEXT and CSV files as well as GZIP-compressed TEXT and CSV files.
  • The value of each parameter must be enclosed in double quotation marks (") and cannot contain unnecessary spaces.

CREATE SERVER parameters

  • ossendpoint: the endpoint used to access OSS through the internal network, also known as the host.
  • id oss: the ID of the account.
  • key oss: the AccessKey pair of the account.
  • bucket: the OSS bucket. You must create an OSS account before you specify this parameter.

The following fault tolerance parameters can be used for data import and export. If network condition is poor, you can adjust these parameters to ensure successful import and export.

  • oss_connect_timeout: indicates the connection timeout period. Default value: 10. Unit: seconds.
  • oss_dns_cache_timeout: indicates the DNS timeout period. Default value: 60. Unit: seconds.
  • oss_speed_limit: indicates the minimum data transmission rate. Default value: 1024. Unit: Kbit/s.
  • oss_speed_time: the maximum waiting period during which the data transmission rate is lower than its minimum value. Default value: 15. Unit: seconds.

If the default values of oss_speed_limit and oss_speed_time are used, a timeout error occurs when the transmission rate is smaller than 1,024 Kbit/s for 15 consecutive seconds.

CREATE FOREIGN TABLE parameters

  • filepath: a file name that contains a path in OSS.
    • A file name contains a path but not a bucket name.
    • This parameter matches multiple files in the corresponding path in OSS. You can load multiple files to a database.
    • You can import files named in the format of filepath or filepath.x to the database. The values of x must be consecutive numbers starting from 1. For example, among the files named filepath, filepath.1, filepath.2, filepath.3, and filepath.5, the first four files are matched and imported. The filepath.5 file is not imported.
  • dir: the virtual file directory in OSS.
    • dir must end with a forward slash (/).
    • All files (excluding subfolders and files in subfolders) in the virtual file directory specified by dir will be matched and imported to a database.
  • prefix: the prefix of the path name corresponding to the data file. The prefix does not support regular expressions. Only one parameter among prefix, filepath, and dir can be specified at a time because they are mutually exclusive.
  • format: the file format, which can only be csv.
  • encoding: the file data encoding format. It supports common PostgreSQL encoding formats, such as UTF-8.
  • parse_errors: the fault-tolerant parsing mode. If an error occurs during the parsing process, the entire row of data is ignored.
  • delimiter: the column delimiter.
  • quote: the quote character for files.
  • escape: the escape character for files.
  • null: sets the column matching a specified string to null. For example, null 'test' is used to set the value of the 'test' column to null.
  • force_not_null: sets the value of a column to a non-null value. For example, force_not_null 'id' is used to set the rule that the values in the empty ID column cells are empty strings rather than null values.
  • compressiontype: specifies the format of the files to be read or written in OSS.
    • none: the uncompressed text files. This is the default value.
    • gzip: The files are compressed in the GZIP format.
  • compressionlevel: specifies the degree to which data files written to OSS are compressed. Valid values: 1 to 9. Default value: 6.
Note
  • You must specify filepath and dir in the OPTIONS parameter.
  • You must specify either filepath or dir.
  • The export mode only supports virtual folders, that is, only dir is supported.

Export mode parameters for CREATE FOREIGN TABLE

  • oss_flush_block_size: the buffer size for the data written to OSS at a time. Default value: 32 MB. Valid values: 1 MB to 128 MB.
  • oss_file_max_size: the maximum size of a data file allowed to be written to OSS. If a data file reaches the maximum size, the remaining data is written to another data file. Default value: 1024 MB. Valid values: 8 MB to 4000 MB.
  • num_parallel_worker: the maximum number of threads that are allowed to run in parallel to compress the data written to OSS. Valid values: 1 to 8. Default value: 3.

Auxiliary functions

FUNCTION oss_fdw_list_file (relname text, schema text DEFAULT 'public')

  • Obtains the name and size of the OSS file that an external table matches.
  • The unit of file size is Byte.
select * from oss_fdw_list_file('t_oss');
              name              |   size    
--------------------------------+-----------
 oss_test/test.gz.1  | 739698350
 oss_test/test.gz.2  | 739413041
 oss_test/test.gz.3  | 739562048
(3 rows)

Auxiliary features

oss_fdw.rds_read_one_file: In read mode, it is used to specify a file to match the external table. If the file is specified, the external table only matches this file during data import.

Example: set oss_fdw.rds_read_one_file = 'oss_test/example16.csv.1';

set oss_fdw.rds_read_one_file = 'oss_test/test.gz.2';
select * from oss_fdw_list_file('t_oss');
              name              |   size    
--------------------------------+-----------
  oss_test/test.gz.2  | 739413041
(1 rows)

oss_fdw example

# Create a plug-in
create extension oss_fdw; 
# Create a server 
CREATE SERVER ossserver FOREIGN DATA WRAPPER oss_fdw OPTIONS 
     (host 'oss-cn-hangzhou.aliyuncs.com', id 'xxx', key 'xxx', bucket 'mybucket');
# Create an OSS external table.
CREATE FOREIGN TABLE ossexample 
    (date text, time text, open float,
     high float, low float, volume int); 
     SERVER ossserver 
     OPTIONS ( filepath 'osstest/example.csv', delimiter ',' ,
         format 'csv', encoding 'utf8', PARSE_ERRORS '100');
# Create a table to load data to
create table example
        (date text, time text, open float,
         high float, low float, volume int);
# Load data from ossexample to example.
insert into example select * from ossexample;
# Result
# oss_fdw estimates the file size in OSS and formulates a query plan.
explain insert into example select * from ossexample;
                             QUERY PLAN                              
---------------------------------------------------------------------
 Insert on example  (cost=0.00..1.60 rows=6 width=92)
   ->  Foreign Scan on ossexample  (cost=0.00..1.60 rows=6 width=92)
         Foreign OssFile: osstest/example.csv.0
         Foreign OssFile Size: 728
(4 rows)
# Write the data in the example table to OSS.
insert into ossexample select * from example;
explain insert into ossexample select * from example;
                           QUERY PLAN
-----------------------------------------------------------------
 Insert on ossexample  (cost=0.00..16.60 rows=660 width=92)
   ->  Seq Scan on example  (cost=0.00..16.60 rows=660 width=92)
(2 rows)

oss_fdw usage considerations

  • oss_fdw is an external table plug-in developed based on the PostgreSQL FOREIGN TABLE framework.
  • The data import efficiency is subject to the PolarDB for PostgreSQL cluster resources (CPU, I/O, memory, and network) and the OSS performance.
  • To guarantee a high data import performance, ensure that the PolarDB for PostgreSQL database is in the same region as the OSS bucket. For more information, see OSS endpoints.
  • We recommend that you use the public endpoint if this error message appears when you read SQL statements from an external table: ERROR: oss endpoint userendpoint not in aliyun white list. For more information, see Public endpoints in Alibaba Cloud zones. If the problem persists, submit a ticket.

Exception handling

When an import or export error occurs, the log displays the following error information:

  • code: the HTTP status code of the request that has failed.
  • error_code: the error code returned by OSS.
  • error_msg: the error message returned by OSS.
  • req_id: the universally unique identifier (UUID) that identifies the request. If you require assistance in solving a problem, you can submit a ticket containing the req_id of the failed request to OSS developers.

For more information about error types, see the following references. Timeout errors can be handled using oss_ext parameters.

AccessKey pair encryption

If the AccessKey ID and AccessKey secret in the CREATE SERVER command are not encrypted, other users can obtain your AccessKey pair in plaintext by executing the select * from pg_foreign_server statement. You can use symmetric encryption to encrypt your AccessKey pair. Use different keys for different instances to further protect your information. However, to prevent incompatibility with earlier versions, do not add data types as you do in Greenplum.

Encrypted information:

postgres=# select * from pg_foreign_server ;
  srvname  | srvowner | srvfdw | srvtype | srvversion | srvacl |                                                                              srvoptions
-----------+----------+--------+---------+------------+--------+------------------------------------------------------------------------------------------------------------------------------------
----------------------------------
 ossserver |       10 |  16390 |         |            |        | {host=oss-cn-hangzhou-zmf.aliyuncs.com,id=MD5xxxxxxxx,key=MD5xxxxxxxx,bucket=067862}

The encrypted value starts with the MD5 string. The remainder of the total length divided by 8 is 3. Therefore, if you export data and import it again, the data is not encrypted again. However, you cannot create an AccessKey ID or AccessKey secret that starts with the MD5 string.