Alibaba Cloud allows you to use the oss_fdw plug-in to load data in OSS to PolarDB-P databases and write data in PolarDB-P databases to OSS.

oss_fdw parameters

The oss_fdw plug-in uses a method similar to other Foreign Data Wrapper (FDW) interfaces to encapsulate external data stored in OSS. You can use oss_fdw to read data stored in OSS. This process is similar to reading data tables. oss_fdw provides unique parameters to connect and parse file data in OSS.

Note
  • oss_fdw can read and write files of the following types in OSS: TEXT and CSV files as well as gzip-compressed TEXT and CSV files.
  • The value of each parameter must be enclosed in double quotation marks (") and cannot contain any unnecessary spaces.

CREATE SERVER parameters

  • ossendpoint: the endpoint used to access OSS through the internal network, also known as the host.
  • id oss: the ID of your OSS account.
  • key oss: the key of your OSS account.
  • bucket: the OSS bucket. You must create an OSS account before specifying this parameter.

The following fault tolerance parameters can be used for data import and export. If network connectivity is poor, you can adjust these parameters to guarantee successful import and export.

  • oss_connect_timeout: indicates the connection timeout period. Default value: 10. Unit: seconds.
  • oss_dns_cache_timeout: indicates the DNS timeout period. Default value: 60. Unit: seconds.
  • oss_speed_limit: indicates the minimum data transmission rate. Default value: 1. Unit: Kbit/s.
  • oss_speed_time: indicates the maximum period when the data transmission rate is lower than the minimum value. Default value: 15. Unit: seconds.

If the default values of oss_speed_limit and oss_speed_time are used, a timeout error occurs when the transmission rate is smaller than 1 Kbit/s for 15 consecutive seconds.

CREATE FOREIGN TABLE parameters

  • filepath: a file name that contains a path in OSS.

    • A file name contains a path but not a bucket name.

    • This parameter matches multiple files in the corresponding path in OSS. You can load multiple files to a database.

    • You can import files named in the format of filepath or filepath.x to a database. The values of x must be consecutive numbers starting from 1.

      For example, among the files named filepath, filepath.1, filepath.2, filepath.3, and filepath.5, the first four files are matched and imported. The filepath.5 file is not imported.

  • dir: the virtual file directory in OSS.

    • dir must end with a forward slash (/).

    • All files (excluding subfolders and files in subfolders) in the virtual file directory specified by dir will be matched and imported to a database.

  • prefix: the prefix of the path name corresponding to the data file. The prefix does not support regular expressions. Only one parameter among prefix, filepath, and dir can be specified at a time because they are mutually exclusive.
  • format: the file format, which can only be csv.

  • encoding: the file data encoding format. It supports common PostgreSQL encoding formats, such as UTF-8.

  • parse_errors: the fault-tolerant parsing mode. If an error occurs during the parsing process, the entire row of data is ignored.

  • delimiter: the column delimiter.

  • quote: the quote character for files.

  • escape: the escape character for files.

  • null: sets the column matching a specified string to null. For example, null 'test' is used to set the value of the 'test' column to null.

  • force_not_null: sets the value of a column to a non-null value. For example, force_not_null 'id' is used to set the value of the 'id' column to empty strings.

  • compressiontype: specifies the format of the files to be read or written in OSS.

    • none: uncompressed text files. This is the default value.
    • gzip: The files to be read must be gzip compressed.
  • compressionlevel: specifies the compression level of the compression format written to OSS. Valid values: 1 to 9. Default value: 6.

Note
  • You must specify filepath and dir in the OPTIONS parameter.
  • You must specify either filepath or dir.
  • The export mode only supports virtual folders, that is, only dir is supported.

Export mode parameters for CREATE FOREIGN TABLE

  • oss_flush_block_size: the buffer size for the data written to OSS at a time. Default value: 32 MB. Valid values: 1 MB to 128 MB.

  • oss_file_max_size: the maximum file size for the data written to OSS (subsequent data is written in another file when the maximum file size is exceeded). Default value: 1024 MB. Valid values: 8 MB to 4000 MB.

  • num_parallel_worker: the number of parallel compression threads in which the OSS data is written. Valid values: 1 to 8. Default value: 3.

Auxiliary functions

FUNCTION oss_fdw_list_file (relname text, schema text DEFAULT 'public')

  • Obtains the name and size of the OSS file that an external table matches.

  • The unit of file size is Byte.

select * from oss_fdw_list_file('t_oss');
              name              |   size    
--------------------------------+-----------
 oss_test/test.gz.1  | 739698350
 oss_test/test.gz.2  | 739413041
 oss_test/test.gz.3  | 739562048
(3 rows)

Auxiliary features

oss_fdw.rds_read_one_file: In read mode, it is used to specify a file to match the external table. If the file is specified, the external table only matches this file during data import.

Example: set oss_fdw.rds_read_one_file = 'oss_test/example16.csv.1';

set oss_fdw.rds_read_one_file = 'oss_test/test.gz.2';
select * from oss_fdw_list_file('t_oss');
              name              |   size    
--------------------------------+-----------
  oss_test/test.gz.2  | 739413041
(1 rows)

oss_fdw example

# Create a plug-in
create extension oss_fdw; 
# Create a server 
CREATE SERVER ossserver FOREIGN DATA WRAPPER oss_fdw OPTIONS 
     (host 'oss-cn-hangzhou.aliyuncs.com', id 'xxx', key 'xxx', bucket 'mybucket');
# Create an OSS external table
CREATE FOREIGN TABLE ossexample 
    (date text, time text, open float,
     high float, low float, volume int) 
     SERVER ossserver 
     OPTIONS ( filepath 'osstest/example.csv', delimiter ',' ,
         format 'csv', encoding 'utf8', PARSE_ERRORS '100');
# Create a table to load data to
create table example
        (date text, time text, open float,
         high float, low float, volume int)
# Load data from ossexample to example.
insert into example select * from ossexample;
# Result
# oss_fdw estimates the file size in OSS and formulates a query plan correctly.
explain insert into example select * from ossexample;
                             QUERY PLAN                              
---------------------------------------------------------------------
 Insert on example (cost=0.00..1.60 rows=6 width=92)
   ->  Foreign Scan on ossexample (cost=0.00..1.60 rows=6 width=92)
         Foreign OssFile: osstest/example.csv.0
         Foreign OssFile Size: 728
(4 rows)
# Write the data in the example table to OSS.
insert into ossexample select * from example;
explain insert into ossexample select * from example;
                           QUERY PLAN
-----------------------------------------------------------------
 Insert on ossexample (cost=0.00..16.60 rows=660 width=92)
   ->  Seq Scan on example (cost=0.00..16.60 rows=660 width=92)
(2 rows)

oss_fdw usage considerations

  • oss_fdw is an external table plug-in developed based on the PostgreSQL FOREIGN TABLE framework.

  • The data import efficiency is subject to the PolarDB-P cluster resources (CPU, I/O, memory, and MET) and OSS.

  • To guarantee data import performance, make sure that PolarDB-P is in the same region as OSS. For more information, see Endpoints.

  • If the error "oss endpoint userendpoint not in aliyun white list" is reported during reading of SQL statements for external tables, use the endpoints listed in Regions and endpoints. If the problem persists, submit a ticket.

Error handling

When an import or export error occurs, the log displays the following error information:

  • code: the HTTP status code of the request that has failed.

  • error_code: the error code returned by OSS.

  • error_msg: the error message returned by OSS.

  • req_id: the UUID that identifies the request. If you cannot solve the problem, you can seek help from OSS development engineers by providing the req_id.

For more information about error types, see the following references. Timeout errors can be handled using oss_ext parameters.

ID and key encryption

If id and key parameters for CREATE SERVER are not encrypted, executing the select * from pg_foreign_server statement will display the information in plaintext. Your ID and key will be exposed. You can use symmetric encryption to hide the ID and key. Use different keys for different instances to further protect your information. However, to avoid incompatibility with earlier versions, do not add data types as you do in Greenplum.

Encrypted information:

postgres=# select * from pg_foreign_server ;
  srvname  | srvowner | srvfdw | srvtype | srvversion | srvacl |                                                                              srvoptions
-----------+----------+--------+---------+------------+--------+------------------------------------------------------------------------------------------------------------------------------------
----------------------------------
 ossserver |       10 |  16390 |         |            |        | {host=oss-cn-hangzhou-zmf.aliyuncs.com,id=MD5xxxxxxxx,key=MD5xxxxxxxx,bucket=067862}

The encrypted information is preceded by the MD5 hash value. The remainder of the total length divided by 8 is 3. Therefore, encryption is not performed again when the exported data is imported. But you cannot create the key and ID preceded by an MD5 hash value.