Alibaba Cloud allows you to use the oss_fdw plug-in to load data from Object Storage Service (OSS) to PolarDB databases and write data in PolarDB databases to OSS.

oss_fdw parameters

The oss_fdw plug-in uses the same method that is used by other Foreign Data Wrapper (FDW) interfaces to encapsulate the foreign data stored in OSS. You can use oss_fdw to read the data in OSS. This process is similar to reading data tables. oss_fdw provides unique parameters to connect to and parse the file data in OSS.

Note
  • oss_fdw can read and write the following types of files in OSS: TEXT and CSV files, including TEXT and CSV files compressed by using gzip.
  • The value of each parameter must be enclosed in double quotation marks (") and cannot contain unnecessary spaces.

CREATE SERVER parameters

  • ossendpoint: the endpoint used to access OSS through the internal network. This parameter is also known as the host.
  • id oss: the ID of the account.
  • key oss: the AccessKey pair of the account.
  • bucket: the OSS bucket where the data you want to access is stored. You must create an OSS account before you specify this parameter.

The following fault tolerance parameters can be used for data import and export. If the network condition is poor, you can adjust these parameters to ensure successful import and export.

  • oss_connect_timeout: indicates the connection timeout period. Default value: 10. Unit: seconds.
  • oss_dns_cache_timeout: indicates the DNS timeout period. Default value: 60. Unit: seconds.
  • oss_speed_limit: indicates the minimum data transmission rate. Default value: 1024. Unit: byte/s.
  • oss_speed_time: the maximum waiting period during which the data transmission rate is lower than the minimum value. The default value is 15 seconds.

If the default values of oss_speed_limit and oss_speed_time are used, a timeout error occurs if the transmission rate is lower than 1,024 byte/s for 15 consecutive seconds.

CREATE FOREIGN TABLE parameters

  • filepath: an OSS file name that contains the file path.
    • The file name contains the file path but not the bucket name.
    • This parameter matches multiple files in the specified OSS path. You can load multiple files to a database.
    • You can import files whose names follow the filepath or filepath.x format to a database. The values of x must be consecutive numbers starting from 1. For example, among the files named filepath, filepath.1, filepath.2, filepath.3, and filepath.5, the first four files are matched and imported, whereas the filepath.5 file is not imported.
  • dir: a virtual file directory in OSS.
    • The specified directory must end with a forward slash (/).
    • All files (excluding subfolders and files in subfolders) in the virtual file directory specified by dir are matched and imported to a database.
  • prefix: the prefix of the path name corresponding to the data file. Regular expressions are not supported. The prefix, filepath, and dir parameters are mutually exclusive. You can specify only one of them at a time.
  • format: the file format, which can only be csv.
  • encoding: the encoding format. It supports common PostgreSQL encoding formats, such as UTF-8.
  • parse_errors: The fault-tolerant parsing mode is used. If an error occurs in a row during the parsing process, the errors on the entire row are ignored.
  • delimiter: the delimiter of columns.
  • quote: the quote character for files.
  • escape: the escape character for files.
  • null: sets the column value that matches a specified string to null. For example, null 'test' is used to locate the column for which the value is 'test' and set the value to null.
  • force_not_null: disallows the values of the specified columns to be null. For example, force_not_null 'id' is used to set the rule that the values in the empty ID column cells are empty strings rather than null values.
  • compressiontype: specifies the format of the files to be read or written in OSS.
    • none: The files are uncompressed. This is the default value.
    • gzip: The files are compressed by using gzip.
  • compressionlevel: specifies the degree to which data files written to OSS are compressed. Valid values: 1 to 9. Default value: 6.
Note
  • You must specify filepath and dir in the OPTIONS parameter.
  • You must specify one and only one of filepath and dir.
  • The export mode supports only the dir parameter and does not support the filepath parameter.

Export mode parameters for CREATE FOREIGN TABLE

  • oss_flush_block_size: the buffer size of the data written to OSS at a time. Default value: 32 MB. Valid values: 1 MB to 128 MB.

  • oss_file_max_size: the maximum size of a data file to be written to OSS. If a data file reaches the maximum size, the remaining data is written to another data file. Default value: 1024. Valid values: 8 to 4000. Unit: MB.

  • num_parallel_worker: the maximum number of threads that are allowed to run in parallel to compress the data written to OSS. Valid values: 1 to 8. Default value: 3.

Auxiliary functions

FUNCTION oss_fdw_list_file (relname text, schema text DEFAULT 'public')

  • This function obtains the name and size of the OSS file that a foreign table matches.

  • The file size is measured in bytes.

select * from oss_fdw_list_file('t_oss');
              name              |   size    
--------------------------------+-----------
 oss_test/test.gz.1  | 739698350
 oss_test/test.gz.2  | 739413041
 oss_test/test.gz.3  | 739562048
(3 rows)

Auxiliary features

oss_fdw.rds_read_one_file: In read mode, this feature is used to specify a file to match the foreign table. The foreign table matches only the specified file during data import.

Example: set oss_fdw.rds_read_one_file = 'oss_test/example16.csv.1';

set oss_fdw.rds_read_one_file = 'oss_test/test.gz.2';
select * from oss_fdw_list_file('t_oss');
              name              |   size    
--------------------------------+-----------
  oss_test/test.gz.2  | 739413041
(1 rows)

oss_fdw examples

# Create a plug-in.
create extension oss_fdw; 
# Create a server. 
CREATE SERVER ossserver FOREIGN DATA WRAPPER oss_fdw OPTIONS 
     (host 'oss-cn-hangzhou.aliyuncs.com', id 'xxx', key 'xxx', bucket 'mybucket');
# Create an OSS foreign table.
CREATE FOREIGN TABLE ossexample 
    (date text, time text, open float,
     high float, low float, volume int) 
     SERVER ossserver 
     OPTIONS ( filepath 'osstest/example.csv', delimiter ',' ,
         format 'csv', encoding 'utf8', PARSE_ERRORS '100');
# Create a table to which you want to load data.
create table example
        (date text, time text, open float,
         high float, low float, volume int);
# Load data from the ossexample table to the example table.
insert into example select * from ossexample;
# Result
# oss_fdw estimates the file size in OSS and formulates a query plan.
explain insert into example select * from ossexample;
                             QUERY PLAN                              
---------------------------------------------------------------------
 Insert on example  (cost=0.00..1.60 rows=6 width=92)
   ->  Foreign Scan on ossexample  (cost=0.00..1.60 rows=6 width=92)
         Foreign OssFile: osstest/example.csv.0
         Foreign OssFile Size: 728
(4 rows)
# Write data from the example table to OSS.
insert into ossexample select * from example;
explain insert into ossexample select * from example;
                           QUERY PLAN
-----------------------------------------------------------------
 Insert on ossexample  (cost=0.00..16.60 rows=660 width=92)
   ->  Seq Scan on example  (cost=0.00..16.60 rows=660 width=92)
(2 rows)

oss_fdw usage notes

  • oss_fdw is a foreign table plug-in developed based on the PostgreSQL FOREIGN TABLE framework.

  • The data import performance is affected by the PolarDB cluster resources (CPU, I/O, memory, and network resources) and the OSS performance.

  • To ensure the performance of data import, ensure that the PolarDB cluster is in the same region as the OSS bucket. For more information, see OSS endpoints.

  • If the error ERROR: oss endpoint userendpoint not in aliyun white list is reported when SQL statements are read from the foreign table, we recommend that you use the public endpoint. For more information, see Public endpoints in Alibaba Cloud zones. If the problem persists, contact us.

Error handling

When an import or export error occurs, the log displays the following error information:

  • code: the HTTP status code of the failed request.

  • error_code: the error code returned by OSS.

  • error_msg: the error message returned by OSS.

  • req_id: the universally unique identifier (UUID) of the request. If you need assistance in solving a problem, you can submit a ticket that contains the req_id of the failed request to OSS engineers.

For more information about the errors, see the following references. Timeout errors can be handled by using oss_ext parameters.

Encryption of AccessKey ID and AccessKey secret

If the id and key parameters for CREATE SERVER are not encrypted, other users can obtain your AccessKey pair in plaintext by executing the select * from pg_foreign_server statement. You can use symmetric encryption to hide your AccessKey ID and AccessKey secret. Use different AccessKey pairs for different instances to further protect your information. However, to prevent incompatibility with earlier versions, do not add data types as you do in Greenplum.

Encrypted information:

postgres=# select * from pg_foreign_server ;
  srvname  | srvowner | srvfdw | srvtype | srvversion | srvacl |                                                                              srvoptions
-----------+----------+--------+---------+------------+--------+------------------------------------------------------------------------------------------------------------------------------------
----------------------------------
 ossserver |       10 |  16390 |         |            |        | {host=oss-cn-hangzhou-zmf.aliyuncs.com,id=MD5xxxxxxxx,key=MD5xxxxxxxx,bucket=067862}

The encrypted value starts with the MD5 string. The remainder of the total length divided by 8 is 3. Therefore, the data is not encrypted again after the exported data is imported. However, you cannot create an AccessKey ID or AccessKey secret that starts with the MD5 string.