This topic describes how to read and write external data files by using the oss_fdw plugin. In Alibaba Cloud, you can use this plugin to load data from OSS to an RDS PostgreSQL or RDS PPAS instance. You can also write data from an RDS PostgreSQL or RDS PPAS instance to OSS.

Prerequisites

The RDS instance runs PPAS 10.

oss_fdw example

# Create a plugin for apsaradb RDS for PPAS
select rds_manage_extension('create','oss_fdw');
# Create a server 
CREATE SERVER ossserver FOREIGN DATA WRAPPER oss_fdw OPTIONS 
     (host 'oss-cn-hangzhou.aliyuncs.com', id 'xxx', key 'xxx', bucket 'mybucket');
# Create an oss external table.
CREATE FOREIGN TABLE ossexample 
    (date text, time text, open float,
     high float, low float, volume int) 
     SERVER ossserver 
     OPTIONS ( filepath 'osstest/example.csv', delimiter ',' ,
         format 'csv', encoding 'utf8', PARSE_ERRORS '100');
# Create a table for which data is loaded.
create table example
        (date text, time text, open float,
         high float, low float, volume int)
# Load data from ossexample to example.
insert into example select * from ossexample;

# oss_fdw estimates the file size in oss and formulates a query plan correctly.
explain insert into example select * from ossexample;
                             QUERY PLAN                              
---------------------------------------------------------------------
 Insert on example (cost=0.00..1.60 rows=6 width=92)
   ->  Foreign Scan on ossexample (cost=0.00..1.60 rows=6 width=92)
         Foreign OssFile: osstest/example.csv.0
         Foreign OssFile Size: 728
(4 rows)
# Write the data in the example table to OSS.
insert into ossexample select * from example;
explain insert into ossexample select * from example;
                           QUERY PLAN
-----------------------------------------------------------------
 Insert on ossexample (cost=0.00..16.60 rows=660 width=92)
   ->  Seq Scan on example (cost=0.00..16.60 rows=660 width=92)
(2 rows)

For a description of the parameters, see the following section.

oss_fdw parameters

The oss_fdw plug-in uses a method similar to other Foreign Data Wrapper (FDW) interfaces to encapsulate external data stored in OSS. You can use oss_fdw to read data stored in OSS. This process is similar to reading data tables. oss_fdw provides unique parameters to connect and parse file data in OSS.

Note
  • Currently, oss_fdw can read and write files in text, csv, or gzip format.
  • The value of each parameter must be enclosed in double quotation marks (") and cannot contain any unnecessary spaces.

CREATE SERVER parameters

Parameter Description
ossendpoint Is the address used to access OSS from the intranet, also known as the host.
id oss The id of the account.
key oss The account key.
bucket Bucket, you need to create an OSS account before configuring this parameter.

The following fault tolerance parameters can be used for data import and export. If network connectivity is poor, you can adjust these parameters to ensure successful import and export.

Parameter Description
oss_connect_timeout The connection timeout period. Unit: seconds. Default value: 10.
oss_dns_cache_timeout The DNS timeout period. Unit: seconds. Default value: 60.
oss_speed_limit The minimum allowed rate. Default value: 1024, that is, 1 kB.
oss_speed_time The maximum time when the minimum transmission rate is tolerated. Default value: 15. Unit: seconds.
Note If the default values of oss_speed_limit and oss_speed_time are used, a timeout error occurs when the transmission rate is smaller than 1 Kbit/s for 15 consecutive seconds.

CREATE FOREIGN TABLE parameters

Parameter Description
filepath The name of the file that contains a path in OSS.
  • A file name contains a path but not a bucket name.
  • This parameter matches multiple files in the corresponding path in OSS. You can load multiple files to a database.
  • You can import files named in the format of filepath or filepath.x to a database. The values of x must be consecutive numbers starting from 1.

    For example, among the files named filepath, filepath.1, filepath.2, filepath.3, and filepath.5, the first four files are matched and imported. The filepath.5 file is not imported.

dir The virtual file directory in OSS.
  • dir must end with (/).
  • All files (excluding subfolders and files in subfolders) in the virtual file directory specified by dir will be matched and imported to a database.
Prefix The prefix of the path name corresponding to the data file. The prefix does not support regular expressions. Only one parameter among prefix, filepath, and dir can be specified at a time because they are mutually exclusive.
format Specifies the file format. Currently, only the csv format is supported.
encoding The encoding format of data in the file. It supports common PostgreSQL encoding formats, such as UTF-8.
parse_errors The fault-tolerant parsing mode ignores the errors that occur during the parsing process.
delimiter Specifies the column delimiter.
quote Specifies the reference character for the file.
escape Specifies the escape character for the file.
null Specifies that a column matching a specified string is null. For example, null 'test' indicates that the string whose column value is 'test' is null.
force_not_null The column values are not null. For example, force_not_null 'id' is used to set the value of the 'id' column to empty strings.
compressiontype The formats of the files to be read and written in OSS.
  • none: uncompressed text files. This is the default value.
  • gzip: The files to be read must be gzip compressed.
compressionlevel The compression level of compression format written to OSS. Valid values: 1 to 9. Default value: 6.
Note
  • You must specify filepath and dir in the OPTIONS parameter.
  • You must specify either filepath or dir.
  • The export mode only supports virtual folders, that is, only dir is supported.

Export mode parameters for CREATE FOREIGN TABLE

  • oss_flush_block_size: the buffer size for the data written to OSS at a time. Default value: 32 MB. Valid values: 1 MB to 128 MB.
  • oss_file_max_size: the maximum file size for the data written to OSS (subsequent data is written in another file when the maximum file size is exceeded). Default value: 1024 MB. Valid values: 8 MB to 4000 MB.
  • num_parallel_worker: the number of parallel compression threads in which the OSS data is written. Valid values: 1 to 8. Default value: 3.

Auxiliary functions

FUNCTION oss_fdw_list_file (relname text, schema text DEFAULT 'public')

  • Obtains the name and size of the OSS file that an external table matches.
  • The unit of file size is Byte.
select * from oss_fdw_list_file('t_oss');
              name              |   size    
--------------------------------+-----------
 oss_test/test.gz.1  | 739698350
 oss_test/test.gz.2  | 739413041
 oss_test/test.gz.3  | 739562048
(3 rows)

Auxiliary features

oss_fdw.rds_read_one_file: In read mode, it is used to specify a file to match the external table. If the file is specified, the external table only matches this file during data import.

Example: set oss_fdw.rds_read_one_file = 'oss_test/example16.csv.1';

set oss_fdw.rds_read_one_file = 'oss_test/test.gz.2';
select * from oss_fdw_list_file('t_oss');
              name              |   size    
--------------------------------+-----------
  oss_test/test.gz.2  | 739413041
(1 rows)

oss_fdw notes

  • oss_fdw is an external table plug-in developed based on the PostgreSQL FOREIGN TABLE framework.
  • The data import performance is related to the CPU IO MEM MET of PPAS instances and OSS.
  • To guarantee that data is correctly imported, you need to make sure that the Region of the DTS server is the same as that of the OSS server. For more information, see OSS endpoint information.
  • If the SQL of the external table is read, ERROR: oss endpoint userendpoint not in aliyun white list, we recommend that you use the endpoint shared by Alibaba Cloud in each zone. For more information, see Regions and endpoints. If the problem persists, submit a ticket.

Error handling

When an import or export error occurs, the log displays the following error information:

  • code: the HTTP status code of the request that has failed.

  • error_code: the error code returned by OSS.

  • error_msg: the error message returned by OSS.

  • req_id: the UUID that identifies the request. If you cannot solve the problem, you can seek help from OSS development engineers by providing the req_id.

For more information about error types, see the following topics. Timeout errors can be handled by using oss_ext parameters.

ID and key encryption

If id and key parameters for CREATE SERVER are not encrypted, executing the select * from pg_foreign_server statement will display the information in plaintext. Your ID and key will be exposed. The symmetric encryption can be performed on the id and key to hide the id and key. (different instances use different keys to maximize the protection of user information.) however, you cannot use methods like GP to add a data type.

The encrypted information is as follows:

postgres=# select * from pg_foreign_server ;
  srvname  | srvowner | srvfdw | srvtype | srvversion | srvacl |                                                                              srvoptions
-----------+----------+--------+---------+------------+--------+------------------------------------------------------------------------------------------------------------------------------------
----------------------------------
 ossserver |       10 |  16390 |         |            |        | {host=oss-cn-hangzhou-zmf.aliyuncs.com,id=MD5xxxxxxxx,key=MD5xxxxxxxx,bucket=067862}

The encrypted information is preceded by MD5 (total length: len,len%8==3). Therefore, encryption is not performed again when the exported data is imported. But you cannot create the key and id preceded by MD5.