Alibaba Cloud allows you to use the oss_fdw plug-in to load data from an OSS bucket to a database on an ApsaraDB RDS for PostgreSQL or PPAS instance and write data from the database to the OSS bucket.

Prerequisites

The RDS instance runs one of the following PostgreSQL versions:

  • PostgreSQL 9.4
  • PostgreSQL 10

Example

# Create an oss_fdw plug-in for the RDS for PostgreSQL database.
create extension oss_fdw;  --- For an RDS for PPAS database, execute the select rds_manage_extension('create','oss_fdw'); statement.
# Create a server. 
CREATE SERVER ossserver FOREIGN DATA WRAPPER oss_fdw OPTIONS 
     (host 'oss-cn-hangzhou.aliyuncs.com' , id 'xxx', key 'xxx',bucket 'mybucket');
# Create a foreign table for OSS.
CREATE FOREIGN TABLE ossexample 
    (date text, time text, open float,
     high float, low float, volume int) 
     SERVER ossserver 
     OPTIONS ( filepath 'osstest/example.csv', delimiter ',' ,
         format 'csv', encoding 'utf8', PARSE_ERRORS '100');
# Create a table to store loaded data.
create table example
        (date text, time text, open float,
         high float, low float, volume int);
# Load data from the ossexample table to the example table.
insert into example select * from ossexample;

# oss_fdw estimates the data size in the OSS bucket and configures a query plan.
explain insert into example select * from ossexample;
                             QUERY PLAN                              
---------------------------------------------------------------------
 Insert on example  (cost=0.00..1.60 rows=6 width=92)
   ->  Foreign Scan on ossexample  (cost=0.00..1.60 rows=6 width=92)
         Foreign OssFile: osstest/example.csv.0
         Foreign OssFile Size: 728
(4 rows)
# Write data from the example table to the ossexample table.
insert into ossexample select * from example;
explain insert into ossexample select * from example;
                           QUERY PLAN
-----------------------------------------------------------------
 Insert on ossexample  (cost=0.00..16.60 rows=660 width=92)
   ->  Seq Scan on example  (cost=0.00..16.60 rows=660 width=92)
(2 rows)

For information about the parameters, see the following sections.

oss_fdw parameters

The oss_fdw plug-in uses a method similar to other Foreign Data Wrappers (FDWs) to encapsulate external data stored in OSS buckets. You can use oss_fdw to read data from OSS buckets. This process is similar to reading data from tables. oss_fdw provides unique parameters to connect to OSS buckets and parse file data.

Note
  • oss_fdw can read and write the following types of files in OSS buckets: TEXT and CSV files, including TEXT and CSV files compressed by using gzip.
  • The value of each parameter must be enclosed in double quotation marks (") and cannot contain any unnecessary spaces.

CREATE SERVER parameters

Parameter Description
ossendpoint The internal OSS endpoint, which is used as the host address.
id oss The account ID.
key oss The account key.
bucket The OSS bucket. You must create an OSS account before you configure this parameter.

The following fault tolerance parameters can be used for data import and export. If the network connectivity is poor, you can adjust these parameters to ensure successful import and export.

Parameter Description
oss_connect_timeout The timeout period of connection. Unit: seconds. Default value: 10.
oss_dns_cache_timeout The timeout period of DNS cache. Unit: seconds. Default value: 60.
oss_speed_limit The minimum transmission rate. Unit: bit/s. Default value: 1024 (1 Kbit/s).
oss_speed_time The maximum time period to tolerate the minimum transmission rate. Unit: seconds. Default value: 15.
Note If the default values of oss_speed_limit and oss_speed_time are used, a timeout occurs when the transmission rate is slower than 1 Kbit/s for 15 consecutive seconds.

CREATE FOREIGN TABLE parameters

Parameter Description
filepath The file name that contains the path in the OSS bucket.
  • The file name contains the path and does not contain the bucket name.
  • This parameter matches multiple files in the specified path in the OSS bucket, so you can load multiple files to the database at a time.
  • You can import files named in the format of filepath or filepath.x to the database. The values of x must be consecutive numbers starting from 1.

    For example, among the files named filepath, filepath.1, filepath.2, filepath.3, and filepath.5, the first four files are matched and imported. The filepath.5 file is not imported.

dir The virtual file directory in the OSS bucket.
  • The value must end with a forward slash (/).
  • All files (excluding subfolders and files in the subfolders) in the specified virtual file directory are matched and imported to the database.
prefix The path name prefix of the data file. The prefix does not support regular expressions. You can only configure one of the prefix, filepath, and dir parameters.
format The file format. Set the value to csv.
encoding The data encoding format. Common encoding formats in PostgreSQL are supported, such as UTF-8.
parse_errors The fault-tolerant parsing mode. If an error occurs in the parsing process, the entire row of data is ignored.
delimiter The column delimiter.
quote The quote character for the file.
escape The escape character for the file.
null This parameter sets the value of the column matching a specified string to null. For example, null 'test' sets the value of the test column to null.
force_not_null This parameter sets the value of the specified column to a non-null value. For example, if the value of the id column is empty, force_not_null 'id' sets the value of the id column to an empty string, instead of null.
compressiontype The format of files to be read and written in the OSS bucket.
  • none: the uncompressed text files. This is the default value.
  • gzip: the gzip-compressed files.
compressionlevel The compression level of files written to the OSS bucket. Valid values: 1 to 9. Default value: 6.
Note
  • The filepath and dir parameters are configured in the OPTIONS field.
  • You must specify either the filepath or dir parameter.
  • The export only supports the dir parameter and does not support the filepath parameter.

CREATE FOREIGN TABLE parameters in the export

  • oss_flush_block_size: the buffer size for data written to the OSS bucket at a time. Default value: 32 MB. Valid values: 1 MB to 128 MB.
  • oss_file_max_size: the maximum file size for data written to the OSS bucket. Extra data is written to another file. Default value: 1024 MB. Valid values: 8 MB to 4000 MB.
  • num_parallel_worker: the number of parallel compression threads in which data is written to the OSS bucket. Valid values: 1 to 8. Default value: 3.

Auxiliary functions

FUNCTION oss_fdw_list_file (relname text, schema text DEFAULT ‘public’)

  • This function obtains the names and sizes of files that match a foreign table in the OSS bucket.
  • The unit of the file size is bytes.
select * from oss_fdw_list_file('t_oss');
              name              |   size    
--------------------------------+-----------
 oss_test/test.gz.1  | 739698350
 oss_test/test.gz.2  | 739413041
 oss_test/test.gz.3  | 739562048
(3 rows)

Auxiliary features

oss_fdw.rds_read_one_file: specifies the file that matches the foreign table when data is read. The foreign table only matches the specified file during data import.

Example: set oss_fdw.rds_read_one_file = ‘oss_test/example16.csv.1’;

set oss_fdw.rds_read_one_file = 'oss_test/test.gz.2';
select * from oss_fdw_list_file('t_oss');
              name              |   size    
--------------------------------+-----------
  oss_test/test.gz.2  | 739413041
(1 rows)

oss_fdw usage notes

  • oss_fdw is a foreign table plug-in developed based on the PostgreSQL FOREIGN TABLE framework.
  • Data import efficiency varies based on the OSS configuration and PostgreSQL cluster resources, such as CPU, I/O, and memory.
  • Make sure that the RDS for PostgreSQL instance resides in the same region as the OSS bucket. This ensures data import efficiency. For more information, see OSS domain names.
  • If the error ERROR: oss endpoint userendpoint not in aliyun white list is reported when SQL statements are read from the foreign table, you can use the public OSS endpoint of the required region. For more information, see Regions and endpoints. If the problem persists, submit a ticket.

Error information

When an error occurs during the import or export, the following error information is recorded in logs:

  • code: the HTTP status code of the failed request.

  • error_code: the error code returned by OSS.

  • error_msg: the error message returned by OSS.

  • req_id: the UUID that identifies the request. If you require assistance in solving a problem, you can submit a ticket containing req_id of the failed request to OSS developers.

For more information about the errors, see the following references. Timeout errors can be handled based on oss_ext parameters.

ID and key encryption

If the id and key parameters in CREATE SERVER are not encrypted, other users can obtain your ID and key in plaintext by executing the select * from pg_foreign_server statement. You can use symmetric encryption to hide the ID and key and use different keys for different instances to protect your data. However, to avoid incompatibility with instances of earlier versions, do not add data types as you do in Greenplum.

The encrypted ID and key are displayed as follows:

postgres=# select * from pg_foreign_server ;
  srvname  | srvowner | srvfdw | srvtype | srvversion | srvacl |                                                                              srvoptions
-----------+----------+--------+---------+------------+--------+------------------------------------------------------------------------------------------------------------------------------------
----------------------------------
 ossserver |       10 |  16390 |         |            |        | {host=oss-cn-hangzhou-zmf.aliyuncs.com,id=MD5xxxxxxxx,key=MD5xxxxxxxx,bucket=067862}

The encrypted string starts with MD5. The total length divided by 8 gets a remainder of 3. Encryption is not performed again when the exported data is imported. You cannot create a key and ID that starts with MD5.