This topic describes how to import data from an Object Storage Service (OSS) bucket into an ApsaraDB RDS for PostgreSQL instance by using the oss_fdw plug-in. This topic also describes how to export data from an ApsaraDB RDS for PostgreSQL instance to an OSS bucket by using the oss_fdw plug-in. The oss_fdw plug-in is also supported for RDS instances that run the PPAS database engine.

Prerequisites

Your RDS instance runs one of the following database engine versions:

  • PostgreSQL 13
  • PostgreSQL 12
  • PostgreSQL 11
  • PostgreSQL 10
  • PostgreSQL 9.4

Examples

# Create an oss_fdw plug-in for your RDS instance.
create extension oss_fdw;  ---If your RDS instance runs the PPAS database engine, execute the following statement: select rds_manage_extension('create','oss_fdw');
# Create an OSS server. 
CREATE SERVER ossserver FOREIGN DATA WRAPPER oss_fdw OPTIONS 
     (host 'oss-cn-hangzhou.aliyuncs.com' , id 'xxx', key 'xxx',bucket 'mybucket');
# Create a foreign OSS table named ossexample.
CREATE FOREIGN TABLE ossexample 
    (date text, time text, open float,
     high float, low float, volume int) 
     SERVER ossserver 
     OPTIONS ( filepath 'osstest/example.csv', delimiter ',' ,
         format 'csv', encoding 'utf8', PARSE_ERRORS '100');
# Create a table named example on your RDS instance. The table is used to store the data that is imported into your RDS instance.
create table example
        (date text, time text, open float,
         high float, low float, volume int);
# Import data from the ossexample table into the example table.
insert into example select * from ossexample;

# Use the oss_fdw plug-in to estimate the size of the ossexample table and formulate a query plan.
explain insert into example select * from ossexample;
                             QUERY PLAN                              
---------------------------------------------------------------------
 Insert on example  (cost=0.00..1.60 rows=6 width=92)
   ->  Foreign Scan on ossexample  (cost=0.00..1.60 rows=6 width=92)
         Foreign OssFile: osstest/example.csv.0
         Foreign OssFile Size: 728
(4 rows)
# Export data from the example table to the ossexample table.
insert into ossexample select * from example;
explain insert into ossexample select * from example;
                           QUERY PLAN
-----------------------------------------------------------------
 Insert on ossexample  (cost=0.00..16.60 rows=660 width=92)
   ->  Seq Scan on example  (cost=0.00..16.60 rows=660 width=92)
(2 rows)

For more information about the parameters in the preceding examples, see the following sections.

Parameters supported by the oss_fdw plug-in

Similar to other foreign data wrappers (FDWs), the oss_fdw plug-in encapsulates foreign data that is stored in OSS buckets. You can use the oss_fdw plug-in to read data from OSS buckets. This process is similar to the process of reading data tables. The oss_fdw plug-in provides some unique parameters that are used to access a specified OSS bucket and parse the OSS objects in the OSS bucket.

Note
  • The oss_fdw plug-in can read and write data to OSS objects in the following formats: TEXT and CSV. These OSS objects include the TEXT objects and CSV objects that are compressed by using gzip.
  • The value of each parameter that is used by the oss_fdw plug-in must be enclosed in a pair of double quotation marks (""). In addition, the value of each parameter cannot contain unnecessary spaces.

Parameters in the CREATE SERVER statement

Parameter Description
ossendpoint The internal endpoint of the OSS bucket. The endpoint is also referred to as the host address.
id oss The AccessKey ID of the account that is used to access OSS.
key oss The AccessKey secret of the account that is used to access OSS.
bucket The OSS bucket that stores the objects whose data you want to read or write. Before you set this parameter, you must create an account that is used to access OSS.

The following table describes the fault tolerance parameters that are provided by OSS. If network connectivity is poor, you can adjust the values of these parameters to ensure successful import and export.

Parameter Description
oss_connect_timeout The period of time after which the connection to OSS times out. Unit: seconds. Default value: 10.
oss_dns_cache_timeout The period of time after which the cached Domain Name System (DNS) record times out. Unit: seconds. Default value: 60.
oss_speed_limit The minimum transmission rate that can be tolerated. Unit: bit/s. Default value: 1024. The default value is equal to 1 Kbit/s.
oss_speed_time The maximum period of time for which the minimum transmission rate can be tolerated. Unit: seconds. Default value: 15.
Note You can retain the default values of the oss_speed_limit and oss_speed_time parameters. In this case, if the transmission rate remains less than 1 Kbit/s for 15 consecutive seconds, a time-out occurs.

Parameters in the CREATE FOREIGN TABLE statement

Parameter Description
filepath The object name that is used to match objects stored in the OSS bucket. The object name must contain an OSS path.
  • The object name does not contain an OSS bucket name.
  • The object name matches multiple objects that are stored in the OSS path. This allows you to import data from multiple objects into your RDS instance.
  • Only the data from the objects that are named in the following formats can be imported into your RDS instance: filepath and filepath.x. The values of the x variable must be consecutive integers that start from 1.

    For example, the OSS path stores five objects: filepath, filepath.1, filepath.2, filepath.3, and filepath.5. In this case, filepath, filepath.1, filepath.2, and filepath.3 are matched and imported, but filepath.5 cannot be matched or imported.

dir The folder that is used to match objects stored in the OSS bucket.
  • The folder must end with a forward slash (/).
  • The data of all the objects in the folder are matched and imported into your RDS instance. However, these objects do not include the subfolders and the objects stored in the subfolders.
prefix The prefix of the OSS path in the OSS bucket. This parameter does not support regular expressions. You can configure only one of the prefix, filepath, and dir parameters.
format The format that is supported for the objects stored in the OSS bucket. Only the CSV format is supported.
encoding The format that is used to encode data in the objects stored in the OSS bucket. Common encoding formats in PostgreSQL are supported. These supported formats include UTF-8.
parse_errors The mode that is used to tolerate faults during the parsing process. If an error occurs during the parsing process, the entire row that encounters the error is ignored.
delimiter The delimiter that is used to separate columns in the objects stored in the OSS bucket.
quote The quote character that is supported for the objects stored in the OSS bucket.
escape The escape character that is supported for the objects stored in the OSS bucket.
null Populates an empty column by using null values. For example, you specify the null 'test' setting. In this case, if the test column is empty, it is populated by using null values.
force_not_null Populates an empty column by using empty strings rather than null values. For example, you specify the force_not_null 'id' setting. In this case, if the ID column is empty, it is populated by using empty strings rather than null values.
compressiontype The compression format that is used to read and write data to the objects stored in the OSS bucket.
  • none: The data is not compressed. This is the default value.
  • gzip: The data is compressed in the GZIP format.
compressionlevel The compression level that is used to write data to the objects stored in the OSS bucket. Valid values: 1 to 9. Default value: 6.
Note
  • The filepath and dir parameters are specified in the OPTIONS parameter.
  • You must specify the filepath parameter or the dir parameter. Do not specify both parameters.
  • If you export data from your RDS instance to the OSS bucket, you can specify only the dir parameter. You cannot specify the filepath parameter.

Parameters in the CREATE FOREIGN TABLE statement

  • oss_flush_block_size: the buffer size of the data that can be written to the OSS bucket at a time. Valid values: 1 to 128. Unit: MB. Default value: 32.
  • oss_file_max_size: the maximum amount of data that can be written to an object in the OSS bucket. If the amount of data that needs to be written reaches the maximum value, the data that remains is written to a new object. Valid values: 8 to 4000. Unit: MB. Default value: 1024.
  • num_parallel_worker: the maximum number of threads that can run in parallel to compress the data written to the OSS bucket. Valid values: 1 to 8. Default value: 3.

Auxiliary functions

FUNCTION oss_fdw_list_file (relname text, schema text DEFAULT 'public')

  • This function is used to obtain the name and size of the OSS object that the specified foreign table matches.
  • The size is measured in bytes.
select * from oss_fdw_list_file('t_oss');
              name              |   size    
--------------------------------+-----------
 oss_test/test.gz.1  | 739698350
 oss_test/test.gz.2  | 739413041
 oss_test/test.gz.3  | 739562048
(3 rows)

Auxiliary parameters

oss_fdw.rds_read_one_file: specifies the OSS object that a foreign table matches. This parameter is supported only when you import data from the OSS bucket into your RDS instance. If you specify this parameter, only the OSS object that the specified foreign table matches is imported.

Example: set oss_fdw.rds_read_one_file = 'oss_test/example16.csv.1';

set oss_fdw.rds_read_one_file = 'oss_test/test.gz.2';
select * from oss_fdw_list_file('t_oss');
              name              |   size    
--------------------------------+-----------
  oss_test/test.gz.2  | 739413041
(1 rows)

Usage notes

  • The oss_fdw plug-in is developed based on the PostgreSQL FOREIGN TABLE framework to manage foreign tables.
  • Data import performance varies based on the available PostgreSQL and OSS resources. The PostgreSQL resources are CPU, I/O, and memory.
  • To import data at high performance, make sure that your RDS instance and the OSS bucket reside in the same region. For more information, see OSS domain names.
  • If the error "ERROR: oss endpoint userendpoint not in aliyun white list" is reported when SQL statements are read from the foreign table, we recommend that you use the public OSS endpoint that is provided for the specified region. For more information, see Regions and endpoints. If the error persists, submit a ticket.

Troubleshooting

If an import or export error occurs, the following error information is logged:

  • code: the HTTP status code of the request that has failed.

  • error_code: the error code that is returned by OSS.

  • error_msg: the error message that is returned by OSS.

  • req_id: the universally unique identifier (UUID) of the request that has failed. If you require assistance from OSS developers, you can submit a ticket that contains the req_id parameter of the failed request.

For more information about various errors, see the following documentation. You can handle time-out errors by reconfiguring the parameters related to the oss_ext plug-in.

Encryption of AccessKey ID and AccessKey secret

If you do not encrypt the values of the id and key parameters in the CREATE SERVER statement, other users can obtain your AccessKey pair in plaintext by executing the select * from pg_foreign_server statement. You can use symmetric encryption to encrypt the values of the id and key parameters. Use different keys for different RDS instances. This further protects your AccessKey pair. However, you cannot add data types as you can in Greenplum. This prevents incompatibility with earlier versions.

The following snippet provides the encrypted values of the id and key parameters:

postgres=# select * from pg_foreign_server ;
  srvname  | srvowner | srvfdw | srvtype | srvversion | srvacl |                                                                              srvoptions
-----------+----------+--------+---------+------------+--------+------------------------------------------------------------------------------------------------------------------------------------
----------------------------------
 ossserver |       10 |  16390 |         |            |        | {host=oss-cn-hangzhou-zmf.aliyuncs.com,id=MD5xxxxxxxx,key=MD5xxxxxxxx,bucket=067862}

Each encrypted value starts with an MD5 string. The total length divided by 8 is 3. After these encrypted values are exported, they will not be encrypted again. Take note that you cannot create an AccessKey ID or AccessKey secret that starts with an MD5 string.