This topic describes how to read and write external data files by using the oss_fdw plugin. In Alibaba Cloud, you can use this plugin to load data from OSS to an RDS PostgreSQL or RDS PPAS instance. You can also write data from an RDS PostgreSQL or RDS PPAS instance to OSS.
Prerequisites
The RDS instance runs PPAS 10.
oss_fdw example
# Create a plugin for apsaradb RDS for PPAS
select rds_manage_extension('create','oss_fdw');
# Create a server
CREATE SERVER ossserver FOREIGN DATA WRAPPER oss_fdw OPTIONS
(host 'oss-cn-hangzhou.aliyuncs.com', id 'xxx', key 'xxx', bucket 'mybucket');
# Create an oss external table.
CREATE FOREIGN TABLE ossexample
(date text, time text, open float,
high float, low float, volume int)
SERVER ossserver
OPTIONS ( filepath 'osstest/example.csv', delimiter ',' ,
format 'csv', encoding 'utf8', PARSE_ERRORS '100');
# Create a table for which data is loaded.
create table example
(date text, time text, open float,
high float, low float, volume int)
# Load data from ossexample to example.
insert into example select * from ossexample;
# oss_fdw estimates the file size in oss and formulates a query plan correctly.
explain insert into example select * from ossexample;
QUERY PLAN
---------------------------------------------------------------------
Insert on example (cost=0.00..1.60 rows=6 width=92)
-> Foreign Scan on ossexample (cost=0.00..1.60 rows=6 width=92)
Foreign OssFile: osstest/example.csv.0
Foreign OssFile Size: 728
(4 rows)
# Write the data in the example table to OSS.
insert into ossexample select * from example;
explain insert into ossexample select * from example;
QUERY PLAN
-----------------------------------------------------------------
Insert on ossexample (cost=0.00..16.60 rows=660 width=92)
-> Seq Scan on example (cost=0.00..16.60 rows=660 width=92)
(2 rows)
For a description of the parameters, see the following section.
oss_fdw parameters
The oss_fdw plug-in uses a method similar to other Foreign Data Wrapper (FDW) interfaces to encapsulate external data stored in OSS. You can use oss_fdw to read data stored in OSS. This process is similar to reading data tables. oss_fdw provides unique parameters to connect and parse file data in OSS.
- Currently, oss_fdw can read and write files in text, csv, or gzip format.
- The value of each parameter must be enclosed in double quotation marks (") and cannot contain any unnecessary spaces.
CREATE SERVER parameters
Parameter | Description |
---|---|
ossendpoint | Is the address used to access OSS from the intranet, also known as the host. |
id oss | The id of the account. |
key oss | The account key. |
bucket | Bucket, you need to create an OSS account before configuring this parameter. |
The following fault tolerance parameters can be used for data import and export. If network connectivity is poor, you can adjust these parameters to ensure successful import and export.
Parameter | Description |
---|---|
oss_connect_timeout | The connection timeout period. Unit: seconds. Default value: 10. |
oss_dns_cache_timeout | The DNS timeout period. Unit: seconds. Default value: 60. |
oss_speed_limit | The minimum allowed rate. Default value: 1024, that is, 1 kB. |
oss_speed_time | The maximum time when the minimum transmission rate is tolerated. Default value: 15. Unit: seconds. |
CREATE FOREIGN TABLE parameters
Parameter | Description |
---|---|
filepath | The name of the file that contains a path in OSS.
|
dir | The virtual file directory in OSS.
|
Prefix | The prefix of the path name corresponding to the data file. The prefix does not support regular expressions. Only one parameter among prefix, filepath, and dir can be specified at a time because they are mutually exclusive. |
format | Specifies the file format. Currently, only the csv format is supported. |
encoding | The encoding format of data in the file. It supports common PostgreSQL encoding formats, such as UTF-8. |
parse_errors | The fault-tolerant parsing mode ignores the errors that occur during the parsing process. |
delimiter | Specifies the column delimiter. |
quote | Specifies the reference character for the file. |
escape | Specifies the escape character for the file. |
null | Specifies that a column matching a specified string is null. For example, null 'test' indicates that the string whose column value is 'test' is null. |
force_not_null | The column values are not null. For example, force_not_null 'id' is used to set the value of the 'id' column to empty strings. |
compressiontype | The formats of the files to be read and written in OSS.
|
compressionlevel | The compression level of compression format written to OSS. Valid values: 1 to 9. Default value: 6. |
- You must specify filepath and dir in the OPTIONS parameter.
- You must specify either filepath or dir.
- The export mode only supports virtual folders, that is, only dir is supported.
Export mode parameters for CREATE FOREIGN TABLE
- oss_flush_block_size: the buffer size for the data written to OSS at a time. Default value: 32 MB. Valid values: 1 MB to 128 MB.
- oss_file_max_size: the maximum file size for the data written to OSS (subsequent data is written in another file when the maximum file size is exceeded). Default value: 1024 MB. Valid values: 8 MB to 4000 MB.
- num_parallel_worker: the number of parallel compression threads in which the OSS data is written. Valid values: 1 to 8. Default value: 3.
Auxiliary functions
FUNCTION oss_fdw_list_file (relname text, schema text DEFAULT 'public')
- Obtains the name and size of the OSS file that an external table matches.
- The unit of file size is Byte.
select * from oss_fdw_list_file('t_oss');
name | size
--------------------------------+-----------
oss_test/test.gz.1 | 739698350
oss_test/test.gz.2 | 739413041
oss_test/test.gz.3 | 739562048
(3 rows)
Auxiliary features
oss_fdw.rds_read_one_file: In read mode, it is used to specify a file to match the external table. If the file is specified, the external table only matches this file during data import.
Example: set oss_fdw.rds_read_one_file = 'oss_test/example16.csv.1';
set oss_fdw.rds_read_one_file = 'oss_test/test.gz.2';
select * from oss_fdw_list_file('t_oss');
name | size
--------------------------------+-----------
oss_test/test.gz.2 | 739413041
(1 rows)
oss_fdw notes
- oss_fdw is an external table plug-in developed based on the PostgreSQL FOREIGN TABLE framework.
- The data import performance is related to the CPU IO MEM MET of PPAS instances and OSS.
- To guarantee that data is correctly imported, you need to make sure that the Region of the DTS server is the same as that of the OSS server. For more information, see OSS endpoint information.
- If the SQL of the external table is read,
ERROR: oss endpoint userendpoint not in aliyun white list
, we recommend that you use the endpoint shared by Alibaba Cloud in each zone. For more information, see Regions and endpoints. If the problem persists, submit a ticket.
Error handling
When an import or export error occurs, the log displays the following error information:
-
code: the HTTP status code of the request that has failed.
-
error_code: the error code returned by OSS.
-
error_msg: the error message returned by OSS.
-
req_id: the UUID that identifies the request. If you cannot solve the problem, you can seek help from OSS development engineers by providing the req_id.
For more information about error types, see the following topics. Timeout errors can be handled by using oss_ext parameters.
ID and key encryption
If id and key parameters for CREATE SERVER are not encrypted, executing the select * from pg_foreign_server
statement will display the information in plaintext. Your ID and key will be exposed.
The symmetric encryption can be performed on the id and key to hide the id and key.
(different instances use different keys to maximize the protection of user information.)
however, you cannot use methods like GP to add a data type.
The encrypted information is as follows:
postgres=# select * from pg_foreign_server ;
srvname | srvowner | srvfdw | srvtype | srvversion | srvacl | srvoptions
-----------+----------+--------+---------+------------+--------+------------------------------------------------------------------------------------------------------------------------------------
----------------------------------
ossserver | 10 | 16390 | | | | {host=oss-cn-hangzhou-zmf.aliyuncs.com,id=MD5xxxxxxxx,key=MD5xxxxxxxx,bucket=067862}
The encrypted information is preceded by MD5 (total length: len,len%8==3). Therefore, encryption is not performed again when the exported data is imported. But you cannot create the key and id preceded by MD5.