AnalyticDB for PostgreSQL supports the gpossext feature, which allows you to use an Object Storage Service (OSS) external table to import data from OSS to AnalyticDB for PostgreSQL in parallel.
Features
gpossext can read from and write to TEXT and CSV files, even when they are compressed in GZIP packages.
The following figure shows the gpossext architecture.

TEXT and CSV format description
The following parameters specify the formats of files read from and written to OSS. You can specify the parameters in the external table DDL parameters.
\n
: the string used as a line delimiter or line break for TEXT and CSV files.- DELIMITER: the string used to delimit columns.
- If you specify the DELIMITER parameter, you must also specify the QUOTE parameter.
- Recommended column delimiters include commas (,), vertical bars (|), and special characters such as \t.
- QUOTE: encloses user data that contains special characters by column.
- The pair of characters specified by the QUOTE parameter is used to distinguish user data from control characters.
- To improve efficiency, we recommend that you do not enclose data such as integers in QUOTE characters.
- QUOTE characters cannot be the same as those specified by DELIMITER. The default value of QUOTE is a pair of double quotation marks ("").
- User data that contains QUOTE characters must also contain ESCAPE characters to differentiate user data from machine code.
- ESCAPE: the escape character.
- Place an escape character before a special character that needs to be escaped to indicate that it is not a special character.
- The default value of ESCAPE is the same as that of QUOTE.
- You can also use other characters as escape characters, such as backslashes (\) used by MySQL by default.
Control character | TEXT | CSV |
---|---|---|
DELIMITER | \t (Tab) | , (Comma) |
QUOTE | " (Double quotation mark) | " (Double quotation mark) |
ESCAPE | N/A | " (Double quotation mark) |
NULL | \N (Backslash-N) | Empty string without quotation marks |
Precautions
- The syntax to create and use external tables is the same as that of Greenplum Database, except for the syntax of location-related parameters.
- The performance of data import and export depends on the OSS performance and resources
of AnalyticDB for PostgreSQL instances, such as CPU, I/O, memory, and network resources. To maximize import and
export performance, we recommend that you use column store and compression when you
create a table. For example, you can specify the following clause:
“WITH (APPENDONLY=true, ORIENTATION=column, COMPRESSTYPE=zlib, COMPRESSLEVEL=5, BLOCKSIZE=1048576)
. For more information, see CREATE TABLE. - To maximize import and export performance, you must make sure that the OSS bucket and the AnalyticDB for PostgreSQL instance reside in the same region.
Procedure
Examples
This topic uses a table named example to describe how to import data from OSS.
Execute the following query plan. The result shows that the compute nodes import data from OSS in parallel. The redistribution motion node hashes the data and distributes the data to corresponding compute nodes. The compute nodes that receive the data insert the data to a database.
EXPLAIN INSERT INTO example SELECT * FROM ossexample;
QUERY PLAN
-----------------------------------------------------------------------------------------------
Insert (slice0; segments: 4) (rows=250000 width=92)
-> Redistribute Motion 4:4 (slice1; segments: 4) (cost=0.00..11000.00 rows=250000 width=92)
Hash Key: ossexample.date
-> External Scan on ossexample (cost=0.00..11000.00 rows=250000 width=92)
(4 rows)
SDK troubleshooting
When an error occurs during the import or export process, the error log contains the following information:
- code: the HTTP status code of the request that has failed.
- error_code: the error code that is returned by OSS.
- error_msg: the error message that is returned by OSS.
- req_id: the universally unique identifier (UUID) of the request that has failed. If you require assistance in solving a problem, you can submit a ticket that contains the req_id value of the failed request.
For more information, see Error responses. You can handle timeout-related errors by using parameters related to oss_ext.