COPY imports data from Object Storage Service (OSS) into an AnalyticDB for PostgreSQL table. UNLOAD exports query results from an AnalyticDB for PostgreSQL table to OSS. Both statements operate through OSS foreign tables. For background information, see Use OSS foreign tables to access OSS data.
Prerequisites
Before you begin, ensure that you have:
An AnalyticDB for PostgreSQL instance
An OSS bucket accessible with your AccessKey credentials
The
oss_fdwextension installed on the instance
COPY
Use the COPY statement to import data from an OSS bucket path or manifest file into an AnalyticDB for PostgreSQL table.
Syntax
COPY <table_name>
[ <column_list> ]
FROM <data_source>
ACCESS_KEY_ID '<access_key_id>'
SECRET_ACCESS_KEY '<secret_access_key>'
[ [ FORMAT ] [ AS ] <data_format> ]
[ MANIFEST ]
[ option '<value>' [ ... ] ]Parameters
| Parameter | Required | Description |
|---|---|---|
table_name | Yes | The target AnalyticDB for PostgreSQL table. The table must already exist in the instance. |
column_list | No | The columns to write data to. If omitted, data is written to all columns. |
data_source | Yes | The OSS path to read from. Format: oss://<bucket_name>/path_prefix. |
access_key_id | Yes | The AccessKey ID of an Alibaba Cloud account or Resource Access Management (RAM) user with OSS access. Use a RAM user with minimal required permissions rather than your root account credentials. For instructions, see Obtain an AccessKey pair. |
secret_access_key | Yes | The AccessKey secret corresponding to the AccessKey ID. For instructions, see Obtain an AccessKey pair. |
[ FORMAT ] [ AS ] <data_format> | No | The format of the source data. Defaults to CSV. Supported values: BINARY, CSV, JSON, JSONLINE, ORC, PARQUET, TEXT. FORMAT AS CSV, FORMAT CSV, and CSV are equivalent. |
MANIFEST | No | Treat data_source as a manifest file rather than a data path. See MANIFEST file format below. |
option 'value' | No | Additional options in key value format. See the options table below. |
Options
| Option | Type | Required | Description |
|---|---|---|---|
ENDPOINT | STRING | Yes | The OSS endpoint. For a list of endpoints by region, see Regions and endpoints. |
FDW | STRING | Yes | The name of the oss_fdw extension. Required to create a temporary OSS server for the COPY statement. |
FORMAT, FILETYPE, DELIMITER, ESCAPE, and others | — | — | Options for creating the temporary OSS foreign table. For details, see Overview of OSS foreign tables. |
MANIFEST file format
When MANIFEST is specified, data_source must point to a JSON file with the following structure:
{
"entries": [
{"url": "oss://adbpg-regress/local_t/_seg2_0.csv", "mandatory": true},
{"url": "oss://adbpg-regress/local_t/_seg1_0.csv", "mandatory": true},
{"url": "oss://adbpg-regress/local_t/_seg0_0.csv", "mandatory": true},
{"url": "oss://adbpg-regress-2/local_t/_seg1_0.csv", "mandatory": true},
{"url": "oss://adbpg-regress-2/local_t/_seg2_0.csv", "mandatory": true},
{"url": "oss://adbpg-regress-2/local_t/_seg0_0.csv", "mandatory": true}
]
}| Field | Description |
|---|---|
entries | An array of OSS objects. Objects can span different buckets and paths, but must all be accessible with the same AccessKey ID and secret. |
url | The full path of an OSS object. |
mandatory | If true, the import fails when the object is not found. |
Fault tolerance
When importing data, some rows may fail to parse. Configure the following options to handle errors without aborting the entire import:
| Option | Description |
|---|---|
log_errors 'true' | Record malformed rows to an error log instead of failing immediately. |
segment_reject_limit '<n>' | Stop the import and return an error if the number of rejected rows reaches n (for example, '10') or the rejection rate reaches n% (for example, '10%'). Use a count threshold for predictable error budgets; use a percentage threshold for variable-sized datasets where absolute counts are harder to reason about. |
After the import, query the error log:
SELECT * FROM gp_read_error_log('<table_name>');Error logs consume storage space. Delete them when no longer needed:
SELECT gp_truncate_error_log('<table_name>');Examples
Import selected columns from CSV
Create the target table.
CREATE TABLE local_t2 (a int, b float8, c text);Import data into columns
aandc. Columnbreceives NULL.COPY local_t2 (a, c) FROM 'oss://adbpg-regress/local_t/' ACCESS_KEY_ID 'LTAI****************' SECRET_ACCESS_KEY 'TNPP*************************' FORMAT AS CSV ENDPOINT 'oss-cn-hangzhou-internal.aliyuncs.com' FDW 'oss_fdw';Verify the import.
SELECT * FROM local_t2 LIMIT 10;Expected output:
a | b | c ----+---+---------------------------------- 12 | | a24cba6ebdc5e0c485cd88ef60b72fea 15 | | c4d3028f5205fab98e5f43c7945db4ba 20 | | 769884311db01f400e21a903a3f1cb50 ... (10 rows)Check whether the data in columns
aandcof thelocal_t2table matches the data in thelocal_ttable.SELECT sum(hashtext(t.a::text)) AS col_a_hash, sum(hashtext(t.c::text)) AS col_c_hash FROM local_t2 t;Expected output:
col_a_hash | col_c_hash -------------+------------- 23725368368 | 13447976580 (1 row)SELECT sum(hashtext(t.a::text)) AS col_a_hash, sum(hashtext(t.c::text)) AS col_c_hash FROM local_t t;Expected output:
col_a_hash | col_c_hash -------------+------------- 23725368368 | 13447976580 (1 row)Import ORC or Parquet files using the same syntax with a different
FORMAT:-- ORC COPY tt FROM 'oss://adbpg-regress/q_oss_orc_list/' ACCESS_KEY_ID 'LTAI****************' SECRET_ACCESS_KEY 'TNPP*************************' FORMAT AS ORC ENDPOINT 'oss-cn-hangzhou-internal.aliyuncs.com' FDW 'oss_fdw'; -- Parquet COPY tp FROM 'oss://adbpg-regress/test_parquet/' ACCESS_KEY_ID 'LTAI****************' SECRET_ACCESS_KEY 'TNPP*************************' FORMAT AS PARQUET ENDPOINT 'oss-cn-hangzhou-internal.aliyuncs.com' FDW 'oss_fdw';
Import from a manifest file
Create the target table.
CREATE TABLE local_manifest (a int, c text);Create a manifest file that references objects from multiple buckets.
{ "entries": [ {"url": "oss://adbpg-regress/local_t/_20210114103840_83f407434beccbd4eb2a0ce45ef39568_1450404435_seg2_0.csv", "mandatory": true}, {"url": "oss://adbpg-regress/local_t/_20210114103840_83f407434beccbd4eb2a0ce45ef39568_1856683967_seg1_0.csv", "mandatory": true}, {"url": "oss://adbpg-regress/local_t/_20210114103840_83f407434beccbd4eb2a0ce45ef39568_1880804901_seg0_0.csv", "mandatory": true}, {"url": "oss://adbpg-regress-2/local_t/_20210114103849_67100080728ef95228e662bc02cb99d1_1008521914_seg1_0.csv", "mandatory": true}, {"url": "oss://adbpg-regress-2/local_t/_20210114103849_67100080728ef95228e662bc02cb99d1_1234881553_seg2_0.csv", "mandatory": true}, {"url": "oss://adbpg-regress-2/local_t/_20210114103849_67100080728ef95228e662bc02cb99d1_1711667760_seg0_0.csv", "mandatory": true} ] }Import using the manifest file path.
COPY local_manifest FROM 'oss://adbpg-regress-2/unload_manifest/t_manifest' ACCESS_KEY_ID 'LTAI****************' SECRET_ACCESS_KEY 'TNPP*************************' FORMAT AS CSV MANIFEST ENDPOINT 'oss-cn-hangzhou-internal.aliyuncs.com' FDW 'oss_fdw';
Handle import errors with fault tolerance
Create the target table.
CREATE TABLE sales (id integer, value float8, x text) DISTRIBUTED BY (id);Import with error logging enabled. The import continues even if rows fail, but stops when 10 or more errors are found.
COPY sales FROM 'oss://adbpg-const/error_sales/' ACCESS_KEY_ID 'LTAI****************' SECRET_ACCESS_KEY 'TNPP*************************' FORMAT AS CSV log_errors 'true' segment_reject_limit '10' endpoint 'oss-cn-hangzhou-internal.aliyuncs.com' FDW 'oss_fdw';Expected output:
NOTICE: found 3 data formatting errors (3 or more input rows), rejected related input data COPY FOREIGN TABLEQuery the error details.
SELECT * FROM gp_read_error_log('sales');Expected output:
cmdtime | relname | filename | linenum | bytenum | errmsg | rawdata | rawbytes ----------------------------+------------------------------------------------+-------------------------+---------+---------+-----------------------------------------------------------+---------+---------- 2021-02-08 14:24:04.225238 | adbpgforeigntabletmp_20210208142403_1936866966 | error_sales/sales.2.csv | 2 | | invalid byte sequence for encoding "UTF8": 0xed 0xab 0xad | | \x 2021-02-08 14:24:04.225238 | adbpgforeigntabletmp_20210208142403_1936866966 | error_sales/sales.2.csv | 3 | | invalid byte sequence for encoding "UTF8": 0xed 0xab 0xad | | \x 2021-02-08 14:24:04.225269 | adbpgforeigntabletmp_20210208142403_1936866966 | error_sales/sales.3.csv | 2 | | invalid byte sequence for encoding "UTF8": 0xed 0xab 0xad | | \x (3 rows)
UNLOAD
Use the UNLOAD statement to export query results from an AnalyticDB for PostgreSQL table to OSS.
Usage notes
When exporting to CSV, enclose the following options in double quotes and write them in lowercase: delimiter, quote, null, header, escape, and encoding. Without quotes, these option names may be interpreted as SQL keywords, causing syntax errors.
UNLOAD ('SELECT * FROM test')
TO 'oss://adbpg-regress/local_t/'
ACCESS_KEY_ID 'LTAI****************'
SECRET_ACCESS_KEY 'TNPP*************************'
FORMAT csv
"delimiter" '|'
"quote" '"'
"null" ''
"header" 'true'
"escape" 'E'
"encoding" 'utf-8'
FDW 'oss_fdw'
ENDPOINT 'oss-cn-hangzhou-internal.aliyuncs.com';Syntax
UNLOAD ('<select_statement>')
TO <destination_url>
ACCESS_KEY_ID '<access_key_id>'
SECRET_ACCESS_KEY '<secret_access_key>'
[ [ FORMAT ] [ AS ] <data_format> ]
[ MANIFEST [ '<manifest_url>' ] ]
[ PARALLEL [ { ON | TRUE } | { OFF | FALSE } ] ]
[ option '<value>' [ ... ] ]Parameters
| Parameter | Required | Description |
|---|---|---|
select_statement | Yes | A SELECT statement whose results are written to OSS. |
destination_url | Yes | The OSS path to write to. Format: oss://<bucket_name>/path_prefix. |
access_key_id | Yes | The AccessKey ID of an Alibaba Cloud account or RAM user with OSS access. Use a RAM user with minimal required permissions rather than your root account credentials. For instructions, see Obtain an AccessKey pair. |
secret_access_key | Yes | The AccessKey secret corresponding to the AccessKey ID. For instructions, see Obtain an AccessKey pair. |
[ FORMAT ] [ AS ] <data_format> | No | The format of the exported data. Defaults to CSV. Supported values: CSV, ORC, TEXT. |
MANIFEST [ '<manifest_url>' ] | No | Generate a manifest file listing all exported objects. If manifest_url is specified, the manifest is written to that path (which must end with the manifest suffix) and can be in a different bucket from the data files. If omitted, the manifest path prefix matches the data files. If the manifest file already exists, set ALLOWOVERWRITE 'true' to overwrite it. |
PARALLEL | No | Controls parallel export. Default: ON. When ON, each compute node writes a separate output file — the number of output files equals the number of compute nodes in your instance. When OFF or FALSE, data is exported serially and saved to a single file if the total size is 8 GB or less. |
option 'value' | No | Additional options in key value format. See the options table below. |
Options
| Option | Type | Required | Description |
|---|---|---|---|
ENDPOINT | STRING | Yes | The OSS endpoint. For a list of endpoints by region, see Regions and endpoints. |
FDW | STRING | Yes | The name of the oss_fdw extension. Required to create a temporary OSS server for the UNLOAD statement. |
FORMAT, FILETYPE, DELIMITER, ESCAPE, and others | — | — | Options for creating the temporary OSS foreign table. For details, see Overview of OSS foreign tables. |
Examples
Export to CSV
Create the source table and insert test data.
CREATE TABLE local_t (a int, b float8, c text); INSERT INTO local_t SELECT r, random() * 1000, md5(random()::text) FROM generate_series(1, 1000) r;Verify the source data.
SELECT * FROM local_t LIMIT 5;Expected output:
a | b | c ----+------------------+---------------------------------- 5 | 550.81393988803 | 8009fa725372e996786849213a695ce0 6 | 95.8335199393332 | ce7952c6728cdffdee06cc5b502d6457 9 | 421.379795763642 | d3260ccbf6b9c03f3658d96bb7678b4d 10 | 362.347379792482 | 2bbbf89d23a2f83b089b589f55b5c4fc 11 | 800.203878898174 | a52994c5573e6b36d8a1c357bf800ce5 (5 rows)Export columns
aandcto OSS in CSV format.UNLOAD ('SELECT a, c FROM local_t') TO 'oss://adbpg-regress/local_t/' ACCESS_KEY_ID 'LTAI****************' SECRET_ACCESS_KEY 'TNPP*************************' FORMAT AS CSV ENDPOINT 'oss-cn-hangzhou-internal.aliyuncs.com' FDW 'oss_fdw';Expected output:
NOTICE: OSS output prefix: "local_t/adbpgforeigntabletmp_20200907164801_1354519958_20200907164801_652261618". UNLOADConfirm the files were written to OSS.
ossutil --config hangzhou-zmf.config ls oss://adbpg-regress/local_t/Expected output (one file per compute node):
LastModifiedTime Size(B) StorageClass ETAG ObjectName 2020-09-07 16:48:01 +0800 CST 12023 Standard 9F38B5407142C044C1F3555F00000000 oss://adbpg-regress/local_t/adbpgforeigntabletmp_20200907164801_1354519958_20200907164801_652261618_seg0_0.csv 2020-09-07 16:48:01 +0800 CST 12469 Standard 807BA680A0DED49BC1F3555F00000000 oss://adbpg-regress/local_t/adbpgforeigntabletmp_20200907164801_1354519958_20200907164801_652261618_seg1_0.csv 2020-09-07 16:48:01 +0800 CST 12401 Standard 3524F68F628CEB64C1F3555F00000000 oss://adbpg-regress/local_t/adbpgforeigntabletmp_20200907164801_1354519958_20200907164801_652261618_seg2_0.csv Object Number is: 3Inspect the CSV content.
head -n 10 adbpgforeigntabletmp_20200907164801_1354519958_20200907164801_652261618_seg2_0.csvExpected output:
7,1225341d0d367a69b1b345536b21ef73 19,424a7a5c36066842f4de8c8a8341fc89 27,c214432e9928e4a6f7bef7bd815424c0 29,ade5d636e2b5d2a606a02e79255da4bd 37,85660e60ede47b68493f6295620db568
Export with a manifest file
All three examples below use the same source table (local_t) created in the previous section.
Scenario 1: Generate a manifest file alongside the data files
UNLOAD ('SELECT * FROM local_t')
TO 'oss://adbpg-regress/local_t/'
ACCESS_KEY_ID 'LTAI****************'
SECRET_ACCESS_KEY 'TNPP*************************'
FORMAT AS CSV
MANIFEST
ENDPOINT 'oss-cn-hangzhou-internal.aliyuncs.com'
FDW 'oss_fdw';List the exported files:
ossutil ls -s oss://adbpg-regress/local_t/Expected output — three data files and one manifest file share the same path prefix:
oss://adbpg-regress/local_t/_20210114100329_3e9b07726306d88b3193dc95c10a5c5c_162488956_seg1_0.csv
oss://adbpg-regress/local_t/_20210114100329_3e9b07726306d88b3193dc95c10a5c5c_163756258_seg0_0.csv
oss://adbpg-regress/local_t/_20210114100329_3e9b07726306d88b3193dc95c10a5c5c_1741120517_seg2_0.csv
oss://adbpg-regress/local_t/_20210114100329_3e9b07726306d88b3193dc95c10a5c5c_manifest
Object Number is: 4View the manifest content:
ossutil cat oss://adbpg-regress/local_t/_20210114100329_3e9b07726306d88b3193dc95c10a5c5c_manifest{
"entries": [
{"url": "oss://adbpg-regress/local_t/_20210114100329_3e9b07726306d88b3193dc95c10a5c5c_162488956_seg1_0.csv"},
{"url": "oss://adbpg-regress/local_t/_20210114100329_3e9b07726306d88b3193dc95c10a5c5c_163756258_seg0_0.csv"},
{"url": "oss://adbpg-regress/local_t/_20210114100329_3e9b07726306d88b3193dc95c10a5c5c_1741120517_seg2_0.csv"}
]
}Scenario 2: Write the manifest file to a different bucket
ALLOWOVERWRITE 'true' overwrites the existing manifest file only. Data files are not overwritten and must be deleted manually if needed.UNLOAD ('SELECT * FROM local_t')
TO 'oss://adbpg-regress/local_t/'
ACCESS_KEY_ID 'LTAI****************'
SECRET_ACCESS_KEY 'TNPP*************************'
FORMAT AS CSV
MANIFEST 'oss://adbpg-regress-2/unload_manifest/t_manifest'
ALLOWOVERWRITE 'true'
ENDPOINT 'oss-cn-hangzhou-internal.aliyuncs.com'
FDW 'oss_fdw';Verify that data files are in the original bucket and the manifest is in the separate bucket:
# Data files
ossutil ls -s oss://adbpg-regress/local_t/oss://adbpg-regress/local_t/_20210114100956_4d3395a9501f6e22da724a2b6df1b6d3_1736161168_seg0_0.csv
oss://adbpg-regress/local_t/_20210114100956_4d3395a9501f6e22da724a2b6df1b6d3_1925769064_seg2_0.csv
oss://adbpg-regress/local_t/_20210114100956_4d3395a9501f6e22da724a2b6df1b6d3_644328153_seg1_0.csv
Object Number is: 3# Manifest file in a different bucket
ossutil cat oss://adbpg-regress-2/unload_manifest/t_manifest{
"entries": [
{"url": "oss://adbpg-regress/local_t/_20210114100956_4d3395a9501f6e22da724a2b6df1b6d3_1736161168_seg0_0.csv"},
{"url": "oss://adbpg-regress/local_t/_20210114100956_4d3395a9501f6e22da724a2b6df1b6d3_1925769064_seg2_0.csv"},
{"url": "oss://adbpg-regress/local_t/_20210114100956_4d3395a9501f6e22da724a2b6df1b6d3_644328153_seg1_0.csv"}
]
}FAQ
Why did the export produce multiple CSV files?
UNLOAD writes one output file per compute node by default (PARALLEL ON). If your instance has four compute nodes, you get four files. To export to a single file, set PARALLEL OFF — but only if the total data size is 8 GB or less.