All Products
Search
Document Center

AnalyticDB for PostgreSQL:Use the COPY or UNLOAD statement to import or export data between OSS foreign tables and AnalyticDB for PostgreSQL tables

Last Updated:Aug 01, 2023

This topic describes how to use the COPY statement to import data from Object Storage Service (OSS) foreign tables to AnalyticDB for PostgreSQL tables, and use the UNLOAD statement to export data from AnalyticDB for PostgreSQL tables to OSS foreign tables.

The COPY or UNLOAD statement is used to import or export data based on OSS foreign tables. For more information, see Use OSS foreign tables to access OSS data.

COPY

Syntax

COPY <table_name> 
[ <column_list> ]
FROM <data_source>
ACCESS_KEY_ID '<access_key_id>'
SECRET_ACCESS_KEY '<secret_access_key>'
[ [ FORMAT ] [ AS ] <data_format> ]
[ MANIFEST ]
[ option '<value>' [ ... ] ]

Parameters

Parameter

Required

Description

table_name

Yes

The name of the AnalyticDB for PostgreSQL table in which the imported data is stored. The AnalyticDB for PostgreSQL table must exist in the AnalyticDB for PostgreSQL instance.

column_list

No

The list of columns to which you want to write data. If you do not specify this parameter, data is written to all columns.

data_source

Yes

The URL of the OSS bucket from which data is obtained. Example: oss://<bucket_name>/path_prefix.

access_key_id

Yes

The AccessKey ID of an Alibaba Cloud account or a Resource Access Management (RAM) user that has permissions to access OSS.

For information about how to obtain your AccessKey ID, see Obtain an AccessKey pair.

secret_access_key

Yes

The AccessKey secret of an Alibaba Cloud account or a RAM user that has permissions to access OSS.

For information about how to obtain your AccessKey secret, see Obtain an AccessKey pair.

[ FORMAT ] [ AS ] <data_format>

No

The file format in which the imported data is stored. If you do not specify this parameter, FORMAT AS CSV is used.

You can set data_format to BINARY, CSV, JSON, JSONLINE, ORC, PARQUET, or TEXT. In the parameter name, FORMAT and AS can be omitted. For example, FORMAT AS CSV or FORMAT CSV is equivalent to CSV.

MANIFEST

No

The data source is a manifest file. The manifest file must be in the JSON format and consist of the following elements:

  • entries: an array of the OSS objects in the manifest file. The OSS objects can be in different buckets or paths. The OSS objects must be accessible by using the same AccessKey ID or AccessKey secret. Example:

    {  "entries": [  
        {"url": "oss://adbpg-regress/local_t/_seg2_0.csv", "mandatory":true},
        {"url": "oss://adbpg-regress/local_t/_seg1_0.csv", "mandatory":true},
        {"url": "oss://adbpg-regress/local_t/_seg0_0.csv", "mandatory":true}, 
        {"url": "oss://adbpg-regress-2/local_t/_seg1_0.csv", "mandatory":true}, 
        {"url": "oss://adbpg-regress-2/local_t/_seg2_0.csv", "mandatory":true}, 
        {"url": "oss://adbpg-regress-2/local_t/_seg0_0.csv", "mandatory":true} 
       ]
    }
  • url: the full path of an OSS object.

  • mandatory: specifies whether to report an error when an OSS object is not found.

[ option '<value>' [ ... ] ]

No

A list of options. Specify each option in the key value format. The following table describes the available options.

Options

Option

Type

Required

Description

ENDPOINT

STRING

Yes

The OSS endpoint. For information about how to obtain the OSS endpoint, see Regions and endpoints.

FDW

STRING

Yes

The name of the oss_fdw extension. The oss_fdw extension is required when you create a temporary OSS server for the COPY statement.

Other options that are used to create an OSS foreign table, such as FORMAT, FILETYPE, DELIMITER, and ESCAPE

N/A

N/A

The options that are used to create a temporary OSS foreign table. For more information, see Overview of OSS foreign tables.

Examples

Example 1

  1. Create an AnalyticDB for PostgreSQL table.

     CREATE TABLE local_t2 (a int, b float8, c text);
  2. Use the COPY statement to import data to columns a and c. Column b is assigned NULL.

    COPY local_t2 (a, c)
    FROM 'oss://adbpg-regress/local_t/'
    ACCESS_KEY_ID 'LTAI5tDfyHVmxCf66Un****'
    SECRET_ACCESS_KEY 'TNPPxOFY4xx7CZGjVgSsVBtIs****'
    FORMAT AS CSV
    ENDPOINT 'oss-cn-hangzhou-internal.aliyuncs.com'
    FDW 'oss_fdw';
  3. Query data of the table.

    SELECT * FROM local_t2 LIMIT 10;

    Sample result:

     a  | b |                c
    ----+---+----------------------------------
     12 |   | a24cba6ebdc5e0c485cd88ef60b72fea
     15 |   | c4d3028f5205fab98e5f43c7945db4ba
     20 |   | 769884311db01f400e21a903a3f1cb50
     26 |   | 7d12c981d262e0067ea1a04368f32f2a
     30 |   | 4e64bda52d54d263d16f42771b1d0225
     35 |   | b70c976d4c04568bd497b42a7d2e451d
     40 |   | d07ce2948b8618b47c351b6e222182f6
     46 |   | c2234393f878f5557776b7e778299564
     47 |   | cde904b2331fa274cd8d9266aa858342
     50 |   | 1235b900fb644bb36440a274314e4b6b
    (10 rows)
  4. Check whether the data in columns a and c of the local_t2 table is the same as that of the local_t table.

    • SELECT sum(hashtext(t.a::text)) AS col_a_hash, sum(hashtext(t.c::text)) AS col_c_hash FROM local_t2 t;

      Sample result:

       col_a_hash  | col_c_hash
      -------------+-------------
       23725368368 | 13447976580
      (1 row)
    • SELECT sum(hashtext(t.a::text)) AS col_a_hash, sum(hashtext(t.c::text)) AS col_c_hash FROM local_t t;

      Sample result:

       col_a_hash  | col_c_hash
      -------------+-------------
       23725368368 | 13447976580
      (1 row)
  5. Save the data in a format other than CSV.

    • Save the data in the ORC format.

      COPY tt
      FROM 'oss://adbpg-regress/q_oss_orc_list/'
      ACCESS_KEY_ID 'LTAI5tDfyHVmxCf66Un****'
      SECRET_ACCESS_KEY 'TNPPxOFY4xx7CZGjVgSsVBtIs****'
      FORMAT AS ORC
      ENDPOINT 'oss-cn-hangzhou-internal.aliyuncs.com'
      FDW 'oss_fdw';
    • Save the data in the PARQUET format.

      COPY tp
      FROM  'oss://adbpg-regress/test_parquet/'
      ACCESS_KEY_ID 'LTAI5tDfyHVmxCf66Un****'
      SECRET_ACCESS_KEY 'TNPPxOFY4xx7CZGjVgSsVBtIs****'
      FORMAT AS PARQUET
      ENDPOINT 'oss-cn-hangzhou-internal.aliyuncs.com'
      FDW 'oss_fdw';

Example 2

  1. Create an AnalyticDB for PostgreSQL table.

     CREATE TABLE local_manifest (a int, c text);
  2. Create a manifest file in which the OSS objects can be in different buckets.

    {
       "entries": [
          {"url": "oss://adbpg-regress/local_t/_20210114103840_83f407434beccbd4eb2a0ce45ef39568_1450404435_seg2_0.csv", "mandatory":true},
          {"url": "oss://adbpg-regress/local_t/_20210114103840_83f407434beccbd4eb2a0ce45ef39568_1856683967_seg1_0.csv", "mandatory":true},
          {"url": "oss://adbpg-regress/local_t/_20210114103840_83f407434beccbd4eb2a0ce45ef39568_1880804901_seg0_0.csv", "mandatory":true},
          {"url": "oss://adbpg-regress-2/local_t/_20210114103849_67100080728ef95228e662bc02cb99d1_1008521914_seg1_0.csv", "mandatory":true},
          {"url": "oss://adbpg-regress-2/local_t/_20210114103849_67100080728ef95228e662bc02cb99d1_1234881553_seg2_0.csv", "mandatory":true},
          {"url": "oss://adbpg-regress-2/local_t/_20210114103849_67100080728ef95228e662bc02cb99d1_1711667760_seg0_0.csv", "mandatory":true}
       ]
    }
  3. Use the COPY statement to import the data from the manifest file to the AnalyticDB for PostgreSQL table.

    COPY local_manifest
    FROM 'oss://adbpg-regress-2/unload_manifest/t_manifest'
    ACCESS_KEY_ID 'LTAI5tDfyHVmxCf66Un****'
    SECRET_ACCESS_KEY 'TNPPxOFY4xx7CZGjVgSsVBtIs****'
    FORMAT AS CSV
    MANIFEST          -- The data source is a manifest file. 
    ENDPOINT 'oss-cn-hangzhou-internal.aliyuncs.com'
    FDW 'oss_fdw';

Example 3

When you use the COPY statement to import data from OSS, error lines may be returned. In this case, you can configure the following options to implement fault tolerance:

  • log_errors: specifies whether to record information of the error lines in log files.

  • segment_reject_limit: segment_reject_limit '10' specifies if the number of error lines is greater than or equal to 10, the system returns an error and exits. segment_reject_limit '10%' specifies if the proportion of error lines is greater than or equal to 10%, the system returns an error and exits.

  1. Create an AnalyticDB for PostgreSQL table.

    CREATE TABLE sales(id integer, value float8, x text) DISTRIBUTED BY (id);
  2. Use the COPY statement to import data from an OSS object that has three error lines.

    COPY sales
    FROM 'oss://adbpg-const/error_sales/'
    ACCESS_KEY_ID 'LTAI5tDfyHVmxCf66Un****'
    SECRET_ACCESS_KEY 'TNPPxOFY4xx7CZGjVgSsVBtIs****'
    FORMAT AS csv
    log_errors 'true'
    segment_reject_limit '10'
    endpoint 'oss-cn-hangzhou-internal.aliyuncs.com'
    FDW 'oss_fdw';
    NOTICE:  found 3 data formatting errors (3 or more input rows), rejected related input data
    COPY FOREIGN TABLE
  3. Execute the following statement to query the error line details:

    SELECT * FROM gp_read_error_log('<Name of the destination table of the COPY statement>');

    In the following example, the error line details of the sales table are queried:

    SELECT * FROM gp_read_error_log('sales');

    Sample result:

                cmdtime            |                    relname                     |        filename         | linenum | bytenum |                          errmsg                           | rawdata | rawbytes
    -------------------------------+------------------------------------------------+-------------------------+---------+---------+-----------------------------------------------------------+---------+----------
     2021-02-08 14:24:04.225238+08 | adbpgforeigntabletmp_20210208142403_1936866966 | error_sales/sales.2.csv |       2 |         | invalid byte sequence for encoding "UTF8": 0xed 0xab 0xad |         | \x
     2021-02-08 14:24:04.225238+08 | adbpgforeigntabletmp_20210208142403_1936866966 | error_sales/sales.2.csv |       3 |         | invalid byte sequence for encoding "UTF8": 0xed 0xab 0xad |         | \x
     2021-02-08 14:24:04.225269+08 | adbpgforeigntabletmp_20210208142403_1936866966 | error_sales/sales.3.csv |       2 |         | invalid byte sequence for encoding "UTF8": 0xed 0xab 0xad |         | \x
    (3 rows)
    Note

    The additionally saved error line logs occupy storage space. You can use the following syntax to delete the error line logs: SELECT gp_truncate_error_log('<table_name>').

UNLOAD

Usage notes

When you export data to a CSV file, you must enclose options in quotation marks (") and write the options in lowercase letters. If you do not follow this requirement, some options may be processed as keywords. This may result in syntax errors. You must specify the following options in a particular way: delimiter, quote, null, header, escape, and encoding. Example:

UNLOAD ('SELECT * FROM test') 
TO 'oss://adbpg-regress/local_t/'
ACCESS_KEY_ID 'LTAI5tDfyHVmxCf66Un****'
SECRET_ACCESS_KEY 'TNPPxOFY4xx7CZGjVgSsVBtIs****'
FORMAT csv
"delimiter" '|'
"quote" '"'
"null" ''
"header" 'true'
"escape" 'E'
"encoding" 'utf-8'
FDW 'oss_fdw'
ENDPOINT 'oss-cn-hangzhou-internal.aliyuncs.com';

Syntax

UNLOAD ('<select_statement>')
TO <destination_url>
ACCESS_KEY_ID '<access_key_id>'
SECRET_ACCESS_KEY '<secret_access_key>'
[ [ FORMAT ] [ AS ] <data_format> ]
[ MANIFEST [ '<manifest_url>' ] ]
[ PARALLEL [ { ON | TRUE } | { OFF | FALSE } ] ]
[ option '<value>' [ ... ] ]

Parameters

Parameter

Required

Description

select_statement

Yes

The SELECT statement. The query result data is written to OSS.

destination_url

Yes

The URL of the OSS bucket from which data is obtained. Example: oss://<bucket_name>/path_prefix.

access_key_id

Yes

The AccessKey ID of an Alibaba Cloud account or a Resource Access Management (RAM) user that has permissions to access OSS.

For information about how to obtain your AccessKey ID, see Obtain an AccessKey pair.

secret_access_key

Yes

The AccessKey secret of an Alibaba Cloud account or a RAM user that has permissions to access OSS.

For information about how to obtain your AccessKey secret, see Obtain an AccessKey pair.

[ FORMAT ] [ AS ] <data_format>

No

The file format in which the exported data is stored. If you do not specify this parameter, FORMAT AS CSV is used.

You can set data_format to CSV, ORC, or TEXT. In the parameter name, FORMAT and AS can be omitted. For example, FORMAT AS CSV or FORMAT CSV is equivalent to CSV.

MANIFEST

No

Generate a manifest file when data is exported. If you specify manifest_url, the full path of the manifest file is different from that of the data files that are located in a different bucket. The path must end with the manifest suffix. If you do not specify manifest_url, the path prefix of the manifest file is the same as that of the data files.

Note

If the file that is specified by manifest_url already exists, you must set the allowoverwrite option to true in [ option [ value ] [ ...] ] to overwrite the manifest file.

PARALLEL

No

Specifies whether to export data from multiple compute nodes in parallel. By default, data is exported from multiple compute nodes in parallel. A separate export file is generated for each compute node. If you set this parameter to OFF or FALSE, the data is not exported in parallel. If the size of the exported data does not exceed 8 GB, the exported data is saved to a single file.

[ option '<value>' [ ... ] ]

No

A list of options. Specify each option in the key value format. The following table describes the available options.

Options

Option

Type

Required

Description

ENDPOINT

STRING

Yes

The OSS endpoint. For information about how to obtain the OSS endpoint, see Regions and endpoints.

FDW

STRING

Yes

The name of the oss_fdw extension. The oss_fdw extension is required when you create a temporary OSS server for the COPY statement.

Other options that are used to create an OSS foreign table, such as FORMAT, FILETYPE, DELIMITER, and ESCAPE

N/A

N/A

The options that are used to create a temporary OSS foreign table. For more information, see Overview of OSS foreign tables.

Examples

Example 1

  1. Create an AnalyticDB for PostgreSQL table and insert test data into the table.

  2. CREATE TABLE local_t (a int, b float8, c text);
    INSERT INTO local_t SELECT r, random() * 1000, md5(random()::text) FROM generate_series(1,1000)r;
  3. Query data of the AnalyticDB for PostgreSQL table.

    SELECT * FROM local_t LIMIT 5;

    Sample result:

  4.  a  |        b         |                c
    ----+------------------+----------------------------------
      5 |  550.81393988803 | 8009fa725372e996786849213a695ce0
      6 | 95.8335199393332 | ce7952c6728cdffdee06cc5b502d6457
      9 | 421.379795763642 | d3260ccbf6b9c03f3658d96bb7678b4d
     10 | 362.347379792482 | 2bbbf89d23a2f83b089b589f55b5c4fc
     11 | 800.203878898174 | a52994c5573e6b36d8a1c357bf800ce5
    (5 rows)
  5. Use the UNLOAD statement to export the data from the specified columns of the AnalyticDB for PostgreSQL table to OSS and save the data in the CSV format.

  6. UNLOAD ('select a, c from local_t') TO 'oss://adbpg-regress/local_t/'
    ACCESS_KEY_ID 'LTAI5tDfyHVmxCf66Un****'
    SECRET_ACCESS_KEY 'TNPPxOFY4xx7CZGjVgSsVBtIs****'
    FORMAT AS CSV
    ENDPOINT 'oss-cn-hangzhou-internal.aliyuncs.com'
    FDW 'oss_fdw';
    NOTICE:  OSS output prefix: "local_t/adbpgforeigntabletmp_20200907164801_1354519958_20200907164801_652261618".
    UNLOAD
  7. Check whether the CSV file is written to the specified path.

    $ ossutil --config hangzhou-zmf.config ls oss://adbpg-regress/local_t/

    Sample result:

  8. LastModifiedTime                   Size(B)  StorageClass   ETAG                                  ObjectName
    2020-09-07 16:48:01 +0800 CST        12023      Standard   9F38B5407142C044C1F3555F00000000      oss://adbpg-regress/local_t/adbpgforeigntabletmp_20200907164801_1354519958_20200907164801_652261618_seg0_0.csv
    2020-09-07 16:48:01 +0800 CST        12469      Standard   807BA680A0DED49BC1F3555F00000000      oss://adbpg-regress/local_t/adbpgforeigntabletmp_20200907164801_1354519958_20200907164801_652261618_seg1_0.csv
    2020-09-07 16:48:01 +0800 CST        12401      Standard   3524F68F628CEB64C1F3555F00000000      oss://adbpg-regress/local_t/adbpgforeigntabletmp_20200907164801_1354519958_20200907164801_652261618_seg2_0.csv
    Object Number is: 3
    
    0.153414(s) elapsed
  9. Check whether the CSV file contains only the data in columns a and c of the local_t table.

    $ head -n 10 adbpgforeigntabletmp_20200907164801_1354519958_20200907164801_652261618_seg2_0.csv

    Sample result:

  10. 7,1225341d0d367a69b1b345536b21ef73
    19,424a7a5c36066842f4de8c8a8341fc89
    27,c214432e9928e4a6f7bef7bd815424c0
    29,ade5d636e2b5d2a606a02e79255da4bd
    37,85660e60ede47b68493f6295620db568
    77,e1be448ba2b08f0a2ca05b7ed812abfd
    80,5e85d597a3b0f2f9736a728724a0f9e0
    92,dc23f76f0b1446504b8f1c2274521d2f
    94,50304822488d55a500e3a71bcf40890f
    97,e970fde8cd0df9c6b610925a488f6042

Example 2

  1. Use the UNLOAD statement to export data and generate a manifest file. The path prefix of the manifest file is the same as that of the data files.

  2. UNLOAD ('select * from local_t') TO 'oss://adbpg-regress/local_t/'
    ACCESS_KEY_ID 'LTAI5tDfyHVmxCf66Un****'
    SECRET_ACCESS_KEY 'TNPPxOFY4xx7CZGjVgSsVBtIs****'
    FORMAT AS CSV
    MANIFEST    -- Generate a manifest file when the UNLOAD statement is used to export data. The path prefix of the manifest file is the same as that of the data files. 
    ENDPOINT 'oss-cn-hangzhou-internal.aliyuncs.com'
    FDW 'oss_fdw';
    NOTICE:  OSS output prefix: "local_t/_20210114100329_3e9b07726306d88b3193dc95c10a5c5c".
    UNLOAD
  3. View the list of exported files.

    ossutil ls -s oss://adbpg-regress/local_t/

    The list includes several data files and a manifest file. Sample result:

  4. oss://adbpg-regress/local_t/_20210114100329_3e9b07726306d88b3193dc95c10a5c5c_162488956_seg1_0.csv
    oss://adbpg-regress/local_t/_20210114100329_3e9b07726306d88b3193dc95c10a5c5c_163756258_seg0_0.csv
    oss://adbpg-regress/local_t/_20210114100329_3e9b07726306d88b3193dc95c10a5c5c_1741120517_seg2_0.csv
    oss://adbpg-regress/local_t/_20210114100329_3e9b07726306d88b3193dc95c10a5c5c_manifest
    Object Number is: 4
    
    0.136180(s) elapsed
  5. View the content of the manifest file.

    ossutil cat oss://adbpg-regress/local_t/_20210114100329_3e9b07726306d88b3193dc95c10a5c5c_manifest

    Sample result:

  6. {
       "entries": [
          {"url": "oss://adbpg-regress/local_t/_20210114100329_3e9b07726306d88b3193dc95c10a5c5c_162488956_seg1_0.csv"},
          {"url": "oss://adbpg-regress/local_t/_20210114100329_3e9b07726306d88b3193dc95c10a5c5c_163756258_seg0_0.csv"},
          {"url": "oss://adbpg-regress/local_t/_20210114100329_3e9b07726306d88b3193dc95c10a5c5c_1741120517_seg2_0.csv"}
       ]
    }
  7. Use the UNLOAD statement to export data and generate a specified manifest file. The path of the manifest file may be different from that of the data files.

  8. Note

    If you set the ALLOWOVERWRITE option to true, the existing manifest file is overwritten. However, the data files are not overwritten. The data files can be manually deleted.

    UNLOAD ('select * from local_t') TO 'oss://adbpg-regress/local_t/'
    ACCESS_KEY_ID 'LTAI5tDfyHVmxCf66Un****'
    SECRET_ACCESS_KEY 'TNPPxOFY4xx7CZGjVgSsVBtIs****'
    FORMAT AS CSV
    MANIFEST 'oss://adbpg-regress-2/unload_manifest/t_manifest' -- Generate a specified manifest file when the UNLOAD statement is used to export data. 
    ALLOWOVERWRITE 'true' -- Overwrite the existing manifest file. 
    ENDPOINT 'oss-cn-hangzhou-internal.aliyuncs.com'
    FDW 'oss_fdw';
    NOTICE:  OSS output prefix: "local_t/_20210114100329_3e9b07726306d88b3193dc95c10a5c5c".
    UNLOAD
  9. View the list of exported files.

    ossutil ls -s oss://adbpg-regress/local_t/

    Only data files are included in the list. Sample result:

  10. oss://adbpg-regress/local_t/_20210114100956_4d3395a9501f6e22da724a2b6df1b6d3_1736161168_seg0_0.csv
    oss://adbpg-regress/local_t/_20210114100956_4d3395a9501f6e22da724a2b6df1b6d3_1925769064_seg2_0.csv
    oss://adbpg-regress/local_t/_20210114100956_4d3395a9501f6e22da724a2b6df1b6d3_644328153_seg1_0.csv
    Object Number is: 3
    
    0.118540(s) elapsed
  11. View the content of the manifest file that is stored in a different bucket from the data files.

    ossutil cat oss://adbpg-regress-2/unload_manifest/t_manifest

    Sample result:

  12. {
       "entries": [
          {"url": "oss://adbpg-regress/local_t/_20210114100956_4d3395a9501f6e22da724a2b6df1b6d3_1736161168_seg0_0.csv"},
          {"url": "oss://adbpg-regress/local_t/_20210114100956_4d3395a9501f6e22da724a2b6df1b6d3_1925769064_seg2_0.csv"},
          {"url": "oss://adbpg-regress/local_t/_20210114100956_4d3395a9501f6e22da724a2b6df1b6d3_644328153_seg1_0.csv"}
       ]
    }

FAQ

Q: Multiple CSV files were generated when I exported data. Why?

A: When you use the UNLOAD statement to export data from AnalyticDB for PostgreSQL to OSS, one CSV file is generated for each compute node. For example, if your instance has four compute nodes, four CSV files are generated.