This topic describes how to migrate data from an Amazon Redshift instance to an AnalyticDB for PostgreSQL instance.

Prerequisites

  • An Amazon Redshift instance is created.
  • The Amazon S3 service used to export Amazon Redshift data is activated.
  • Alibaba Cloud Object Storage Service (OSS) is activated. For more information, see What is OSS?
  • An AnalyticDB for PostgreSQL instance is created. For more information about how to select instance specifications, see the "Specification selection" section of this topic.

Overall procedure

Migrate data from Amazon Redshift to AnalyticDB for PostgreSQL

For more information about the migration procedure, see the "Migration procedure" section of this topic.

Specification selection

This section describes how to select specifications for your AnalyticDB for PostgreSQL instance based on the specifications of the source Amazon Redshift instance.

An Amazon Redshift instance consists of a leader node and multiple compute nodes.

  • Leader node: acts as the coordinator node within an AnalyticDB for PostgreSQL instance. The leader node communicates with the client, analyzes and makes query plans, and performs database operations.
  • Compute node: acts as a compute group within an AnalyticDB for PostgreSQL instance. Each compute node consists of multiple node slices. Each node slice acts as a compute node within an AnalyticDB for PostgreSQL instance and performs computation for data storage and query.

When you create an AnalyticDB for PostgreSQL instance, you can select node specifications based on specifications of each node slice in the source Amazon Redshift instance.

Example:

An Amazon Redshift instance consists of four compute nodes, each with node specifications of 2 cores, 16 GB memory, and 1 TB storage. Each compute node consists of two node slices.

When you create an AnalyticDB for PostgreSQL instance, you can set the number of compute nodes to 8, node specifications to 2C16GB, and storage of a single compute node to 1,000 GB.

Note
  • For more information about how to create an AnalyticDB for PostgreSQL instance, see Create an instance.
  • We recommend that you set the storage type to Enhanced SSD (ESSD) for better I/O performance than ultra disks.
  • Pay-as-you-go instances can be scaled after they are created.

Migration procedure

  1. Export data from an Amazon Redshift instance to an Amazon S3 bucket. You can execute the UNLOAD statement to export data. For more information, see UNLOAD.
    UNLOAD ('select-statement')
    TO 's3://object-path/name-prefix'
    authorization
    [ option [ ... ] ]
    
    where option is
    { [ FORMAT [ AS ] ] CSV | PARQUET
    | PARTITION BY ( column_name [, ... ] ) [ INCLUDE ]
    | MANIFEST [ VERBOSE ] 
    | HEADER           
    | DELIMITER [ AS ] 'delimiter-char' 
    | FIXEDWIDTH [ AS ] 'fixedwidth-spec'   
    | ENCRYPTED [ AUTO ]
    | BZIP2  
    | GZIP 
    | ZSTD
    | ADDQUOTES 
    | NULL [ AS ] 'null-string'
    | ESCAPE
    | ALLOWOVERWRITE
    | CLEANPATH
    | PARALLEL [ { ON | TRUE } | { OFF | FALSE } ]
    | MAXFILESIZE [AS] max-size [ MB | GB ] 
    | REGION [AS] 'aws-region' }
    Note
    • We recommend that you specify FORMAT AS PARQUET or CSV to set the format of the exported data.
    • We recommend that you specify PARALLEL ON to export data in parallel. This method improves the data export efficiency by generating more files.
    • We recommend that you specify the MAXFILESIZE parameter to limit the size of files so that more files are generated. The recommended number of files is an integer multiple of the number of nodes within an AnalyticDB for PostgreSQL instance. This improves the data import efficiency when data is subsequently imported from OSS external tables to the AnalyticDB for PostgreSQL instance in parallel.
  2. Synchronize data from Amazon S3 to OSS.
    1. Create a bucket in OSS. For more information, see Create buckets.
      Note We recommend that you create an OSS bucket within the same region as the AnalyticDB for PostgreSQL instance to facilitate subsequent data import.
    2. Download and install ossimport in standalone mode. For more information, see Architectures and configurations.

      The ossimport software in standalone mode has the following file structure:

      ossimport
      ├── bin
      │ └── ossimport2.jar  # The JAR package that contains the Master, Worker, TaskTracker, and Console modules.
      ├── conf
      │ ├── local_job.cfg   # The Job configuration file.
      │   └── sys.properties  # The configuration file that contains system parameters.
      ├── console.bat         # The Windows command line utility that is used to run tasks step by step.
      ├── console.sh          # The Linux command line utility that is used to run tasks step by step.
      ├── import.bat          # The script that automatically imports files based on the conf/local_job.cfg configuration file in Windows. The configuration file contains parameters that specify data migration operations such as start, migration, verification, and retry.
      ├── import.sh           # The script that automatically imports files based on the conf/local_job.cfg configuration file in Linux. The configuration file contains parameters that specify data migration operations such as start, migration, verification, and retry.
      ├── logs                # The directory that contains logs.
      └── README.md           # The file in which ossimport is introduced or described. We recommend that you read this file before you use ossimport.
    3. Configure ossimport in standalone mode. The following sample code provides an example on how to edit the conf/local_job.cfg file. For more information about ossimport configurations, see Architectures and configurations.
      srcType=s3
      srcAccessKey="your AWS Access Key ID"
      srcSecretKey="your AWS Access Key Secret"
      srcDomain=s3.ap-southeast-2.amazonaws.com
      srcBucket=s3-export-bucket
      destAccessKey="your Alibaba Cloud Access Key ID"
      destSecretKey="your Alibaba Cloud Access Key Secret"
      destDomain=http://oss-ap-southeast-2-internal.aliyuncs.com
      destBucket=oss-export-bucket
      destPrefix=
      isSkipExistFile=true
      Note You need only to modify the parameters mentioned in the preceding sample code.
    4. Run ossimport to synchronize data to OSS. For more information about how to use ossimport in standalone mode, see Standalone deployment.
  3. Import data from OSS to the AnalyticDB for PostgreSQL instance.
    1. Convert data definition language (DDL) statements by modifying information of database objects such as schemas, tables, functions, and views. For more information, see the "Convert DDL statements" section of this topic.
    2. Execute the COPY FROM OSS statement to import data to the AnalyticDB for PostgreSQL instance. For more information, see Use the COPY or UNLOAD statement to import or export data between OSS foreign tables and local tables.
      COPY table_name [ ( column_name [, ...] ) ]
          FROM 'data_source_url'
          ACCESS_KEY_ID 'access_key_id'
          SECRET_ACCESS_KEY 'secret_access_key'
          [ [ FORMAT ] [ AS ] data_format ]
          [ MANIFEST ]
          [ [ option value]  ...  ]
      
      where data_source_url should be in format:
      
          oss://{bucket_name}/{path_prefix}
      • Example 1: Execute the COPY statement to import PARQUET files from OSS.
        COPY tp
        FROM  'oss://adbpg-regress/test_parquet/'
        ACCESS_KEY_ID 'id'
        SECRET_ACCESS_KEY 'key'
        FORMAT AS PARQUET
        ENDPOINT 'oss-****.aliyuncs.com'
        FDW 'oss_fdw';
      • Example 2: Execute the COPY statement to import columns a and c of CSV files from OSS. The value of column b is NULL.
        COPY local_t2 (a, c)
        FROM 'oss://adbpg-regress/local_t/'
        ACCESS_KEY_ID 'id'
        SECRET_ACCESS_KEY 'key'
        FORMAT AS CSV
        ENDPOINT 'oss-****.aliyuncs.com'
        FDW 'oss_fdw';

Convert DDL statements

The DDL statements of Amazon Redshift are slightly different from those of AnalyticDB for PostgreSQL and must be converted before you migrate data.

  • CREATE SCHEMA

    Create a schema based on the CREATE SCHEMA statement of AnalyticDB for PostgreSQL. Example:

    CREATE SCHEMA schema1 AUTHORIZATION xxxpoc;
    GRANT ALL ON SCHEMA schema1 TO xxxpoc;
    GRANT ALL ON SCHEMA schema1 TO public;
    COMMENT ON SCHEMA model IS 'for xxx migration  poc test';
    
    CREATE SCHEMA oss_external_table AUTHORIZATION xxxpoc;
  • CREATE FUNCTION

    Some SQL functions of Amazon Redshift are not supported by AnalyticDB for PostgreSQL and must be customized or rewritten. Examples:

    • Replace the CONVERT_TIMEZONE(a,b,c) function with the following function:
      timezone(b, timezone(a,c))
    • Replace the GETDATE() function with the following function:
      current_timestamp(0):timestamp
    • Replace or optimize user-defined functions (UDFs). The following sample code shows a UDF of Amazon Redshift:
      CREATE OR REPLACE FUNCTION public.f_jdate(dt timestamp without time zone)
      RETURNS character varying AS
      '      from datetime import timedelta, datetime
             if dt.hour < 4:
                    d = timedelta(days=-1)
                    dt = dt + d
             return str(dt.date())'
      LANGUAGE plpythonu IMMUTABLE;
      COMMIT;

      Replace the preceding function with the following function:

      to_char(a - interval '4 hour', 'yyyy-mm-dd')
    • Replace other Amazon Redshift functions.

      Functions and Operators lists the standard PostgreSQL functions. You can modify or rewrite the SQL functions of Amazon Redshift that are not supported by AnalyticDB for PostgreSQL. The following list describes some of these functions:

  • CREATE TABLE

    The name of an Amazon Redshift table is up to 127 characters in length, while the name of an AnalyticDB for PostgreSQL table is up to 63 characters in length. If the names of database objects such as tables, functions, and views have more than 63 characters, the additional characters must be removed.

    • Modify compression encodings.

      Replace ENCODE XXX in a CREATE TABLE statement of Amazon Redshift with the following clause:

      with (COMPRESSTYPE={ZLIB|ZSTD|RLE_TYPE|NONE})

      Some compression encodings of Amazon Redshift are not supported by AnalyticDB for PostgreSQL. For more information, see Compression encodings. The following list describes the unsupported compression encodings:

      • BYTEDICT
      • DELTA
      • DELTA32K
      • LZO
      • MOSTLY8
      • MOSTLY16
      • MOSTLY32
      • RAW (no compression)
      • RUNLENGTH
      • TEXT255
      • TEXT32K
    • Modify distribution keys.

      Amazon Redshift supports three distribution keys. For more information, see Distribution styles. You must modify distribution keys based on the following rules:

      • EVEN distribution: Replace DISTSTYLE EVEN with DISTRIBUTED RANDOMLY.
      • KEY distribution: Replace DISKEY with DISTRIBUTED BY (column, [ ... ] ).
      • ALL distribution: Replace ALL with DISTRIBUTED REPLICATED.
    • Modify sort keys.

      Replace the COMPOUND or INTERLEAVED option in the [ COMPOUND | INTERLEAVED ] SORTKEY (column_name [, ...] ) ] sort key clause of Amazon Redshift with the following clause:

      order by (column, [ ... ])

    Examples:

    • Example 1

      The following example shows a CREATE TABLE statement of Amazon Redshift:

      CREATE TABLE schema1.table1
      (
          filed1 VARCHAR(100) ENCODE lzo,
          filed2 INTEGER DISTKEY,
          filed3 INTEGER,
          filed4 BIGINT ENCODE lzo,
          filed5 INTEGER,
      )
      INTERLEAVED SORTKEY
      (
          filed1,
          filed2
      );

      Convert the preceding statement to the following CREATE TABLE statement of AnalyticDB for PostgreSQL:

      CREATE TABLE schema1.table1
      (
          filed1 VARCHAR(100) ,
          filed3 INTEGER,
          filed5 INTEGER
      )
      WITH(APPENDONLY=true,ORIENTATION=column,COMPRESSTYPE=zstd)
      DISTRIBUTED BY (filed2)
      ORDER BY (filed1, filed2);
      
      -- Sort
      SORT schema2.table1;
      MULTISORT schema2.table1;
    • Example 2

      The following example shows a CREATE TABLE statement of Amazon Redshift, which includes the ENCODE and SORTKEY options:

      CREATE TABLE schema2.table2
      (
          filed1 VARCHAR(50) ENCODE lzo,
          filed2 VARCHAR(50) ENCODE lzo,
          filed3 VARCHAR(20) ENCODE lzo,
      )
      DISTSTYLE EVEN
      INTERLEAVED SORTKEY
      (
          filed1
      );

      Convert the preceding statement to the following CREATE TABLE statement of AnalyticDB for PostgreSQL:

      CREATE TABLE schema2.table2
      (
          filed1 VARCHAR(50),
          filed2 VARCHAR(50),
          filed3 VARCHAR(20),
      )
      WITH(APPENDONLY=true, ORIENTATION=column, COMPRESSTYPE=zstd)
      DISTRIBUTED randomly
      ORDER BY (filed1);
      
      -- Sort
      SORT schema2.table2;
      MULTISORT schema2.table2;
      Note For more information about sort keys, see Use sort keys in column-oriented tables.
  • CREATE VIEW

    Convert a CREATE VIEW statement of Amazon Redshift to an SQL statement that complies with the AnalyticDB for PostgreSQL syntax. The conversion rules are similar to those of CREATE TABLE statements.

    Note The WITH NO SCHEMA BINDING clause is not supported.