This topic describes how to migrate data from an Amazon Redshift instance to an AnalyticDB for PostgreSQL instance.

Prerequisites

  • An Amazon Redshift instance is created.
  • The Amazon S3 service that is used to export Amazon Redshift data is activated.
  • Alibaba Cloud Object Storage Service (OSS) is activated. For more information, see What is OSS?
  • An OSS bucket is created. For more information, see Create buckets.
    Note We recommend that you create an OSS bucket within the same region as the AnalyticDB for PostgreSQL instance to facilitate subsequent data import.
  • An AnalyticDB for PostgreSQL instance is created. For more information about how to select instance specifications, see the "Specification selection" section of this topic.

Specification selection

This section describes how to select specifications for your AnalyticDB for PostgreSQL instance based on the specifications of the source Amazon Redshift instance.

An Amazon Redshift instance consists of a leader node and multiple compute nodes.

  • Leader node: acts as the coordinator node within an AnalyticDB for PostgreSQL instance. The leader node communicates with the client, analyzes and develops query execution plans, and performs database operations.
  • Compute node: acts as a set of compute nodes within an AnalyticDB for PostgreSQL instance. Each compute node in an Amazon Redshift instance is partitioned into slices. Each node slice acts as a compute node within an AnalyticDB for PostgreSQL instance and performs computation for data storage and query.

When you create an AnalyticDB for PostgreSQL instance, you can select node specifications of the AnalyticDB for PostgreSQL instance based on specifications of each node slice in the source Amazon Redshift instance.

Example

An Amazon Redshift instance consists of four compute nodes, each with node specifications of 2 cores, 16 GB memory, and 1 TB storage. Each compute node consists of two node slices.

When you create an AnalyticDB for PostgreSQL instance, you can set the number of compute nodes to 8, node specifications to 2 cores and 16 GB memory, and single-node storage to 1,000 GB.

Note
  • For more information about how to create an AnalyticDB for PostgreSQL instance, see Create an instance.
  • We recommend that you set the storage type to Enhanced SSD (ESSD) for better I/O performance than ultra disks.

Procedure

Step 1: Export data from an Amazon Redshift instance to an Amazon S3 bucket

You can execute UNLOAD statements to export data. For more information, see UNLOAD.

Syntax:

UNLOAD ('select-statement')
TO 's3://object-path/name-prefix'
authorization
[ option [ ... ] ]

where option is
{ [ FORMAT [ AS ] ] CSV | PARQUET
| PARTITION BY ( column_name [, ... ] ) [ INCLUDE ]
| MANIFEST [ VERBOSE ] 
| HEADER           
| DELIMITER [ AS ] 'delimiter-char' 
| FIXEDWIDTH [ AS ] 'fixedwidth-spec'   
| ENCRYPTED [ AUTO ]
| BZIP2  
| GZIP 
| ZSTD
| ADDQUOTES 
| NULL [ AS ] 'null-string'
| ESCAPE
| ALLOWOVERWRITE
| CLEANPATH
| PARALLEL [ { ON | TRUE } | { OFF | FALSE } ]
| MAXFILESIZE [AS] max-size [ MB | GB ] 
| REGION [AS] 'Amazon-region' }
Note
  • We recommend that you use FORMAT AS PARQUET or CSV to set the format of the exported data.
  • We recommend that you specify PARALLEL ON to export data in parallel. This allows more files to be generated and therefore improves data export efficiency.
  • We recommend that you specify the MAXFILESIZE parameter to limit the size of files so that more files are generated. The recommended value is an integer multiple of the number of nodes within an AnalyticDB for PostgreSQL instance. This improves the data import efficiency when data is subsequently imported from OSS foreign tables to the AnalyticDB for PostgreSQL instance in parallel.

Step 2: Synchronize data from Amazon S3 to OSS

You can use Alibaba Cloud Data Online Migration or the ossimport tool to synchronize data from Amazon S3 to OSS.

Data Online Migration

  1. Log on to the Data Transport console.
  2. Create the source and destination data addresses for data synchronization.
    1. In the left-side navigation pane, choose Data Online Migration > Data Address.
    2. Click Create Data Address to create a source data address.
    3. In the Create Data Address panel, configure the parameters and click OK. The following table describes the parameters.
      ParameterRequiredDescription
      Data TypeYesSelect AWS S3.
      Data NameYesEnter a name. The name must be 3 to 63 characters in length. The name cannot contain special characters, except for hyphens (-) and underscores (_).
      EndpointYesEnter a region where the source data address is located. For more information, see Regions and endpoints.
      BucketYesThe name of an AWS Simple Storage Service (AWS S3) bucket.

      The bucket name cannot contain invalid characters, such as spaces, line breaks, and tab characters.

      PrefixYes
      • Migrate All Data: All data in the bucket is migrated.

        When you migrate all data, you do not need to specify a prefix.

      • Migrate Partial Data: Only the files in the directory that is specified by the prefix are migrated. The prefix must end with a forward slash (/). For example, you can specify data/to/oss/ as a prefix.
      Access Key Id and Secret Access KeyYesSpecify the AccessKey pair that is used to migrate data. Delete the key pair after the migration job is completed.
    4. Click Create Data Address again to create a destination data address.
    5. In the Create Data Address panel, configure the parameters and click OK. The following table describes the parameters.
      ParameterRequiredDescription
      Data TypeYesSelect OSS.
      Data NameYesEnter a name. The name must be 3 to 63 characters in length. The name cannot contain special characters, except for hyphens (-) and underscores (_).
      Data RegionYesSelect the region where the destination data address is located.
      Enable Transfer AccelerationNoFor Data Online Migration to use the transfer acceleration feature provided by Object Storage Service (OSS), you must enable the transfer acceleration feature for buckets. The transfer acceleration feature takes effect within 30 minutes after it is enabled. After you enable the feature, wait for 30 minutes before you create a migration job.
      Important If you use the transfer acceleration feature to access your OSS buckets, you are charged additional fees for transfer acceleration. For more information, see Enable transfer acceleration.
      OSS EndpointYes
      Select an endpoint based on the region where your data is stored. For more information, see Regions and endpoints.
      • http://oss-cn-endpoint.aliyuncs.com: indicates that you use an HTTP-based endpoint to access OSS over the Internet.
      • https://oss-cn-endpoint.aliyuncs.com: indicates that you use an HTTPS-based endpoint to access OSS over the Internet.
      Note If you want to migrate data from third-party cloud services to OSS, you must access OSS from a public endpoint over the Internet.
      Access Key Id and Access Key SecretYesSpecify the AccessKey pair that is used to migrate data. For more information, see Prepare for a migration job.
      OSS BucketYesSelect a bucket to store the migrated data.
      OSS PrefixNoAn OSS prefix cannot start with a forward slash (/) but must end with a forward slash (/). For example, you can specify data/to/oss/ as the OSS prefix. If you want to migrate data to the root directory of the specified bucket, do not specify this parameter.
      Important If the name of a source file starts with a forward slash (/), you must specify an OSS prefix when you configure the destination data address. If no OSS prefix is specified, the migration job fails. For example, if the name of a file to be migrated is /test/test.png, you must specify an OSS prefix, such as oss/. After the migration job is completed, the file name changes from /test/test.png to oss//test/test.png.
  3. Create an online migration job.
    1. In the left-side navigation pane, choose Data Online Migration > Migration Jobs.
    2. Click Create Job.
    3. In the Create Job panel, read the terms of the migration service and select I agree to the above terms and activate the Data Transport service. Then, click Next.
    4. In the Fee Reminder message, click Yes, Go Ahead.
    5. In the Create Job panel, configure the parameters and click Next. The following table describes the parameters.
      ParameterRequiredDescription
      Job NameYesEnter a job name. The name must be 3 to 63 characters in length and can contain only lowercase letters, digits, and hyphens (-). It cannot start or end with a hyphen (-).
      Source Data AddressYesSelect the source data address that you created.
      Destination Data AddressYesSelect the destination data address that you created.
      Specified DirectoryNoSpecify the directories in which the files and subdirectories are migrated or not migrated. Valid values:
      • Do not filter: migrates all data from the source data address.
      • Exclude: does not migrate the files or subdirectories in the specified directories.

        For example, you want to migrate all the directories in the root_dir/ directory except root_dir/bad_sub1/ and root_dir/bad_sub2/. In this case, you can select this method and specify bad_sub1/ and bad_sub2/.

      • Contain: migrates only the files and subdirectories in the specified directories.

        For example, you want to migrate only root_dir/good_sub1/ and root_dir/good_sub2/ in the root_dir/ directory. In this case, you can select this method and specify good_sub1/ and good_sub2/.

      Note
      • A directory name cannot start with a forward slash (/) or a backslash (\), and cannot contain double forward slashes (//), double periods (..), or double quotation marks ("). The character string that consists of all the specified directory names cannot exceed 10 KB in length.
      • A directory name must end with a forward slash (/). For example, you can specify docs/ as a directory name.
      • You can specify up to 20 directories that you want to include or exclude.
      Migration TypeYesSelect the data migration mode. Valid values:
      • Full: The system migrates all data from the source data address to the destination data address. You must configure the Start Time Point of File parameter. The system migrates all data whose last modification time is later than the specified time. After all data is migrated, the migration job ends.

        If you change the data at the source data address after the migration job is complete, you can submit another job to migrate all data. In this case, the system migrates only the changed data.

      • Incremental: The system runs a migration job based on the values that you specified for the Migration Interval and Migration Times parameters.
        • You must configure the Start Time Point of File parameter. During the first migration, the system migrates all data whose last modification time is later than the specified time. After the first migration is complete, incremental migration is performed based on the value that you specified for the Migration Interval parameter. For each incremental migration, the system migrates only the data that is created or modified after the previous migration starts and before the current migration starts.
        • If you set the Migration Times parameter to N, full migration is performed once and then incremental migration is performed (N - 1) times.

          For example, the current time is 08:00, March 10, 2019 and you set the Migration Interval parameter to 1, the Migration Times parameter to 5, and the Start Time Point of File parameter to 08:00, March 5, 2019. When the first migration starts, the system migrates all files whose last modification time is between 08:00, March 5, 2019 and 08:00, March 10, 2019. In this example, the first migration requires one hour to complete and the second migration starts at 10:00, March 10, 2019. The duration from 08:00 to 10:00 consists of one hour for the first migration process and one hour for the migration interval. During the second migration, the system migrates the files whose last modification time is between 08:00, March 10, 2019 and 10:00, March 10, 2019. The migration job consists of one full migration and four incremental migrations.

      Important Before full or incremental migration starts, the system compares the files at the source data address with the files at the destination data address. If a source file has the same name as a destination file, the destination file is overwritten when one of the following conditions is met:
      • The Content-Type values of the source file and the destination file are different.
      • The last modification time of the source file is later than that of the destination file.
      • The size of the source file is different from that of the destination file.
      Start Time Point of FileYesSpecify a time to filter data to be migrated. Data will be migrated if it is created or modified after the specified time. Valid values:
      • All: migrates all files.
      • Assign: migrates only the files that are created or modified after the specified time.

        For example, if you set the time to 08:00:00, November 1, 2018, only the files that are created or modified after 08:00:00, November 1, 2018 are migrated.

      Migration IntervalThis parameter is required if you set the Migration Type parameter to Incremental.The default value is 1, and the maximum value is 24. Unit: hours.
      Migration TimesThis parameter is required if you set the Migration Type parameter to Incremental.The default value is 1, and the maximum value is 30.
      File Overwrite MethodYesThe method that is used to process the source file whose name is the same as that of a destination file. Valid values:
      • LastModified: If a source file has the same name as a destination file, the system compares the last modification time that is indicated by the LastModified property of the files.
        • If the last modification time of the source file is later than that of the destination file, the source file is migrated and the destination file is overwritten.
        • If the last modification time of the source file is earlier than that of the destination file, the source file is not migrated and the destination file is retained.
        • If the two files have the same last modification time, the system checks the values of the Size and Content-Type properties of the files.

          If the two files have the same Size and Content-Type values, the source file is not migrated. If the two files have different values of at least one property, the source file is migrated and the destination file is overwritten.

      • Condition: If a source file has the same name as a destination file, the system compares the values of the LastModified, Size, and Content-Type properties of the two files.
        • If all property values of the two files are the same, the source file is not migrated.
        • If the two files have different values of at least one property, the source file is migrated and the destination file is overwritten.
      • All: If a source file has the same name as a destination file, the system performs no comparison and directly overwrites the destination file with the source file.
      • No: If a source file has the same name as a destination file, the system performs no comparison and directly skips the source file during the migration.
    6. In the Performance step, navigate to the Data Prediction section and configure the Data Size and File Count parameters.
      Note To ensure a successful migration job, we recommend that you estimate the volume of data that you want to migrate as accurately as possible. For more information, see Prepare for a migration job.
    7. Optional: In the Performance step, navigate to the Flow Control section, specify the Time Range and Max Flow(MB/s) parameters, and then click Add.
      Note To ensure business continuity, we recommend that you configure the Time Range and Max Flow parameters based on the traffic fluctuations.
    8. Click Create and wait until the migration job is completed.

ossimport

  1. Download and install ossimport in standalone mode. For more information, see Overview.

    The ossimport software in standalone mode has the following file structure:

    ossimport
    ├── bin
    │ └── ossimport2.jar  # The JAR package that contains the Master, Worker, TaskTracker, and Console modules.
    ├── conf
    │ ├── local_job.cfg   # The Job configuration file.
    │   └── sys.properties  # The configuration file that contains system parameters.
    ├── console.bat         # The Windows command line utility that is used to run tasks in a distributed manner.
    ├── console.sh          # The Linux command line utility that is used to run tasks in a distributed manner.
    ├── import.bat          # The script that automatically imports files based on the conf/local_job.cfg configuration file in Windows. The configuration file contains parameters that specify data migration operations such as start, migration, verification, and retry.
    ├── import.sh           # The script that automatically imports files based on the conf/local_job.cfg configuration file in Linux. The configuration file contains parameters that specify data migration operations such as start, migration, verification, and retry.
    ├── logs                # The directory that contains logs.
    └── README.md           # The file that provides a description for ossimport. We recommend that you read this file before you use ossimport.
  2. Configure ossimport in standalone mode.

    You need to modify only the following parameters in the conf/local_job.cfg configuration file:

    srcType=s3
    srcAccessKey=<Your AWS access key>
    srcSecretKey=<Your AWS secret key>
    srcDomain=<The domain name that corresponds to the region where the Amazon S3 bucket is deployed>
    srcBucket=<The name of the Amazon S3 bucket>
    destAccessKey=<Your Alibaba Cloud AccessKey ID>
    destSecretKey=<Your Alibaba Cloud AccessKey secret>
    destDomain=<The endpoint that corresponds to the region where the OSS bucket is deployed>
    destBucket=<The name of the OSS bucket>
    destPrefix=
    isSkipExistFile=true

    For more information, see Overview.

  3. Run ossimport to synchronize data to OSS. For more information about how to use ossimport in standalone mode, see Standalone deployment.

Step 3: Create a destination table

Create a destination table in the AnalyticDB for PostgreSQL instance to load data from Amazon Redshift. The destination table must have the same schema as the source table. For more information, see CREATE TABLE .

Modify information about database objects such as schemas, tables, functions, and views to convert data definition language (DDL) statements. For more information, see the "Convert DDL statements" section of this topic.

Step 4: Import data from OSS to the AnalyticDB for PostgreSQL instance

You can use the COPY statement or an OSS foreign table to import data to AnalyticDB for PostgreSQL:

Convert DDL statements

The DDL statements of Amazon Redshift are slightly different from the DDL statements of AnalyticDB for PostgreSQL and must be converted before you migrate data.

CREATE SCHEMA

Execute the CREATE SCHEMA statement of AnalyticDB for PostgreSQL to create a schema. Example:

CREATE SCHEMA schema1 AUTHORIZATION xxxpoc;
GRANT ALL ON SCHEMA schema1 TO xxxpoc;
GRANT ALL ON SCHEMA schema1 TO public;
COMMENT ON SCHEMA model IS 'for xxx migration  poc test';

CREATE SCHEMA oss_external_table AUTHORIZATION xxxpoc;

CREATE FUNCTION

Some SQL functions of Amazon Redshift are not supported by AnalyticDB for PostgreSQL and must be customized or rewritten. Examples:

  • Replace the CONVERT_TIMEZONE(a,b,c) function with the following function:
    timezone(b, timezone(a,c))
  • Replace the GETDATE() function with the following function:
    current_timestamp(0):timestamp
  • Replace or optimize user-defined functions (UDFs). The following sample code shows a UDF of Amazon Redshift:
    CREATE OR REPLACE FUNCTION public.f_jdate(dt timestamp without time zone)
    RETURNS character varying AS
    '      from datetime import timedelta, datetime
           if dt.hour < 4:
                  d = timedelta(days=-1)
                  dt = dt + d
           return str(dt.date())'
    LANGUAGE plpythonu IMMUTABLE;
    COMMIT;

    Replace the preceding function with the following function:

    to_char(a - interval '4 hour', 'yyyy-mm-dd')
  • Replace other Amazon Redshift functions.

    For more information about the standard PostgreSQL functions, see Functions and Operators. You can modify or rewrite the SQL functions of Amazon Redshift that are not supported by AnalyticDB for PostgreSQL, such as the following functions:

CREATE TABLE

  • Modify the name of a table.

    The name of an Amazon Redshift table can be up to 127 characters in length, while the name of an AnalyticDB for PostgreSQL table can be up to 63 characters in length. If the names of database objects such as tables, functions, and views have more than 63 characters, the additional characters must be removed.

  • Modify the compression algorithm.

    Replace ENCODE XXX in a CREATE TABLE statement of Amazon Redshift with the following clause:

    WITH (COMPRESSTYPE={ZLIB|ZSTD|RLE_TYPE|NONE})

    For more information about the compression algorithms supported by AnalyticDB for PostgreSQL, see Data compression.

  • Modify distribution keys.

    Amazon Redshift supports three distribution keys. For more information, see Distribution styles. You must modify distribution keys based on the following rules:

    • EVEN distribution: Use DISTRIBUTED RANDOMLY.
    • KEY distribution: Use DISTRIBUTED BY (column, [ ... ] ).
    • ALL distribution: Use DISTRIBUTED REPLICATED.
  • Modify sort keys.

    Replace the COMPOUND or INTERLEAVED option in the [ COMPOUND | INTERLEAVED ] SORTKEY (column_name [, ...] ) ] sort key clause of Amazon Redshift with the following clause:

    order by (column, [ ... ])

Examples:

  • Example 1:

    The following example shows a CREATE TABLE statement of Amazon Redshift:

    CREATE TABLE schema1.table1
    (
        filed1 VARCHAR(100) ENCODE lzo,
        filed2 INTEGER DISTKEY,
        filed3 INTEGER,
        filed4 BIGINT ENCODE lzo,
        filed5 INTEGER
    )
    INTERLEAVED SORTKEY
    (
        filed1,
        filed2
    );

    Convert the preceding statement to the following CREATE TABLE statement of AnalyticDB for PostgreSQL:

    CREATE TABLE schema1.table1
    (
        filed1 VARCHAR(100) ,
        filed2 INTEGER,
        filed3 INTEGER,
        filed4 BIGINT,
        filed5 INTEGER
    )
    WITH(APPENDONLY=true,ORIENTATION=column,COMPRESSTYPE=zstd)
    DISTRIBUTED BY (filed2)
    ORDER BY (filed1, filed2);
    
    -- Sort
    SORT schema1.table1;
    MULTISORT schema1.table1;
  • Example 2:

    The following example shows a CREATE TABLE statement of Amazon Redshift, which includes the ENCODE and SORTKEY options:

    CREATE TABLE schema2.table2
    (
        filed1 VARCHAR(50) ENCODE lzo,
        filed2 VARCHAR(50) ENCODE lzo,
        filed3 VARCHAR(20) ENCODE lzo,
    )
    DISTSTYLE EVEN
    INTERLEAVED SORTKEY
    (
        filed1
    );

    Convert the preceding statement to the following CREATE TABLE statement of AnalyticDB for PostgreSQL:

    CREATE TABLE schema2.table2
    (
        filed1 VARCHAR(50),
        filed2 VARCHAR(50),
        filed3 VARCHAR(20),
    )
    WITH(APPENDONLY=true, ORIENTATION=column, COMPRESSTYPE=zstd)
    DISTRIBUTED randomly
    ORDER BY (filed1);
    
    -- Sort
    SORT schema2.table2;
    MULTISORT schema2.table2;

CREATE VIEW

Convert a CREATE VIEW statement of Amazon Redshift to an SQL statement that complies with the AnalyticDB for PostgreSQL syntax. The conversion rules are similar to the conversion rules of CREATE TABLE statements.

Note The WITH NO SCHEMA BINDING clause is not supported.