Configure automatic data migration from an Amazon Redshift cluster to an AnalyticDB for PostgreSQL instance - AnalyticDB

Alibaba Cloud provides an automatic data migration tool that supports DDL conversion to help you configure automatic data migration from Amazon Redshift to AnalyticDB for PostgreSQL.

Prerequisites

An Amazon Redshift cluster is created.
Amazon S3 is activated and a temporary Identity and Access Management (IAM) role is created for the service.
The IAM role is granted the Amazon S3 and Amazon Redshift permissions.

The following information is added to the trust relationship of the IAM role, and the changes take effect in approximately 10 minutes:

{
        "Effect": "Allow",
        "Principal": {
              "Service": "redshift.amazonaws.com"
        },
        "Action": "sts:AssumeRole"
}

The IAM role is associated with Amazon Redshift.
An AnalyticDB for PostgreSQL instance and an Object Storage Service (OSS) bucket are created in the same region.

Specification selection

This section describes how to select specifications for an AnalyticDB for PostgreSQL instance based on the specifications of the source Amazon Redshift cluster.

An Amazon Redshift cluster consists of a leader node and multiple compute nodes.

Leader node: acts as the coordinator node within an AnalyticDB for PostgreSQL instance. The leader node communicates with the client, analyzes and develops query execution plans, and performs database operations.
Compute node: acts as a set of compute nodes within an AnalyticDB for PostgreSQL instance. Each compute node in an Amazon Redshift cluster is partitioned into slices. Each node slice acts as a compute node within an AnalyticDB for PostgreSQL instance and performs computing for data storage and query.

When you create an AnalyticDB for PostgreSQL instance, you can select the node specifications of the instance based on specifications of each node slice in the source Amazon Redshift cluster.

Example

An Amazon Redshift cluster consists of four compute nodes. Each compute node consists of two node slices. The specifications of each node slice are 2 cores, 16 GB memory, and 1 TB storage.

When you create an AnalyticDB for PostgreSQL instance, you can set the number of compute nodes to 8, node specifications to 2 cores and 16 GB memory, and single-node storage to 1,000 GB.

Note

For information about how to create an AnalyticDB for PostgreSQL instance, see Create an instance.
We recommend that you set the storage type to Enhanced SSD (ESSD) because ESSDs provide better I/O performance than ultra disks.

Procedure

Step 1: Export and convert DDL statements

You can use the adbpg-redshift-migration.jar tool to export DDL statements from Amazon Redshift and convert them into a format compatible with AnalyticDB for PostgreSQL. Run the following sample command to export and convert DDL statements. Replace the parameter values with actual ones.

java -jar adbpg-redshift-migration.jar -d redshift -h r-red.cjbgr.cn-north-1.redshift.amazonaws.com.cn -p 5439 -U awsuser -w Lrz**** --transform

Important

Before you run the command, configure security group rules for the virtual private cloud (VPC) in which the Amazon Redshift cluster resides to ensure that the IP address of the migration tool is accessible to the service port of the Amazon Redshift cluster.

Parameters

-d: the database type for downloading or converting DDL statements. Set the value to redshift.
-h: the endpoint of the Amazon Redshift cluster, which can be obtained from the JDBC URL parameter in the Amazon Redshift console. For example, if the JDBC URL parameter value is jdbc:redshift://r-red.cjbgr.cn-north-1.redshift.amazonaws.com.cn:5439/dev, r-red.cjbgr.cn-north-1.redshift.amazonaws.com.cn is the endpoint.
-p: the port number of Amazon Redshift. Set the value to 5439.
-U: the superuser name of the Amazon Redshift cluster. In most cases, the awsuser user that is specified during cluster creation is used.
-w: the password of the superuser of the Amazon Redshift cluster.
--transform: performs DDL download and conversion. The downloaded DDL statements are stored in a file named ddl_dump_20250227174322.txt, where the numeric string represents the download timestamp. The converted DDL statements are stored in a file named ddl_transform_20250227202402.txt, where the numeric string represents the conversion timestamp.

After you create a database, a schema, and a user on the AnalyticDB for PostgreSQL instance to store Amazon Redshift data, check whether the DDL statements in ddl_transform_20250227202402.txt are ready for use. If so, execute the statements on the AnalyticDB for PostgreSQL instance to create the corresponding table schemas.

Step 2: Configure automatic data upload from Amazon Redshift to Amazon S3

Run the following sample command to upload data from Amazon Redshift to Amazon S3. Replace the parameter values with actual ones. After the data upload is complete, two files are generated: succeed_transfer_tables_20250228144558.txt (contains the uploaded tables) and failed_transfer_tables_20250228144558.txt (contains the failed tables).

java -jar adbpg-redshift-migration.jar -d redshift -h host-info -p 5432 -U dbuser -w Lrz**** -o s3 -b dbuser-bj --iam-account account-info --iam-role role-info --root-dir rootdir --format parquet --use-iam --china-region --jobs 8 -f tablelist.txt --upload

Parameters

-o: the cloud storage service that is used to store Amazon Redshift data. Set the value to Amazon S3.
-b: the name of the Amazon S3 bucket.
--iam-account: the account ID of the IAM role. In most cases, the parameter value has 12 digits, which can be obtained from the Amazon Resource Name (ARN) of the IAM role. Example: 123456789012 in arn:aws-cn:iam::123456789012:role/lrz2.
--iam-role: the name of the IAM role. Example: lrz2 in arn:aws-cn:iam::123456789012:role/lrz2.
--root-dir: the root directory of Amazon S3.
--format: the data export format, which can be CSV, PARQUET, or JSON. Row delimiters in CSV files cannot be modified. We recommend that you use the PARQUET format, which is the default data format for the adbpg-redshift-migration tool and enables fast and secure data export.
--use-iam: uses the IAM role to upload data.
--china-region: optional. Specify this parameter only if you upload data in regions inside China.
--jobs: uploads data in parallel.
-f: the name of the file that contains the names of tables to be uploaded. Each line in the file includes only one table name in the database.schema.table format. You can use --full to upload all tables in the Amazon Redshift cluster.
--upload: performs the upload operation.

Step 3: Migrate data from Amazon S3 to OSS

After table data is uploaded to Amazon S3, use Alibaba Cloud Data Online Migration to migrate the data from Amazon S3 to OSS.

Step 4: Configure automatic data import from OSS to AnalyticDB for PostgreSQL

After data is migrated to OSS, run the following sample command to import the data from OSS to AnalyticDB for PostgreSQL. Replace the parameter values with actual ones.

Important

Before you run the following command, submit a ticket to apply for superuser permissions.

java -jar adbpg-redshift-migration.jar -d adbpg -h gp-2ze312*******o-master.gpdb.rds.aliyuncs.com -p 5432 -U dbuser -w Lrz**** -o oss -e oss-cn-beijing-internal.aliyuncs.com -i "LTAI5tP*******rCrdj" -k "4k3ntmvK*******M2mqjQcuOy" -b dbuser-bj --root-dir rootdir --format parquet --jobs 8 -f tablelist.txt --download

Parameters

-d: the database type for downloading or converting DDL statements. Set the value toadbpg.
-h: the endpoint of the AnalyticDB for PostgreSQL instance. You can obtain the public endpoint of the instance in the AnalyticDB for PostgreSQL console.
-p: the port number of AnalyticDB for PostgreSQL. Set the value to 5432.
-U: the name of the initial account of the AnalyticDB for PostgreSQL instance.
-w: the password of the initial account of the AnalyticDB for PostgreSQL instance.
-o: the cloud storage service that is used to store data. Set the value to OSS.
-e: the endpoint of the OSS bucket that stores data.
-i: the AccessKey ID that is used to access OSS.
-k: the AccessKey secret that is used to access OSS.
-b: the name of the OSS bucket.
--root-dir: the root directory of OSS.
-f: the name of the file that contains the names of tables to be downloaded. Each line in the file includes only one table name in the database.schema.table format. In most cases, you can use the file generated during upload. Example: succeed_transfer_tables_20250228144558.txt.
--new-name-file: the name of the file that contains the corresponding AnalyticDB for PostgreSQL table names. If the table names in AnalyticDB for PostgreSQL are different from those in Amazon Redshift, you must specify this parameter. Each line in the file includes only one table name in the database.schema.table format. The table names in this file have a one-to-one correspondence with those in the file specified by the -f parameter.
--download: performs the download operation.

References

For information about manual data migration, see Manually migrate data from an Amazon Redshift cluster to an AnalyticDB for PostgreSQL instance.