Configure automatic data migration from an Amazon Redshift cluster to an AnalyticDB for PostgreSQL instance - AnalyticDB

Migrating a data warehouse from Amazon Redshift to AnalyticDB for PostgreSQL involves DDL conversion, data export, cross-cloud transfer, and data import. The adbpg-redshift-migration.jar tool automates these steps, handling DDL conversion and data movement so you can complete the migration without manual scripting.

The migration consists of four steps:

Export and convert DDL statements from Redshift to AnalyticDB for PostgreSQL format.
Upload table data from Redshift to Amazon S3.
Transfer data from Amazon S3 to Object Storage Service (OSS).
Import data from OSS into AnalyticDB for PostgreSQL.

Prerequisites

Before you begin, ensure that you have:

An Amazon Redshift cluster
Amazon S3 activated and a temporary Identity and Access Management (IAM) role created for the service
The IAM role granted Amazon S3 and Amazon Redshift permissions

The following trust relationship configuration added to the IAM role (takes effect in approximately 10 minutes):

{
  "Effect": "Allow",
  "Principal": {
    "Service": "redshift.amazonaws.com"
  },
  "Action": "sts:AssumeRole"
}

The IAM role associated with your Amazon Redshift cluster
An AnalyticDB for PostgreSQL instance and an OSS bucket created in the same region

Size the AnalyticDB for PostgreSQL instance

Map your Redshift cluster specifications to AnalyticDB for PostgreSQL node specifications before creating the instance.

An Amazon Redshift cluster consists of a leader node and multiple compute nodes. Each compute node is partitioned into node slices. Each node slice maps to one compute node in AnalyticDB for PostgreSQL.

Example: A Redshift cluster with 4 compute nodes, each with 2 node slices (2 cores, 16 GB memory, 1 TB storage per slice), maps to an AnalyticDB for PostgreSQL instance with:

Compute nodes: 8 (4 nodes x 2 slices)
Node specifications: 2 cores, 16 GB memory
Single-node storage: 1,000 GB

Set the storage type to Enhanced SSD (ESSD) for better I/O performance. For instructions on creating an instance, see Create an instance.

Procedure

Step 1: Export and convert DDL statements

Download the adbpg-redshift-migration.jar tool, then run the following command to export DDL statements from Redshift and convert them to AnalyticDB for PostgreSQL format.

Important

Before running this command, configure security group rules for the virtual private cloud (VPC) where the Redshift cluster resides so that the migration tool's IP address can reach the Redshift service port.

java -jar adbpg-redshift-migration.jar -d redshift -h r-red.cjbgr.cn-north-1.redshift.amazonaws.com.cn -p 5439 -U awsuser -w Lrz**** --transform

Replace the parameter values with your actual values.

Parameter	Description
`-d`	Database type. Set to `redshift`.
`-h`	Endpoint of the Redshift cluster. Get this from the JDBC URL in the Redshift console. For example, if the JDBC URL is `jdbc:redshift://r-red.cjbgr.cn-north-1.redshift.amazonaws.com.cn:5439/dev`, the endpoint is `r-red.cjbgr.cn-north-1.redshift.amazonaws.com.cn`.
`-p`	Port number. Set to `5439`.
`-U`	Superuser name of the Redshift cluster. This is typically `awsuser`, the user specified during cluster creation.
`-w`	Password of the Redshift superuser.
`--transform`	Performs both DDL download and conversion. The downloaded DDL is saved to `ddl_dump_{timestamp}.txt`. The converted DDL is saved to `ddl_transform_{timestamp}.txt`.

After you create a database, a schema, and a user on the AnalyticDB for PostgreSQL instance to store Amazon Redshift data, review ddl_transform_{timestamp}.txt and verify that the converted DDL statements are compatible with AnalyticDB for PostgreSQL. If so, execute the DDL statements to create the corresponding table schemas.

Step 2: Upload data from Redshift to Amazon S3

Run the following command to export table data from Redshift and upload it to Amazon S3. After the upload completes, two result files are generated: succeed_transfer_tables_{timestamp}.txt (successfully uploaded tables) and failed_transfer_tables_{timestamp}.txt (failed tables).

java -jar adbpg-redshift-migration.jar -d redshift -h host-info -p 5432 -U dbuser -w Lrz**** -o s3 -b dbuser-bj --iam-account account-info --iam-role role-info --root-dir rootdir --format parquet --use-iam --china-region --jobs 8 -f tablelist.txt --upload

Replace the parameter values with your actual values.

Parameter	Description
`-o`	Target cloud storage. Set to `s3`.
`-b`	Amazon S3 bucket name.
`--iam-account`	Account ID of the IAM role. This is a 12-digit number from the Amazon Resource Name (ARN) of the IAM role. For example, `123456789012` in `arn:aws-cn:iam::123456789012:role/lrz2`.
`--iam-role`	Name of the IAM role. For example, `lrz2` in `arn:aws-cn:iam::123456789012:role/lrz2`.
`--root-dir`	Root directory in Amazon S3.
`--format`	Data export format: `parquet` (default, recommended), `csv`, or `json`. Parquet enables fast and secure data export. Row delimiters in CSV files cannot be customized.
`--use-iam`	Uses the IAM role for authentication when uploading.
`--china-region`	(Optional) Specify this parameter only when uploading data from regions in the Chinese mainland.
`--jobs`	Number of parallel upload jobs.
`-f`	File containing the names of tables to upload. Each line must contain one table name in `database.schema.table` format. To upload all tables, use `--full` instead.
`--upload`	Triggers the upload operation.

Step 3: Transfer data from Amazon S3 to OSS

Use Alibaba Cloud Data Online Migration to transfer the data from Amazon S3 to OSS.

Step 4: Import data from OSS to AnalyticDB for PostgreSQL

Important

Before running this command, submit a ticket to apply for superuser permissions on your AnalyticDB for PostgreSQL instance.

Run the following command to import the data from OSS into AnalyticDB for PostgreSQL. Use the succeed_transfer_tables_{timestamp}.txt file generated in Step 2 as the input for the -f parameter.

java -jar adbpg-redshift-migration.jar -d adbpg -h gp-2ze312*******o-master.gpdb.rds.aliyuncs.com -p 5432 -U dbuser -w Lrz**** -o oss -e oss-cn-beijing-internal.aliyuncs.com -i "LTAI5tP*******rCrdj" -k "4k3ntmvK*******M2mqjQcuOy" -b dbuser-bj --root-dir rootdir --format parquet --jobs 8 -f tablelist.txt --download

Replace the parameter values with your actual values.

Parameter	Description
`-d`	Database type. Set to `adbpg`.
`-h`	Endpoint of the AnalyticDB for PostgreSQL instance. Get the public endpoint from the AnalyticDB for PostgreSQL console.
`-p`	Port number. Set to `5432`.
`-U`	Name of the initial account of the AnalyticDB for PostgreSQL instance.
`-w`	Password of the initial account.
`-o`	Source cloud storage. Set to `oss`.
`-e`	Endpoint of the OSS bucket.
`-i`	AccessKey ID used to access OSS.
`-k`	AccessKey secret used to access OSS.
`-b`	Name of the OSS bucket.
`--root-dir`	Root directory of OSS.
`--format`	Data format to import. Use the same format specified in Step 2 (default: `parquet`).
`--jobs`	Number of parallel import jobs.
`-f`	File containing the names of tables to import. Each line must contain one table name in `database.schema.table` format. In most cases, you can use the `succeed_transfer_tables_{timestamp}.txt` file generated during upload in Step 2.
`--new-name-file`	(Optional) File containing the corresponding AnalyticDB for PostgreSQL table names. Required only if the table names differ from those in Redshift. Each line must contain one table name in `database.schema.table` format. Table names in this file correspond one-to-one with those in the `-f` file.
`--download`	Triggers the download (import) operation.

Next steps

For manual migration without the automated tool, see Manually migrate data from an Amazon Redshift cluster to an AnalyticDB for PostgreSQL instance.