Migrating a data warehouse from Amazon Redshift to AnalyticDB for PostgreSQL involves DDL conversion, data export, cross-cloud transfer, and data import. The adbpg-redshift-migration.jar tool automates these steps, handling DDL conversion and data movement so you can complete the migration without manual scripting.
The migration consists of four steps:
-
Export and convert DDL statements from Redshift to AnalyticDB for PostgreSQL format.
-
Upload table data from Redshift to Amazon S3.
-
Transfer data from Amazon S3 to Object Storage Service (OSS).
-
Import data from OSS into AnalyticDB for PostgreSQL.
Prerequisites
Before you begin, ensure that you have:
-
An Amazon Redshift cluster
-
Amazon S3 activated and a temporary Identity and Access Management (IAM) role created for the service
-
The IAM role granted Amazon S3 and Amazon Redshift permissions
-
The following trust relationship configuration added to the IAM role (takes effect in approximately 10 minutes):
{ "Effect": "Allow", "Principal": { "Service": "redshift.amazonaws.com" }, "Action": "sts:AssumeRole" } -
The IAM role associated with your Amazon Redshift cluster
-
An AnalyticDB for PostgreSQL instance and an OSS bucket created in the same region
Size the AnalyticDB for PostgreSQL instance
Map your Redshift cluster specifications to AnalyticDB for PostgreSQL node specifications before creating the instance.
An Amazon Redshift cluster consists of a leader node and multiple compute nodes. Each compute node is partitioned into node slices. Each node slice maps to one compute node in AnalyticDB for PostgreSQL.
Example: A Redshift cluster with 4 compute nodes, each with 2 node slices (2 cores, 16 GB memory, 1 TB storage per slice), maps to an AnalyticDB for PostgreSQL instance with:
-
Compute nodes: 8 (4 nodes x 2 slices)
-
Node specifications: 2 cores, 16 GB memory
-
Single-node storage: 1,000 GB
Set the storage type to Enhanced SSD (ESSD) for better I/O performance. For instructions on creating an instance, see Create an instance.
Procedure
Step 1: Export and convert DDL statements
Download the adbpg-redshift-migration.jar tool, then run the following command to export DDL statements from Redshift and convert them to AnalyticDB for PostgreSQL format.
Before running this command, configure security group rules for the virtual private cloud (VPC) where the Redshift cluster resides so that the migration tool's IP address can reach the Redshift service port.
java -jar adbpg-redshift-migration.jar -d redshift -h r-red.cjbgr.cn-north-1.redshift.amazonaws.com.cn -p 5439 -U awsuser -w Lrz**** --transform
Replace the parameter values with your actual values.
| Parameter | Description |
|---|---|
-d |
Database type. Set to redshift. |
-h |
Endpoint of the Redshift cluster. Get this from the JDBC URL in the Redshift console. For example, if the JDBC URL is jdbc:redshift://r-red.cjbgr.cn-north-1.redshift.amazonaws.com.cn:5439/dev, the endpoint is r-red.cjbgr.cn-north-1.redshift.amazonaws.com.cn. |
-p |
Port number. Set to 5439. |
-U |
Superuser name of the Redshift cluster. This is typically awsuser, the user specified during cluster creation. |
-w |
Password of the Redshift superuser. |
--transform |
Performs both DDL download and conversion. The downloaded DDL is saved to ddl_dump_{timestamp}.txt. The converted DDL is saved to ddl_transform_{timestamp}.txt. |
After you create a database, a schema, and a user on the AnalyticDB for PostgreSQL instance to store Amazon Redshift data, review ddl_transform_{timestamp}.txt and verify that the converted DDL statements are compatible with AnalyticDB for PostgreSQL. If so, execute the DDL statements to create the corresponding table schemas.
Step 2: Upload data from Redshift to Amazon S3
Run the following command to export table data from Redshift and upload it to Amazon S3. After the upload completes, two result files are generated: succeed_transfer_tables_{timestamp}.txt (successfully uploaded tables) and failed_transfer_tables_{timestamp}.txt (failed tables).
java -jar adbpg-redshift-migration.jar -d redshift -h host-info -p 5432 -U dbuser -w Lrz**** -o s3 -b dbuser-bj --iam-account account-info --iam-role role-info --root-dir rootdir --format parquet --use-iam --china-region --jobs 8 -f tablelist.txt --upload
Replace the parameter values with your actual values.
| Parameter | Description |
|---|---|
-o |
Target cloud storage. Set to s3. |
-b |
Amazon S3 bucket name. |
--iam-account |
Account ID of the IAM role. This is a 12-digit number from the Amazon Resource Name (ARN) of the IAM role. For example, 123456789012 in arn:aws-cn:iam::123456789012:role/lrz2. |
--iam-role |
Name of the IAM role. For example, lrz2 in arn:aws-cn:iam::123456789012:role/lrz2. |
--root-dir |
Root directory in Amazon S3. |
--format |
Data export format: parquet (default, recommended), csv, or json. Parquet enables fast and secure data export. Row delimiters in CSV files cannot be customized. |
--use-iam |
Uses the IAM role for authentication when uploading. |
--china-region |
(Optional) Specify this parameter only when uploading data from regions in the Chinese mainland. |
--jobs |
Number of parallel upload jobs. |
-f |
File containing the names of tables to upload. Each line must contain one table name in database.schema.table format. To upload all tables, use --full instead. |
--upload |
Triggers the upload operation. |
Step 3: Transfer data from Amazon S3 to OSS
Use Alibaba Cloud Data Online Migration to transfer the data from Amazon S3 to OSS.
Step 4: Import data from OSS to AnalyticDB for PostgreSQL
Before running this command, submit a ticket to apply for superuser permissions on your AnalyticDB for PostgreSQL instance.
Run the following command to import the data from OSS into AnalyticDB for PostgreSQL. Use the succeed_transfer_tables_{timestamp}.txt file generated in Step 2 as the input for the -f parameter.
java -jar adbpg-redshift-migration.jar -d adbpg -h gp-2ze312*******o-master.gpdb.rds.aliyuncs.com -p 5432 -U dbuser -w Lrz**** -o oss -e oss-cn-beijing-internal.aliyuncs.com -i "LTAI5tP*******rCrdj" -k "4k3ntmvK*******M2mqjQcuOy" -b dbuser-bj --root-dir rootdir --format parquet --jobs 8 -f tablelist.txt --download
Replace the parameter values with your actual values.
| Parameter | Description |
|---|---|
-d |
Database type. Set to adbpg. |
-h |
Endpoint of the AnalyticDB for PostgreSQL instance. Get the public endpoint from the AnalyticDB for PostgreSQL console. |
-p |
Port number. Set to 5432. |
-U |
Name of the initial account of the AnalyticDB for PostgreSQL instance. |
-w |
Password of the initial account. |
-o |
Source cloud storage. Set to oss. |
-e |
Endpoint of the OSS bucket. |
-i |
AccessKey ID used to access OSS. |
-k |
AccessKey secret used to access OSS. |
-b |
Name of the OSS bucket. |
--root-dir |
Root directory of OSS. |
--format |
Data format to import. Use the same format specified in Step 2 (default: parquet). |
--jobs |
Number of parallel import jobs. |
-f |
File containing the names of tables to import. Each line must contain one table name in database.schema.table format. In most cases, you can use the succeed_transfer_tables_{timestamp}.txt file generated during upload in Step 2. |
--new-name-file |
(Optional) File containing the corresponding AnalyticDB for PostgreSQL table names. Required only if the table names differ from those in Redshift. Each line must contain one table name in database.schema.table format. Table names in this file correspond one-to-one with those in the -f file. |
--download |
Triggers the download (import) operation. |
Next steps
For manual migration without the automated tool, see Manually migrate data from an Amazon Redshift cluster to an AnalyticDB for PostgreSQL instance.