Migrate data from Amazon Redshift to AnalyticDB for PostgreSQL in four steps: export data from Redshift to Amazon S3, sync it to Alibaba Cloud Object Storage Service (OSS), create the destination schema, and import the data.
Prerequisites
Before you begin, make sure you have:
An Amazon Redshift cluster
Amazon S3 activated for exporting Redshift data
OSS activated. See What is OSS?
An OSS bucket created. See Create a bucket
An AnalyticDB for PostgreSQL instance. See Specification selection to size it correctly
To facilitate subsequent data import, we recommend that you create an OSS bucket in the same region as the AnalyticDB for PostgreSQL instance.
Architecture overview
Before you start, understand the key structural differences between Amazon Redshift and AnalyticDB for PostgreSQL. These differences drive the DDL conversions required in Step 3.
| Concept | Amazon Redshift | AnalyticDB for PostgreSQL |
|---|---|---|
| Coordinator | Leader node | Coordinator node |
| Compute unit | Node slice (a partition within a compute node) | Compute node |
| Distribution key | DISTKEY / DISTSTYLE | DISTRIBUTED BY / DISTRIBUTED RANDOMLY / DISTRIBUTED REPLICATED |
| Sort key | [COMPOUND|INTERLEAVED] SORTKEY | ORDER BY |
| Column compression | ENCODE XXX | WITH (COMPRESSTYPE={ZLIB|ZSTD|RLE_TYPE|NONE}) |
| Max object name length | 127 characters | 63 characters |
AnalyticDB for PostgreSQL is PostgreSQL-compatible, so standard PostgreSQL functions and syntax work directly. Redshift-specific extensions — DISTKEY, SORTKEY, ENCODE, and proprietary functions — must be converted before migration.
Specification selection
Size your AnalyticDB for PostgreSQL instance by mapping each Redshift node slice to one AnalyticDB for PostgreSQL compute node.
Set the compute node specifications (CPU, memory, storage) to match those of one Redshift node slice.
Example: A Redshift cluster with 4 compute nodes, each with 2 slices (each slice: 2 cores, 16 GB memory, 1 TB storage), maps to an AnalyticDB for PostgreSQL instance with 8 compute nodes at 2 cores, 16 GB memory, and 1,000 GB storage per node.
Use Enhanced SSD (ESSD) as the storage type for better I/O performance. For instructions on creating an instance, see Create an instance.
Migrate data
Step 1: Export data from Redshift to Amazon S3
Use the Redshift UNLOAD statement to export table data to an Amazon S3 bucket. See the UNLOAD reference for full syntax.
UNLOAD ('select-statement')
TO 's3://object-path/name-prefix'
AUTHORIZATION
[ option [ ... ] ]
WHERE option IS
{ [ FORMAT [ AS ] ] CSV | PARQUET
| PARTITION BY ( column_name [, ... ] ) [ INCLUDE ]
| MANIFEST [ VERBOSE ]
| HEADER
| DELIMITER [ AS ] 'delimiter-char'
| FIXEDWIDTH [ AS ] 'fixedwidth-spec'
| ENCRYPTED [ AUTO ]
| BZIP2
| GZIP
| ZSTD
| ADDQUOTES
| NULL [ AS ] 'null-string'
| ESCAPE
| ALLOWOVERWRITE
| CLEANPATH
| PARALLEL [ { ON | TRUE } | { OFF | FALSE } ]
| MAXFILESIZE [AS] max-size [ MB | GB ]
| REGION [AS] 'Amazon-region' }Follow these guidelines for best import performance:
Format: Use
FORMAT AS PARQUETorFORMAT AS CSV.Parallelism: Set
PARALLEL ONto export data across multiple files simultaneously.File size: Set
MAXFILESIZEto a value that produces a file count equal to an integer multiple of your AnalyticDB for PostgreSQL node count. This lets the parallel OSS import distribute files evenly across nodes.
Step 2: Sync data from Amazon S3 to OSS
Two tools are available: Data Online Migration (recommended for large-scale or ongoing transfers) and ossimport (recommended for standalone, script-based migration).
Option 1: Data Online Migration (recommended)
Log on to the Data Transport console.
Create a source data address (Amazon S3):
In the left-side navigation pane, choose Data Online Migration > Data Address.
Click Create Data Address.
Configure the following parameters and click OK.
Parameter Required Description Name Yes 3–63 characters; lowercase letters, digits, hyphens (-), and underscores (_); UTF-8; cannot start with a hyphen or underscore Type Yes Select AWS S3 Domain Name Yes The Amazon S3 endpoint. See Amazon S3 endpoints AccessKeyId Yes The IAM user's access key ID, used to authenticate read access to the S3 bucket SecretAccessKey Yes The IAM user's secret access key Bucket Yes The Amazon S3 bucket containing the exported data. Bucket names cannot contain spaces, line breaks, or tab characters Prefix No Limits migration to a specific directory. Cannot start with /; must end with/. Leave blank to migrate the entire bucketTunnel No Required only when migrating over Express Connect circuits or VPN gateways Agent No Required only when migrating over Express Connect circuits or VPN gateways. Up to 30 agents per tunnel Create a destination data address (OSS):
Click Create Data Address again.
Configure the following parameters and click OK.
Parameter Required Description Data Type Yes Select OSS Data Name Yes 3–63 characters; no special characters except hyphens (-) and underscores (_) Data Region Yes The region where the OSS bucket resides Enable Transfer Acceleration No Speeds up cross-region transfers. Takes effect within 30 minutes after enabled — wait 30 minutes before creating the migration job. Additional fees apply. See Transfer acceleration OSS Endpoint Yes The OSS endpoint for the destination region. Use a public endpoint when migrating from a third-party cloud. See Regions and endpoints Access Key Id and Access Key Secret Yes The RAM user's AccessKey pair with write access to the OSS bucket OSS Bucket Yes The destination OSS bucket OSS Prefix No Migrates data to a specific directory. Cannot start with /; must end with/(for example,data/to/oss/). Required if any source file name starts with/Create a migration job: Incremental migration example: Current time is 08:00, March 10, 2019; Migration Interval = 1 hour; Migration Times = 5; Start Time Point of File = 08:00, March 5, 2019. The first migration covers files modified between March 5 and March 10. If the first migration takes one hour to complete, the second migration starts at 10:00, March 10, 2019 and covers files modified between 08:00 and 10:00. The job consists of one full migration and four incremental migrations.
In the Performance step, go to Data Prediction and set Data Size and File Count to accurately reflect the volume of data being migrated.
(Optional) In the Performance step, go to Flow Control, set Time Range and Max Flow(MB/s), and click Add to throttle bandwidth during peak hours.
Click Create and wait for the migration job to complete.
ImportantBefore full or incremental migration starts, the system compares source and destination files. A destination file is overwritten when the Content-Type values differ, the source file is newer, or the file sizes differ. For LastModified and Condition overwrite methods, make sure source files have valid
LastModified,Size, andContent-Typevalues. Missing metadata can cause the overwrite policy to fail silently.Parameter Required Description Job Name Yes 3–63 characters; no special characters except hyphens and underscores Source Data Address Yes The source address created in step 2 Destination Data Address Yes The destination address created in step 3 Specified Directory No Filters which directories to migrate: Do not filter (all data), Exclude (skip specified directories), or Contain (only migrate specified directories). Up to 20 directories per filter type. Directory names must end with /, cannot start with/or\, cannot contain//,.., or", and the total character string of all specified directory names cannot exceed 10 KBMigration Type Yes Full: migrates all files once and ends. Incremental: repeats migration at a set interval Start Time Point of File Yes All: migrates all files. Assign: migrates only files created or modified after the specified time Migration Interval Yes (incremental only) How often incremental migrations run. Default: 1 hour. Maximum: 24 hours Migration Times Yes (incremental only) Total number of migration runs (1 full + N−1 incremental). Default: 1. Maximum: 30 File Overwrite Method Yes How to handle files with the same name at source and destination: LastModified (overwrite if source is newer or if Size/Content-Type differ when timestamps match), Condition (overwrite if any of LastModified, Size, or Content-Type differ), All (always overwrite), No (always skip)
Option 2: ossimport
Download and install ossimport in standalone mode. See Overview. The standalone file structure:
ossimport ├── bin │ └── ossimport2.jar # Contains Master, Worker, TaskTracker, and Console modules ├── conf │ ├── local_job.cfg # Job configuration file │ └── sys.properties # System parameters ├── console.bat # Windows CLI for distributed task management ├── console.sh # Linux CLI for distributed task management ├── import.bat # Windows: auto-import using conf/local_job.cfg ├── import.sh # Linux: auto-import using conf/local_job.cfg ├── logs # Log directory └── README.md # Usage guide — read before startingEdit
conf/local_job.cfgand set the following parameters:srcType=s3 srcAccessKey=<your-aws-access-key> srcSecretKey=<your-aws-secret-key> srcDomain=<domain-for-the-amazon-s3-bucket-region> srcBucket=<your-amazon-s3-bucket-name> destAccessKey=<your-alibaba-cloud-accesskey-id> destSecretKey=<your-alibaba-cloud-accesskey-secret> destDomain=<endpoint-for-the-oss-bucket-region> destBucket=<your-oss-bucket-name> destPrefix= isSkipExistFile=trueSee Overview for all configuration options.
Run ossimport to start the sync. See Standalone deployment for usage instructions.
Step 3: Convert DDL statements and create destination tables
Before creating tables in AnalyticDB for PostgreSQL, convert your Redshift DDL statements. Redshift-specific syntax is not compatible with AnalyticDB for PostgreSQL.
See DDL conversion reference for the full conversion rules and examples.
After converting the DDL, create the destination schema in AnalyticDB for PostgreSQL. See CREATE TABLE for syntax.
Step 4: Import data from OSS into AnalyticDB for PostgreSQL
Two methods are available for loading data from OSS:
COPY statement: See Use the COPY or UNLOAD statement to import or export data between OSS foreign tables and AnalyticDB for PostgreSQL tables
OSS foreign table: See Use OSS foreign tables for data lake analysis
DDL conversion reference
The following sections cover the syntax differences between Redshift and AnalyticDB for PostgreSQL, with conversion examples for schemas, functions, tables, and views.
Key syntax differences
| Element | Amazon Redshift | AnalyticDB for PostgreSQL |
|---|---|---|
| Compression | ENCODE XXX | WITH (COMPRESSTYPE={ZLIB|ZSTD|RLE_TYPE|NONE}) |
| EVEN distribution | DISTSTYLE EVEN | DISTRIBUTED RANDOMLY |
| KEY distribution | DISTKEY (column) | DISTRIBUTED BY (column, [...]) |
| ALL distribution | DISTSTYLE ALL | DISTRIBUTED REPLICATED |
| Sort key | [COMPOUND|INTERLEAVED] SORTKEY (col, ...) | ORDER BY (col, [...]) |
| Table name length | Up to 127 characters | Up to 63 characters |
AnalyticDB for PostgreSQL table, function, and view names are limited to 63 characters. Truncate any Redshift object names that exceed this limit.
Function conversions
AnalyticDB for PostgreSQL does not support several Redshift-specific SQL functions. Replace them as follows:
| Redshift function | AnalyticDB for PostgreSQL equivalent |
|---|---|
CONVERT_TIMEZONE(a, b, c) | timezone(b, timezone(a, c)) |
GETDATE() | current_timestamp(0)::timestamp |
DATEADD(part, n, date) | Rewrite using standard PostgreSQL date functions. See DATEADD() |
DATEDIFF(part, start, end) | Rewrite using standard PostgreSQL date functions. See DATEDIFF() |
REGEXP_COUNT(str, pattern) | Rewrite using standard PostgreSQL regex functions. See REGEXP_COUNT() |
For user-defined functions (UDFs), rewrite in a language supported by AnalyticDB for PostgreSQL. For example, replace this Redshift Python UDF:
CREATE OR REPLACE FUNCTION public.f_jdate(dt timestamp WITHOUT time zone)
RETURNS character varying AS
' from datetime import timedelta, datetime
if dt.hour < 4:
d = timedelta(days=-1)
dt = dt + d
return str(dt.date())'
LANGUAGE plpythonu IMMUTABLE;
COMMIT;With this SQL expression:
to_char(a - interval '4 hour', 'yyyy-mm-dd')For a complete list of standard PostgreSQL functions, see Functions and Operators.
CREATE SCHEMA
CREATE SCHEMA schema1 AUTHORIZATION xxxpoc;
GRANT ALL ON SCHEMA schema1 TO xxxpoc;
GRANT ALL ON SCHEMA schema1 TO public;
COMMENT ON SCHEMA model IS 'for xxx migration poc test';
CREATE SCHEMA oss_external_table AUTHORIZATION xxxpoc;CREATE TABLE
Apply all applicable conversions from Key syntax differences when rewriting CREATE TABLE statements.
Example 1: Redshift table with DISTKEY and INTERLEAVED SORTKEY
Redshift source:
CREATE TABLE schema1.table1
(
filed1 VARCHAR(100) ENCODE lzo,
filed2 INTEGER DISTKEY,
filed3 INTEGER,
filed4 BIGINT ENCODE lzo,
filed5 INTEGER
)
INTERLEAVED SORTKEY
(
filed1,
filed2
);AnalyticDB for PostgreSQL equivalent:
CREATE TABLE schema1.table1
(
filed1 VARCHAR(100),
filed2 INTEGER,
filed3 INTEGER,
filed4 BIGINT,
filed5 INTEGER
)
WITH (APPENDONLY=true, ORIENTATION=column, COMPRESSTYPE=zstd)
DISTRIBUTED BY (filed2)
ORDER BY (filed1, filed2);
-- Sort
SORT schema1.table1;
MULTISORT schema1.table1;Example 2: Redshift table with DISTSTYLE EVEN and INTERLEAVED SORTKEY
Redshift source:
CREATE TABLE schema2.table2
(
filed1 VARCHAR(50) ENCODE lzo,
filed2 VARCHAR(50) ENCODE lzo,
filed3 VARCHAR(20) ENCODE lzo,
)
DISTSTYLE EVEN
INTERLEAVED SORTKEY
(
filed1
);AnalyticDB for PostgreSQL equivalent:
CREATE TABLE schema2.table2
(
filed1 VARCHAR(50),
filed2 VARCHAR(50),
filed3 VARCHAR(20)
)
WITH (APPENDONLY=true, ORIENTATION=column, COMPRESSTYPE=zstd)
DISTRIBUTED RANDOMLY
ORDER BY (filed1);
-- Sort
SORT schema2.table2;
MULTISORT schema2.table2;For details on how sort keys affect query performance, see Use sort keys and rough set indexes to accelerate queries in column-oriented tables.
For compression algorithm options, see Data compression.
CREATE VIEW
Convert CREATE VIEW statements using the same rules as CREATE TABLE. The WITH NO SCHEMA BINDING clause is not supported — remove it from the converted statement.
What's next
To automate this migration process, see Configure automatic data migration from an Amazon Redshift cluster to an AnalyticDB for PostgreSQL instance.