×
Community Blog 5 Simple Steps to Migrate Data Warehouses from Redshift to Alibaba Cloud AnalyticDB for PG

5 Simple Steps to Migrate Data Warehouses from Redshift to Alibaba Cloud AnalyticDB for PG

This article describes how to migrate data between the two data warehouse platforms (from Redshift to Alibaba Cloud AnalyticDB for PG) in just five simple steps.

By Peng Xiaoqiang (Lufeng)

Alibaba Cloud AnalyticDB for PostgreSQL (ADB PG, formerly known as HybridDB for PostgreSQL) is a real-time data warehousing service based on the PostgreSQL kernel Massively Parallel Processing (MPP) architecture. This product supports complex extract-transform-load (ETL) tasks and high-performance online queries. It is closely integrated with the Alibaba Cloud ecosystem.

AWS Redshift is also an MPP data warehouse server based on the PostgreSQL kernel engine and is widely used as a data warehouse in AWS. ADB PG is highly compatible with Redshift in terms of architecture and syntax. This article describes how to migrate data between these two data warehouse platforms.

1

Product Architecture Comparison

The latest Alibaba Cloud ADB PG 6.0 is built on PostgreSQL 9.4, whereas Redshift is based on PostgreSQL 8.2. Compared to ADB PG, Redshift provides a richer set of features and is fully compatible with PostgreSQL ecosystem tools, including PostGIS, MADlib, and other extended analysis tools. Redshift only supports column-store tables, not PostgreSQL native row-store tables. Comparatively, ADB PG retains support for PostgreSQL row-store tables to implement high-throughput data update operations. It also supports column-store tables for OLAP large table aggregation operations.

Comparison of ADB PG and Redshift

Item ADB PG Redshift
PostgreSQL version PG 9.4 PG 8.2
SQL syntax Compatible with PG and partially compatible with Oracle syntax Compatible with PG
Transactions Supported Supported
Row-store Supported Not supported
Column-store Supported Supported
Table partitions Supported Supported
Cloud storage Supports online access to OSS data Supports online access to S3 data
Multimodal analysis PostGIS, MADLib, and vector search

Key Syntax Comparison and Migration

Both, Alibaba Cloud ADB PG and AWS Redshift are based on the standalone PostgreSQL kernel engine, so their syntax is highly compatible. Some of their syntax descriptions are slightly different, as detailed below.

Differences in Syntax for DDL Table Creation

Syntax Redshift ADB PG
DISTKEY(col) DISTRIBUTED BY(col)
Random table distribution DISTSTYLE EVEN DISTRIBUTED RANDOMLY
Table replica distribution DISTSTYLE ALL DISTRIBUTED REPLICATED
Data encoding and compression AZ64 /BYTEDICT /DELTA /LZO /RAW /RUNLENGTH /ZSTD (COMPRESSTYPE={ZStD/ZLIB/QUICKLZ/RLE_TYPE/NONE})
Column-store sort key SORTKEY (col) "with(APPENDONLY=true, ORIENTATION=column)sortkey(volume)"
System functions PG8.2 and some custom functions PG9.4 and some custom functions

Syntax Guide

DDL Conversion Example 1

Table creation statement for Redshift, including the distribution key DISTKEY and sort column:

CREATE TABLE schema1.table1(
    filed1 VARCHAR(100) ENCODE lzo,
    filed2 INTEGER DISTKEY,
    filed3 INTEGER,
    filed4 BIGINT ENCODE lzo,
    filed5 INTEGER,)
INTERLEAVED SORTKEY (
    filed1,
    filed2);

Table creation statement for ADB PG:

CREATE TABLE schema1.table1
(
    filed1 VARCHAR(100) ,
    filed3 INTEGER,
    filed5 INTEGER
)
WITH(APPENDONLY=true,ORIENTATION=column,COMPRESSTYPE=zlib)
DISTRIBUTED BY (filed2)
SORTKEY
(
    filed1,
    filed2
)

DDL Conversion Example 2

Table creation statement for Redshift, including the ENCODE and SORTKEY options:

CREATE TABLE schema2.table2
(
    filed1 VARCHAR(50) ENCODE lzo,
    filed2 VARCHAR(50) ENCODE lzo,
    filed3 VARCHAR(20) ENCODE lzo,
)
DISTSTYLE EVEN
INTERLEAVED SORTKEY
(
    filed1
); 

Table creation statement for ADB PG:

CREATE TABLE schema2.table2(
    filed1 VARCHAR(50),
    filed2 VARCHAR(50),
    filed3 VARCHAR(20))
WITH(APPENDONLY=true, ORIENTATION=column, COMPRESSTYPE=zlib)
DISTRIBUTED randomly
SORTKEY
(
    filed1
); 

Data Migration

Both Redshift and ADB PG support high-speed parallel data import and export from cloud storage.

Migrating data from Redshift to ADB PG involves the following steps:

  • Prepare resources related to Amazon Redshift, Amazon S3 (Amazon Simple Storage Service), ADB PG, and Alibaba Cloud OSS before performing operations.
  • Import Redshift data to Amazon S3.
  • Use OSSImport to import CSV data files from Amazon S3 to OSS.
  • On the destination ADB PG instance, create an object that corresponds to the source Redshift, including Schema, Table, View, and Function.
  • Use an OSS external table to import data to ADB PG.

The migration process is as follows:

2

Please visit our official documentation page to learn more about migrating data from Amazon Redshift to ApsaraDB AnalyticDB for PostgreSQL.

0 0 0
Share on

ApsaraDB

63 posts | 5 followers

You may also like

Comments