Migrate from Teradata to AnalyticDB for PostgreSQL by converting syntax - AnalyticDB - Alibaba Cloud - AnalyticDB

Migration requirements

A successful migration from Teradata to AnalyticDB for PostgreSQL is designed to meet the following goals:

Perform a complete migration for the underlying data platform of the data warehouse.
Perform smooth migration for deployed applications on the data warehouse system.
Perform migration of which business users are not aware for business platforms and data.
Ensure good performance of the data warehouse after migration.
Set favorable migration duration and migration plans.
Fully reuse the original system architecture, ETL process, data structure, and management tools.

Migration scope

A complete Teradata migration involves four areas:

Historical data: Export data as text files using the specified delimiter and character encoding, store them on Elastic Compute Service (ECS) instance local disks or in Object Storage Service (OSS) buckets on the same network as the AnalyticDB for PostgreSQL instance, and load the files through OSS external tables over the gpfdist protocol. Export DDL scripts from Teradata and batch-convert them using AnalyticDB for PostgreSQL syntax.
Extract-Transform-Load (ETL) processes: Convert ETL statements to AnalyticDB for PostgreSQL Data Manipulation Language (DML) syntax using the script-based conversion tools. Replace Teradata functions with their AnalyticDB for PostgreSQL equivalents based on the function mapping table in this document. Reconfigure and restart ETL jobs after historical data migration completes.
API connections: AnalyticDB for PostgreSQL supports JDBC and ODBC protocols, so Business Intelligence (BI) frontend tools connect the same way. Update the instance IP address to complete the API migration.
Management tools: Deploy backup and recovery tools for the AnalyticDB for PostgreSQL instance and schedule regular recovery drills.

Data type mapping

Most core data types map directly between Teradata and AnalyticDB for PostgreSQL. The script-based Data Definition Language (DDL) conversion tools handle batch conversion automatically.

Teradata	AnalyticDB for PostgreSQL	Notes
CHAR	CHAR	Direct mapping
VARCHAR	VARCHAR	Direct mapping
LONG VARCHAR	VARCHAR(64000)	Fixed maximum length
VARBYTE(size)	BYTEA	Binary data type
BYTEINT	BYTEA	Binary data type
SMALLINT	SMALLINT	Direct mapping
INTEGER	INTEGER	Direct mapping
DECIMAL(size,dec)	DECIMAL(size,dec)	Direct mapping
NUMERIC(precision,dec)	NUMERIC(precision,dec)	Direct mapping
FLOAT	FLOAT	Direct mapping
REAL	REAL	Direct mapping
DOUBLE PRECISION	DOUBLE PRECISION	Direct mapping
DATE	DATE	Direct mapping
TIME	TIME	Direct mapping
TIMESTAMP	TIMESTAMP	Direct mapping

Table creation statements

The following examples show how a Teradata DDL statement maps to AnalyticDB for PostgreSQL. Key differences are annotated in the converted statement.

Teradata:

CREATE MULTISET TABLE test_table,NO FALLBACK ,
     NO BEFORE JOURNAL,
     NO AFTER JOURNAL,
     CHECKSUM = DEFAULT,
     DEFAULT MERGEBLOCKRATIO
     (
      first_column DATE FORMAT 'YYYYMMDD' TITLE 'COLUMN 1' NOT NULL,
      second_column INTEGER TITLE 'COLUMN 2' NOT NULL ,
      third_column CHAR(6) CHARACTER SET LATIN CASESPECIFIC TITLE 'COLUMN 3' NOT NULL ,
      fourth_column CHAR(20) CHARACTER SET LATIN CASESPECIFIC TITLE 'COLUMN 4' NOT NULL,
      fifth_column CHAR(1) CHARACTER SET LATIN CASESPECIFIC TITLE 'COLUMN 5' NOT NULL,
      sixth_column CHAR(24) CHARACTER SET LATIN CASESPECIFIC TITLE 'COLUMN 6' NOT NULL,
      seventh_column VARCHAR(18) CHARACTER SET LATIN CASESPECIFIC TITLE 'COLUMN 7' NOT NULL,
      eighth_column DECIMAL(18,0) TITLE 'COLUMN 8' NOT NULL ,
      nineth_column DECIMAL(18,6) TITLE 'COLUMN 9' NOT NULL )
PRIMARY INDEX ( first_column ,fourth_column )
PARTITION BY RANGE_N(first_column  BETWEEN DATE '1999-01-01' AND DATE '2050-12-31' EACH INTERVAL '1' DAY );

CREATE INDEX test_index (first_column, fourth_column) ON test_table;

AnalyticDB for PostgreSQL:

-- MULTISET TABLE, NO FALLBACK, NO BEFORE JOURNAL, NO AFTER JOURNAL,
-- CHECKSUM, DEFAULT MERGEBLOCKRATIO: Teradata-specific clauses removed
-- FORMAT 'YYYYMMDD', TITLE, CHARACTER SET, CASESPECIFIC: column attributes removed
CREATE TABLE test_table
     (
      first_column DATE NOT NULL,           -- FORMAT 'YYYYMMDD' removed; use SET client_encoding for encoding
      second_column INTEGER NOT NULL ,
      third_column CHAR(6) NOT NULL ,
      fourth_column CHAR(20) NOT NULL,
      fifth_column CHAR(1) NOT NULL,
      sixth_column CHAR(24) NOT NULL,
      seventh_column VARCHAR(18) NOT NULL,
      eighth_column DECIMAL(18,0) NOT NULL ,
      nineth_column DECIMAL(18,6) NOT NULL )
DISTRIBUTED BY ( first_column ,fourth_column )   -- PRIMARY INDEX -> DISTRIBUTED BY
PARTITION BY RANGE(first_column)                  -- RANGE_N syntax converted to standard RANGE
(START (DATE '1999-01-01')  INCLUSIVE
END (DATE '2050-12-31')  INCLUSIVE
EVERY (INTERVAL '1 DAY' ) );

create index test_index on test_table(first_column, fourth_column);  -- index syntax unchanged

DDL differences summary

Area	Teradata	AnalyticDB for PostgreSQL
Distribution key	`PRIMARY INDEX (col1, col2)`	`DISTRIBUTED BY (col1, col2)`
Partitioning	`PARTITION BY RANGE_N(...)`	`PARTITION BY RANGE(...)` with `START`/`END`/`EVERY`
Index creation	`CREATE INDEX idx (cols) ON table`	`CREATE INDEX idx ON table(cols)`
Column title	`TITLE 'label'`	`COMMENT ON COLUMN table.col IS 'label';`
Column encoding	`CHARACTER SET LATIN CASESPECIFIC`	`SET client_encoding = latin1;` (session-level)
Table attributes	`MULTISET`, `NO FALLBACK`, `CHECKSUM`, etc.	Not supported; remove these clauses

Data import and export formats

Both Teradata and AnalyticDB for PostgreSQL support TXT and CSV file formats. The key difference is the separator:

Teradata: Two-character separators
AnalyticDB for PostgreSQL: Single-character separators

Convert separators in your data files before loading.

SQL statement differences

AnalyticDB for PostgreSQL supports most Teradata SQL syntax. The following statements require conversion.

CAST

Teradata CAST statements support format specifiers; AnalyticDB for PostgreSQL does not.

Statement	Teradata	AnalyticDB for PostgreSQL
Integer cast	`CAST(x AS INT FORMAT '999999')`	Write a custom function for replacement
Date cast	`CAST(x AS DATE FORMAT 'YYYYMMDD')`	`CAST(x AS DATE)` — no change needed; `YYYY-MM-DD` is supported natively

QUALIFY

Teradata's QUALIFY keyword filters window function results inline. AnalyticDB for PostgreSQL does not support QUALIFY; rewrite as a subquery.

Teradata:

SELECT itemid, sumprice, RANK() OVER (ORDER BY sumprice DESC)
     FROM (SELECT a1.item_id, SUM(a1.sale)
           FROM sales AS a1
           GROUP BY a1.itemID) AS t1 (itemid, sumprice)
     QUALIFY RANK() OVER (ORDER BY sum_price DESC) <=100;

AnalyticDB for PostgreSQL:

-- Wrap the window function in a subquery, then filter in the outer WHERE clause
SELECT itemid, sumprice, rank FROM
(SELECT itemid, sumprice, RANK() OVER (ORDER BY sumprice DESC) AS rank
     FROM (SELECT a1.item_id, SUM(a1.sale)
           FROM sales AS a1
           GROUP BY a1.itemID) AS t1 (itemid, sumprice)
)  AS a
WHERE rank <=100;

MACRO

Teradata macros execute a group of SQL statements as a named unit. AnalyticDB for PostgreSQL does not support macros; use a FUNCTION instead.

Teradata:

CREATE MACRO Get_Emp_Salary(EmployeeNo INTEGER) AS (
   SELECT
   EmployeeNo,
   NetPay
   FROM
   Salary
   WHERE EmployeeNo = :EmployeeNo;
);

AnalyticDB for PostgreSQL:

CREATE OR REPLACE FUNCTION Get_Emp_Salary(
        EmployeeNo INTEGER,
        OUT EmployeeNo INTEGER,
        OUT NetPay FLOAT
) RETURNS SETOF RECORD AS
$$
        SELECT EmployeeNo, NetPay
        FROM Salary
        WHERE EmployeeNo = $1
$$
 LANGUAGE SQL;

Function mapping

The following table maps Teradata functions to their AnalyticDB for PostgreSQL equivalents. For functions with no direct equivalent, the table provides the SQL expression to use.

Teradata	AnalyticDB for PostgreSQL	Description
`ZEROIFNULL(x)`	`COALESCE(x, 0)`	Converts NULL to 0 for cumulative calculations
`NULLIFZERO(x)`	`COALESCE`	Replaces 0 values with NULL for cumulative data
`INDEX(str, substr)`	`POSITION(substr IN str)`	Returns the position of a substring in a string
`ADD_MONTHS(date, n)`	`TO_DATE`	Adds or subtracts a specified number of months to or from the input date
`FORMAT`	`TO_CHAR` / `TO_DATE`	Formats data as a string or date
`CSUM(col, order_col)`	Subquery	Returns the cumulative sum of a value expression for each row in a partition
`MAVG(col, n, order_col)`	Subquery	Computes the moving average of a specified column based on a defined number of rows (query width)
`MSUM(col, n, order_col)`	Subquery	Computes the moving sum of a specified column based on a defined query width
`MDIFF(col, n, order_col)`	Subquery	Computes the moving difference of a specified column based on a defined query width
`QUALIFY`	Subquery with `WHERE` on window function result	Filters window function results; see the QUALIFY section above
`CHAR(str)` / `CHARACTERS(str)`	`LENGTH(str)`	Returns the number of characters in a string