AnalyticDB for PostgreSQL data source configuration - DataWorks

The AnalyticDB for PostgreSQL data source lets you read data from and write data to AnalyticDB for PostgreSQL in DataWorks offline synchronization tasks.

What this connector supports:

Read from tables and views (offline synchronization)
Write data using insert, copy, or upsert mode
Concurrent reads using a split key for improved throughput
Incremental reads using a where filter or custom SQL

Supported versions

Versions up to and including 7.0 are supported.

Supported field types

Offline read

AnalyticDB for PostgreSQL Reader supports most, but not all, AnalyticDB for PostgreSQL data types. Before you proceed, verify that your data types are supported. Types that require explicit conversion are noted in the table.

Category	AnalyticDB for PostgreSQL data type	Notes
Integer types	BIGINT, BIGSERIAL, INTEGER, SMALLINT, SERIAL, GEOMETRY	—
Floating-point types	DOUBLE, PRECISION, MONEY, NUMERIC, REAL	MONEY requires explicit type conversion. Use `a_inet::varchar` syntax.
String types	VARCHAR, CHAR, TEXT, BIT, INET	BIT and INET require explicit type conversion. Use `a_inet::varchar` syntax.
Date and time types	DATE, TIME, TIMESTAMP	—
Boolean type	BOOL	—
Binary types	BYTEA	—

Offline write

AnalyticDB for PostgreSQL Writer supports most, but not all, AnalyticDB for PostgreSQL data types. Before you proceed, verify that your data types are supported. Types that require explicit conversion are noted in the table.

Category	AnalyticDB for PostgreSQL data type	Notes
LONG	BIGINT, BIGSERIAL, INTEGER, SMALLINT, SERIAL	—
DOUBLE	DOUBLE, PRECISION, MONEY, NUMERIC, REAL	MONEY requires explicit type conversion. Use `a_inet::varchar` syntax.
STRING	VARCHAR, CHAR, TEXT, BIT, INET, GEOMETRY	BIT and INET require explicit type conversion. Use `a_inet::varchar` syntax.
DATE	DATE, TIME, TIMESTAMP	—
BOOLEAN	BOOL	—
BYTES	BYTEA	—

Limits

Offline synchronization tasks support reading data from views.

Prerequisites

Before you begin, ensure that you have:

Added the AnalyticDB for PostgreSQL data source to DataWorks. For instructions, see Data source management. Parameter descriptions are available in the DataWorks console when you add the data source.

Configure a data synchronization task

Choose a configuration guide based on your task type.

Offline sync task for a single table

Use the codeless UI or code editor to configure the task.
For all parameters and a script demo for the code editor, see Script demo and parameter descriptions.

Offline sync task for an entire database

See Configure a real-time sync task for an entire database.

Script demo and parameter descriptions

Reader script demo

{
    "type": "job",
    "steps": [
        {
            "parameter": {
                "datasource": "test_004",
                "column": [
                    "id",
                    "name",
                    "sex",
                    "salary",
                    "age"
                ],
                "where": "id=1001",
                "splitPk": "id",
                "table": "public.person"
            },
            "name": "Reader",
            "category": "reader"
        },
        {
            "parameter": {},
            "name": "Writer",
            "category": "writer"
        }
    ],
    "version": "2.0",
    "order": {
        "hops": [
            {
                "from": "Reader",
                "to": "Writer"
            }
        ]
    },
    "setting": {
        "errorLimit": {
            "record": ""
        },
        "speed": {
            "concurrent": 6,
            "throttle": true,
            "mbps": "12"
        }
    }
}

Reader parameters

Parameter	Description	Required	Default
`datasource`	The name of the data source. The value must match the name of the data source added in DataWorks.	Yes	None
`table`	The name of the table to read data from.	Yes	None
`column`	The columns to synchronize, specified as a JSON array. To synchronize all columns, use `["*"]`. Supports column pruning, column reordering, constants, and functions. Example: `["id", "table", "1", "'mingya.wmy'", "to_char(a+1)", "2.3", "true"]`.	Yes	None
`splitPk`	The split key for concurrent reads. Data Integration partitions data by this column and runs concurrent tasks to improve performance. Use the table's primary key for even distribution. Supports integer columns only — strings, floating-point numbers, and dates are not supported. If left blank, data is read in a single channel.	No	None
`where`	A filter condition used to construct the SQL `WHERE` clause. Use this for incremental synchronization, for example: `id>2 and sex=1`. If left blank, all rows are read.	No	None
`querySql`	A custom SQL query that overrides `column`, `table`, and `where`. Use this for complex scenarios such as multi-table joins, for example: `select a,b from table_a join table_b on table_a.id = table_b.id`. Available in the code editor only.	No	None
`fetchSize`	The number of records fetched per batch from the database. A larger value reduces network round trips and improves throughput. Values greater than 2048 may cause an out-of-memory (OOM) error in the Data Integration process.	No	512

Writer script demo

{
    "type": "job",
    "steps": [
        {
            "parameter": {},
            "name": "Reader",
            "category": "reader"
        },
        {
            "parameter": {
                "postSql": [],
                "datasource": "test_004",
                "column": [
                    "id",
                    "name",
                    "sex",
                    "salary",
                    "age"
                ],
                "table": "public.person",
                "preSql": []
            },
            "name": "Writer",
            "category": "writer"
        }
    ],
    "version": "2.0",
    "order": {
        "hops": [
            {
                "from": "Reader",
                "to": "Writer"
            }
        ]
    },
    "setting": {
        "errorLimit": {
            "record": ""
        },
        "speed": {
            "throttle": true,
            "concurrent": 6,
            "mbps": "12"
        }
    }
}

Writer parameters

Parameter	Description	Required	Default
`datasource`	The name of the data source. The value must match the name of the data source added in DataWorks.	Yes	None
`table`	The name of the table to write data to.	Yes	None
`writeMode`	The write mode. See the write mode comparison table below.	No	`insert`
`conflictMode`	The conflict resolution policy when `writeMode` is `upsert` and a primary key or unique index conflict occurs. Valid values: `replace` (new data overwrites existing data) and `ignore` (new data is discarded). Configurable in the code editor only.	No	`replace`
`column`	The target columns to write data to, specified as a JSON array. To write to all columns in order, use `["*"]`. Example: `["id", "name", "age"]`.	Yes	None
`preSql`	SQL statements to run before the synchronization task starts, such as truncating the target table. The codeless UI supports one statement; the code editor supports multiple.	No	None
`postSql`	SQL statements to run after the synchronization task completes, such as adding a timestamp column. The codeless UI supports one statement; the code editor supports multiple.	No	None
`batchSize`	The number of records written per batch. A larger value reduces network round trips and improves throughput, but values that are too large may cause an OOM error in the Data Integration process.	No	1,024

Write mode comparison

Write mode	How it works	Use when
`insert`	Runs `INSERT INTO ... VALUES ...` to write rows	No conflicts are expected; recommended for most cases
`copy`	Uses the PostgreSQL `COPY FROM` command to bulk-load data	Performance is a concern and you need higher write throughput
`upsert`	Handles conflicts based on `conflictMode`	Data may already exist and you need to update or ignore duplicates