Configure data synchronization to read from and write to GBase8a - DataWorks

The GBase8a data source lets you read data from and write data to GBase8a. This topic describes the data synchronization capabilities for GBase8a in DataWorks.

GBase8a Reader and GBase8a Writer support:

Reading from multiple tables in a single synchronization task
Filtering rows with WHERE conditions for incremental synchronization
Partitioning large tables by primary key for parallel reads
Writing data with pre- and post-execution SQL hooks

Limitations

GBase8a Reader and GBase8a Writer support Serverless resource groups (recommended) and exclusive resource groups for Data Integration.
When an INSERT INTO statement encounters a primary key or unique index conflict, the conflicting rows are not written.
Data can be written only to a destination table in the primary database.
The task requires at least the INSERT INTO permission. Additional permissions may be required for statements specified in preSql and postSql.
GBase8a Writer does not support the writeMode parameter.

Prerequisites

Add a GBase8a data source to DataWorks before developing a synchronization task. Follow the instructions in Data source management. Parameter descriptions are available in the DataWorks console when you add the data source.

Set up a synchronization task

Configure an offline synchronization task for a single table using either the codeless UI or the code editor:

Codeless UI: Configure in codeless UI
Code editor: Configure in code editor

For code editor parameter descriptions and script examples, see Appendix: Script examples and parameter descriptions.

Appendix: Script examples and parameter descriptions

The following scripts and parameter tables cover the settings specific to GBase8a Reader and GBase8a Writer. For the unified script format required by the code editor, see Configure a task in the code editor.

Reader script example

{
    "type": "job",
    "steps": [
        {
            "stepType": "gbase8a",
            "parameter": {
                "datasource": "",
                "username": "",
                "password": "",
                "where": "",
                "column": [
                    "id",
                    "name"
                ],
                "splitPk": "id",
                "connection": [
                    {
                        "table": [
                            "table"
                        ],
                        "datasource": ""
                    }
                ]
            },
            "name": "Reader",
            "category": "reader"
        },
        {
            "stepType": "stream",
            "parameter": {
                "print": false,
                "fieldDelimiter": ","
            },
            "name": "Writer",
            "category": "writer"
        }
    ],
    "version": "2.0",
    "order": {
        "hops": [
            {
                "from": "Reader",
                "to": "Writer"
            }
        ]
    },
    "setting": {
        "errorLimit": {
            "record": "0"
        },
        "speed": {
            "throttle": true,
            "concurrent": 1,
            "mbps": "12"
        }
    }
}

Reader parameters

Parameter	Description	Required	Default
`table`	The tables from which data is synchronized. Specify as a JSON array. Multiple tables can be read in parallel, but all tables must have the same schema. GBase8a Reader does not verify schema consistency across tables. The `table` parameter must be nested inside the `connection` configuration block.	Yes	None
`column`	The columns to synchronize. Specify as a JSON array. Use `["*"]` to select all columns. Supports column pruning (select specific columns), column reordering (export in a different order from the schema), constant values (e.g., `'123'`), and function columns (e.g., `date('now')`). Cannot be blank.	Yes	None
`datasource`	The name of the GBase8a data source added in DataWorks.	No	None
`splitPk`	The column used to partition data for parallel reads. Use an integer primary key for even data distribution and to avoid data hotspots. Supports integer types only — strings, floating-point numbers, and dates are not supported and cause the setting to be ignored, falling back to single-channel read. Leave blank to disable partitioning.	No	Blank
`where`	A filter condition appended to the SQL query. GBase8a Reader builds a query from `column`, `table`, and `where` to extract data. Use `where` for incremental synchronization — for example, set it to `gmt_create>$bizdate` to sync the current day's data. If left blank, a full data synchronization is performed.	No	None
`querySql`	A custom SQL query that overrides `table`, `column`, `where`, and `splitPk`. Use this when `where` alone cannot express the required filter logic. When `querySql` is set, GBase8a Reader ignores the `table`, `column`, `where`, and `splitPk` parameters.	No	None
`fetchSize`	The number of records fetched from the database per batch. A larger value reduces network round trips and improves read throughput. Note Values greater than 2048 may cause an out-of-memory (OOM) error during synchronization.	No	1,024

Writer script example

{
    "type": "job",
    "version": "2.0",
    "steps": [
        {
            "stepType": "stream",
            "parameter": {},
            "name": "Reader",
            "category": "reader"
        },
        {
            "stepType": "gbase8a",
            "parameter": {
                "datasource": "Data source name",
                "username": "",
                "password": "",
                "column": [
                    "id",
                    "name"
                ],
                "connection": [
                    {
                        "table": [
                            "Gbase8a_table"
                        ],
                        "datasource": ""
                    }
                ],
                "preSql": [
                    "delete from @table where db_id = -1"
                ],
                "postSql": [
                    "update @table set db_modify_time = now() where db_id = 1"
                ]
            },
            "name": "Writer",
            "category": "writer"
        }
    ],
    "setting": {
        "errorLimit": {
            "record": "0"
        },
        "speed": {
            "throttle": true,
            "concurrent": 1,
            "mbps": "12"
        }
    },
    "order": {
        "hops": [
            {
                "from": "Reader",
                "to": "Writer"
            }
        ]
    }
}

Writer parameters

Parameter	Description	Required	Default
`datasource`	The name of the data source added in DataWorks. Must match the name of the added data source exactly.	Yes	None
`table`	The destination table for data writes. Specify as a JSON array. The `table` parameter must be nested inside the `connection` configuration block.	Yes	None
`column`	The destination columns to write to. Separate multiple columns with commas — for example, `["id", "name", "age"]`. Cannot be blank.	Yes	None
`preSql`	A SQL statement to run before the data write. Use `@table` as a placeholder for the destination table name — the system substitutes the actual table name at runtime.	No	None
`postSql`	A SQL statement to run after the data write completes.	No	None
`batchSize`	The number of records submitted per batch. A larger value reduces network round trips and improves write throughput. Excessively large values may cause an out-of-memory (OOM) error during synchronization.	No	1,024