The GBase8a data source lets you read data from and write data to GBase8a. This topic describes the data synchronization capabilities for GBase8a in DataWorks.
GBase8a Reader and GBase8a Writer support:
-
Reading from multiple tables in a single synchronization task
-
Filtering rows with WHERE conditions for incremental synchronization
-
Partitioning large tables by primary key for parallel reads
-
Writing data with pre- and post-execution SQL hooks
Limitations
-
GBase8a Reader and GBase8a Writer support Serverless resource groups (recommended) and exclusive resource groups for Data Integration.
-
When an INSERT INTO statement encounters a primary key or unique index conflict, the conflicting rows are not written.
-
Data can be written only to a destination table in the primary database.
-
The task requires at least the INSERT INTO permission. Additional permissions may be required for statements specified in
preSqlandpostSql. -
GBase8a Writer does not support the
writeModeparameter.
Prerequisites
Add a GBase8a data source to DataWorks before developing a synchronization task. Follow the instructions in Data source management. Parameter descriptions are available in the DataWorks console when you add the data source.
Set up a synchronization task
Configure an offline synchronization task for a single table using either the codeless UI or the code editor:
-
Codeless UI: Configure in codeless UI
-
Code editor: Configure in code editor
For code editor parameter descriptions and script examples, see Appendix: Script examples and parameter descriptions.
Appendix: Script examples and parameter descriptions
The following scripts and parameter tables cover the settings specific to GBase8a Reader and GBase8a Writer. For the unified script format required by the code editor, see Configure a task in the code editor.
Reader script example
{
"type": "job",
"steps": [
{
"stepType": "gbase8a",
"parameter": {
"datasource": "",
"username": "",
"password": "",
"where": "",
"column": [
"id",
"name"
],
"splitPk": "id",
"connection": [
{
"table": [
"table"
],
"datasource": ""
}
]
},
"name": "Reader",
"category": "reader"
},
{
"stepType": "stream",
"parameter": {
"print": false,
"fieldDelimiter": ","
},
"name": "Writer",
"category": "writer"
}
],
"version": "2.0",
"order": {
"hops": [
{
"from": "Reader",
"to": "Writer"
}
]
},
"setting": {
"errorLimit": {
"record": "0"
},
"speed": {
"throttle": true,
"concurrent": 1,
"mbps": "12"
}
}
}
Reader parameters
| Parameter | Description | Required | Default |
|---|---|---|---|
table |
The tables from which data is synchronized. Specify as a JSON array. Multiple tables can be read in parallel, but all tables must have the same schema. GBase8a Reader does not verify schema consistency across tables. The table parameter must be nested inside the connection configuration block. |
Yes | None |
column |
The columns to synchronize. Specify as a JSON array. Use ["*"] to select all columns. Supports column pruning (select specific columns), column reordering (export in a different order from the schema), constant values (e.g., '123'), and function columns (e.g., date('now')). Cannot be blank. |
Yes | None |
datasource |
The name of the GBase8a data source added in DataWorks. | No | None |
splitPk |
The column used to partition data for parallel reads. Use an integer primary key for even data distribution and to avoid data hotspots. Supports integer types only — strings, floating-point numbers, and dates are not supported and cause the setting to be ignored, falling back to single-channel read. Leave blank to disable partitioning. | No | Blank |
where |
A filter condition appended to the SQL query. GBase8a Reader builds a query from column, table, and where to extract data. Use where for incremental synchronization — for example, set it to gmt_create>$bizdate to sync the current day's data. If left blank, a full data synchronization is performed. |
No | None |
querySql |
A custom SQL query that overrides table, column, where, and splitPk. Use this when where alone cannot express the required filter logic. When querySql is set, GBase8a Reader ignores the table, column, where, and splitPk parameters. |
No | None |
fetchSize |
The number of records fetched from the database per batch. A larger value reduces network round trips and improves read throughput. Note
Values greater than 2048 may cause an out-of-memory (OOM) error during synchronization. |
No | 1,024 |
Writer script example
{
"type": "job",
"version": "2.0",
"steps": [
{
"stepType": "stream",
"parameter": {},
"name": "Reader",
"category": "reader"
},
{
"stepType": "gbase8a",
"parameter": {
"datasource": "Data source name",
"username": "",
"password": "",
"column": [
"id",
"name"
],
"connection": [
{
"table": [
"Gbase8a_table"
],
"datasource": ""
}
],
"preSql": [
"delete from @table where db_id = -1"
],
"postSql": [
"update @table set db_modify_time = now() where db_id = 1"
]
},
"name": "Writer",
"category": "writer"
}
],
"setting": {
"errorLimit": {
"record": "0"
},
"speed": {
"throttle": true,
"concurrent": 1,
"mbps": "12"
}
},
"order": {
"hops": [
{
"from": "Reader",
"to": "Writer"
}
]
}
}
Writer parameters
| Parameter | Description | Required | Default |
|---|---|---|---|
datasource |
The name of the data source added in DataWorks. Must match the name of the added data source exactly. | Yes | None |
table |
The destination table for data writes. Specify as a JSON array. The table parameter must be nested inside the connection configuration block. |
Yes | None |
column |
The destination columns to write to. Separate multiple columns with commas — for example, ["id", "name", "age"]. Cannot be blank. |
Yes | None |
preSql |
A SQL statement to run before the data write. Use @table as a placeholder for the destination table name — the system substitutes the actual table name at runtime. |
No | None |
postSql |
A SQL statement to run after the data write completes. | No | None |
batchSize |
The number of records submitted per batch. A larger value reduces network round trips and improves write throughput. Excessively large values may cause an out-of-memory (OOM) error during synchronization. | No | 1,024 |