Sync DM Dameng Data to DataWorks via Offline Pipelines - DataWorks

Capabilities

Capability	DM Reader	DM Writer
Offline (batch) sync	Yes	Yes
Read from views	Yes	—
Column pruning	Yes	—
Column reordering	Yes	—
Parallel read (`splitPk`)	Yes (integer columns only)	—
Incremental sync (`where`)	Yes	—
Pre SQL execution	Yes	Yes
Post SQL execution	—	Yes
Serverless resource groups	Yes	Yes
Exclusive resource groups for Data Integration	Yes	Yes

Supported field types

DM Reader and DM Writer support most common relational database data types. The following table lists the DM data types that DM Reader can convert. Unsupported types cause a read error—verify your schema before configuring a sync task.

Category	DM data types
Integer	INT, TINYINT, SMALLINT, BIGINT
Floating-point	REAL, FLOAT, DOUBLE, NUMBER, DECIMAL
String	CHAR, VARCHAR, LONGVARCHAR, TEXT
Date and time	DATE, DATETIME, TIMESTAMP, TIME
Boolean	BIT
Binary	BINARY, VARBINARY, BLOB

Configure a sync task

Single-table offline sync

Configure a task using either the codeless UI or the code editor:

Codeless UI: Configure a task in the codeless UI
Code editor: Configure a task in the code editor

For the full script reference and parameter descriptions, see Script reference.

Entire-database offline sync

See Configure a real-time sync task for an entire database.

Script reference

When configuring a batch synchronization task in the code editor, use the unified script format. The following sections cover the DM-specific parameters for Reader and Writer.

Reader

Script example

{
  "type": "job",
  "version": "2.0",
  "order": {
    "hops": [
      { "from": "Reader", "to": "Writer" }
    ]
  },
  "setting": {
    "errorLimit": { "record": "0" },
    "speed": {
      "throttle": true,
      "concurrent": 1,
      "mbps": "12"
    }
  },
  "steps": [
    {
      "category": "reader",
      "name": "Reader",
      "stepType": "dm",
      "parameter": {
        "datasource": "dm_datasource",
        "table": "table",
        "column": ["*"],
        "preSql": ["delete from XXX;"],
        "fetchSize": 2048
      }
    },
    {
      "category": "writer",
      "name": "Writer",
      "stepType": "stream",
      "parameter": {}
    }
  ]
}

Reader parameters

Parameter	Required	Default	Description
`datasource`	Yes	—	Name of the DM data source. See Configure a DM data source.
`table`	Yes	—	Table to read data from.
`column`	Yes	—	Columns to sync, as a JSON array. Use `["*"]` for all columns. Supports column pruning, column reordering, and constants (integer, string, null, function expression, floating-point, Boolean).
`splitPk`	No	Empty	Column used to split data for parallel reads. Set this to the primary key for even distribution and to avoid data hot spots. Only integer columns are supported—floating-point, string, date, and other types cause an error. If not set, the table is read with a single channel.
`where`	No	—	SQL filter condition appended to the query—for example, `gmt_create>$bizdate` for incremental sync, or `limit 10` for testing. If not set, all rows are read.
`querySql`	No	—	Custom SQL query—for example, `select a,b from table_a join table_b on table_a.id = table_b.id`. When set, the `column`, `table`, and `where` parameters are ignored.
`fetchSize`	No	1,024	Number of rows fetched per batch. Higher values reduce network round-trips and improve throughput. Values above 2,048 may cause an out-of-memory (OOM) error.
`preSql`	No	—	SQL statement executed before the sync task starts. Only one statement is supported.

Writer

Script example

{
  "type": "job",
  "version": "2.0",
  "order": {
    "hops": [
      { "from": "Reader", "to": "Writer" }
    ]
  },
  "setting": {
    "errorLimit": { "record": "" },
    "speed": {
      "throttle": true,
      "concurrent": 2,
      "mbps": "12"
    }
  },
  "steps": [
    {
      "category": "reader",
      "name": "Reader",
      "stepType": "oracle",
      "parameter": {
        "datasource": "aaa",
        "column": ["PROD_ID", "name"],
        "where": "",
        "splitPk": "",
        "encoding": "UTF-8",
        "table": "PENGXI.SALES"
      }
    },
    {
      "category": "writer",
      "name": "Writer",
      "stepType": "dm",
      "parameter": {
        "datasource": "dm_datasource",
        "table": "table",
        "column": ["id", "name"],
        "preSql": ["delete from XXX;"]
      }
    }
  ]
}

Writer parameters

Parameter	Required	Default	Description
`datasource`	Yes	—	Name of the DM data source. See Configure a DM data source.
`table`	Yes	—	Destination table. If the table's schema differs from the username in the data source configuration, use the `schema.table` format.
`column`	Yes	—	Destination columns to write to, separated by commas. Do not use the default column settings.
`preSql`	No	—	SQL statement executed before the sync task starts—for example, to purge old data. Only one statement is supported; multiple statements disable transaction support.
`postSql`	No	—	SQL statement executed after the sync task completes—for example, to add a timestamp. Only one statement is supported; multiple statements disable transaction support.
`batchSize`	No	1,024	Number of rows written per batch. Higher values reduce network interactions and improve throughput. Values that are too large may cause an OOM error.

DataWorks:DM (Dameng)

Capabilities

Supported field types

Configure a sync task

Single-table offline sync

Entire-database offline sync

Script reference

Reader

Script example

Reader parameters

Writer

Script example

Writer parameters

What's next