All Products
Search
Document Center

DataWorks:SQL Server

Last Updated:Mar 27, 2026

The SQL Server data source provides a bidirectional channel for reading from and writing to SQL Server databases in DataWorks Data Integration.

Supported capabilities

Capability Supported Notes
Offline (batch) read Yes Includes reading from views
Offline (batch) write Yes Insert mode only
Entire-database offline read Yes
Incremental sync via WHERE clause Yes Timestamp-based or auto-increment ID-based
Concurrent data extraction Yes Configured via splitPk and splitFactor
Custom SQL queries Yes Via querySql parameter

Supported versions

SQL Server Reader uses the com.microsoft.sqlserver sqljdbc4 4.0 driver. For driver details, see the Microsoft JDBC Driver support matrix.

Version Supported
SQL Server 2016 Yes
SQL Server 2014 Yes
SQL Server 2012 Yes
PDW 2008R2 AU34 Yes
SQL Server 2008 R2 Yes
SQL Server 2008 Yes
Azure SQL Database Yes
SQL Server 2019 No
SQL Server 2018 No
Azure SQL Managed Instance No
Azure Synapse Analytics No

Supported field types

For the complete SQL Server field type reference, see the SQL Server data types documentation. The following table lists supported types using SQL Server 2016 as the reference version.

SQL Server 2016 field type SQL Server Reader SQL Server Writer
bigint Supported Supported
bit Supported Supported
decimal Supported Supported
int Supported Supported
money Supported Supported
numeric Supported Supported
smallint Supported Supported
smallmoney Supported Supported
tinyint Supported Supported
float Supported Supported
real Supported Supported
date Supported Supported
datetime2 Supported Supported
datetime Supported Supported
smalldatetime Supported Supported
time Supported Supported
datetimeoffset Not supported Not supported
char Supported Supported
text Supported Supported
varchar Supported Supported
nchar Supported Supported
ntext Supported Supported
nvarchar Supported Supported
binary Supported Supported
image Supported Supported
varbinary Supported Supported
sql_variant Supported Supported
uniqueidentifier Supported Supported
xml Supported Supported
cursor Not supported Not supported
hierarchyid Not supported Not supported
rowversion Not supported Not supported
table Not supported Not supported
Spatial Geometry Types Not supported Not supported
Spatial Geography Types Not supported Not supported

The following table shows how SQL Server data types map to Data Integration internal type categories.

Internal type SQL Server data types
Integer BIGINT, INT, SMALLINT, TINYINT
Floating-point FLOAT, DECIMAL, REAL, NUMERIC
String CHAR, NCHAR, NTEXT, NVARCHAR, TEXT, VARCHAR, NVARCHAR(MAX), VARCHAR(MAX)
Date and time DATE, DATETIME, TIME
Boolean BIT
Binary BINARY, VARBINARY, VARBINARY(MAX), TIMESTAMP

Add a data source

Before developing a synchronization task, add the SQL Server data source to DataWorks. Follow the instructions in Data source management. Parameter descriptions are available directly in the DataWorks console when you add the data source.

Develop a synchronization task

Configure an offline synchronization task for a single table

Configure an offline read task for an entire database

For instructions, see Configure a real-time synchronization task for an entire database.

Choose an incremental sync method

SQL Server Reader uses SELECT...WHERE... statements for data extraction. Two approaches are available for incremental sync:

Method When to use Example WHERE clause
Timestamp-based Your table has a field (e.g., updated_at) populated on every insert, update, or logical delete updated_at > ${bizdate}
Auto-increment ID-based Your table uses an auto-incrementing primary key and you only need to capture new rows id > ${last_max_id}

If your table has no field that distinguishes new from modified records, incremental sync is not possible — use full data synchronization instead.

Usage notes

Active/standby lag

When SQL Server uses an active/standby disaster recovery architecture, the secondary database continuously restores data from the primary database using binary logs. Network latency or high write volume on the primary can cause the secondary to lag behind. Data synchronized from the secondary may not reflect the most current state of the primary.

Data consistency in concurrent mode

SQL Server is a Relational Database Management System (RDBMS) with strongly consistent read interfaces. In single-threaded mode, SQL Server Reader does not retrieve data from other in-flight transactions due to the database snapshot mechanism.

In concurrent mode, SQL Server Reader splits data by the splitPk field and starts multiple parallel tasks. These tasks run in separate transactions at different times, so the combined result is not a fully consistent snapshot. Two engineering workarounds are available:

  • Run the sync in single-threaded mode (no splitPk). This guarantees consistency but reduces throughput.

  • Quiesce data writers during the sync (for example, by locking tables or pausing replication to the secondary). This ensures a static dataset but may affect online services.

Encoding

SQL Server Reader uses Java Database Connectivity (JDBC) for data extraction. JDBC handles encoding detection and conversion automatically — no encoding configuration is required.

SQL security

The querySql parameter lets you pass arbitrary SELECT statements to the database. SQL Server Reader does not perform any validation or sanitization on these statements.

Appendix: Script demo and parameters

Reader script demo

{
    "type": "job",
    "version": "2.0",
    "steps": [
        {
            "stepType": "sqlserver",
            "parameter": {
                "datasource": "",        // Data source name
                "column": [              // Columns to read
                    "id",
                    "name"
                ],
                "where": "",             // Filter condition
                "splitPk": "",           // Column used for data sharding
                "table": ""              // Source table name
            },
            "name": "Reader",
            "category": "reader"
        },
        {
            "stepType": "stream",
            "parameter": {},
            "name": "Writer",
            "category": "writer"
        }
    ],
    "setting": {
        "errorLimit": {
            "record": "0"
        },
        "speed": {
            "throttle": true,
            "concurrent": 1,
            "mbps": "12"
        }
    },
    "order": {
        "hops": [
            {
                "from": "Reader",
                "to": "Writer"
            }
        ]
    }
}

To use querySql for a custom query (for example, reading name from dbo.test_table in the sql_server_source data source):

{
    "stepType": "sqlserver",
    "parameter": {
        "connection": [
            {
                "querySql": ["select name from dbo.test_table"],
                "datasource": "sql_server_source"
            }
        ],
        "datasource": "sql_server_source",
        "column": ["name"],
        "where": "",
        "splitPk": "id"
    },
    "name": "Reader",
    "category": "reader"
}

Reader parameters

Parameter Description Required Default
datasource The data source name. Must match the name of an added data source. Yes None
table The source table name. Each job supports one table. Yes None
column Columns to read, specified as a JSON array. Use ["*"] to read all columns. Supports column pruning, column reordering, and constant values. The array cannot be empty. Yes None
splitPk Column used to shard data for concurrent extraction. Use the primary key when possible — evenly distributed keys prevent data hotspots. Only integer columns are supported; specifying a string, float, or date column causes an error. No None
splitFactor Number of shards per concurrent thread. Total shards = concurrent × splitFactor. Recommended range: 1–100. Values that are too large may cause an out-of-memory (OOM) error. No 5
where Filter condition used to build the SQL statement. Leave blank to read all rows. Example: gmt_create > ${bizdate}. No None
querySql Custom SQL query that overrides table, column, and where. Use for multi-table joins or complex filters. Example: select a,b from table_a join table_b on table_a.id = table_b.id. When set, SQL Server Reader ignores column, table, and where. No None
fetchSize Number of records fetched per batch from the database. Affects network round trips and extraction performance. Values above 2,048 may cause OOM errors. No 1024
driverVersion SQL Server JDBC driver version. Set to 12.10 to use a version that supports Active Directory Service Principal authentication. No 4.0

SQL Server Reader assembles table, column, and where into a SQL statement and sends it to the database. When querySql is set, it is sent directly.

Writer script demo

{
    "type": "job",
    "version": "2.0",
    "steps": [
        {
            "stepType": "stream",
            "parameter": {},
            "name": "Reader",
            "category": "reader"
        },
        {
            "stepType": "sqlserver",
            "parameter": {
                "postSql": [],           // SQL statements run after the sync task
                "datasource": "",        // Data source name
                "column": [             // Destination columns
                    "id",
                    "name"
                ],
                "table": "",            // Destination table name
                "preSql": []            // SQL statements run before the sync task
            },
            "name": "Writer",
            "category": "writer"
        }
    ],
    "setting": {
        "errorLimit": {
            "record": "0"
        },
        "speed": {
            "throttle": true,
            "concurrent": 1,
            "mbps": "12"
        }
    },
    "order": {
        "hops": [
            {
                "from": "Reader",
                "to": "Writer"
            }
        ]
    }
}

Writer parameters

Parameter Description Required Default
datasource The data source name. Must match the name of an added data source. Yes None
table The destination table name. Yes None
column Destination columns to write, separated by commas. Example: ["id","name","age"]. Use ["*"] to write all columns in order. Yes None
preSql SQL statements executed before the sync task. The codeless UI supports one statement; the code editor supports multiple (for example, to clear old data). No None
postSql SQL statements executed after the sync task. The codeless UI supports one statement; the code editor supports multiple (for example, to add a timestamp). No None
writeMode Write mode. Only insert is supported. On primary key or unique index conflict, the row is treated as dirty data and the original row is retained. No insert
batchSize Number of records written per batch. Larger values reduce network round trips and improve throughput, but very large values may cause OOM errors. No 1,024
driverVersion SQL Server JDBC driver version. Set to 12.10 to use a version that supports Active Directory Service Principal authentication. No 4.0