The SQL Server data source provides a bidirectional channel for reading from and writing to SQL Server databases in DataWorks Data Integration.
Supported capabilities
| Capability | Supported | Notes |
|---|---|---|
| Offline (batch) read | Yes | Includes reading from views |
| Offline (batch) write | Yes | Insert mode only |
| Entire-database offline read | Yes | |
| Incremental sync via WHERE clause | Yes | Timestamp-based or auto-increment ID-based |
| Concurrent data extraction | Yes | Configured via splitPk and splitFactor |
| Custom SQL queries | Yes | Via querySql parameter |
Supported versions
SQL Server Reader uses the com.microsoft.sqlserver sqljdbc4 4.0 driver. For driver details, see the Microsoft JDBC Driver support matrix.
| Version | Supported |
|---|---|
| SQL Server 2016 | Yes |
| SQL Server 2014 | Yes |
| SQL Server 2012 | Yes |
| PDW 2008R2 AU34 | Yes |
| SQL Server 2008 R2 | Yes |
| SQL Server 2008 | Yes |
| Azure SQL Database | Yes |
| SQL Server 2019 | No |
| SQL Server 2018 | No |
| Azure SQL Managed Instance | No |
| Azure Synapse Analytics | No |
Supported field types
For the complete SQL Server field type reference, see the SQL Server data types documentation. The following table lists supported types using SQL Server 2016 as the reference version.
| SQL Server 2016 field type | SQL Server Reader | SQL Server Writer |
|---|---|---|
| bigint | Supported | Supported |
| bit | Supported | Supported |
| decimal | Supported | Supported |
| int | Supported | Supported |
| money | Supported | Supported |
| numeric | Supported | Supported |
| smallint | Supported | Supported |
| smallmoney | Supported | Supported |
| tinyint | Supported | Supported |
| float | Supported | Supported |
| real | Supported | Supported |
| date | Supported | Supported |
| datetime2 | Supported | Supported |
| datetime | Supported | Supported |
| smalldatetime | Supported | Supported |
| time | Supported | Supported |
| datetimeoffset | Not supported | Not supported |
| char | Supported | Supported |
| text | Supported | Supported |
| varchar | Supported | Supported |
| nchar | Supported | Supported |
| ntext | Supported | Supported |
| nvarchar | Supported | Supported |
| binary | Supported | Supported |
| image | Supported | Supported |
| varbinary | Supported | Supported |
| sql_variant | Supported | Supported |
| uniqueidentifier | Supported | Supported |
| xml | Supported | Supported |
| cursor | Not supported | Not supported |
| hierarchyid | Not supported | Not supported |
| rowversion | Not supported | Not supported |
| table | Not supported | Not supported |
| Spatial Geometry Types | Not supported | Not supported |
| Spatial Geography Types | Not supported | Not supported |
The following table shows how SQL Server data types map to Data Integration internal type categories.
| Internal type | SQL Server data types |
|---|---|
| Integer | BIGINT, INT, SMALLINT, TINYINT |
| Floating-point | FLOAT, DECIMAL, REAL, NUMERIC |
| String | CHAR, NCHAR, NTEXT, NVARCHAR, TEXT, VARCHAR, NVARCHAR(MAX), VARCHAR(MAX) |
| Date and time | DATE, DATETIME, TIME |
| Boolean | BIT |
| Binary | BINARY, VARBINARY, VARBINARY(MAX), TIMESTAMP |
Add a data source
Before developing a synchronization task, add the SQL Server data source to DataWorks. Follow the instructions in Data source management. Parameter descriptions are available directly in the DataWorks console when you add the data source.
Develop a synchronization task
Configure an offline synchronization task for a single table
-
To configure the task using the codeless UI or code editor, see Configure a task in the codeless UI and Configure a task in the code editor.
-
For a complete parameter reference and script examples, see Appendix: Script demo and parameters.
Configure an offline read task for an entire database
For instructions, see Configure a real-time synchronization task for an entire database.
Choose an incremental sync method
SQL Server Reader uses SELECT...WHERE... statements for data extraction. Two approaches are available for incremental sync:
| Method | When to use | Example WHERE clause |
|---|---|---|
| Timestamp-based | Your table has a field (e.g., updated_at) populated on every insert, update, or logical delete |
updated_at > ${bizdate} |
| Auto-increment ID-based | Your table uses an auto-incrementing primary key and you only need to capture new rows | id > ${last_max_id} |
If your table has no field that distinguishes new from modified records, incremental sync is not possible — use full data synchronization instead.
Usage notes
Active/standby lag
When SQL Server uses an active/standby disaster recovery architecture, the secondary database continuously restores data from the primary database using binary logs. Network latency or high write volume on the primary can cause the secondary to lag behind. Data synchronized from the secondary may not reflect the most current state of the primary.
Data consistency in concurrent mode
SQL Server is a Relational Database Management System (RDBMS) with strongly consistent read interfaces. In single-threaded mode, SQL Server Reader does not retrieve data from other in-flight transactions due to the database snapshot mechanism.
In concurrent mode, SQL Server Reader splits data by the splitPk field and starts multiple parallel tasks. These tasks run in separate transactions at different times, so the combined result is not a fully consistent snapshot. Two engineering workarounds are available:
-
Run the sync in single-threaded mode (no
splitPk). This guarantees consistency but reduces throughput. -
Quiesce data writers during the sync (for example, by locking tables or pausing replication to the secondary). This ensures a static dataset but may affect online services.
Encoding
SQL Server Reader uses Java Database Connectivity (JDBC) for data extraction. JDBC handles encoding detection and conversion automatically — no encoding configuration is required.
SQL security
The querySql parameter lets you pass arbitrary SELECT statements to the database. SQL Server Reader does not perform any validation or sanitization on these statements.
Appendix: Script demo and parameters
Reader script demo
{
"type": "job",
"version": "2.0",
"steps": [
{
"stepType": "sqlserver",
"parameter": {
"datasource": "", // Data source name
"column": [ // Columns to read
"id",
"name"
],
"where": "", // Filter condition
"splitPk": "", // Column used for data sharding
"table": "" // Source table name
},
"name": "Reader",
"category": "reader"
},
{
"stepType": "stream",
"parameter": {},
"name": "Writer",
"category": "writer"
}
],
"setting": {
"errorLimit": {
"record": "0"
},
"speed": {
"throttle": true,
"concurrent": 1,
"mbps": "12"
}
},
"order": {
"hops": [
{
"from": "Reader",
"to": "Writer"
}
]
}
}
To use querySql for a custom query (for example, reading name from dbo.test_table in the sql_server_source data source):
{
"stepType": "sqlserver",
"parameter": {
"connection": [
{
"querySql": ["select name from dbo.test_table"],
"datasource": "sql_server_source"
}
],
"datasource": "sql_server_source",
"column": ["name"],
"where": "",
"splitPk": "id"
},
"name": "Reader",
"category": "reader"
}
Reader parameters
| Parameter | Description | Required | Default |
|---|---|---|---|
datasource |
The data source name. Must match the name of an added data source. | Yes | None |
table |
The source table name. Each job supports one table. | Yes | None |
column |
Columns to read, specified as a JSON array. Use ["*"] to read all columns. Supports column pruning, column reordering, and constant values. The array cannot be empty. |
Yes | None |
splitPk |
Column used to shard data for concurrent extraction. Use the primary key when possible — evenly distributed keys prevent data hotspots. Only integer columns are supported; specifying a string, float, or date column causes an error. | No | None |
splitFactor |
Number of shards per concurrent thread. Total shards = concurrent × splitFactor. Recommended range: 1–100. Values that are too large may cause an out-of-memory (OOM) error. |
No | 5 |
where |
Filter condition used to build the SQL statement. Leave blank to read all rows. Example: gmt_create > ${bizdate}. |
No | None |
querySql |
Custom SQL query that overrides table, column, and where. Use for multi-table joins or complex filters. Example: select a,b from table_a join table_b on table_a.id = table_b.id. When set, SQL Server Reader ignores column, table, and where. |
No | None |
fetchSize |
Number of records fetched per batch from the database. Affects network round trips and extraction performance. Values above 2,048 may cause OOM errors. | No | 1024 |
driverVersion |
SQL Server JDBC driver version. Set to 12.10 to use a version that supports Active Directory Service Principal authentication. |
No | 4.0 |
SQL Server Reader assembles table, column, and where into a SQL statement and sends it to the database. When querySql is set, it is sent directly.
Writer script demo
{
"type": "job",
"version": "2.0",
"steps": [
{
"stepType": "stream",
"parameter": {},
"name": "Reader",
"category": "reader"
},
{
"stepType": "sqlserver",
"parameter": {
"postSql": [], // SQL statements run after the sync task
"datasource": "", // Data source name
"column": [ // Destination columns
"id",
"name"
],
"table": "", // Destination table name
"preSql": [] // SQL statements run before the sync task
},
"name": "Writer",
"category": "writer"
}
],
"setting": {
"errorLimit": {
"record": "0"
},
"speed": {
"throttle": true,
"concurrent": 1,
"mbps": "12"
}
},
"order": {
"hops": [
{
"from": "Reader",
"to": "Writer"
}
]
}
}
Writer parameters
| Parameter | Description | Required | Default |
|---|---|---|---|
datasource |
The data source name. Must match the name of an added data source. | Yes | None |
table |
The destination table name. | Yes | None |
column |
Destination columns to write, separated by commas. Example: ["id","name","age"]. Use ["*"] to write all columns in order. |
Yes | None |
preSql |
SQL statements executed before the sync task. The codeless UI supports one statement; the code editor supports multiple (for example, to clear old data). | No | None |
postSql |
SQL statements executed after the sync task. The codeless UI supports one statement; the code editor supports multiple (for example, to add a timestamp). | No | None |
writeMode |
Write mode. Only insert is supported. On primary key or unique index conflict, the row is treated as dirty data and the original row is retained. |
No | insert |
batchSize |
Number of records written per batch. Larger values reduce network round trips and improve throughput, but very large values may cause OOM errors. | No | 1,024 |
driverVersion |
SQL Server JDBC driver version. Set to 12.10 to use a version that supports Active Directory Service Principal authentication. |
No | 4.0 |