DataWorks data integration supports reading data from and writing data to Snowflake. This topic describes the supported field types, how to create a Snowflake data source, and the script parameters for Snowflake Reader and Snowflake Writer.
Limitations
Snowflake Reader and Snowflake Writer use the snowflake-jdbc:3.20.0 driver. For driver capabilities and known issues, see Snowflake JDBC Driver Release Notes.
Supported field types
Snowflake Reader and Snowflake Writer support most common relational database field types, including numeric and character types. The following table lists supported and unsupported types.
Field type | Offline read (Snowflake Reader) | Offline write (Snowflake Writer) |
NUMBER | Supported | Supported |
DECIMAL | Supported | Supported |
NUMERIC | Supported | Supported |
INT | Supported | Supported |
INTEGER | Supported | Supported |
BIGINT | Supported | Supported |
SMALLINT | Supported | Supported |
TINYINT | Supported | Supported |
BYTEINT | Supported | Supported |
FLOAT | Supported | Supported |
FLOAT4 | Supported | Supported |
FLOAT8 | Supported | Supported |
DOUBLE | Supported | Supported |
DOUBLE PRECISION | Supported | Supported |
CHAR | Supported | Supported |
VARCHAR | Supported | Supported |
STRING | Supported | Supported |
TEXT | Supported | Supported |
BINARY | Supported | Supported |
VARBINARY | Supported | Supported |
BOOLEAN | Supported | Supported |
DATE | Supported | Supported |
TIME | Supported | Supported |
TIMESTAMP | Supported | Supported |
TIMESTAMP_NTZ | Supported | Supported |
TIMESTAMP_LTZ | Supported | Supported |
TIMESTAMP_TZ | Not supported | Supported |
OBJECT | Not supported | Not supported |
ARRAY | Not supported | Not supported |
VARIANT | Not supported | Not supported |
GEOGRAPHY | Not supported | Not supported |
GEOMETRY | Not supported | Not supported |
VECTOR | Not supported | Not supported |
FILE | Not supported | Not supported |
Create a data source
Before developing a data synchronization task, create a Snowflake data source in DataWorks. For the full procedure, see Data Source Management. For descriptions of each configuration parameter, refer to the tooltips on the configuration page.
Develop a data synchronization task
Configure an offline sync task for a single table
For the step-by-step procedure, see Configure a sync task in the codeless UI and Configure a sync task in the code editor.
For script parameters and sample scripts, see the Appendix: Script demos and parameter descriptions section.
Troubleshooting
Out-of-memory (OOM) errors
Symptom: The sync task fails with an out-of-memory (OOM) error.
Causes and fixes:
fetchSizeexceeds 2048. ReducefetchSizeto 1024 (the default) or lower.batchSizeis set too high for the available memory. ReducebatchSizeto 1024 (the default) or lower.splitFactoris set too high. KeepsplitFactorin the range 1–100. A high value creates too many data chunks and increases memory pressure.
splitPk has no effect
Symptom: Data is not split across concurrent threads despite setting splitPk.
Cause: splitPk supports only integer columns. If you specify a string, floating-point, or date column, Snowflake Reader reports an error.
Fix: Set splitPk to an integer primary key column. If no integer column is available, increase concurrent instead to improve throughput.
Appendix: Script demos and parameter descriptions
Configure an offline task script
Use the code editor to write the sync task script in the required format. For the general format requirements, see Configure a sync task in the code editor. The following sections provide script samples and parameter descriptions for Snowflake Reader and Snowflake Writer.
Reader script demo
{
"transform": false,
"type": "job",
"version": "2.0",
"steps": [
{
"stepType": "snowflake",
"parameter": {
"schema": "PUBLIC",
"envType": 0,
"datasource": "snowflake_datasource",
"column": [
"ID",
"NAME"
],
"where": "",
"table": "table"
},
"name": "Reader",
"category": "reader"
},
{
"stepType": "stream",
"parameter": {
"print": true
},
"name": "Writer",
"category": "writer"
}
],
"setting": {
"errorLimit": {
"record": "0"
},
"speed": {
"throttle": false,
"concurrent": 2
}
},
"order": {
"hops": [
{
"from": "Reader",
"to": "Writer"
}
]
}
}Reader script parameters
Parameter | Description | Required | Default |
| The name of the Snowflake data source. | Yes | None |
| The name of the table to read from. | Yes | None |
| The schema where the source table resides. | Yes | None |
| The columns to sync, specified as a JSON array. Use | Yes | None |
| The integer primary key column used to split data into chunks for concurrent extraction. Set this to a primary key with evenly distributed values to avoid data skew. Only integer columns are supported — specifying a string, floating-point, or date column causes Snowflake Reader to report an error. | No | None |
| The number of chunks per concurrent thread. Total chunks = | No | None |
| A SQL filter condition appended to the extraction query. For example, set | No | None |
| The number of rows retrieved per batch from Snowflake. A larger value reduces round trips and improves throughput, but values above 2048 may cause an OOM error. For large tables, start with the default and increase only if needed. | No | 1024 |
Writer script demo
{
"transform": false,
"type": "job",
"version": "2.0",
"steps": [
{
"stepType": "stream",
"parameter": {},
"name": "Reader",
"category": "reader"
},
{
"stepType": "snowflake",
"parameter": {
"schema": "PUBLIC",
"envType": 0,
"datasource": "snowflake_datasource",
"column": [
"ID",
"NAME"
],
"table": "TABLE"
},
"name": "Writer",
"category": "writer"
}
],
"setting": {
"executeMode": null,
"errorLimit": {
"record": "0"
},
"speed": {
"concurrent": 2,
"throttle": false
}
},
"order": {
"hops": []
}
}Writer script parameters
Parameter | Description | Required | Default |
| The name of the Snowflake data source. | Yes | None |
| The name of the destination table. | Yes | None |
| The schema where the destination table resides. | Yes | None |
| The destination columns to write to, specified as a JSON array. For example: | Yes | None |
| The number of rows written per batch. A larger value reduces round trips and improves throughput, but values that are too large may cause an OOM error in the data integration process. | No | 1024 |