DataWorks data integration supports reading data from and writing data to Snowflake. This topic describes the data synchronization capabilities that are available for Snowflake.
Limits
Snowflake Reader and Writer use the snowflake-jdbc:3.20.0 driver. For more information about the capabilities of the driver, see Snowflake JDBC Driver Release Notes.
Supported field types
Snowflake Reader and Snowflake Writer support most common data types for relational databases, such as numbers and characters. However, some data types are not supported. You must verify that your data types are supported by the specific database.
Field type | Offline read (Snowflake Reader) | Offline write (Snowflake Writer) |
NUMBER | Supported | Support |
DECIMAL | Supported | Supported |
NUMERIC | Supported | Supported |
INT | Support | Supported |
INTEGER | Support | Supported |
BIGINT | Supported | Supported |
SMALLINT | Supported | Help and support |
TINYINT | Supported | Supported |
BYTEINT | Supported | Supported |
FLOAT | Supported | Supported |
FLOAT4 | Supported | Supported |
FLOAT8 | Support | Supported |
DOUBLE | Supported | Support |
DOUBLE PRECISION | Supported | Supported |
CHAR | Supported | Supported |
VARCHAR | Supported | Supported |
STRING | Supported | Supported |
TEXT | Supported | Supported |
BINARY | Support | Supported |
VARBINARY | Supported | Supported |
BOOLEAN | Supported | Supported |
DATE | Support | Support |
TIME | Supported | Supported |
TIMESTAMP | Supported | Supported |
TIMESTAMP_NTZ | Supported | Supported |
TIMESTAMP_LTZ | Supported | Support |
TIMESTAMP_TZ | Not supported | Supported |
OBJECT | Not supported | Not supported |
ARRAY | Not supported | Not supported |
VARIANT | Not supported | Not supported |
GEOGRAPHY | Not supported | Not supported |
GEOMETRY | Not supported | Not supported |
VECTOR | Not supported | Not supported |
FILE | Not supported | Not supported |
Create a data source
Before you develop a data synchronization task, you must create a corresponding data source in DataWorks. For more information about the procedure, see Data Source Management. For detailed descriptions of the configuration parameters, see the tooltips on the configuration page.
Develop a data synchronization task
For information about the entry point and general process for configuring a data synchronization task, see the following guides.
Configure an offline sync task for a single table
For more information about the procedure, see Configure a sync task in the codeless UI and Configure a sync task in the code editor.
For all parameters and a sample script for the code editor, see the Appendix: Sample scripts and parameter descriptions section in this topic.
Appendix: Script demos and parameter descriptions
Configure an offline task script
If you use the code editor to configure an offline task for a single table, you must write the script in the required format. For more information, see Configure a sync task in the code editor. The following sections describe the parameters for the data source in the code editor.
Reader script demo
{
"transform": false,
"type": "job",
"version": "2.0",
"steps": [
{
"stepType": "snowflake",
"parameter": {
"schema": "PUBLIC",
"envType": 0,
"datasource": "snowflake_datasource",
"column": [
"ID",
"NAME"
],
"where": "",
"table": "table"
},
"name": "Reader",
"category": "reader"
},
{
"stepType": "stream",
"parameter": {
"print": true
},
"name": "Writer",
"category": "writer"
}
],
"setting": {
"errorLimit": {
"record": "0"
},
"speed": {
"throttle": false,
"concurrent": 2
}
},
"order": {
"hops": [
{
"from": "Reader",
"to": "Writer"
}
]
}
}Reader script parameters
Parameter | Description | Required | Default value |
datasource | The name of the Snowflake data source. | Yes | None |
table | The name of the table from which you want to sync data. | Yes | None |
schema | The schema where the Snowflake table that you want to read resides. | Yes | None |
column | The columns that you want to sync from the source table. Use a JSON array to describe the fields. By default, all columns are used, such as ["*"].
| Yes | None |
splitFactor | The split factor. This parameter specifies the number of chunks into which the data is split for synchronization. If you configure multiple concurrent threads, the data is split into a number of chunks equal to Concurrency × splitFactor. For example, if Concurrency is set to 5 and splitFactor is set to 5, the data is split into 25 chunks (5 × 5) and processed by 5 concurrent threads. Note Set this parameter to a value from 1 to 100. A value that is too large may cause an out-of-memory (OOM) error. | ||
splitPk | When Snowflake Reader extracts data, you can specify the splitPk parameter to shard data based on the specified primary key column. This allows the data synchronization system to run concurrent tasks to improve efficiency.
| ||
where | The filter condition. Snowflake Reader combines the specified column, table, and where parameters into an SQL statement to extract data. For example, for testing purposes, you can set the where condition to limit 10. For incremental synchronization, you can set the where condition to
| No | None |
fetchSize | This parameter specifies the number of records to retrieve in each batch from the database server. A larger value reduces the number of network interactions between Data Integration and the server, which improves data extraction performance. Note A fetchSize value greater than 2048 may cause an OOM error in the data synchronization process. | No | 1024 |
Writer script demo
{
"transform": false,
"type": "job",
"version": "2.0",
"steps": [
{
"stepType":"stream",
"parameter":{},
"name":"Reader",
"category":"reader"
},
{
"stepType": "snowflake",
"parameter": {
"schema": "PUBLIC",
"envType": 0,
"datasource": "snowflake_datasource",
"column": [
"ID",
"NAME"
],
"table": "TABLE"
},
"name": "Writer",
"category": "writer"
}
],
"setting": {
"executeMode": null,
"errorLimit": {
"record": "0"
},
"speed": {
"concurrent": 2,
"throttle": false
}
},
"order": {
"hops": []
}
}Writer script parameters
Parameter | Description | Required | Default value |
datasource | The name of the Snowflake data source. | Yes | None |
table | The name of the table to which you want to sync data. | Yes | None |
schema | The schema where the Snowflake table that you want to write to resides. | Yes | None |
column | The columns in the destination table to which you want to write data. Separate the column names with commas (,). Example: | Yes | None |
batchSize | The number of records to write in a single batch. A larger value can significantly reduce network interactions between Data Integration and Snowflake and improve overall throughput. However, a value that is too large may cause an OOM error in the Data Integration process. | No | 1024 |