Snowflake data source - DataWorks - Alibaba Cloud Documentation Center

DataWorks data integration supports reading data from and writing data to Snowflake. This topic describes the data synchronization capabilities that are available for Snowflake.

Limits

Snowflake Reader and Writer use the snowflake-jdbc:3.20.0 driver. For more information about the capabilities of the driver, see Snowflake JDBC Driver Release Notes.

Supported field types

Snowflake Reader and Snowflake Writer support most common data types for relational databases, such as numbers and characters. However, some data types are not supported. You must verify that your data types are supported by the specific database.

Field type	Offline read (Snowflake Reader)	Offline write (Snowflake Writer)
NUMBER	Supported	Support
DECIMAL	Supported	Supported
NUMERIC	Supported	Supported
INT	Support	Supported
INTEGER	Support	Supported
BIGINT	Supported	Supported
SMALLINT	Supported	Help and support
TINYINT	Supported	Supported
BYTEINT	Supported	Supported
FLOAT	Supported	Supported
FLOAT4	Supported	Supported
FLOAT8	Support	Supported
DOUBLE	Supported	Support
DOUBLE PRECISION	Supported	Supported
CHAR	Supported	Supported
VARCHAR	Supported	Supported
STRING	Supported	Supported
TEXT	Supported	Supported
BINARY	Support	Supported
VARBINARY	Supported	Supported
BOOLEAN	Supported	Supported
DATE	Support	Support
TIME	Supported	Supported
TIMESTAMP	Supported	Supported
TIMESTAMP_NTZ	Supported	Supported
TIMESTAMP_LTZ	Supported	Support
TIMESTAMP_TZ	Not supported	Supported
OBJECT	Not supported	Not supported
ARRAY	Not supported	Not supported
VARIANT	Not supported	Not supported
GEOGRAPHY	Not supported	Not supported
GEOMETRY	Not supported	Not supported
VECTOR	Not supported	Not supported
FILE	Not supported	Not supported

Create a data source

Before you develop a data synchronization task, you must create a corresponding data source in DataWorks. For more information about the procedure, see Data Source Management. For detailed descriptions of the configuration parameters, see the tooltips on the configuration page.

Develop a data synchronization task

For information about the entry point and general process for configuring a data synchronization task, see the following guides.

Configure an offline sync task for a single table

For more information about the procedure, see Configure a sync task in the codeless UI and Configure a sync task in the code editor.
For all parameters and a sample script for the code editor, see the Appendix: Sample scripts and parameter descriptions section in this topic.

Appendix: Script demos and parameter descriptions

Configure an offline task script

If you use the code editor to configure an offline task for a single table, you must write the script in the required format. For more information, see Configure a sync task in the code editor. The following sections describe the parameters for the data source in the code editor.

Reader script demo

{
    "transform": false,
    "type": "job",
    "version": "2.0",
    "steps": [
        {
            "stepType": "snowflake",
            "parameter": {
                "schema": "PUBLIC",
                "envType": 0,
                "datasource": "snowflake_datasource",
                "column": [
                    "ID",
                    "NAME"
                ],
                "where": "",
                "table": "table"
            },
            "name": "Reader",
            "category": "reader"
        },
        {
            "stepType": "stream",
            "parameter": {
                "print": true
            },
            "name": "Writer",
            "category": "writer"
        }
    ],
    "setting": {
        "errorLimit": {
            "record": "0"
        },
        "speed": {
            "throttle": false,
            "concurrent": 2
        }
    },
    "order": {
        "hops": [
            {
                "from": "Reader",
                "to": "Writer"
            }
        ]
    }
}

Reader script parameters

Parameter	Description	Required	Default value
datasource	The name of the Snowflake data source.	Yes	None
table	The name of the table from which you want to sync data.	Yes	None
schema	The schema where the Snowflake table that you want to read resides.	Yes	None
column	The columns that you want to sync from the source table. Use a JSON array to describe the fields. By default, all columns are used, such as [""]. You can select a subset of columns to export. You can export columns in an order different from the table schema. You can configure constants by providing the configuration in JSON format. The following is an example: `["id","1", "'bazhen.csy'", "null", "to_char(a + 1)", "2.3" , "true"]`. id is a regular column name. 1 is an integer constant. 'bazhen.csy'* is a string constant. null is a null pointer. to_char(a + 1) is a function expression. 2.3 is a floating-point number. true is a Boolean value. The column parameter must specify the columns that you want to sync. It cannot be empty.	Yes	None
splitFactor	The split factor. This parameter specifies the number of chunks into which the data is split for synchronization. If you configure multiple concurrent threads, the data is split into a number of chunks equal to Concurrency × splitFactor. For example, if Concurrency is set to 5 and splitFactor is set to 5, the data is split into 25 chunks (5 × 5) and processed by 5 concurrent threads. Note Set this parameter to a value from 1 to 100. A value that is too large may cause an out-of-memory (OOM) error.
splitPk	When Snowflake Reader extracts data, you can specify the splitPk parameter to shard data based on the specified primary key column. This allows the data synchronization system to run concurrent tasks to improve efficiency. Set splitPk to the primary key of the table. The values in a primary key column are typically distributed evenly. This prevents data hot spots in the shards. Currently, the splitPk parameter supports only integer columns for data sharding. If you specify a column of another data type, such as string, floating-point number, or date, Snowflake Reader reports an error.
where	The filter condition. Snowflake Reader combines the specified column, table, and where parameters into an SQL statement to extract data. For example, for testing purposes, you can set the where condition to limit 10. For incremental synchronization, you can set the where condition to `gmt_create>${bizdate}`. The where condition allows for efficient incremental data synchronization. If you do not configure this parameter or leave it empty, all data is synced from the table.	No	None
fetchSize	This parameter specifies the number of records to retrieve in each batch from the database server. A larger value reduces the number of network interactions between Data Integration and the server, which improves data extraction performance. Note A fetchSize value greater than 2048 may cause an OOM error in the data synchronization process.	No	1024

Writer script demo

{
    "transform": false,
    "type": "job",
    "version": "2.0",
    "steps": [
        { 
            "stepType":"stream",
            "parameter":{},
            "name":"Reader",
            "category":"reader"
        },
        {
            "stepType": "snowflake",
            "parameter": {
                "schema": "PUBLIC",
                "envType": 0,
                "datasource": "snowflake_datasource",
                "column": [
                    "ID",
                    "NAME"
                ],
                "table": "TABLE"
            },
            "name": "Writer",
            "category": "writer"
        }
    ],
    "setting": {
        "executeMode": null,
        "errorLimit": {
            "record": "0"
        },
        "speed": {
            "concurrent": 2,
            "throttle": false
        }
    },
    "order": {
        "hops": []
    }
}

Writer script parameters

Parameter	Description	Required	Default value
datasource	The name of the Snowflake data source.	Yes	None
table	The name of the table to which you want to sync data.	Yes	None
schema	The schema where the Snowflake table that you want to write to resides.	Yes	None
column	The columns in the destination table to which you want to write data. Separate the column names with commas (,). Example: `"column":["id","name","age"]`. To write data to all columns in order, use an asterisk (). Example: `"column":[""]`.	Yes	None
batchSize	The number of records to write in a single batch. A larger value can significantly reduce network interactions between Data Integration and Snowflake and improve overall throughput. However, a value that is too large may cause an OOM error in the Data Integration process.	No	1024