Configure the SAP HANA data source for synchronization - DataWorks

Supported versions

DataBase 2.00.048.06.1623425628
DataBase 2.00.055.00.161

Limitations

Offline sync supports reading data from views.
This data source supports serverless resource groups (recommended) and exclusive resource groups for Data Integration.

Supported field types

The SAP HANA Reader supports the following data types.

Type category	Data types
Integer	INT, TINYINT, SMALLINT, MEDIUMINT, and BIGINT
Floating-point	FLOAT, DOUBLE, and DECIMAL
String	VARCHAR, CHAR, TINYTEXT, TEXT, MEDIUMTEXT, and LONGTEXT
Date and time	DATE, DATETIME, TIMESTAMP, TIME, and YEAR
Boolean	BIT and BOOL
Binary	TINYBLOB, MEDIUMBLOB, BLOB, LONGBLOB, and VARBINARY

Important

Data types not listed above are not supported. The SAP HANA Reader treats tinyint(1) as an integer.

Add a data source

Add SAP HANA as a data source in DataWorks before developing a synchronization task. For instructions, see Data source management.

Note

Parameter descriptions are available directly in the DataWorks console when you add the data source.

Develop a data synchronization task

Configure an offline sync task for a single table

Use the codeless UI or the code editor to configure the task:

For all script parameters and a script demo, see Appendix: Script demo and parameter descriptions.

Appendix: Script demo and parameter descriptions

The following sections describe the Reader and Writer script parameters for batch synchronization tasks configured in the code editor. For the script format requirements, see Configure a task in the code editor.

Reader script demo

All Reader examples use "stepType": "saphana" to identify the SAP HANA plugin.

Single table in a single database

{
    "type": "job",
    "version": "2.0",
    "steps": [
        {
            "stepType": "saphana",
            "parameter": {
                "column": [
                    "id"
                ],
                "connection": [
                    {
                        "querySql": ["select a,b from join1 c join join2 d on c.id = d.id;"],
                        "datasource": "",
                        "table": [
                            "xxx"
                        ]
                    }
                ],
                "where": "",
                "splitPk": "",
                "encoding": "UTF-8"
            },
            "name": "Reader",
            "category": "reader"
        },
        {
            "stepType": "stream",
            "parameter": {},
            "name": "Writer",
            "category": "writer"
        }
    ],
    "setting": {
        "errorLimit": {
            "record": "0"
        },
        "speed": {
            "throttle": true,
            "concurrent": 1,
            "mbps": "12"
        }
    },
    "order": {
        "hops": [
            {
                "from": "Reader",
                "to": "Writer"
            }
        ]
    }
}

Sharded tables

Sharding lets the SAP HANA Reader read data from multiple SAP HANA tables. All sharded tables must share the same schema.

{
    "type": "job",
    "version": "1.0",
    "configuration": {
        "reader": {
            "plugin": "saphana",
            "parameter": {
                "connection": [
                    {
                        "table": [
                            "tbl1",
                            "tbl2",
                            "tbl3"
                        ],
                        "datasource": "datasourceName1"
                    },
                    {
                        "table": [
                            "tbl4",
                            "tbl5",
                            "tbl6"
                        ],
                        "datasource": "datasourceName2"
                    }
                ],
                "singleOrMulti": "multi",
                "splitPk": "db_id",
                "column": [
                    "id", "name", "age"
                ],
                "where": "1 < id and id < 100"
            }
        },
        "writer": {}
    }
}

Reader script parameters

Parameter	Description	Required	Default value
`username`	The username for the SAP HANA database.	Yes	None
`password`	The password for the SAP HANA database.	Yes	None
`column`	The columns to synchronize. Use `*` to synchronize all columns. If a column name contains a forward slash (`/`), enclose it in double quotation marks and escape the quotes with backslashes. For example, for a column named `/abc/efg`, set the value to `\\"/abc/efg\\"`.	Yes	None
`table`	The name of the table to synchronize. Specify the value as an array, even for a single table.	Yes	None
`jdbcUrl`	The Java Database Connectivity (JDBC) URL for the SAP HANA database. Example: `jdbc:sap://127.0.0.1:30215?currentschema=TEST`.	Yes	None
`splitPk`	A numeric integer column used as the split key for concurrent synchronization. If no such column exists, leave this blank.	No	None

Writer script demo

{
    "type": "job",
    "version": "2.0",
    "steps": [
        {
            "stepType": "stream",
            "parameter": {},
            "name": "Reader",
            "category": "reader"
        },
        {
            "stepType": "saphana",
            "parameter": {
                "postSql": [],
                "datasource": "",
                "column": [
                    "id",
                    "value"
                ],
                "batchSize": 1024,
                "table": "",
                "preSql": [
                    "delete from XXX;"
                ]
            },
            "name": "Writer",
            "category": "writer"
        }
    ],
    "setting": {
        "errorLimit": {
            "record": "0"
        },
        "speed": {
            "throttle": true,
            "concurrent": 1,
            "mbps": "12"
        }
    },
    "order": {
        "hops": [
            {
                "from": "Reader",
                "to": "Writer"
            }
        ]
    }
}

Writer script parameters

Parameter	Description	Required	Default value
`datasource`	The data source name. Must match the name configured in the code editor.	Yes	None
`table`	The destination table name.	Yes	None
`column`	The destination columns to write data to, specified as an array. Example: `["id", "name", "age"]`. Use `["*"]` to write to all columns in order. If a column name contains a forward slash (`/`), enclose it in double quotation marks and escape the quotes with backslashes. For example, for a column named `/abc/efg`, set the value to `\\"/abc/efg\\"`.	Yes	None
`preSql`	An SQL statement to run before the synchronization task starts. Use this to clear old data before loading. Example: `truncate table tablename`. In the codeless UI, only one statement is supported. In the code editor, multiple statements are supported, but transaction atomicity is not supported.	No	None
`postSql`	An SQL statement to run after the synchronization task completes. In the codeless UI, only one statement is supported. In the code editor, multiple statements are supported. Example: `ALTER TABLE tablename ADD colname TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP`.	No	None
`batchSize`	The number of records committed in a single batch. A larger value reduces network round-trips and improves throughput, but setting it too high may cause out-of-memory (OOM) errors.	No	1024