Use these built-in functions in transform module expressions for Flink CDC data ingestion jobs.
Built-in functions
Flink CDC provides built-in functions for use directly in projection and filter expressions within the transform module.
Arithmetic functions
| Function | Description |
|---|---|
numeric1 + numeric2 |
Returns the sum of numeric1 and numeric2. |
numeric1 - numeric2 |
Returns numeric1 minus numeric2. |
numeric1 * numeric2 |
Returns numeric1 multiplied by numeric2. |
numeric1 / numeric2 |
Returns numeric1 divided by numeric2. |
numeric1 % numeric2 |
Returns numeric1 modulo numeric2. |
ABS(numeric) |
Returns the absolute value of numeric. |
CEIL(numeric) |
Returns the smallest integer greater than or equal to numeric. |
FLOOR(numeric) |
Returns the largest integer less than or equal to numeric. |
ROUND(numeric, int) |
Returns a numeric value rounded to n decimal places. |
UUID() |
Generates a globally unique identifier (UUID) string, such as 3d3c68f7-f608-473f-b60c-b0c44ad4cc4e. Uses RFC 4122 type 4 to generate pseudo-random UUIDs. |
Time functions
| Function | Description |
|---|---|
LOCALTIME |
Returns the local time in the current time zone. Return type: TIME(0). |
LOCALTIMESTAMP |
Returns the local timestamp in the current time zone. Return type: TIMESTAMP(3). |
CURRENT_TIME |
Returns the local time in the current time zone. Equivalent to LOCALTIME. |
CURRENT_DATE |
Returns the local date in the current time zone. |
CURRENT_TIMESTAMP |
Returns the local timestamp in the current time zone. Return type: TIMESTAMP_LTZ(3). |
NOW() |
Returns the local timestamp in the current time zone. Equivalent to CURRENT_TIMESTAMP. |
DATE_FORMAT(timestamp, string) |
Formats the input timestamp using the specified format string. The format string is compatible with Java SimpleDateFormat. |
DATE_FORMAT_TZ(timestamp, string) |
Formats the input timestamp as a string in the specified time zone. The format string is compatible with Java SimpleDateFormat. Output format: yyyy-MM-dd HH:mm:ss. |
DATE_FORMAT_TZ(timestamp, string1[, string2]) |
Converts the input timestamp to a string formatted as string1 in the time zone string2. The format string is compatible with Java SimpleDateFormat. |
TIMESTAMPADD(timeintervalunit, interval, timepoint) |
Returns the timepoint after adding interval (in timeintervalunit units) to timepoint. Valid values for timeintervalunit: SECOND, MINUTE, HOUR, DAY, MONTH, YEAR. |
DATE_ADD(timestamp, integer, string) |
Converts timestamp to local time in the time zone specified by string, then adds the number of days specified by integer. Returns the result as a string in yyyy-MM-dd format. |
TIMESTAMPDIFF(timepointunit, timepoint1, timepoint2) |
Returns the difference between timepoint1 and timepoint2 in timepointunit units. Valid values for timepointunit: SECOND, MINUTE, HOUR, DAY, MONTH, YEAR. |
TO_DATE(string1[, string2]) |
Parses the date string string1 into a DATE type using format string string2. Default format: yyyy-MM-dd. |
TO_TIMESTAMP(string1[, string2]) |
Parses the timestamp string string1 into a timezone-free TIMESTAMP type using format string string2. Default format: yyyy-MM-ddHH:mm:ss. |
TO_TIMESTAMP_LTZ(string1[, string2][, string3]) |
Parses the timestamp string string1 into a TIMESTAMP_LTZ type in time zone string3 using format string string2. Default format: yyyy-MM-dd HH:mm:ss.SSS. Default time zone: UTC. |
TO_TIMESTAMP_LTZ(bigint[, integer]) |
Converts a timestamp (bigint) to TIMESTAMP_LTZ with the precision specified by integer. The integer must be 0 or 3. |
FROM_UNIXTIME(numeric[, string]) |
Converts numeric (seconds since 1970-01-01 00:00:00 UTC) to a timestamp string in the format specified by string. Default format: yyyy-MM-ddHH:mm:ss. |
UNIX_TIMESTAMP() |
Returns the current UNIX timestamp in seconds. |
UNIX_TIMESTAMP(string1[, string2]) |
Parses the datetime string string1 using format string string2 and returns the corresponding UNIX timestamp in seconds. |
When evaluating projection and filter expressions, each subexpression returns the same time point. For example, NOW() AS t1, NOW() AS t2, NOW() AS t3 always returns the same timestamp for t1, t2, and t3, regardless of evaluation order or timing.
String functions
| Function | Description |
|---|---|
string1 || string2 |
Returns the concatenation of string1 and string2. Note: this is the string concatenation operator, not the logical OR operator. |
CHAR_LENGTH(string) |
Returns the number of characters in string. |
UPPER(string) |
Returns string in uppercase. |
LOWER(string) |
Returns string in lowercase. |
TRIM(string1) |
Removes leading and trailing whitespace from string1. |
REGEXP_REPLACE(string1, string2, string3) |
Replaces all substrings in string1 that match the regular expression string2 with string3. For example, REGEXP_REPLACE('foobar', 'oo|ar', '__') returns f__b__. |
SUBSTRING(string FROM integer1 [FOR integer2]) |
Returns the substring of string starting at position integer1 and ending at position integer2. If FOR integer2 is omitted, the substring extends to the end of string. |
CONCAT(string1, string2, ...) |
Returns a new string formed by concatenating string1, string2, and so on. For example, CONCAT('AA', 'BB', 'CC') returns AABBCC. |
Type conversion functions
Use the CAST(<expression> AS <type>) syntax for explicit type conversion.
Use VARCHAR as the target type for strings. STRING is not a valid target type.
| Source type | Target type | Description |
|---|---|---|
| Any | VARCHAR |
Fields of any type can be converted to strings. |
NUMERIC, STRING |
BOOLEAN |
Any string literal that is not '0' or 'FALSE' is converted to 'TRUE'. |
NUMERIC, STRING |
TINYINT |
The result is truncated to the range -128 to 127. |
NUMERIC, STRING |
SMALLINT |
The result is truncated to the range -32768 to 32767. |
NUMERIC, STRING |
INTEGER |
The result is truncated to the range -2147483648 to 2147483647. |
NUMERIC, STRING |
BIGINT |
The result is truncated to the range -9223372036854775808 to 9223372036854775807. |
NUMERIC, STRING |
FLOAT |
|
NUMERIC |
DOUBLE |
|
NUMERIC |
DECIMAL |
|
STRING, TIMESTAMP_TZ, TIMESTAMP_LTZ |
TIMESTAMP |
The string must be in a valid ISO_LOCAL_DATE_TIME format. |
Comparison functions
These functions return NULL when any input parameter is NULL, unless otherwise noted.
| Function | Description |
|---|---|
value1 = value2 |
Returns TRUE if value1 equals value2. Otherwise, returns FALSE. |
value1 <> value2 |
Returns TRUE if value1 does not equal value2. Otherwise, returns FALSE. |
value1 > value2 |
Returns TRUE if value1 is greater than value2. Otherwise, returns FALSE. |
value1 >= value2 |
Returns TRUE if value1 is greater than or equal to value2. Otherwise, returns FALSE. |
value1 < value2 |
Returns TRUE if value1 is less than value2. Otherwise, returns FALSE. |
value1 <= value2 |
Returns TRUE if value1 is less than or equal to value2. Otherwise, returns FALSE. |
value IS NULL |
Returns TRUE if value is NULL. Otherwise, returns FALSE. |
value IS NOT NULL |
Returns TRUE if value is not NULL. Otherwise, returns FALSE. |
value1 BETWEEN value2 AND value3 |
Returns TRUE if value1 is between value2 and value3 (inclusive). Otherwise, returns FALSE. For example, 12 BETWEEN 10 AND 15 returns TRUE; 12 BETWEEN 15 AND 20 returns FALSE. |
value1 NOT BETWEEN value2 AND value3 |
Returns TRUE if value1 is not between value2 and value3 (inclusive). Otherwise, returns FALSE. |
string1 LIKE string2 |
Returns TRUE if string1 matches the pattern defined by string2. Otherwise, returns FALSE. |
string1 NOT LIKE string2 |
Returns TRUE if string1 does not match the pattern defined by string2. Otherwise, returns FALSE. |
value1 IN (value2 [, value3]*) |
Returns TRUE if value1 exists in the list [value2, value3, ...]. Otherwise, returns FALSE. |
value1 NOT IN (value2 [, value3]*) |
Returns TRUE if value1 does not exist in the list [value2, value3, ...]. Otherwise, returns FALSE. |
Logical functions
| Function | Description |
|---|---|
boolean1 OR boolean2 |
Returns TRUE if boolean1 or boolean2 is TRUE. |
boolean1 AND boolean2 |
Returns TRUE if boolean1 and boolean2 are both TRUE. |
NOT boolean |
Returns FALSE if boolean is TRUE. Returns TRUE if boolean is FALSE. |
boolean IS FALSE |
Returns FALSE if boolean is TRUE. Returns TRUE if boolean is FALSE. |
boolean IS NOT FALSE |
Returns TRUE if boolean is TRUE. Returns FALSE if boolean is FALSE. |
boolean IS TRUE |
Returns TRUE if boolean is TRUE. Returns FALSE if boolean is FALSE. |
boolean IS NOT TRUE |
Returns FALSE if boolean is TRUE. Returns TRUE if boolean is FALSE. |
Conditional functions
| Function | Description |
|---|---|
CASE value WHEN value1_1 [, value1_2]* THEN result1 (WHEN value2_1 [, value2_2]* THEN result2)* (ELSE result_z) END |
Checks whether value equals each value in the WHEN clauses in order. Returns the result of the first matching clause. If no clause matches, returns the ELSE value. If there is no ELSE clause, returns NULL. |
CASE WHEN condition1 THEN result1 (WHEN condition2 THEN result2)* (ELSE result_z) END |
Checks each condition in the WHEN clauses in order. Returns the result of the first true condition. If no condition is true, returns the ELSE value. If there is no ELSE clause, returns NULL. |
COALESCE(value1 [, value2]*) |
Returns the first non-NULL value in the list [value1, value2, ...]. Returns NULL if all values are NULL. |
IF(condition, true_value, false_value) |
Returns true_value if condition is true. Otherwise, returns false_value. |
Variant functions
Variant functions are available in data ingestion jobs starting with Ververica Runtime (VVR) 11.6.
| Function | Description |
|---|---|
PARSE_JSON(json_string[, allow_duplicate_keys]) |
Parses a JSON string into a VARIANT. Throws an error if the JSON string is invalid. To return NULL instead, use TRY_PARSE_JSON. When the input contains duplicate keys, the parser keeps the last field with the same key if allowDuplicateKeys is true, or throws an error if false. Default: allowDuplicateKeys = false. |
TRY_PARSE_JSON(json_string[, allow_duplicate_keys]) |
Parses a JSON string into a VARIANT. Returns NULL if the JSON string is invalid. To throw an error instead, use PARSE_JSON. When the input contains duplicate keys, the parser keeps the last field with the same key if allowDuplicateKeys is true, or throws an error if false. Default: allowDuplicateKeys = false. |
Hash functions
Hash functions are available in data ingestion jobs starting with Ververica Runtime (VVR) 11.6. All hash functions return NULL if the input string is NULL.
| Function | Description |
|---|---|
MD5(string) |
Returns the MD5 hash of string as a 32-character hexadecimal string. |
SHA1(string) |
Returns the SHA-1 hash of string as a 40-character hexadecimal string. |
SHA224(string) |
Returns the SHA-224 hash of string as a 56-character hexadecimal string. |
SHA256(string) |
Returns the SHA-256 hash of string as a 64-character hexadecimal string. |
SHA384(string) |
Returns the SHA-384 hash of string as a 96-character hexadecimal string. |
SHA512(string) |
Returns the SHA-512 hash of string as a 128-character hexadecimal string. |
SHA2(string, hashLength) |
Returns a hash using a SHA-2 family function. hashLength specifies the bit length of the result: 224, 256, 384, or 512. Returns NULL if string or hashLength is NULL. |
Access nested data types
Use the [] operator to access elements in nested fields of type ARRAY, MAP, or VARIANT. This feature is available starting with Ververica Runtime (VVR) 11.6.
Access methods
The access syntax depends on the internal format of the nested field. The examples below use a field named nest_col.
Array format (`ARRAY` or `VARIANT` array)
Use nest_col[integer] to retrieve the element at the specified index. Array indexing starts at 1.
Example: If nest_col is [{"id": 1, "name": "Project A"}, {"id": 2, "name": "Project B"}], then nest_col[1] returns {"id": 1, "name": "Project A"}.
Hash table format (`MAP` or `VARIANT` object)
Use nest_col[string] to retrieve the value for the specified key.
Example: If nest_col is {"id": 1, "name": "Project A"}, then nest_col['id'] returns 1.
Deeply nested fields
Chain the [] operator to access deeply nested elements.
Example: If nest_col is [{"id": 1, "name": "Project A"}, {"id": 2, "name": "Project B"}], then nest_col[1]['id'] returns 1.
Return types
The return type of [] depends on the type of the original field:
| Original field type | Access syntax | Return type |
|---|---|---|
ARRAY[T] |
nest_col[integer] |
T |
MAP<K, V> |
nest_col[string] |
V |
VARIANT |
nest_col[integer] or nest_col[string] |
VARIANT |