All Products
Search
Document Center

Realtime Compute for Apache Flink:Raw

Last Updated:Sep 20, 2023

This topic provides an example on how to use the Raw format and describes the parameters and data type mappings of Raw.

Background information

The Raw format can be used to read and write byte-based raw values as a single column. The Raw format is built in Flink. Connectors that support the Raw format include Apache Kafka connector, Upsert Kafka connector, and Object Storage Service (OSS) connector.

Sample code

For example, a Kafka database contains log data in the Raw format. You can use Flink SQL to read and analyze the log data.

47.xx.xx.179 - - [28/Feb/2019:13:17:10 +0000] "GET /?p=1 HTTP/2.0" 200 5316 "https://domain.com/?p=1" "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.xx.xx.119 Safari/537.36" "2.75"

The following sample code provides an example on how to use the Raw format to read data from a Kafka topic and use the data as strings. The data is encoded in UTF-8.

CREATE TABLE nginx_log (
  log STRING
) WITH (
  'connector' = 'kafka',
  'topic' = 'nginx_log',
  'properties.bootstrap.servers' = 'localhost:9092',
  'properties.group.id' = 'testGroup',
  'format' = 'raw'
);

After you execute the preceding statement to read the data in the Raw format and use the data as strings, you can use a user-defined function (UDF) to split a string into multiple strings and perform further analysis. For example, you can use the my_split function in the following SQL statement to split the string.

SELECT t.hostname, t.datetime, t.url, t.browser, ...
FROM(
  SELECT my_split(log) as t FROM nginx_log
);

Similarly, you can write a column of the STRING data type to a Kafka topic as a string that is encoded in UTF-8.

Parameters

Parameter

Required

Default value

Data type

Description

format

Yes

No default value

STRING

The format that you declare to use. If you want to use the Raw format, set this parameter to raw.

raw.charset

No

UTF-8

STRING

The character set that is used to encode the text string. Default value: UTF-8.

raw.endianness

No

big-endian

STRING

The endian format of a value in bytes. Valid values:

  • big-endian (default value)

  • little-endian

Data type mappings

The following table describes the Flink SQL data types that are supported by the Raw format.

Flink SQL data type

Description

CHAR, VARCHAR, and STRING

The text string that is encoded in UTF-8. UTF-8 is the default encoding format.

Note

The character set that is used to encode the text string can be specified by the raw.charset parameter.

BINARY, VARBINARY, and BYTES

The sequence of bytes.

BOOLEAN

A single byte of the BOOLEAN data type.

TINYINT

A single byte of a signed numeric value.

SMALLINT

Two bytes that are in the big-endian encoding. big-endian is the default endian format.

Note

The endian format can be specified by the raw.endianness parameter.

INT

Four bytes that are in the big-endian encoding. big-endian is the default endian format.

Note

The endian format can be specified by the raw.endianness parameter.

BIGINT

Eight bytes that are in the big-endian encoding. big-endian is the default endian format.

Note

The endian format can be specified by the raw.endianness parameter.

FLOAT

Four bytes that are in the big-endian encoding and the IEEE 754 format. big-endian is the default endian format.

Note

The endian format can be specified by the raw.endianness parameter.

DOUBLE

Eight bytes that are in the big-endian encoding and the IEEE 754 format. big-endian is the default endian format.

Note

The endian format can be specified by the raw.endianness parameter.

RAW

The sequence of bytes serialized by the underlying TypeSerializer of the RAW data type.

Others

The Raw format is used to encode a NULL value as a NULL value of the BYTE[] data type. The Upsert Kafka connector identifies the NULL value as a tombstone message and deletes the NULL value from the Kafka key. Therefore, if a field has a NULL value and the Upsert Kafka connector is used, we recommend that you do not specify the value.format parameter in the Raw format.