All Products
Search
Document Center

Realtime Compute for Apache Flink:Parquet

Last Updated:Nov 21, 2024

This topic provides an example on how to use the Parquet format and describes the parameters and data type mappings of Parquet.

Limits

  • The Parquet format can be used to read and write Parquet data.

  • Only the supported Object Storage Service (OSS) connector can read data in the Parquet format.

Sample code

The following sample code provides an example on how to create a table in the Parquet format by using the OSS connector.

CREATE TABLE user_behavior (
  user_id BIGINT,
  item_id BIGINT,
  category_id BIGINT,
  behavior STRING,
  ts TIMESTAMP(3),
  dt STRING
) PARTITIONED BY (dt) WITH (
 'connector' = 'filesystem',
 'path' = 'oss://<bucket>/path',
 'format' = 'parquet'
)

Parameters

Parameter

Required

Default value

Data type

Description

format

Yes

No default value

STRING

The format that you declare to use. If you want to use the Parqeut format, set this parameter to parquet.

parquet.utc-timezone

No

false

BOOLEAN

Specifies whether to use the UTC time zone. You can configure this parameter to support the conversion between epoch time and a time specified by LocalDateTime.

Valid values:

  • true: The UTC time zone is used.

  • false: The local time zone is used.

The Parquet format also supports the parameters provided in ParquetOutputFormat. For example, you can configure parquet.compression=GZIP to enable the GZIP compression feature.

Data type mappings

The mappings between Parquet data types and Flink SQL data types are compatible with the mappings between Parquet data types and Apache Hive data types. The mappings between Parquet data types and Flink SQL data types defer from the mappings between Parquet data types and Apache Spark data types in the following aspects:

  • The TIMESTAMP data type of Flink SQL is mapped to the INT96 data type of Parquet regardless of the precision.

  • The DECIMAL data type of Flink SQL is mapped to a Parquet array that has a fixed number of bytes in length based on the precision.

The following table describes the mappings between Flink SQL data types and Parquet data types.

Flink SQL data type

Parquet data type

Logical data type of Parquet

CHAR, VARCHAR, and STRING

BINARY

UTF8

BOOLEAN

BOOLEAN

-

BINARY and VARBINARY

BINARY

-

DECIMAL

FIXED_LEN_BYTE_ARRAY

DECIMAL

TINYINT

INT32

INT_8

SMALLINT

INT32

INT_16

INT

INT32

-

BIGINT

INT64

-

FLOAT

FLOAT

-

DOUBLE

DOUBLE

-

DATE

INT32

DATE

TIME

INT32

TIME_MILLIS

TIMESTAMP

INT96

-

ARRAY

-

LIST

MAP

-

MAP

ROW

-

STRUCT