This topic provides an example on how to use the Parquet format and describes the parameters and data type mappings of Parquet.
Limits
The Parquet format can be used to read and write Parquet data.
Only the supported Object Storage Service (OSS) connector can read data in the Parquet format.
Sample code
The following sample code provides an example on how to create a table in the Parquet format by using the OSS connector.
CREATE TABLE user_behavior (
user_id BIGINT,
item_id BIGINT,
category_id BIGINT,
behavior STRING,
ts TIMESTAMP(3),
dt STRING
) PARTITIONED BY (dt) WITH (
'connector' = 'filesystem',
'path' = 'oss://<bucket>/path',
'format' = 'parquet'
)Parameters
Parameter | Required | Default value | Data type | Description |
format | Yes | No default value | STRING | The format that you declare to use. If you want to use the Parqeut format, set this parameter to parquet. |
parquet.utc-timezone | No | false | BOOLEAN | Specifies whether to use the UTC time zone. You can configure this parameter to support the conversion between epoch time and a time specified by LocalDateTime. Valid values:
|
The Parquet format also supports the parameters provided in ParquetOutputFormat. For example, you can configure parquet.compression=GZIP to enable the GZIP compression feature.
Data type mappings
The mappings between Parquet data types and Flink SQL data types are compatible with the mappings between Parquet data types and Apache Hive data types. The mappings between Parquet data types and Flink SQL data types defer from the mappings between Parquet data types and Apache Spark data types in the following aspects:
The TIMESTAMP data type of Flink SQL is mapped to the INT96 data type of Parquet regardless of the precision.
The DECIMAL data type of Flink SQL is mapped to a Parquet array that has a fixed number of bytes in length based on the precision.
The following table describes the mappings between Flink SQL data types and Parquet data types.
Flink SQL data type | Parquet data type | Logical data type of Parquet |
CHAR, VARCHAR, and STRING | BINARY | UTF8 |
BOOLEAN | BOOLEAN | - |
BINARY and VARBINARY | BINARY | - |
DECIMAL | FIXED_LEN_BYTE_ARRAY | DECIMAL |
TINYINT | INT32 | INT_8 |
SMALLINT | INT32 | INT_16 |
INT | INT32 | - |
BIGINT | INT64 | - |
FLOAT | FLOAT | - |
DOUBLE | DOUBLE | - |
DATE | INT32 | DATE |
TIME | INT32 | TIME_MILLIS |
TIMESTAMP | INT96 | - |
ARRAY | - | LIST |
MAP | - | MAP |
ROW | - | STRUCT |