DLA provides various built-in serializer and deserializer (SerDe) types, which are used for serializing and deserializing data during file processing. Instead of compiling programs by yourself, you can choose one or more SerDes to match formats of the files in your OSS instances. DLA can use different types of SerDe to query and analyze files of different formats in OSS, including plain text files, such as CSV and TSV files, ORC, Parquet, JSON, RCFile, and Avro files.
When creating an OSS table in DLA, you must run STORED AS
to specify the format of data files in OSS.
In the following example, the file format is TXT.
CREATE EXTERNAL TABLE nation (
N_NATIONKEY INT,
N_NAME STRING,
N_REGIONKEY INT,
N_COMMENT STRING
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '|'
STORED AS TEXTFILE
LOCATION 'oss://test-bucket-julian-1/tpch_100m/nation';
After you create the table, you can run SHOW CREATE TABLE
to view the original statement for creating the table.
show create table nation;
+-------------------------------+
| Result |
+-------------------------------+
| CREATE EXTERNAL TABLE `nation`(
`n_nationkey` int,
`n_name` string,
`n_regionkey` int,
`n_comment` string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '|'
STORED AS `TEXTFILE`
LOCATION'oss://bucket-name/tpch_100m/nation'|
The following table lists the SerDe types that DLA supports currently. To create a table for a file of any of the following formats, run STORED AS
. DLA selects SERDE, INPUTFORMAT, or OUTPUTFORMAT accordingly.
LazySimpleSerDe |
ColumnarSerDe |
RegexSerDe |
STORED AS RCFILE |
STORED AS AVRO |
STORED AS JSON |
When you use STORED AS
to specify the file format, you can specify the SerDe and special column separators based on the file characteristics. The samples of files in different formats describe how to specify the column separators.