Data Lake Analytics (DLA) has built-in serializer/deserializer (SerDe) libraries to process data files in different formats. You can directly select one or more SerDe libraries to match the formats of data files in Object Storage Service (OSS) without the need to write programs. DLA can use SerDe libraries to query and analyze data of OSS files in different formats. The formats include TXT (CSV and TSV), ORC, Parquet, JSON, RCFile, and AVRO.

When you create an OSS external table in DLA, you need to use a STORED AS clause in the table creation statement to specify the format of the data file in OSS.

For example, the STORED AS clause in the following statement specifies that the file format is TXT.

CREATE EXTERNAL TABLE nation (
    N_NATIONKEY INT, 
    N_NAME STRING, 
    N_REGIONKEY INT, 
    N_COMMENT STRING
) 
ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' 
STORED AS TEXTFILE 
LOCATION 'oss://test-bucket-julian-1/tpch_100m/nation';

After the OSS external table is created, you can execute the SHOW CREATE TABLE statement to view the table creation statement.

show create table nation;
+-------------------------------+
| Result                        |
+-------------------------------+
| CREATE EXTERNAL TABLE `nation`(
  `n_nationkey` int,
  `n_name` string,
  `n_regionkey` int,
  `n_comment` string)
   ROW FORMAT DELIMITED
   FIELDS TERMINATED BY '|'
   STORED AS `TEXTFILE`
   LOCATION'oss://bucket-name/tpch_100m/nation'|

The following table lists the STORED AS clauses supported by DLA. When you create an external table, you can specify the STORED AS clause in the table creation statement. Then, DLA automatically selects an appropriate SerDe library, input format, and output format for the table that you want to create.

STORED AS TEXTFILE An external table is stored as a TXT file. By default, the file is in the TXT format.

Each row in a file corresponds to a record in the table.

STORED AS PARQUET An external table is stored as a Parquet file.
STORED AS ORC An external table is stored as an ORC file.
STORED AS RCFILE An external table is stored as an RCFile file.
STORED AS AVRO An external table is stored as an AVRO file.
STORED AS JSON An external table is stored as a JSON file, except for the GeoJSON file of Esri ArcGIS.

When you use a STORED AS clause to specify the file format, you can also specify the SerDe libraries and special column delimiters based on the file characteristics. For more information, see the descriptions of different file formats.