All Products
Search
Document Center

Storage formats and SerDe

Last Updated: Jul 25, 2019

DLA provides various built-in serializer and deserializer (SerDe) types, which are used for serializing and deserializing data during file processing. Instead of compiling programs by yourself, you can choose one or more SerDes to match formats of the files in your OSS instances. DLA can use different types of SerDe to query and analyze files of different formats in OSS, including plain text files, such as CSV and TSV files, ORC, Parquet, JSON, RCFile, and Avro files.

When creating an OSS table in DLA, you must run STORED AS to specify the format of data files in OSS.

In the following example, the file format is TXT.

  1. CREATE EXTERNAL TABLE nation (
  2. N_NATIONKEY INT,
  3. N_NAME STRING,
  4. N_REGIONKEY INT,
  5. N_COMMENT STRING
  6. )
  7. ROW FORMAT DELIMITED FIELDS TERMINATED BY '|'
  8. STORED AS TEXTFILE
  9. LOCATION 'oss://test-bucket-julian-1/tpch_100m/nation';

After you create the table, you can run SHOW CREATE TABLE to view the original statement for creating the table.

  1. show create table nation;
  2. +-------------------------------+
  3. | Result |
  4. +-------------------------------+
  5. | CREATE EXTERNAL TABLE `nation`(
  6. `n_nationkey` int,
  7. `n_name` string,
  8. `n_regionkey` int,
  9. `n_comment` string)
  10. ROW FORMAT DELIMITED
  11. FIELDS TERMINATED BY '|'
  12. STORED AS `TEXTFILE`
  13. LOCATION'oss://bucket-name/tpch_100m/nation'|

The following table lists the SerDe types that DLA supports currently. To create a table for a file of any of the following formats, run STORED AS. DLA selects SERDE, INPUTFORMAT, or OUTPUTFORMAT accordingly.

SerDe typeSourceA data file stored in TXT format. A data file stored in ORC format.A data file stored in Parquet format.A data file stored in RCFile format.A data file stored in Avro format.A data file stored in JSON format, excluding the geographical JSON data of Esri ArcGIS.
LazySimpleSerDe
ColumnarSerDe
RegexSerDe
STORED AS RCFILE
STORED AS AVRO
STORED AS JSON

When you use STORED AS to specify the file format, you can specify the SerDe and special column separators based on the file characteristics. The samples of files in different formats describe how to specify the column separators.