All Products
Search
Document Center

Data Lake Analytics - Deprecated:PARQUET

Last Updated:May 21, 2020

This topic describes how to create tables for PARQUET files in DLA.

Parquet is a columnar storage file format supported by Apache Hadoop. When the same data is stored in ORC format and Parquet format, the data scanning performance is superior to that in CSV format.

Prerequisites

For Parquet test data preparations, see File format conversion.

Procedure

  1. Create an OSS schema.

    1. CREATE SCHEMA dla_oss_db with DBPROPERTIES(
    2. catalog='oss',
    3. location 'oss://dlaossfile1/dla/'
    4. )
  2. Create a Parquet table.

    1. CREATE EXTERNAL TABLE customer_parqet_date (
    2. c_custkey int,
    3. c_name string,
    4. c_address string,
    5. c_nationkey int,
    6. c_phone string,
    7. c_acctbal double,
    8. c_mktsegment string,
    9. c_comment string
    10. )
    11. STORED AS PARQUET
    12. LOCATION 'oss://dlaossfile1/TPC-H/customer_parquet/'

    STORED AS PARQUET: specifies the file format PARQUET.

  3. View the Parquet table data.

    1. SELECT * FROM customer_parqet_date

    Parquet data query result