All Products
Search
Document Center

Data Lake Analytics - Deprecated:ORC

Last Updated:Jul 29, 2019

This topic describes how to create tables for ORC data in DLA.

Optimized Row Columnar (ORC) is an optimized columnar storage file format supported by Apache Hive. Compared with CSV files, ORC files not only conserve storage space, but also improve the data query performance.

Prerequisites

For test data preparations, see File format conversion.

Procedure

  1. Create an OSS schema.

    1. CREATE SCHEMA dla_oss_db with DBPROPERTIES(
    2. catalog='oss',
    3. location 'oss://dlaossfile1/dla/'
    4. )
  2. Create an ORC table.

    1. CREATE EXTERNAL TABLE orders_orc_date (
    2. O_ORDERKEY INT,
    3. O_CUSTKEY INT,
    4. O_ORDERSTATUS STRING,
    5. O_TOTALPRICE DOUBLE,
    6. O_ORDERDATE DATE,
    7. O_ORDERPRIORITY STRING,
    8. O_CLERK STRING,
    9. O_SHIPPRIORITY INT,
    10. O_COMMENT STRING
    11. )
    12. STORED AS ORC
    13. LOCATION 'oss://dlaossfile1/TPC-H/orders_orc/'

    STORED AS ORC: indicates that the generated table is stored in ORC format.

  3. View ORC data.

    1. SELECT * FROM orders_orc_date

    orc