This topic uses orders_tbl as an example to describe how to convert a text file into an Optimized Row Columnar (ORC) file.

Procedure

  1. Create an Object Storage Service (OSS) schema.
     CREATE SCHEMA dla_oss_db with DBPROPERTIES(
      catalog='oss',
      location 'oss://dlaossfile1/'
      )
  2. Create a table named orders_txt in Data Lake Analytics (DLA). The LOCATION parameter specifies the path of the orders.tbl file in OSS.
     CREATE EXTERNAL TABLE orders_txt (
          O_ORDERKEY INT, 
          O_CUSTKEY INT, 
          O_ORDERSTATUS STRING, 
          O_TOTALPRICE DOUBLE, 
          O_ORDERDATE DATE, 
          O_ORDERPRIORITY STRING, 
          O_CLERK STRING, 
          O_SHIPPRIORITY INT, 
          O_COMMENT STRING
      ) 
      ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' 
      STORED AS TEXTFILE LOCATION 'oss://dlaossfile1/orders/orders.tbl'
  3. Create the destination table orders_orc in DLA and set LOCATION to the required location in OSS.
     CREATE EXTERNAL TABLE orders_orc (
         O_ORDERKEY INT, 
         O_CUSTKEY INT, 
         O_ORDERSTATUS STRING, 
         O_TOTALPRICE DOUBLE, 
         O_ORDERDATE DATE, 
         O_ORDERPRIORITY STRING, 
         O_CLERK STRING, 
         O_SHIPPRIORITY INT, 
         O_COMMENT STRING
     ) 
     STORED AS ORC LOCATION 'oss://dlaossfile1/orders_orc/'
    Note
    • STORED AS ORC: specifies that the destination table is stored as an ORC file.
    • You must set the LOCATION parameter to an existing OSS path and the path must end with a forward slash (/).
  4. Execute the INSERT...SELECT statement to insert data from the orders_txt table into the orders_orc table.
     INSERT INTO orders_orc SELECT * FROM orders_txt;
  5. After the INSERT...SELECT statement is executed, you can view the generated ORC file in OSS.