This topic uses customer.tbl as an example to describe how to convert text files to Parquet files.
Procedure
Create an OSS schema.
CREATE SCHEMA dla_oss_db with DBPROPERTIES(
catalog='oss',
location 'oss://dlaossfile1/TPC-H/'
)
Create a table named customer_txt in DLA and set LOCATION to the path of customer.tbl in OSS.
CREATE EXTERNAL TABLE customer_txt (
c_custkey int,
c_name string,
c_address string,
c_nationkey int,
c_phone string,
c_acctbal double,
c_mktsegment string,
c_comment string
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '|'
STORED AS TEXTFILE LOCATION 'oss://dlaossfile1/TPC-H/customer/customer.tbl'
Create the target table customer_parquet in DLA and set LOCATION to the required path in OSS.
Note: LOCATION must be an existing directory in OSS and ended with
/
.CREATE EXTERNAL TABLE customer_parquet (
c_custkey int,
c_name string,
c_address string,
c_nationkey int,
c_phone string,
c_acctbal double,
c_mktsegment string,
c_comment string
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '|'
STORED AS PARQUET LOCATION 'oss://dlaossfile1/TPC-H/customer_parquet/'
STORED AS PARQUET
: indicates that the table is stored in Parquet format.Run the
INSERT...SELECT
statement to insert data from the customer_txt table to the customer_parquet table.INSERT INTO customer_parquet SELECT * FROM customer_txt;
View the data in table customer_parquet.
After the
INSERT...SELECT
statement is executed, view the Parquet file created in OSS.More information