E-MapReduce supports using Presto to read Delta tables. Presto can use DeltaInputFormat or SymlinkTextInputFormat to read Delta tables. The method of using DeltaInputFormat is exclusive to E-MapReduce.
Use DeltaInputFormat (E-MapReduce only)
To use DeltaInputFormat to read a Delta table, follow these steps:
- Use the Hive client to create a foreign table in your Hive metastore. The foreign
table points to a Delta directory.
CREATE EXTERNAL TABLE delta_tbl(id bigint, `date` string) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'io.delta.hive.DeltaInputFormat' OUTPUTFORMAT 'io.delta.hive.DeltaOutputFormat' LOCATION '/tmp/delta_table';
Note- Presto cannot create a foreign table in Hive. Therefore, you must manually create a foreign table in Hive.
- If the Delta table is a partitioned table, create a partitioned foreign table in Hive by using the PARTITIONED BY clause. When a new partition is added to the Delta table, run the msck repair command to synchronize the partition information to the foreign table in Hive.
- Start the Presto client to read data.
SELECT * FROM delta_tbl LIMIT 10;
Use SymlinkTextInputFormat
To use SymlinkTextInputFormat to read a Delta table, follow these steps:
-
Use Spark SQL to create a symlink file for the target Delta table.
GENERATE symlink_format_manifest FOR TABLE delta.`/delta_test/order`
Note Every time the Delta table is updated, you need to execute the GENERATE statement to make sure that Presto reads the latest data from the Delta table. - Use the Hive client to create a foreign table in your Hive metastore. The foreign
table points to a Delta directory.
CREATE EXTERNAL TABLE delta_tbl(id bigint, `date` string) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.SymlinkTextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION '/tmp/delta_table/_symlink_format_manifest/';
- Start the Presto client to read data.
SELECT * FROM delta_tbl LIMIT 10;
FAQ
Q: Does Presto support reading Delta tables created by Spark SQL?
A: No, Presto currently is not compatible with Delta tables created with the USING syntax of Spark SQL.