E-MapReduce supports using Presto to read Delta tables. Presto can use DeltaInputFormat or SymlinkTextInputFormat to read Delta tables. The method of using DeltaInputFormat is exclusive to E-MapReduce.

Use DeltaInputFormat (E-MapReduce only)

To use DeltaInputFormat to read a Delta table, follow these steps:
  1. Use the Hive client to create a foreign table in your Hive metastore. The foreign table points to a Delta directory.
    CREATE EXTERNAL TABLE delta_tbl(id bigint, `date` string)
    ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
    STORED AS INPUTFORMAT 'io.delta.hive.DeltaInputFormat'
    OUTPUTFORMAT 'io.delta.hive.DeltaOutputFormat'
    LOCATION '/tmp/delta_table';
    Note
    • Presto cannot create a foreign table in Hive. Therefore, you must manually create a foreign table in Hive.
    • If the Delta table is a partitioned table, create a partitioned foreign table in Hive by using the PARTITIONED BY clause. When a new partition is added to the Delta table, run the msck repair command to synchronize the partition information to the foreign table in Hive.
  2. Start the Presto client to read data.
    SELECT * FROM delta_tbl LIMIT 10;

Use SymlinkTextInputFormat

To use SymlinkTextInputFormat to read a Delta table, follow these steps:
  1. Use Spark SQL to create a symlink file for the target Delta table.

    
    
    GENERATE symlink_format_manifest FOR TABLE delta.`/delta_test/order`
    Note Every time the Delta table is updated, you need to execute the GENERATE statement to make sure that Presto reads the latest data from the Delta table.
  2. Use the Hive client to create a foreign table in your Hive metastore. The foreign table points to a Delta directory.
    CREATE EXTERNAL TABLE delta_tbl(id bigint, `date` string)
    ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
    STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.SymlinkTextInputFormat'
    OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
    LOCATION '/tmp/delta_table/_symlink_format_manifest/';
  3. Start the Presto client to read data.
    SELECT * FROM delta_tbl LIMIT 10;

FAQ

Q: Does Presto support reading Delta tables created by Spark SQL?

A: No, Presto currently is not compatible with Delta tables created with the USING syntax of Spark SQL.