This topic describes how to use Delta Lake to perform batch reads and writes.

Create a table and write data

  • Scala
    // Create a non-partitioned table and write data to it.
    data.write.format("delta").save("/tmp/delta_table")
    // Create a partitioned table and write data to it.
    data.write.format("delta").partitionedBy("date").save("/tmp/delta_table")
  • SQL
    -- Create a non-partitioned table and write data to it.
    CREATE TABLE delta_table (id INT) USING delta LOCATION "/tmp/delta_table";
    INSERT INTO delta_table VALUES 0,1,2,3,4;
    -- Create a partitioned table and write data to it.
    CREATE TABLE delta_table (
    
    
    id INT, date STRING) USING delta PARTITIONED BY (date) LOCATION "/tmp/delta_table";
    INSERT INTO delta_table PARTITION (date='2019-11-11') VALUES 0,1,2,3,4;
    -- Write data to dynamic partitions.
    INSERT INTO delta_table PARTITION (date) VALUES (0,'2019-11-01'),(1,'2019-11-02'),(2,'2019-11-05'),(3,'2019-11-08'),(4,'2019-11-11');

Write data to a table in append mode

  • Scala
    // Write data to a non-partitioned table in append mode.
    data.write.format("delta").mode("append").save("/tmp/delta_table")
    // Write data to a partitioned table in append mode.
    data.write.format("delta").mode("append").save("/tmp/delta_table")
  • SQL
    -- Write data to a non-partitioned table in append mode.
    INSERT INTO delta_table VALUES 0,1,2,3,4;
    -- Write data to a partitioned table in append mode.
    INSERT INTO delta_table PARTITION (date='2019-11-11') VALUES 0,1,2,3,4;
    -- Write data to dynamic partitions in append mode.
    INSERT INTO delta_table PARTITION (date) VALUES (0,'2019-11-01'),(1,'2019-11-02'),(2,'2019-11-05'),(3,'2019-11-08'),(4,'2019-11-11');

Write data to a table in overwrite mode

  • Scala
    // Write data to a non-partitioned table in overwrite mode.
    data.write.format("delta").mode("overwrite").save("/tmp/delta_table")
    // Write data to a partitioned table in overwrite mode.
    data.write.format("delta").mode("overwrite").option("replaceWhere", "date >= '2019-11-01' AND date <= '2019-11-11'").save("/tmp/delta_table")
  • SQL
    INSERT OVERWRITE TABLE delta_table VALUES 0,1,2,3,4;
    -- Write data to a partitioned table in overwrite mode.
    INSERT OVERWRITE delta_table PARTITION (date='2019-11-11') VALUES 0,1,2,3,4;
    -- Write data to dynamic partitions in overwrite mode.
    INSERT OVERWRITE delta_table PARTITION (date) VALUES (0,'2019-11-01'),(1,'2019-11-02'),(2,'2019-11-05'),(3,'2019-11-08'),(4,'2019-11-11');

Read a table

  • Scala
    spark.read.format("delta").load("/tmp/delta_table")
  • SQL
    SELECT * FROM delta_table;

Read earlier versions of a table

  • Scala
    df1 = spark.read.format("delta").option("timestampAsOf", timestamp_string).load("/tmp/delta_table")
    df2 = spark.read.format("delta").option("versionAsOf", version).load("/tmp/delta_table")
  • SQL

    This operation is not supported in SQL.