This section demonstrates how to validate, merge, and overwrite schemas.

Validate a schema

By default, schema validation is enabled. If the written data does not contain certain fields specified in the schema of the destination table, these fields are set to null. If the written data contains fields not present in the schema, an exception is thrown, indicating that the data does not match the schema.

import scala.collection.JavaConverters._
import org.apache.spark.sql.Row
import org.apache.spark.sql.types._

val schema=new StructTupe().add("f1", LongType)
val df = spark.createDataFrame(List(Row(1L)).asJava, schema)
df.write.format("delta").mode("append").save("/tmp/delta_table")

org.apache.spark.sql.AnalysisException: A schema mismatch detected when writing to the Delta table.
To enable schema migration, please set:
'.option("mergeSchema", "true")'.

Table schema:
root
-- id: long (nullable = true)
-- date: date (nullable = true)
-- name: string (nullable = true)
-- sales: string (nullable = true)


Data schema:
root
-- f1: long (nullable = true)

Schema validation fails in the following cases:

  • The written data contains fields that are not defined in the schema of the destination table.
  • The data type of a field in the written data is different from that defined in the schema.
  • The written data contains fields that differ only by case, such as Foo and foo.

Merge schemas

If you need to write data to the destination table and update the schema of the destination table, you must enable the automatic schema merging feature by setting the mergeSchema option to true. When the automatic schema merging feature is enabled, if the schema of the written data and that of the destination table do not match but meet the conditions for automatic merging, the schemas are merged.

  • Scala
    df.write.option("mergeSchema", "true")
  • SQL

    Currently, this operation is not supported in SQL.

Conditions for automatic merging include:
  • Columns are added.
  • Data types are converted in a compatible way, including:
    • NullType > Any other type
    • ByteType > ShortType > IntegerType
      Note If the schema of the written data changes partitions, it cannot be merged with the schema of the destination table. You can only overwrite the schema of the destination table.

Overwrite a schema

Schemas cannot be merged in some cases, for example, when a column is deleted, or the data type of a column is converted to an incompatible data type. In this case, you need to overwrite the schema of the destination table.

  • Scala
    df.write.option("overwriteSchema", "true")

    When you overwrite a schema, you also need to overwrite data. Otherwise, the data and the schema may not match. To overwrite a schema, use the df.write.mode("overwrite").option("overwriteSchema", "true") setting to overwrite both the original data and schema. In the setting, .option("overwriteSchema", "true") is required because the df.write.mode("overwrite") setting only overwrites the data, not the schema.

  • SQL

    Currently, this operation is not supported in SQL.