Delta Lake supports three categories of parameters, each with a different configuration scope and prefix.
| Category | How to set | Prefix |
|---|
| Spark SQL parameters | Used to execute SQL statements | spark.sql. |
| Runtime parameters | Set dynamically per session | spark.databricks.delta. |
| Non-runtime parameters | Set as global defaults in the Spark configuration file, or per-table in TBLPROPERTIES | Global: spark.databricks.delta.properties.defaults. Table: delta. |
For non-runtime parameters, table properties take precedence over global parameters. To set a table property, use the TBLPROPERTIES clause in a CREATE TABLE statement.
-- Set a global default
SET spark.databricks.delta.properties.defaults.deletedFileRetentionDuration = interval 2 weeks;
-- Set a table property (applies to this table only)
CREATE TABLE my_table (id INT, data STRING)
TBLPROPERTIES ('delta.deletedFileRetentionDuration' = 'interval 2 weeks');
Parameter reference
The following tables describe the available parameters grouped by function.
Data retention
| Parameter | Data type | Default | Description |
|---|
spark.databricks.delta.properties.defaults.deletedFileRetentionDuration or delta.deletedFileRetentionDuration | CalendarInterval | interval 1 week | How long Delta Lake retains tombstones (markers for deleted files) before physically removing them. If retentionDurationCheck.enabled is true, an exception is thrown when you run VACUUM against tombstones still within this period. The retention period must be at least 1 hour. |
spark.databricks.delta.properties.defaults.logRetentionDuration or delta.logRetentionDuration | CalendarInterval | interval 30 days | How long Delta log files remain valid. A log file expires when its corresponding data file is compacted, or when this duration elapses. Delta checks for and deletes expired log files each time it generates a checkpoint, preventing unbounded log growth. |
spark.databricks.delta.retentionDurationCheck.enabled | Boolean | true | Whether Delta enforces the retention period check when deleting tombstones. Set to false only to remove recently merged small files, but we recommend that you use the default value. Otherwise, data generated recently may be deleted, which causes data read/write failures. |
Schema evolution
| Parameter | Data type | Default | Description |
|---|
spark.databricks.delta.schema.autoMerge.enabled | Boolean | false | Whether Delta automatically merges the schema of incoming data with the destination table schema on write. We recommend that you do not set this parameter to true. Instead, pass the mergeSchema option explicitly when writing data to control schema evolution per write operation. |
Performance tuning
| Parameter | Data type | Default | Description |
|---|
spark.databricks.delta.snapshotPartitions | Int | 10 | Number of partitions used when reading Delta log metadata. Increase this value for large Delta logs to improve parsing performance; decrease it for small Delta logs. This parameter has a significant effect on Delta table parsing speed. |
spark.sql.sources.parallelPartitionDiscovery.parallelism | Int | 1000 | Number of parallel tasks Delta uses to scan files. This parameter applies only to the VACUUM command. Decrease it if the number of files is small to avoid unnecessary overhead. This is a Spark SQL parameter. |