This topic describes some important parameters in Delta.

Delta has three types of parameters:
  • Spark SQL parameters, which are used to execute SQL statements.
  • Runtime parameters, which can be dynamically configured in a session. These parameters are prefixed with spark.databricks.delta.
  • Non-runtime parameters, which can only be configured as global parameters in the Spark configuration file or specified as table parameters in the TBLPROPERTIES clause of the CREATE TABLE statement. Table parameters take precedence over global parameters. Global parameters are prefixed with spark.databricks.delta.properties.defaults. Table parameters are prefixed with delta.
Parameter Description
spark.databricks.delta.snapshotPartitions Default value: 10.

The number of partitions in the Delta log metadata. If the Delta log volume is large, set this parameter to a large value. If the Delta log volume is small, set this parameter to a small value. This parameter has a significant impact on the parsing performance of a Delta table.

spark.databricks.delta.retentionDurationCheck.enabled Default value: true.
Specifies whether to check the retention period when you delete tombstones.
Warning If you want to delete recently merged small files, you can set this parameter to false to disable the retention period check. However, we recommend that you use the default value. Otherwise, data generated recently may be deleted, which causes data read/write failures.
spark.databricks.delta.schema.autoMerge.enabled Default value: false.

Delta can check whether written data matches the schema of the destination table to ensure that the written data is correct. If the schema of your data changes, set the mergeSchema option to true when you write the data. This allows Delta to merge the schema of the written data with the schema of the destination table. You can also set the spark.databricks.delta.schema.autoMerge.enabled parameter to true to enable automatic merging when the data schema changes. However, we recommend that you do not set this parameter to true. Instead, use the mergeSchema option to explicitly enable merging.

spark.databricks.delta.properties.defaults.deletedFileRetentionDuration or delta.deletedFileRetentionDuration Default value: interval 1 week.
The retention period of Delta tombstones. If the spark.databricks.delta.retentionDurationCheck.enabled parameter is set to true, an exception is thrown when you delete tombstones that are still in the retention period.
Note The retention period must be greater than or equal to 1 hour.
spark.databricks.delta.properties.defaults.logRetentionDuration or delta.logRetentionDuration Default value: interval 30 days.
The validity period of Delta log files. A Delta log file expires when one of the following conditions is met:
  • The data file that corresponds to the log file is compacted.
  • The validity period of the log file ends. When Delta generates checkpoint files for Delta logs, it checks for expired Delta log files and deletes them to prevent the log files from growing infinitely.
spark.sql.sources.parallelPartitionDiscovery.parallelism Default value: 1000.
The number of parallel tasks used by Delta to scan files. If the number of files is small, set this parameter to a small value. It can be used only in the vacuum command. An improper setting of this parameter affects the efficiency of the vacuum command in scanning files.
Note This parameter is a Spark SQL parameter.