This topic describes some important parameters in Delta.
- Spark SQL parameters, which are used to execute SQL statements.
- Runtime parameters, which can be dynamically configured in a session. These parameters
are prefixed with
spark.databricks.delta.
- Non-runtime parameters, which can only be configured as global parameters in the Spark
configuration file or specified as table parameters in the
TBLPROPERTIES
clause of the CREATE TABLE statement. Table parameters take precedence over global parameters. Global parameters are prefixed withspark.databricks.delta.properties.defaults.
Table parameters are prefixed withdelta.
Parameter | Description |
---|---|
spark.databricks.delta.snapshotPartitions |
Default value: 10.
The number of partitions in the Delta log metadata. If the Delta log volume is large, set this parameter to a large value. If the Delta log volume is small, set this parameter to a small value. This parameter has a significant impact on the parsing performance of a Delta table. |
spark.databricks.delta.retentionDurationCheck.enabled |
Default value: true.
Specifies whether to check the retention period when you delete tombstones.
Warning If you want to delete recently merged small files, you can set this parameter to false
to disable the retention period check. However, we recommend that you use the default
value. Otherwise, data generated recently may be deleted, which causes data read/write
failures.
|
spark.databricks.delta.schema.autoMerge.enabled |
Default value: false.
Delta can check whether written data matches the schema of the destination table to
ensure that the written data is correct. If the schema of your data changes, set the
|
spark.databricks.delta.properties.defaults.deletedFileRetentionDuration or delta.deletedFileRetentionDuration |
Default value: interval 1 week.
The retention period of Delta tombstones. If the
spark.databricks.delta.retentionDurationCheck.enabled parameter is set to true, an exception is thrown when you delete tombstones that
are still in the retention period.
Note The retention period must be greater than or equal to 1 hour.
|
spark.databricks.delta.properties.defaults.logRetentionDuration or delta.logRetentionDuration |
Default value: interval 30 days.
The validity period of Delta log files. A Delta log file expires when one of the following
conditions is met:
|
spark.sql.sources.parallelPartitionDiscovery.parallelism |
Default value: 1000.
The number of parallel tasks used by Delta to scan files. If the number of files is
small, set this parameter to a small value. It can be used only in the vacuum command.
An improper setting of this parameter affects the efficiency of the vacuum command
in scanning files.
Note This parameter is a Spark SQL parameter.
|