When a backup operation is performed on a database, the source deduplication feature allows the system to back up only data that has been modified since the last backup and ignore other data that has already been backed up.

Background information

Rapid proliferation of data makes the following issues of the traditional backup method more prominent:

  • High bandwidth cost: Leased lines with large bandwidths are required to migrate a large amount of backup data to the cloud. This incurs high costs for the leased lines.
  • High storage cost: You are charged for the storage of a large amount of duplicate data.

Source deduplication is introduced in Database Backup to save bandwidth usage and storage space during physical backups.

Supported backup types

Source deduplication can be used for physical backup of self-managed MySQL databases.

Typical scenarios

  • Large data volumes

    When a large amount of data needs to be backed up, you can use source deduplication to ease pressure on bandwidth and storage.

  • Infrequent data modifications

    When a small amount of data is modified, the source deduplication feature significantly improves the data transmission efficiency and decreases storage costs.

    Note If no modifications have been made on a database when source deduplication is enabled on the database, no data is transmitted during a backup.
  • Long-term archiving

    Source deduplication is most suitable for scenarios where large amounts of duplicate data exists. However, it can also be used in other scenarios to increase compression ratios and decrease data storage costs for long-term archiving.

Benefits

Source deduplication offers the following benefits:

  • High compression ratio (uncompressed file size/compressed file size): Real test results on MySQL data show that when source deduplication is enabled, the compression ratio is about 4:1. In contrast, the data compression ratio provided by gzip is about 2.5:1.
  • Fast upload: Data segments can be concurrently uploaded, which significantly increases the backup speed.
  • Low resource consumption: Source deduplication strikes a balance between deduplication ratio and segment size to minimize the consumption of CPU and memory resources.
  • Upload speed throttling: You can set the limit for data upload speed based on your bandwidth to ensure stable service provision.

Note

The source deduplication feature is in public preview. To use this feature, scan the following QR code to join the DingTalk group whose ID is 35585947. One-to-one technical support is provided during the public preview. Each account is offered a three-month backup schedule for free.

QR code to join the DingTalk group whose ID is 35585947