This topic describes the de-identification algorithms that Sensitive Data Discovery and Protection (SDDP) supports.

Category Description Algorithm Input Applicable sensitive data and scenario
Hashing Raw data cannot be retrieved after it is de-identified in this mode.

This type of algorithms is applicable to password de-identification or the scenario where you must check whether data is sensitive by comparison.

You can use common hash algorithms and set the salt value.

MD5 Salt value
  • Sensitive data: sensitive key information
  • Scenario: data storage
Secure Hash Algorithm 1 (SHA-1) Salt value
SHA-256 Salt value
Hash-based Message Authentication Code (HMAC) Salt value
Masking by using asterisks (*) or number signs (#) Raw data cannot be retrieved after it is de-identified in this mode.

This type of algorithms is applicable to the scenario where sensitive data is to be shown on the graphical user interface (GUI) or is to be shared with others.

This type of algorithms masks targeted content in sensitive data with asterisks (*) or number signs (#).

Keeps the first N characters and the last M characters. Values of N and M
  • Sensitive data: sensitive personal information
  • Scenarios:
    • Data usage
    • Data sharing
Keeps characters from the Xth position to the Yth position. Values of X and Y
Masks the first N characters and the last M characters. Values of N and M
Masks characters from the Xth position to the Yth position. Values of X and Y
Masks characters before a special character when the special character appears for the first time. At sign (@), ampersand (&), or period (.)
Masks characters after a special character when the special character appears for the first time. At sign (@), ampersand (&), or period (.)
Substitution (customization supported) Raw data can be retrieved after it is de-identified by some of the algorithms.

This type of algorithms can be used to de-identify fields in fixed formats, such as ID card numbers.

This type of algorithms substitutes the entire value or a part of the value of a field with the mapped value by using a mapping table or randomly based on a random interval. If raw data is substituted with mapped values, the raw data can be retrieved after de-identification. If raw data is substituted with random values, the raw data cannot be retrieved after de-identification. SDDP provides multiple built-in mapping tables and allows you to customize substitution algorithms.

Substitutes targeted content in ID card numbers with mapped values. Mapping table for substituting administrative region IDs
  • Sensitive data:
    • Sensitive personal information
    • Sensitive enterprise information
    • Sensitive device information
  • Scenarios:
    • Data storage
    • Data sharing
Substitutes targeted content in ID card numbers randomly. Code table for randomly substituting administrative region IDs
Substitutes targeted content in military IDs randomly. Code table for randomly substituting type codes
Substitutes targeted content in passport numbers randomly. Code table for randomly substituting purpose fields
Substitutes targeted content in Hong Kong and Macao exit-entry permit numbers randomly. Code table for randomly substituting purpose fields
Substitutes targeted content in bank card numbers randomly. Code table for randomly substituting Bank Identification Numbers (BINs)
Substitutes targeted content in telephone numbers randomly. Code table for randomly substituting administrative region IDs
Substitutes targeted content in mobile numbers randomly. Code table for randomly substituting mobile network codes
Substitutes targeted content in unified social credit codes randomly. Code table for randomly substituting registration authority IDs, code table for randomly substituting type codes, and code table for randomly substituting administrative region IDs
Substitutes targeted content in general tables with mapped values. Mapping table for substituting uppercase letters, mapping table for substituting lowercase letters, mapping table for substituting digits, and mapping table for substituting special characters
Substitutes targeted content in general tables randomly. Code table for randomly substituting uppercase letters, code table for randomly substituting lowercase letters, code table for randomly substituting digits, and code table for randomly substituting special characters
Bit shifting Raw data can be retrieved after it is de-identified by some of the algorithms.

This type of algorithms can be used to analyze and collect statistics on sensitive datasets.

SDDP provides two types of bit shifting algorithms. One algorithm rounds numbers and dates, and raw data cannot be retrieved after it is de-identified in this mode. The other algorithm bit-shifts text, and raw data can be retrieved after it is de-identified in this mode.

Rounds down a number to the Nth digit before the decimal point. Value of N
  • Sensitive data: general sensitive information
  • Scenarios:
    • Data storage
    • Data usage
Rounds dates. Date rounding level
Shifts characters. Number of places by which targeted bits are moved and shift direction (left or right)
Encryption Raw data can be retrieved after it is de-identified in this mode.

This type of algorithms can be used to encrypt sensitive fields that need to be retrieved after encryption.

Common symmetrical encryption algorithms are supported.

Data Encryption Standard (DES) algorithm Encryption key
  • Sensitive data:
    • Sensitive personal information
    • Sensitive enterprise information
  • Scenario: data storage
Triple Data Encryption Standard (3DES) algorithm Encryption key
Advanced Encryption Standard (AES) algorithm Encryption key
Shuffling Raw data cannot be retrieved after it is de-identified in this mode.

This type of algorithms can be used to de-identify structured data columns.

This type of algorithms extracts values of a field in a specified range from the source table and rearranges the values in the corresponding column or randomly selects values from the column within the value range and rearranges the selected values. In this way, the values are mixed up and de-identified.

Randomly shuffles data. Shuffling method: rearrangement or random selection
  • Sensitive data:
    • Sensitive device information
    • Sensitive location information
  • Scenario: data storage