This topic describes the data de-identification algorithms that are supported by Data Security Center (DSC).

Category Description Algorithm Input Applicable sensitive data and scenario
Hashing Raw data cannot be retrieved after it is de-identified by using this type of algorithm.

This type of algorithm is applicable to password protection or scenarios in which you must check whether data is sensitive by comparison.

You can use common hash algorithms and specify a salt value.

MD5 Salt value
  • Sensitive data: keys
  • Scenario: data storage
Secure Hash Algorithm 1 (SHA-1) Salt value
SHA-256 Salt value
Hash-based Message Authentication Code (HMAC) Salt value
Redaction by using asterisks (*) or number signs (#) Raw data cannot be retrieved after it is de-identified by using this type of algorithm.

This type of algorithm is applicable to scenarios in which sensitive data is to be shown on a user interface or shared with others.

This type of algorithm redacts specified text in sensitive data with asterisks (*) or number signs (#).

Keeps the first N characters and the last M characters Values of N and M
  • Sensitive data: sensitive personal information
  • Scenarios:
    • Data usage
    • Data sharing
Keeps characters from the Xth position to the Yth position Values of X and Y
Redacts the first N characters and the last M characters Values of N and M
Redacts characters from the Xth position to the Yth position Values of X and Y
Redacts characters that precede a special character when the special character appears for the first time At sign (@), ampersand (&), or period (.)
Redacts characters that follow a special character when the special character appears for the first time At sign (@), ampersand (&), or period (.)
Substitution (customization supported) Raw data can be retrieved after it is de-identified by using some of the algorithms.

This type of algorithm can be used to de-identify fields in fixed formats, such as ID card numbers.

This type of algorithm substitutes the entire value or part of the value of a field with a mapped value by using a mapping table. In this case, raw data can be retrieved after it is de-identified. This type of algorithm also substitutes the entire value or part of the value of a field randomly based on a random interval. In this case, raw data cannot be retrieved after it is de-identified. DSC provides multiple built-in mapping tables and allows you to customize substitution algorithms.

Substitutes specific content in ID card numbers with mapped values Mapping table for substituting the IDs of administrative regions
  • Sensitive data:
    • Sensitive personal information
    • Sensitive information of enterprises
    • Sensitive information of devices
  • Scenarios:
    • Data storage
    • Data sharing
Randomly substitutes specific content in ID card numbers Code table for randomly substituting the IDs of administrative regions
Randomly substitutes specific content in the IDs of military officer cards Code table for randomly substituting type codes
Randomly substitutes specific content in passport numbers Code table for randomly substituting purpose fields
Randomly substitutes specific content in permit numbers of Exit-Entry Permits for Travelling to and from Hong Kong and Macao Code table for randomly substituting purpose fields
Randomly substitutes specific content in bank card numbers Code table for randomly substituting Bank Identification Numbers (BINs)
Randomly substitutes specific content in landline telephone numbers Code table for randomly substituting the IDs of administrative regions
Randomly substitutes specific content in mobile numbers Code table for randomly substituting mobile network codes
Randomly substitutes specific content in unified social credit codes Code table for randomly substituting the IDs of registration authorities, code table for randomly substituting type codes, and code table for randomly substituting the IDs of administrative regions
Substitutes specific content in general tables with mapped values Mapping table for substituting uppercase letters, mapping table for substituting lowercase letters, mapping table for substituting digits, and mapping table for substituting special characters
Randomly substitutes specific content in general tables Code table for randomly substituting uppercase letters, code table for randomly substituting lowercase letters, code table for randomly substituting digits, and code table for randomly substituting special characters
Rounding Raw data can be retrieved after it is de-identified by using some of the algorithms.

This type of algorithm can be used to analyze and collect statistics on sensitive datasets.

DSC provides two types of rounding algorithms. One algorithm rounds numbers and dates, and raw data cannot be retrieved after it is de-identified. The other algorithm bit-shifts text, and raw data can be retrieved after it is de-identified.

Rounds numbers Numbers are rounded to the Nth digit before the decimal point. Valid values of N: 1 to 19.
  • Sensitive data: general sensitive information
  • Scenarios:
    • Data storage
    • Data usage
Rounds dates Dates are rounded to the year, month, day, hour, or minute level.
Shifts characters Number of places by which specific bits are moved and shift direction (left or right)
Encryption Raw data can be retrieved after it is de-identified by using this type of algorithm.

This type of algorithm can be used to encrypt sensitive fields that need to be retrieved after encryption.

Common symmetrical encryption algorithms are supported.

Data Encryption Standard (DES) algorithm Encryption key
  • Sensitive data:
    • Sensitive personal information
    • Sensitive information of enterprises
  • Scenario: data storage
Triple Data Encryption Standard (3DES) algorithm Encryption key
Advanced Encryption Standard (AES) algorithm Encryption key
Shuffling Raw data cannot be retrieved after it is de-identified by using this type of algorithm.

This type of algorithm can be used to de-identify structured data columns.

This type of algorithm extracts values of a field in a specified range from the source table and rearranges the values in a specific column. Alternatively, this type of algorithm randomly selects values from a specific column within the value range and rearranges the selected values. This way, the values are mixed up and de-identified.

Randomly shuffles data Shuffle method: rearrangement or random selection
  • Sensitive data:
    • Sensitive information of devices
    • Sensitive location information
  • Scenario: data storage