All Products
Search
Document Center

ApsaraDB for OceanBase:Desensitize data

Last Updated:Jun 07, 2024

Background information

Data desensitization deforms sensitive privacy information, such as names, ID card numbers, mobile numbers, landline numbers, bank accounts, and email addresses, by using desensitization algorithms to protect sensitive privacy data.

Concepts

  • Data desensitization: a feature that uses specific algorithms and techniques to process, obfuscate, or replace sensitive data during data processing and storage so that the data cannot be identified or restored, thereby protecting data security and preventing data breaches.

  • Dynamic desensitization: a process of desensitizing sensitive data in real time when a user queries the database, without modifying the source data in the database. Generally, dynamic desensitization is used in a production environment. It avoids data breaches while ensuring the integrity and accuracy of the original data. However, dynamic desensitization is slow, which can compromise the query efficiency of the database.

  • Static desensitization: a process of preprocessing sensitive data and storing the processed data in storage media, such as a database. Static desensitization is usually used in testing, development, and demonstration environments. It protects sensitive data from being viewed by unauthorized personnel and avoids the legal liability of data breaches. Static desensitization is quick, which can improve the query efficiency. However, after static desensitization, the original data cannot be restored, which undermines data accuracy.

  • Desensitization algorithm: an algorithm used to desensitize sensitive data. A desensitization algorithm can effectively protect the security of sensitive data and avoid data breaches while retaining the data formats and structures to facilitate queries and usage.

  • Identification rule: a rule used to automatically identify sensitive data for data desensitization. When you scan to add sensitive columns, sensitive columns are automatically identified based on specified rules.

  • Sensitive column: a column that contains sensitive data in a database table.

Execution process

image

  1. Log on to the ODC console as the project administrator and choose Security Specifications > Masking Algorithm to view the built-in desensitization algorithms and test the desensitization effect.

  2. Log on to the ODC console as the project administrator and choose Projects > Sensitive Columns. You can click Add Sensitive Column and select Add Manually or Scan to Add. To select Scan to Add, make sure that identification rules have been created.

  3. When you log on as a regular user and perform queries in the SQL window, export a result set, export a ticket, or submit a database change ticket, the output data in the sensitive columns is desensitized.

Prerequisites

  • The project administrator or DBA can manage sensitive columns and identification rules.

  • All users can view and test the desensitization effect, but regular users are not allowed to create, edit, or delete desensitization algorithms.

Considerations

  • Data desensitization is not supported in command-line windows.

  • Data desensitization is not supported during PL execution.

  • When you configure an identification rule script, Groovy supports only the Objects and String classes in Java.

  • When you configure an identification rule script, Groovy closures and inherent closure functions are not supported.

  • If sensitive data is involved when you export data from a MySQL data source, the mask all algorithm is used for desensitization.

Sensitive column management

Add sensitive columns

Example: Desensitize the email and mobile_phone columns of the student table in the odc_test database.

Parameter

Example value

Data source

mysql4.2.4

Source database name

odc_test

Table name

student

  1. In the project collaboration window, choose Projects > Sensitive Columns. You can click Add Sensitive Column and select Add Manually or Scan to Add.

    image

  2. After you manually add or scan to add sensitive columns, click Submit.

    • Method 1: Manually add sensitive columns

      image

    • Method 2: Scan to add sensitive columns

      Note

      Before you scan to add sensitive columns, make sure that identification rules have been created. For more information, see the Identification rule management section in this topic.

      image

  3. In the sensitive column list, you can view and enable added sensitive columns.

    image

Edit a sensitive column

In the sensitive column list shown in the preceding figure, click Edit in the Actions column of a sensitive column to change the desensitization algorithm for the sensitive column.

Delete a sensitive column

In the sensitive column list, click Delete in the Actions column of a sensitive column to delete it.

Identification rule management

Identification rules are an extension of the sensitive data management capability. In addition to manually adding sensitive columns, you can also specify identification rules to implement automatic scan of sensitive columns. An identification rule defines the matching conditions. ODC will identify the columns that meet the specified conditions as sensitive columns. ODC supports three identification methods for identification rules: path, regular expression, and script.

  • Path: An identification rule based on a path expression identifies a sensitive column based on its database name, table name, or column name. The database name, table name, and column name are separated with periods (.). Matching conditions are separated with commas (,). Asterisks (*) are used as wildcard characters.

    Parameter

    Required?

    Description

    Rule Name

    Yes

    The name of the identification rule, which cannot exceed 64 characters in length.

    Rule Status

    Yes

    The status of the identification rule. Valid values: Enable and Disable.

    Matching Rule

    Yes

    Conditions for matching sensitive columns. Columns that meet the conditions specified here are identified as sensitive columns.

    For example, the identification rule \*\.\*\.mobile_phone specifies to match columns named mobile_phone in any tables and databases.

    Exclusion Rule

    No

    Conditions for excluding data columns.

    Important

    The system determines whether a data column is a sensitive column based on the exclusion conditions first and then the matching conditions.

    Masking Algorithm

    Yes

    The default desensitization algorithm for identified sensitive columns.

    Rule Description

    No

    The description of the identification rule.

  • Regular expression: An identification rule based on a regular expression identifies a sensitive column based on its database name, table name, column name, or column remarks.

    Parameter

    Required?

    Description

    Rule Name

    Yes

    The name of the identification rule, which cannot exceed 64 characters in length.

    Rule Status

    Yes

    The status of the identification rule. Valid values: Enable and Disable.

    Identification Object-Database Name

    No

    The regular expression for matching database names.

    For example, * indicates databases with any names.

    Identification Object-Table Name

    No

    The regular expression for matching table names.

    For example, e[a-z]?.* indicates tables whose names are in lowercase and start with the letter e.

    Identification Object-Column Name

    No

    The regular expression for matching column names.

    Identification Object-Column Remarks

    No

    The regular expression for matching column remarks.

    Masking Algorithm

    Yes

    The default desensitization algorithm for identified sensitive columns.

    Rule Description

    No

    The description of the identification rule.

  • Script: An identification rule based on a Groovy script identifies a sensitive column based on its database name, table name, column name, column remarks, or data type.

    Important

    The output of the script must be a Boolean value, namely True or False.

    Parameter

    Required?

    Description

    Rule Name

    Yes

    The name of the identification rule, which cannot exceed 64 characters in length.

    Rule Status

    Yes

    The status of the identification rule. Valid values: Enable and Disable.

    Groovy Script

    Yes

    The script that determines whether a data column is a sensitive column. The script is written based on Groovy syntax specifications.

    Masking Algorithm

    No

    The default desensitization algorithm for identified sensitive columns.

    Rule Description

    No

    The description of the identification rule.

    ODC provides built-in column objects for you to reference in the Groovy script. The following table describes the attributes in a column object.

    Attribute

    Type

    Description

    schema

    String

    The name of the database to which the column belongs.

    table

    String

    The name of the table to which the column belongs.

    name

    String

    The name of the column.

    comment

    String

    The comment on the column.

    type

    String

    The data type of the column.

Here are several sample identification rules that use a script as the identification method:

  • Address:

    if (("varchar".equals(column.type) || "char".equals(column.type))) {
        if (column.name.indexOf("address") >= 0) {
            return true;
        }
        if (column.comment != null &&
                (column.comment.toLowerCase().indexOf("address") >= 0
                        || column.comment.indexOf("Address") >= 0
                        || column.comment.indexOf("Residence address") >= 0
                        || column.comment.indexOf("Location") >= 0)) {
            return true;
        }
    }
    return false;
    
  • Mobile number:

    if (column.name.length() == 11 && ("varchar".equals(column.type) || "char".equals(column.type))) {
        if (column.name.indexOf("phone") >= 0 || column.name.indexOf("mobile") >= 0) {
            return true;
        }
        if (column.comment != null &&
                (column.comment.toLowerCase().indexOf("phone") >= 0
                        || column.comment.indexOf("Phone number") >= 0
                        || column.comment.indexOf("mobile") >= 0
                        || column.comment.indexOf("Mobile number") >= 0)) {
            return true;
        }
    }
    return false;
    
  • ID card number:

    if (column.name.length() >= 15 && ("varchar".equals(column.type) || "char".equals(column.type))) {
        if (column.name.indexOf("id_number") >= 0 || column.name.indexOf("identity_card") >= 0) {
            return true;
        }
        if (column.comment != null &&
                (column.comment.toLowerCase().indexOf("identity card") >= 0
                        || column.comment.indexOf("ID card number") >= 0)) {
            return true;
        }
    }
    return false;

Add an identification rule

Example: Add an identification rule for the mobile_phone column of the student table in the odc_test database as the administrator.

  1. In the project collaboration window, choose Projects > Sensitive Columns. Then click Add Sensitive Column and select Scan to Add.

    image

  2. In the Scan to Add Sensitive Columns dialog box, click Identification Rule and select Manage Identification Rule from the drop-down list.

    image

  3. In the Manage Identification Rule dialog box, click Create Identification Rule.

    image

  4. In the Create Identification Rule pane, specify the name, status, identification method, and desensitization algorithm of the rule and click Create.

    image

    For example, the path-based identification rule odc_test*.student.*a,*.*.mobile_phone specifies to match the mobile_phone column of the student table in the odc_test database.

  5. In the identification rule list, you can view and enable the added identification rule.

    image

Manage identification rules

On the page shown in the preceding figure, click View, Edit, or Delete in the Actions column of an identification rule to view, modify, or delete the identification rule.

View desensitization algorithms

In the project collaboration window, choose Security Specifications > Masking Algorithm and view the desensitization algorithms supported by ODC.

The following table lists the desensitization algorithms supported by ODC.

Algorithm

Test data

Preview result

Mask all (system default)

test value

*****

Personal name (Chinese character)

个人姓名

**名

Personal name (alphabet)

Personal Name

P**

Nickname

Nickname

N***e

Email

odc@oceanbase.com

o***@oceanbase.com

Address

Hangzhou, Zhejiang Province, China

Hangzhou, Z***

Phone number

13500000000

135******00

Fixed line phone number

010-12345678

**********78

Certificate number

123456789

1*******9

Bank card number

1234 5678 5678 1234

***************1234

License plate number

浙AB1234

浙A**234

Device unique identification number

AB123456789CD

****89CD

IP address

10.123.456.789

10...*

MAC address

ab:cd:ef:gh:hi:jk

ab:*:*:*:*:*

MD5

default

c21f969b5f03d33d43e04f8f136e7682

SHA256

default

37a8eec1ce19687d132fe29051dca629d164e2c4958ba141d5f4133a33f0688f

SHA512

default

1625cdb75d25d9f699fd2779f44095b6e320767f606f095eb7edab5581e9e3441adbb0d628832f7dc4574a77a382973ce22911b7e4df2a9d2c693826bbd125bc

SM3

default

40c357923156504f734717d8b4f5623e75209e9572701f4b51ef2a03d9ced863

Rounding

123.456

123

Blanking

default

-

Default

abcd1234

abc**234

Scenarios

Added sensitive columns are desensitized in the SQL window and during data export and data changes.

Scenario 1: Data desensitization during data export

Example: When you export data from the student table in the odc_test database, data is automatically desensitized.

  1. After you export the student table by submitting a ticket in the ODC console, click View in the Actions column of the export task in the export task list.

    image

  2. In the Ticket Details panel, click Download in the lower-right corner.

    image

  3. View the downloaded student table in your local disk.

    image

Scenario 2: Data desensitization during data changes

Example: When you insert data into the student table, data is automatically desensitized.

  1. In the ODC console, submit a database change ticket to insert data into the student table.

  2. In the left-side navigation pane of the SQL window, locate the image odc_test database and view the desensitized data in the student table.

    image

Scenario 3: Data desensitization in an SQL window

Example: When you insert data into the student table, data is automatically desensitized.

  1. In the SQL window, write an SQL statement to insert data into the student table.

    image

  2. On the result tab, you can view the desensitized data in the student table.

References