All Products
Search
Document Center

Manage sensitive data

Last Updated: May 29, 2020

Background

As described in the Data query in the SQLConsole topic, when you query a table and the table contains sensitive fields and confidential fields that you have no permission to access, such fields are displayed in the ****** format in the query result for full masking.

However, in some cases, developers and testers may need to view a part of sensitive data for troubleshooting. In combination with the masking algorithms provided by Data Management Service (DMS) and the de-identification methods provided when you apply for corresponding permissions on sensitive fields, the sensitive data management feature described in this topic can meet the preceding requirement.

Applicable scope

Currently, data masking is applicable only to relational databases and online analytical processing (OLAP) databases. You cannot use data masking for NoSQL databases such as MongoDB and Redis databases.

Introduction to data masking

Data masking allows you to flexibly de-identify exported data and data returned in queries such as cross-database query. For example, you can mask some digits of a mobile number like this: 136xxxxx799.

DMS provides masking algorithms to mask data. You can customize the masking effects of sensitive fields based on such algorithms.

Masking algorithms

A masking algorithm is the core of partial de-identification. It determines the de-identification mode of a sensitive field, including:

  • Method of partial de-identification, the content to be de-identified, and the content to be retained
  • Format of the de-identified content, such as ****** or XXXXXX

Algorithm types

Currently, DMS provides the following three masking algorithms:

  • Fixed positions: masks characters in the specified positions, for example, masks the first N characters, Mth to Nth characters, or the last N characters.
  • Fixed characters: masks specified characters, for example, masks “Hangzhou, Zhejiang” in a mailing address.
  • Full masking: masks all characters, which is an extension of the original masking algorithm. With full masking, de-identified data can be displayed in various formats instead of merely being in the ****** format.

Currently, you can only use the masking algorithms provided by DMS instead of customizing an algorithm.

Supported data types

The masking algorithms work on fields. Currently, the field types supported by masking algorithms are not limited. However, we recommend that you use masking algorithms for fields of the string type that are in the fixed format and with a limited length or fields of the numeric type, such as mobile numbers, email addresses, ID card numbers, and names. This is because long strings consume computing resources of a database when the algorithms are in operation.

De-identification methods

To implement partial de-identification, the permissions on sensitive fields are divided into query, export, and change permissions. If you have been granted the permission to query a sensitive field of a table in a database, the values of the field appear in plaintext when you query the field in the SQLConsole.

In addition to masking algorithms, DMS also provides de-identification methods when you apply for permissions on sensitive fields. You can use the de-identification methods in combination with the masking algorithms for partial de-identification.

Semi-sensitization

Assume that you apply for the permission to query a sensitive field. If you set the de-identification method to semi-sensitization and configure a masking algorithm for the sensitive field to realize partial de-identification, values of the field are de-identified by using the masking algorithm and displayed accordingly instead of appearing in the ****** format when you query the field in the SQLConsole.

Plaintext

Similarly, if you set the de-identification method to plaintext, values of the sensitive field appear in plaintext when you query the field in the SQLConsole.

The following flowchart shows the preceding logic. Logic flowchart

Entrance to sensitive data management

To use the data masking feature, you can log on to the DMS console and choose System Management > Security > Sensitive Data in the top navigation bar to go to the Sensitive Data page. Only the DMS administrator, security administrator, and database administrator (DBA) can go to the Sensitive Data page and use the data masking feature. The Sensitive Data page displays all the sensitive fields defined in your enterprise. On this page, you can configure a masking algorithm for a sensitive field. In addition, you can adjust the security level of a sensitive field.

Note

    1. On the Sensitive Data page, you can only view fields whose security level is sensitive or confidential. Fields whose security level is internal do not appear on this page. If no fields appear on this page, no sensitive fields are defined in your enterprise.

    1. On this page, you can only adjust the security levels of sensitive fields and confidential fields.Entrance to sensitive data management

Algorithm configuration

You can click Add Algorithm in the Actions column of a field to configure a masking algorithm for the field.

Fixed positions

Fixed positions
If you set Algorithm Type to Fixed Position, you must configure the following items for this algorithm:

  • Masking String

    • The string to display as the de-identified content, such as ***, xxx, or Sensitive data.
  • Masking Position

    • The positions of the characters to be masked in the field values. The positions are in the format of coordinates, for example:

      • (1,4): masks the first four characters, that is, the first to fourth characters, in the field values. You can also enter (4) for short.
      • (8, 10): masks the eighth to tenth characters.
      • (-4): masks the last four characters. This format is applicable to field values whose length is not fixed. If the values of a sensitive field are of fixed length, you can also specify the position in the format of a coordinate.
    • You can specify a maximum of three positions, for example:

      (1, 4), (8, 10), (-4)

      The preceding value indicates to mask the first four characters, the eighth to tenth characters, and the last four characters.

Fixed characters

If you set Algorithm Type to Fixed Character, you must configure the following items for this algorithm:

  • Masking String: the same as that used for masking characters in the fixed positions.

  • Character to Be Replaced: the characters to be masked, in the format of a string. You can specify a maximum of three strings. For example, a string to be replaced can be:

    Test:The preceding value indicates that the test string in the field values is replaced with the specified masking string.

Important notes

  • Data masking may fail if you do not have permissions on the sensitive field specified in an SQL statement or the sensitive field for which you have the corresponding permission in semi-sensitization mode is involved in a functional operation.

    • For example, the phone field is a sensitive field for which you have the corresponding permission in semi-sensitization mode. If you execute the select substr(phone, 10) from base_user statement, data masking fails. This is because the phone field is involved in a functional operation.
    • If you want to perform a functional operation on a sensitive field and de-identify the field values at the same time, you must apply for the corresponding permission on the field in plaintext mode.
  • The field types supported by data masking are not limited. However, masking algorithms may be inapplicable to some field types in certain scenarios.

    • Scenario 1
      • DMS applies the same security level and masking algorithm to fields of the same name that reside in tables and databases with the same names, for example:
        • In Instance A, the c1 field in the t1 table in the dmstest database is a sensitive field and is of the varchar type.
        • In Instance B, the c1 field in the t1 table in the dmstest database will also be identified as a sensitive field. Its data type is int.
      • If you configure a masking algorithm for the c1 field of the varchar type, the masking algorithm is also used for the c1 field of the int type. However, the algorithm may be inapplicable to the c1 field of the int type. DMS does not limit the field types supported by its masking algorithms because the field types may vary.
    • Scenario 2
      • Masking algorithms may be inapplicable to fields of a non-string type, such as the date type. This is because fields of the date type must be converted to those of a string type before the field values are de-identified. During the conversion, the date format is changed. Different databases such as MySQL, PostgreSQL, and SQL Server provide different conversion logic for fields of the date type. As a result, the masking effect may not be as expected. Currently, it is found that errors may occur in the date format after masking algorithms are applied to fields of the date type in SQL Server, leading to a poor masking effect.