All Products
Search
Document Center

PolarDB:Overview

Last Updated:Apr 18, 2024

This topic provides an overview of the dynamic data masking feature provided by PolarProxy.

Prerequisites

The version of PolarProxy must be V2.4.12 or later. For more information about how to view the current version of and upgrade PolarProxy, see Minor version update.

Data masking solutions

If you want to authorize third parties to generate reports, analyze data, perform development and test activities, or perform other database-related operations, you may need to obtain the latest customer data from databases in the production environment in real time. To avoid disclosing personal information, data must be masked before it is provided to third parties. Alibaba Cloud provides the following data masking solutions: dynamic data masking and static data masking. PolarProxy uses dynamic data masking.

Table 1. Comparison of data masking solutions

Data masking solution

Description

Advantage

Limits

Dynamic data masking

When your application initiates a data query request, PolarDB masks the sensitive data that is queried before PolarDB returns the data to the application.

To achieve this, you need to specify the database account, the database name, and the table or column that requires data masking before the data is queried.

  • You do not need to change the applications in your business system. This reduces costs.

  • Your applications can query the real-time data from production databases.

Compared with mirror databases, the query performance of production databases is a bit lower because PolarProxy masks the sensitive data in the production databases in real time.

Static data masking

PolarProxy exports all data in a production database to a mirror database, and encrypts or masks the sensitive data during the export.

Your application queries data from mirror databases instead of production databases. As a result, data masking does not affect the services that require access to production databases.

  • To mask sensitive data, you need to develop a set of components in the data import toolkit. This incurs high development costs.

  • Data in mirror databases is not up-to-date with data in production databases.

How it works

After you configure data masking rules in the PolarDB console, the console writes these rules to PolarProxy. When your application connects to a database by using the account specified in the data masking rules and queries the specified columns, PolarProxy masks the data that is queried from the database and returns the masked data to the client. 1

The preceding figure shows the following data masking rules:

  • The data masking rules take effect only when you use the testAcc account to query data from a database.

  • PolarProxy masks only the data that is queried in the name and age columns.

Note

If a column in the query result is masked, all values of the column are masked. If you execute the SELECT * FROM t1 statement and the t1 table contains name and age columns, the values of these two columns in the query result are masked.

If your application uses the testAcc account to connect to a database and queries data in the name, age, and hobby columns of a table, PolarProxy masks data in the name and age columns and returns the masked data together with the unmasked data in the hobby column.

PolarProxy uses different methods to mask different types of data. The following table describes data masking methods.

Data type

Data masking method

Example

Integer data types: TINYINT, SMALLINT, MEDIUMINT, INT, and BIGINT

PolarProxy returns a random value in the format defined in the data type of the raw data.

  • Raw value: 12345

  • Masked value that is randomly selected: 28175

Decimal data types: DECIMAL, FLOAT, and DOUBLE

  • Raw value: 1.2345

  • Masked value that is randomly selected: 8.2547

Date and time data types: DATE, TIME, DATETIME, TIMESTAMP, and YEAR

  • Raw value: 2021-01-01 00:00:00

  • Masked value that is randomly selected: 4926-12-13 17:23:07

Other data types

PolarProxy replaces the data with asterisks (*).

  • Actual value: Elon Musk

  • Masked value: *********

Considerations

  • The dynamic data masking feature applies only to cluster endpoints. Cluster endpoints consist of the default cluster endpoint and custom cluster endpoints. If you use the primary endpoint to connect to a database and query data from the database, the dynamic data masking feature does not take effect. For more information about how to view a cluster endpoint, see View the endpoint and port number.

  • If query results contain data that must be masked and the size of a single row exceeds 16 MB, the query session is closed.

    For example, you want to query data in the name and description columns of the person table. In this table, the sensitive data in the name column must be masked. The size of the data in a row of the description column exceeds 16 MB. In this case, when you execute the SELECT name, description FROM person statement, the query session is closed.

  • If a column in which you want to mask the sensitive data is used as the value of an input parameter in a function, data masking does not take effect.

    For example, a data masking rule is created to mask the sensitive data in the name column. When you execute the SELECT CONCAT(name, '') FROM person statement, your application can still read the raw values of the name column.

  • If a column in which you want to mask the sensitive data is used together with the UNION operator, data masking may not take effect.

    For example, a data masking rule is created to mask the sensitive data in the name column. When you execute the SELECT hobby FROM person UNION SELECT name FROM person statement, your application can still read the raw values of the name column.

Enable the dynamic data masking feature

For more information, see Manage data masking rules.

Appendix: Impacts on cluster performance

The dynamic data masking feature affects the performance of clusters in the following scenarios.

Note

In this example, the read-only queries per second (QPS) of clusters are used to show the difference in performance.

Scenario

Impact on performance

Whether your account is included in the data masking rule

Whether your query hits the data masking rule

No

No

Data masking does not take effect on queries made by your account. This way, the performance of your cluster is not affected.

Yes

No

PolarProxy analyzes only the column definition data in the result set and does not mask the raw data in the query results.

This causes performance overhead of approximately 6%. After the dynamic data masking feature is enabled, the read-only QPS decreases by approximately 6%.

Yes

PolarProxy analyzes the column definition data in the result set and masks the raw data in the query results.

In this case, performance overhead is based on the size of the result set. A larger number of rows in the query results cause greater performance overhead.

If the query result of a single row is returned, the performance overhead of approximately 6% occurs.