Data Security Guard is a DataWorks service that provides features such as identifying and masking sensitive data, adding watermarks to data, managing data permissions, identifying data risks, and tracing data leak sources. The features help you manage sensitive data and ensure data security. This topic describes the usage procedure and limits of Data Security Guard.
In Data Security Guard, you can configure sensitive data identification rules, identify sensitive data based on rules, view identification results, and process sensitive data. You can identify and manage sensitive data before, during, and after the event that generates sensitive data.
Step 1: Identify sensitive data before the event that generates sensitive data.
Before sensitive data is generated, you can use Data Security Guard to specify the categories and sensitivity levels of asset data and configure multiple sensitive data identification rules to identify sensitive data and related data risks.
Specify the category and sensitivity level of data
You can specify a category and a sensitivity level for your data based on the data value, content sensitivity, impacts, and distribution scope. This way, you can manage the data based on the data category and data sensitivity level. The data management principles and data development requirements vary based on the data sensitivity level.
DataWorks provides built-in data category and data sensitivity level templates. You can configure custom data categories and data sensitivity levels based on your business requirements.
Configure sensitive data identification rules
You can define a sensitive field type and then configure a sensitive data identification rule for the sensitive field type based on the source and purpose of data. This helps you identify sensitive data in the current workspace. Content that meets the conditions in the sensitive data identification rule is considered sensitive data.
The following identification methods are supported:
Identification based on data content: Sensitive data is identified based on built-in rules, custom models, sample libraries, and regular expressions.
Identification based on metadata: Sensitive data is identified based on field names and comments. You can use wildcard characters to configure prefixes, suffixes, and inclusion relationships.
Identification based on combined conditions: You can use the OR, AND, and other relationships to configure a sensitive data identification rule that contains multiple conditions.
Configure other settings
System configurations: You can configure settings such as the access control mode of Data Security Guard, the traceable period for data risks based on data watermarks, the data scope of risk identification management, and the email and webhook URL that are used to receive an alert notification that contains data risk identification results.
User group configurations: You can add multiple accounts that have the same data access permissions to a user group at the same time. When you configure a data masking rule, you can add the user group to a whitelist to allow the accounts in the user group to view the original data that is not masked.
Step 2: Manage sensitive data when the event that generates sensitive data is happening.
After you configure and enable a sensitive data identification rule, DataWorks automatically identifies sensitive data that meets the conditions in the rule. You can view the identification results in Data Security Guard.
Configure access control policies
Configure pass-through or blocking policies based on IP addresses or database users.
Configure data masking rules
You can configure data masking rules for identified sensitive data. Sensitive data is displayed based on the configured data masking rules. Data masking rules vary based on the data sensitivity level.
Data masking types:
Dynamic data masking: DataWorks masks sensitive data in query results.
Static data masking: DataWorks masks sensitive data before sensitive data is stored in a database.
Data masking methods include original format-based encryption, masking out, hash-based encryption, character replacement, range change, rounding, and leave empty.
In scenarios in which the original data must be returned, you can configure a whitelist to allow specific accounts to view plaintext information.
You can select a data masking type and a data masking method based on your business requirements.
Manage risk identification rules
You can use the built-in risk identification rules in Data Security Guard after you enable the rules. You can also configure a risk identification rule based on your business requirements and compare the number of occurrences of an event in a risk identification rule with the threshold that is specified for event occurrences. For example, if you specify a data amount or frequency comparison in a risk identification rule, the system automatically detects high-risk operations and sends alert notifications when the conditions in the rule are met.
Process risk identification results
You can view the details of identified risky operations, and mark the operations as risky, not risky, or risk handled.
Step 3: Audit risky operations and trace data leak sources.
You can process and manage sensitive data based on risk identification results to ensure data security.
Audit risky operations
Data Security Guard records all behaviors that involve sensitive data, such as IP addresses, port information, and database users, and provides sensitive data lineages. You can audit risky operations based on the preceding information.
You can manually correct sensitive data identification results that are obtained based on sensitive data identification rules.
Trace data leak sources based on watermarks
If a data leak occurs, the watermark information of data in the leaked data file can be extracted to trace users who caused the data leak.
Only DataWorks Standard Edition or a more advanced edition supports Data Security Guard. For information about how to activate DataWorks, see Activate DataWorks. The Data Security Guard features that you can use vary based on the DataWorks edition. For more information, see Differences among DataWorks editions.
You can use Alibaba Cloud accounts or RAM users that are granted the following permissions to enable Data Security Guard:
Permissions defined in the AdministratorAccess and AliyunDataWorksFullAccess policies. For more information, see Grant permissions to a RAM role.
Users who are assigned the tenant administrator role or the tenant-level security administrator role can use all features of Data Security Guard.
Users who are assigned the workspace-level security administrator role can use related features in the workspaces on which the users have access permissions. For example, when the users use the data lineage feature to modify a sensitive field type, they can select only the workspaces on which they have access permissions. If the users want to use the Data Security Guard features in a workspace on which they do not have access permissions, they must apply for the required permissions. For more information, see Manage permissions on workspace-level services.
In Data Security Guard, you can use the sensitive data identification and dynamic data masking features to identify and dynamically mask sensitive data in only E-MapReduce (EMR), MaxCompute, Cloudera's Distribution including Apache Hadoop (CDH), and Hologres compute engines.
Take note of the following limits on an EMR compute engine:
You can use the data masking feature only when you preview data in Data Map. The data masking feature is not supported in DataStudio or DataAnalysis. The sensitive data identification and data masking features are supported only for specific types of EMR clusters and EMR tables. The following table describes the details.Note The icon indicates that the data preview feature is supported, and the icon indicates that the data preview feature is not supported.
EMR cluster type Metadata storage type Data storage type: OSS Data storage type: OSS-HDFS Data storage type: HDFS DataLake clusters Data Lake Formation (DLF) RDS instance MySQL Custom clusters DLF RDS instance MySQL Other clusters --Note
The features are available only in the following regions: China (Hangzhou), China (Shanghai), China East 2 Finance, China (Beijing), China (Shenzhen), China South 1 Finance, China (Chengdu), China North 2 Ali Gov 1, China (Hong Kong), US (Silicon Valley), Singapore, Malaysia (Kuala Lumpur), and Germany (Frankfurt).
If you want to use Data Security Guard in an EMR cluster, you must upgrade exclusive resource groups for scheduling. You can join the DataWorks DingTalk group to request technical support for the upgrade.
By default, Data Security Guard uses an Alibaba Cloud account to sample data. If Lightweight Directory Access Protocol (LDAP) authentication is enabled for your EMR cluster and Ranger or DLF-Auth is used to manage table permissions, you must configure mappings between the Alibaba Cloud account and the cluster account. This ensures that the Alibaba Cloud account has the required permissions to access tables in the EMR cluster. For more information, see Configure mappings between DataWorks member accounts and EMR cluster accounts.
Go to the Data Security Guard page
Go to the DataStudio page.
Log on to the DataWorks console. In the left-side navigation pane, choose . On the page that appears, select the desired workspace from the drop-down list and click Go to DataStudio.
Click the icon in the upper-left corner, choose, and then click Try now to go to the Data Security Guard page.Note
If your Alibaba Cloud account is granted the required permissions, you can directly access the homepage of Data Security Guard.
If your Alibaba Cloud account is not granted the required permissions, you are redirected to the authorization page of Data Security Guard. You can use the features of Data Security Guard only after your Alibaba Cloud account is granted the required permissions.