All Products
Search
Document Center

DataWorks:Overview

Last Updated:Feb 29, 2024

Data Security Guard is a DataWorks service that provides features such as identifying and masking sensitive data, adding watermarks to data, managing data permissions, identifying data risks, and tracing data leak sources. The features help you manage sensitive data and ensure data security. This topic describes the usage procedure and limits of Data Security Guard.

Procedure

In Data Security Guard, you can configure sensitive data identification rules, identify sensitive data based on rules, view identification results, and process sensitive data. You can identify and manage sensitive data before, during, and after the event that generates sensitive data. The following figure shows the usage procedure and related features of Data Security Guard.

image.png

  1. Step 1: Identify sensitive data before the event that generates sensitive data.

    Before sensitive data is generated, you can use Data Security Guard to specify the categories and sensitivity levels of asset data and configure multiple sensitive data identification rules to identify sensitive data and related data risks.

    Operation

    Description

    References

    Specify the category and sensitivity level of data

    You can specify a category and a sensitivity level for your data based on the data value, content sensitivity, impacts, and distribution scope. This way, you can manage the data based on the data category and data sensitivity level. The data management principles and data development requirements vary based on the data sensitivity level.

    DataWorks provides built-in data category and data sensitivity level templates. You can configure custom data categories and data sensitivity levels based on your business requirements.

    Specify the category and sensitivity level of sensitive data

    Configure sensitive data identification rules

    You can define a sensitive field type and then configure a sensitive data identification rule for the sensitive field type based on the source and purpose of data. This helps you identify sensitive data in the current workspace. Content that meets the conditions in the sensitive data identification rule is considered sensitive data.

    The following identification methods are supported:

    • Identification based on data content: Sensitive data is identified based on built-in rules, custom models, sample libraries, and regular expressions.

    • Identification based on metadata: Sensitive data is identified based on field names and comments. You can use wildcard characters to configure prefixes, suffixes, and inclusion relationships.

    • Identification based on combined conditions: You can use the OR, AND, and other relationships to configure a sensitive data identification rule that contains multiple conditions.

    Configure other settings

    • System configurations: You can configure settings such as the access control mode of Data Security Guard, the traceable period for data risks based on data watermarks, the data scope of risk identification management, and the email and webhook URL that are used to receive an alert notification that contains data risk identification results.

    • User group configurations: You can add multiple accounts that have the same data access permissions to a user group at the same time. When you configure a data masking rule, you can add the user group to a whitelist to allow the accounts in the user group to view the original data that is not masked.

  2. Step 2: Manage sensitive data when the event that generates sensitive data is happening.

    After you configure and enable a sensitive data identification rule, DataWorks automatically identifies sensitive data that meets the conditions in the rule. You can view the identification results in Data Security Guard.

    Operation

    Description

    References

    Configure access control policies

    Configure pass-through or blocking policies based on IP addresses or database users.

    -

    Configure data masking rules

    You can configure data masking rules for identified sensitive data. Sensitive data is displayed based on the configured data masking rules. Data masking rules vary based on the data sensitivity level.

    Data masking types:

    • Dynamic data masking: DataWorks masks sensitive data in query results.

    • Static data masking: DataWorks masks sensitive data before sensitive data is stored in a database.

    Data masking methods include original format-based encryption, masking out, hash-based encryption, character replacement, range change, rounding, and leave empty.

    In scenarios in which the original data must be returned, you can configure a whitelist to allow specific accounts to view plaintext information.

    You can select a data masking type and a data masking method based on your business requirements.

    Create a data masking rule

    Manage risk identification rules

    You can use the built-in risk identification rules in Data Security Guard after you enable the rules. You can also configure a risk identification rule based on your business requirements and compare the number of occurrences of an event in a risk identification rule with the threshold that is specified for event occurrences. For example, if you specify a data amount or frequency comparison in a risk identification rule, the system automatically detects high-risk operations and sends alert notifications when the conditions in the rule are met.

    Process risk identification results

    You can view the details of identified risky operations, and mark the operations as risky, not risky, or risk handled.

  3. Step 3: Audit risky operations and trace data leak sources.

    You can process and manage sensitive data based on risk identification results to ensure data security.

    Operation

    Description

    References

    Audit risky operations

    Data Security Guard records all behaviors that involve sensitive data, such as IP addresses, port information, and database users, and provides sensitive data lineages. You can audit risky operations based on the preceding information.

    You can manually correct sensitive data identification results that are obtained based on sensitive data identification rules.

    Trace data leak sources based on watermarks

    If a data leak occurs, the watermark information of data in the leaked data file can be extracted to trace users who caused the data leak.

    Trace data leak sources

Limits

Edition

Only DataWorks Standard Edition or a more advanced edition supports Data Security Guard. For information about how to activate DataWorks, see Activate DataWorks. The Data Security Guard features that you can use vary based on the DataWorks edition. For more information, see Differences among DataWorks editions.

Permissions

You can use Alibaba Cloud accounts or RAM users that are granted the following permissions to enable Data Security Guard:

Note
  • Users who are assigned the tenant administrator role or the tenant-level security administrator role can use all features of Data Security Guard.

  • Users who are assigned the workspace-level security administrator role can use related features in the workspaces on which the users have access permissions. For example, when the users use the data lineage feature to modify a sensitive field type, they can select only the workspaces on which they have access permissions. If the users want to use the Data Security Guard features in a workspace on which they do not have access permissions, they must apply for the required permissions. For more information, see Manage permissions on workspace-level services.

Features

In Data Security Guard, you can use the sensitive data identification and dynamic data masking features to identify and dynamically mask sensitive data in only E-MapReduce (EMR), MaxCompute, Cloudera's Distribution including Apache Hadoop (CDH), and Hologres compute engines.

Take note of the following limits on an EMR compute engine:

  • The sensitive data identification and data masking features are supported only for specific types of EMR clusters and EMR tables. The following table describes the details.

    Note The Supported icon indicates that the data preview feature is supported, and the Not supported icon indicates that the data preview feature is not supported.
    EMR cluster typeMetadata storage typeData storage type: OSSData storage type: OSS-HDFSData storage type: HDFS
    DataLake clustersData Lake Formation (DLF)SupportedSupportedNot supported
    RDS instanceSupportedSupportedSupported
    MySQLSupportedSupportedSupported
    Custom clustersDLFSupportedSupportedNot supported
    RDS instanceSupportedSupportedSupported
    MySQLSupportedSupportedSupported
    Other clusters--Not supported
    Note

    The features are available only in the following regions: China (Hangzhou), China (Shanghai), China East 2 Finance, China (Beijing), China (Shenzhen), China South 1 Finance, China (Chengdu), China North 2 Ali Gov 1, China (Hong Kong), US (Silicon Valley), Singapore, Malaysia (Kuala Lumpur), and Germany (Frankfurt).

  • If you want to use Data Security Guard in an EMR cluster, you must upgrade exclusive resource groups for scheduling. You can join the DataWorks DingTalk group to request technical support for the upgrade.

  • By default, Data Security Guard uses an Alibaba Cloud account to sample data. If Lightweight Directory Access Protocol (LDAP) authentication is enabled for your EMR cluster and Ranger or DLF-Auth is used to manage table permissions, you must configure mappings between the Alibaba Cloud account and the cluster account. This ensures that the Alibaba Cloud account has the required permissions to access tables in the EMR cluster. For more information, see Configure mappings between DataWorks member accounts and EMR cluster accounts.

Go to the Data Security Guard page

  1. Go to the DataStudio page.

    Log on to the DataWorks console. In the left-side navigation pane, choose Data Modeling and Development > DataStudio. On the page that appears, select the desired workspace from the drop-down list and click Go to DataStudio.

  2. Click the 图标 icon in the upper-left corner, choose All Products > Data Governance > Data Security Guard, and then click Try now to go to the Data Security Guard page.

    Note
    • If your Alibaba Cloud account is granted the required permissions, you can directly access the homepage of Data Security Guard.

    • If your Alibaba Cloud account is not granted the required permissions, you are redirected to the authorization page of Data Security Guard. You can use the features of Data Security Guard only after your Alibaba Cloud account is granted the required permissions.