All Products
Search
Document Center

DataWorks:Risk identification rule management (new version)

Last Updated:Aug 16, 2023

The risk identification rule management feature provides multidimensional association analysis methods and algorithms. These intelligent data analysis methods are used to identify data risks and send you alert notifications based on risk identification rules. This feature also allows you to audit risk identification rules in a visualized manner. DataWorks provides built-in risk identification rules for you to directly use in multiple scenarios. You can also create custom risk identification rules based on your business requirements. This topic describes how to create and manage a risk identification rule.

Background information

After DataWorks ingests data from data sources, the Data Security Guard service filters the data. The old version of the risk identification rule management feature provided by this service can be used to identify data risks only if sensitive data is involved and cannot be used to identify data risks in operation audit scenarios and scenarios that require aggregation of event statistics. To resolve this issue, DataWorks provides a new version of the risk identification rule management feature. The new version of this feature has the following benefits:

  • Ease of use

    The feature can be used to identify the following types of data risks: data access risks, data export risks, data operation risks, and other data risks. The feature also allows you to create a risk identification rule based on a combination of risk identification dimensions, such as access time, sensitivity type, and number of access requests, to identify different types of data risks.

  • High precision

    The feature supports the aggregation of event statistics. You can use a risk identification rule to compare the number of occurrences of an event in a time window with the threshold value that is specified for event occurrences to identify data risks in a precise manner. This feature helps reduce the number of false positives. For example, a risk identification rule specifies to identify a data risk if an event occurs at least three times in 10 minutes.

  • Fine-grained management

    When you use this feature, you can set the risk level of a data risk to High, Medium, or Low. You can perform fine-grained management on data risks based on their risk levels.

  • High flexibility

    DataWorks provides common risk identification rules for you to directly use in multiple scenarios. You can also create custom risk identification rules based on your business requirements. For more information, see Built-in risk identification rules and Create a risk identification rule.

The positions of parameters for a risk identification rule in the DataWorks console differ between the old and new versions of the risk identification rule management feature. For more information, see Comparison of the positions of parameters for a risk identification rule in the old and new versions of the risk identification rule management feature.

Limits

  • Version

    • Only users of DataWorks Professional Edition or a more advanced edition can use the new version of the risk identification rule management feature.

    • Only DataWorks Enterprise Edition or a more advanced edition provides built-in risk identification rules.

  • Switch between the old and new versions of the risk identification rule management feature

    • The old version of the risk identification rule management feature expires on June 30, 2022. The actual expiration time that is displayed on the Custom Identification Rules page takes precedence. After the expiration time elapses, the created risk identification rules and identified data risks are automatically cleared. You can only use the new version of the risk identification rule management feature after June 30, 2022. Export and back up the risk identification rules and identified data risks that you want to use at the earliest opportunity. For more information about the export and backup operations, see Risk identification rule management (old version).

    • The new version of the risk identification rule management feature can also be used before the old version expires. You can switch from the old version to the new version before the expiration time. After you switch to the new version, the created risk identification rules and identified data risks in the old version are not automatically synchronized to the new version. You must create them again in the new version.

  • Alert notification method

    An alert notification can be sent by email or webhook URL.

    Note

    DataWorks supports the webhook URL-based alerting method for DingTalk, Enterprise WeChat, and Lark. Only DataWorks Enterprise Edition or a more advanced edition allows users to use Enterprise WeChat or Lark to receive an alert notification that is sent based on a webhook URL.

Go to the Custom Identification Rules page

  1. Go to the Data Security Guard page.

    1. Log on to the DataWorks console. In the left-side navigation pane, choose Data Modeling and Development > DataStudio. On the page that appears, select the desired workspace from the drop-down list and click Go to DataStudio.

    2. Click the Icon icon in the upper-left corner and choose All Products > Data governance > Data Security Guard.

    3. Click Try now to go to the Data Security Guard page.

  2. Go to the Custom Identification Rules page.

    In the left-side navigation pane of the Data Security Guard page, choose Rule Change > Custom Identification Rules. The page for the old version of the risk identification rule management feature appears. You can click Try New Version in the upper-right corner of the Data Security Guard page to go to the page for the new version of the risk identification rule management feature and create and manage risk identification rules.

    The new version provides common risk identification rules for you to directly use in multiple scenarios. You can also create custom risk identification rules based on your business requirements. For more information, see Built-in risk identification rules and Create a risk identification rule.

Built-in risk identification rules

The following table describes built-in risk identification rules provided by the new version of the risk identification rule management feature.

Rule name

Data risk type

Risk level

Rule configuration

Query a large number of sensitive data records in non-business hours

Data Access Risk

Low

This risk identification rule specifies to identify a data risk if the number of sensitive data records queried in the following periods of time exceeds 10,000:

  • Monday to Friday: 22:00 to 24:00

  • Saturday to Sunday: 00:00 to 24:00

Use similar SQL statements to query data

Data Access Risk

Low

This risk identification rule specifies to identify a data risk if similar SQL statements are used to query data for 10 or more times within 10 minutes:

Query a large number of sensitive data records at a time

Data Access Risk

Medium

This risk identification rule specifies to identify a data risk if the number of sensitive data records queried in a single request exceeds 10,000:

Export a large number of sensitive data records at a time

Data Export Risk

High

This risk identification rule specifies to identify a data risk if the number of sensitive data records exported at a time exceeds 10,000:

Export a large number of sensitive data records in non-business hours

Data Export Risk

High

This risk identification rule specifies to identify a data risk if the number of sensitive data records exported in the following periods of time exceeds 10,000:

  • Monday to Friday: 22:00 to 24:00

  • Saturday to Sunday: 00:00 to 24:00

Create a risk identification rule

  1. Plan and prepare for the creation of a risk identification rule.

    You can create a risk identification rule to identify data risks by specifying fine-grained risk identification conditions in dimensions such as data location, data property, user information, and operation time based on your business requirements. The following table describes the preparations that you must make in advance if you want to use the fine-grained risk identification conditions in the data property and user information dimensions to identify data risks.

    Risk identification dimension

    Fine-grained risk identification condition

    Description

    Data property

    Sensitivity level of sensitive data

    To identify a data risk at a specified sensitivity level, you need to define the sensitivity level for sensitive data in advance. For more information, see Mange data sensitivity levels.

    Data category

    To identify a data risk of a specified data category, you need to define the data category for sensitive data in advance. For more information, see Identify sensitive data.

    Sensitive field type

    To identify a data risk of a specified sensitive field type, you must define the sensitive field type for sensitive data in advance. For more information, see Identify sensitive data.

    User information

    User group

    To identify a data risk for a specified user group that belongs to the current Alibaba Cloud account, you need to configure the user group in advance. For more information, see Create and manage user groups.

    RAM role

    To identify a data risk for a specified RAM user that belongs to the current Alibaba Cloud account, you need to add the RAM user to the current Alibaba Cloud account in advance. For more information, see Create a RAM user.

  2. In the upper-right corner of the Custom Identification Rules page, click + Risk identification rule.

  3. In the New risk identification rule panel, configure the parameters.

    Note

    You can create only a risk identification rule of the statistical association type. A risk identification rule of this type can be used to calculate and aggregate the number of occurrences of a single event, and compare this number with the threshold value that is specified for event occurrences. The risk identification rule is triggered if the number of occurrences exceeds the specified threshold value. For example, a risk identification rule specifies to identify a data risk if the number of sensitive data records that are queried by a user with limited permissions in non-business hours exceeds 10,000.

    1. Configure parameters in the Basic information step.

      Parameter

      Description

      Rule name

      The name of the risk identification rule. The name can be 1 to 30 characters in length and cannot contain special characters.

      Rule Type

      The type of the risk identification rule. Valid values:

      • Data Access Risk: a data risk that occurs when data is accessed.

      • Data Export Risk: a data risk that occurs when data is exported.

      • Data Operation Risk: a data risk that occurs when data is created, modified, or deleted.

      • Other: a data risk of other types.

      Rule level

      The risk level of the risk identification rule. Valid values: Low, Medium, and High. You can set this parameter to High for important data based on your business requirements.

      Description information

      The description of the risk identification rule. The description can be 1 to 100 characters in length.

    2. Click Next.

    3. Configure risk identification conditions and their threshold values.

      • Configure risk identification conditions.

        DataWorks allows you to create a risk identification rule to identify data risks by specifying fine-grained risk identification conditions in dimensions such as data location, data property, user information, and operation time based on your business requirements.

        Note

        You can add a maximum of 10 risk identification conditions. After you select a risk identification dimension, click + Add comparison relation to add a risk identification condition from the selected risk identification dimension. You can repeatedly perform this operation to add multiple risk identification conditions. The logical relationship between the risk identification conditions is AND.

        • Data location

          Specifies the range of locations for data risks. The location range can be accurate to the field name.

          Parameter

          Description

          Required

          Whether to filter out the selected location

          Specifies whether to filter out data risks identified in the selected location. Valid values:

          • : specifies to filter out the selected location. The risk identification rule in which this condition is specified will not identify data risks in the selected location.

          • =: specifies to identify data risks only in the selected location. The risk identification rule in which this condition is specified is used to identify data risks only in the selected location.

          Yes.

          Compute Engine Instance Name

          The compute engine instance that is specified in the risk identification rule.

          Note
          • You can create a risk identification rule to identify data risks only in a MaxCompute compute engine instance.

          • You can specify only one compute engine instance in each risk identification condition. If you want to identify data risks in multiple compute engine instances, you can click + Add comparison relation to add a risk identification condition and specify the compute engine instance in which you want to identify data risks in the condition. You can repeatedly perform the operations to add multiple risk identification conditions and specify the compute engine instance for each added condition.

          Yes.

          Project Name

          The project that is specified in the risk identification rule. You must set Project Name to the project that belongs to the specified compute engine instance. You can select the project from the Project Name drop-down list. You can also enter the name of the project to search for the project.

          Note
          • The drop-down list displays a maximum of 100 project names.

          • A fuzzy match is supported if you search for a project by name. After you enter a keyword in the search box, projects whose names contain the keyword are displayed.

          • You can specify only one project in each risk identification condition. If you want to identify data risks in multiple projects, you can click + Add comparison relation to add a risk identification condition and specify the project in which you want to identify data risks in the condition. You can repeatedly perform the operations to add multiple risk identification conditions and specify the project for each added condition.

          Yes.

          Table Name

          The name of the table that is specified in the risk identification rule. You can specify one or more table names. If you specify multiple table names, separate them with commas (,). You must abide by the following requirements when you specify a table name:

          • A table name can contain a maximum of 30 characters in length. The number of characters of all table names cannot exceed 100.

          • You can use an asterisk (*) as a wildcard. For example, you can enter *name to identify all tables whose names contain the name suffix.

          No. If you do not configure this parameter, the risk identification rule identifies data risks in all tables that belong to the specified project by default.

          Field Name

          The name of the field that is specified in the risk identification rule. You can specify one or more field names. If you specify multiple field names, separate them with commas (,). You must abide by the following requirements when you specify a field name:

          • A field name can contain a maximum of 30 characters in length. The number of characters of all field names cannot exceed 100.

          • You can use an asterisk (*) as a wildcard. For example, you can enter *name to identify all fields whose names contain the name suffix.

          No. If you do not configure this parameter, the risk identification rule identifies data risks in all fields by default.

        • Data property

          Specifies the property that is used to filter and identify data risks.

          Parameter

          Description

          Property

          The category of the property that you specify to identify data risks based on your business requirements. The following property categories are supported:

          • Data grading: specifies the sensitivity level of the data risk that you want to identify. You need to define the sensitivity level for sensitive data in advance. For more information, see Mange data sensitivity levels.

          • Data category: specifies the data category for the data risk that you want to identify. You need to define the data category for sensitive data in advance. For more information, see Identify sensitive data.

          • Sensitive field type: specifies the sensitive field type for the data risk that you want to identify. You need to define the sensitive field type for sensitive data in advance. For more information, see Identify sensitive data.

          Whether to filter out the selected property

          Specifies whether to filter out data risks of the selected property. Valid values:

          • : specifies to filter out the selected property. The risk identification rule in which this condition is specified will not identify data risks of the selected property.

          • =: specifies to identify only data risks of the selected property. The risk identification rule in which this condition is specified is used to identify only data risks of the selected property.

        • User information

          Specifies the category of the user information that is used to filter and identify data risks.

          Parameter

          Description

          Information category

          The category of the user information that you specify to identify data risks.

          • User group: specifies the name of the user group that belongs to the current Alibaba Cloud account. You need to configure the user group in advance. For more information, see Create and manage user groups.

          • RAM role: specifies the RAM user that belongs to the current Alibaba Cloud account. You need to add the RAM user to the current Alibaba Cloud account in advance. For more information, see Create a RAM user.

          • Username: specifies the username of the current Alibaba Cloud account.

          Whether to filter out the selected user information

          • : specifies to filter out the selected user information. The risk identification rule in which this condition is specified will not identify data risks of the selected user information.

          • =: specifies to identify only data risks of the selected user information. The risk identification rule in which this condition is specified is used to identify only data risks of the selected user information.

        • Operation time

          Specifies the time range within which risky operations are performed on data.

          Parameter

          Description

          Select Time Range

          The time range within which risky operations are performed on data. You can select one or more days of a week and select one or more hours on that day or days. The time range is accurate to the hour.

          Whether to filter out the selected time range

          • : specifies to filter out the selected time range. The risk identification rule in which this condition is specified will not identify risky operations that are performed on data in the selected time range.

          • =: specifies to identify only risky operations in the selected time range. The risk identification rule in which this condition is specified is used to identify risky operations that are performed on data in the selected time range.

      • Configure threshold values for the risk identification conditions.

        DataWorks allows you to calculate and aggregate the number of occurrences of an event, and compare this number with the specified threshold value of event occurrences. You can also specify a time window for the threshold comparison condition to identify data risks. You can click + Add threshold comparison to add a threshold comparison condition to identify data risks. You can repeatedly perform this operation to add multiple threshold comparison conditions. Threshold comparison

        Parameter

        Description

        Threshold Category

        • Data volume: specifies to identify a data risk based on the number of data records on which you perform operations. If the number of data records on which you perform operations exceeds the threshold value that you specify, the risk identification rule that contains the threshold comparison condition is triggered. The number of data records can range from 1 to 10,000,000. Default value: 1.

        • Number of occurrences: specifies to identify a data risk based on the number of occurrences of a single event in a specified time range. If the number of occurrences of a single event in a specified time range exceeds the specified threshold value of event occurrences, the risk identification rule that contains the threshold comparison condition is triggered. The number of occurrences of an event is an integer that ranges from 1 to 10,000. Default value: 10.

          Note

          DataWorks categorizes and identifies a single event.

        Time window

        The time range within which an event occurs. Default value: 10 minutes. Valid values:

        • minutes: Valid values for this option range from 1 to 59.

        • hour: Valid values for this option range from 1 to 23.

        • day: Valid values for this option range from 1 to 7.

        Note

        This parameter is required only if the Threshold Category parameter is set to Number of occurrences.

    4. Click Next.

    5. Configure an alert notification method.

      You can specify an alert notification method to receive an alert notification at the earliest opportunity when a data risk is identified and handle the risk based on the alert notification. You can set an alert notification method to email or webhook URL.

      Note

      Before you select an alert notification method, make sure that you have configured the email address and webhook URL in Configure system settings.

    6. Click Save. A risk identification rule is created.

      A created custom risk identification rule does not automatically take effect. You must go to the Risk identification rule page and click Revalidate to manually make the rule take effect.

Manage a risk identification rule

On the Custom Identification Rules page, you can view the created rules and the details about the rules. You can also modify desired rules or perform operations on multiple rules at a time. Manage a risk identification rule

Section

Description

1

You can specify conditions to search for your desired rules in this section. The conditions include data risk type, risk level, built-in rule or not, and risk identification name.

Note

A fuzzy match is supported if you search for rules by name. After you enter a keyword in the search box, rules whose names contain the keyword are displayed.

2

You can perform the following operations in this section:

  • View the basic information about a rule: You can view the basic information about a created rule. The basic information includes the risk type, risk level, effective status, and the numbers of triggered risks, pending risks, and handled risks. You can learn the risk handling results of the current tenant based on the numbers of triggered risks, pending risks, and handled risks.

  • View the details about and modify a rule: You can click View details to view the configuration details about a rule. You can also modify the rule configurations based on your business requirements.

  • Revalidate a rule: You can click the Revalidate icon to revalidate an invalid rule.

    Note

    You can perform this operation only on invalid rules.

3

You can perform an operation on multiple rules at a time in this section. You can perform the operations such as Batch effective, Batch invalidation, or Batch delete on multiple rules at a time. You can click the Switch icon to switch between different operations.

Note

You cannot delete built-in risk identification rules. You can delete only custom risk identification rules that are in the invalid state.

Comparison of the positions of parameters for a risk identification rule in the old and new versions of the risk identification rule management feature

The following table describes the positions of parameters for a risk identification rule in the old and new versions of the risk identification rule management feature.

Note

For more information about the configurations of a risk identification rule in the new version of the risk identification rule management feature, see Create a risk identification rule. For more information about the configurations of a risk identification rule in the old version of the risk identification rule management feature, see Rule Settings tab.

No.

Configuration item

Position in the old version

Position in the new version

1

Rule name

Basic Settings > Rule Name

Basic information > Rule name

2

Rule owner

Basic Settings > Owner

By default, the owner of the rule is the current Alibaba Cloud account.

This configuration item does not exist. DataWorks records the owner of the rule.

3

Rule description

Basic Settings > Description

Basic information > Description information

4

Compute engine instance for which the rule takes effect

Rule Settings > Engine

To specify a compute engine instance in a risk identification condition, perform the following operations: In the Conditions section of the rule definition step, click Select condition and select Data location from the drop-down list.

5

Project for which the rule takes effect

Rule Settings > Project

To specify a project in a risk identification condition, perform the following operations: In the Conditions section of the rule definition step, click Select condition and select Data location from the drop-down list.

6

Data category for the data risk that you want to identify

Rule Settings > Classification

In the Conditions section of the rule definition step, click Select condition and select Data property. Select Data classification as a property category.

7

Sensitivity level of the data risk that you want to identify

Rule Settings > Level

In the Conditions section of the rule definition step, click Select condition and select Data property. Select Data grading as a property category.

8

Sensitive field type for the data risk that you want to identify

Rule Settings > Sensitive field type

In the Conditions section of the rule definition step, click Select condition and select Data property. Select Sensitive field type as a property category.

9

Type of the operation that is performed on data

Rule Settings > Export Type

Valid values:

  • All Export

  • Download Via Tunnel

  • Table Activity

Basic information > Rule Type

Valid values:

  • Data Access Risk

  • Data Export Risk

  • Data Operation Risk

  • Others

10

Table for which the rule takes effect

Rule Settings > Table Name

To specify a table in a risk identification condition, perform the following operations: In the Conditions section of the rule definition step, click Select condition and select Data location.

11

Field for which the rule takes effect

Rule Settings > Field

To specify a field in a risk identification condition, perform the following operations: In the Conditions section of the rule definition step, click Select condition and select Data location.

12

Users for which a risk identification rule is triggered when the users access data that is specified in the rule

Rule Settings > Visitors

To specify an information category in a risk identification condition, perform the following operations: In the Conditions section of the rule definition step, click Select condition and select User information.

13

Maximum number of data records that are specified in a risk identification rule

Rule Settings > Operated Data Volume

In the Conditions section of the rule definition step, click Select condition and select a condition. In the Threshold comparison section for the selected condition, select Data volume in a threshold comparison condition.

14

Time range that is specified in a risk identification rule

Rule Settings > Date

To specify a time range, perform the following operations: In the Conditions section of the rule definition step, click Select condition and select Operation time.

15

Alert notification method for a risk identification rule

Not supported

In the Alert Notification Method section of the Alert Settings step, select an alert notification method.

What to do next

After the risk identification rule is created and takes effect, you can go to the Data Risks page to view the details of risks that are identified based on the rule and handle the risks at the earliest opportunity.