All Products
Search
Document Center

Dataphin:Create and manage identification rules

Last Updated:Jun 23, 2026

Identification rules detect and classify sensitive business data that requires a high level of security.

Limitations

By default, identification rules do not automatically scan views. You can enable view scanning in the rule execution configuration, or manually add or batch-import identification results for views.

Permissions

  • Security administrators and custom global roles with the Identification Rule-Manage permission can create and manage identification rules.

  • You can manage only the rules that you own. Management tasks include editing, deleting, resetting, testing, transferring, manually running, enabling, and disabling the rules.

Create an identification rule

  1. On the Dataphin homepage, choose Governance > Data Security from the top navigation bar.

  2. In the left-side navigation pane, choose Data Identification > Identification Rules. On the Identification Rules page, click Create Identification Rule.

  3. In the Create Identification Rule dialog box, configure the parameters.

    Parameter

    Description

    Basic settings

    Rule Name

    The name of the identification rule. Naming requirements:

    • Can contain Chinese characters, letters, digits, and underscores (_).

    • Must not exceed 12 characters.

    Description

    A custom description for the rule, up to 128 characters long.

    Data classification and sensitivity level

    Data category

    Select the data categories to scan. Options include all categories, all categories under a specified directory, or specific categories.

    • All Categories: All active data categories in the current tenant.

    • All Categories Under a Specified Directory: All active data categories in the specified directory and its subdirectories.

    • Specified Data Categories: Filters active data categories under the current directory and its subdirectories based on the parent directory. To add more data categories, click Add a Category Group to add multiple directories.

    Note

    When the compute engine is StarRocks, Aliyun EMR Serverless Spark, or OushuDB (supported in single-tenant, multi-engine deployment mode) and the field type is HyperLogLog (HLL), content-based identification is not supported.

    Scan scope

    Data source

    The scope of assets to scan. Select assets from a Compute Source or a Data Source.

    • Compute Source: Allows you to select Dataphin tables within a specific data domain or project.

    • Data Source: You can select only data sources for which a metadata collection task has been configured. For a list of supported data sources, see Supported data sources for Dataphin.

    Compute source table scan scope

    This option appears only when you select Compute Source.

    • The logical relationship between conditions can be AND or OR.

    • You can define the scope by Data Domain, Project, or Data Table.

    • The matching conditions include All, Belongs to, Does not belong to, Contains, Does not contain, Regex (case-insensitive), and Regular Expression.

      • All: Selects all assets within the current Dataphin instance.

      • Belongs to/Does not belong to: Select one or more specific resources.

      • Contains/Does not contain: Matches by keyword. For example, to match a user information table, you can enter user_info.

      • Regex (case-insensitive): Enter a regular expression. For example, to match all items whose names contain test, use the expression .*test.*. The match is case-insensitive.

      • Regular Expression: Enter a regular expression. For example, to match all items whose names contain test, use the expression .*test.*.

    Note
    • You can add up to five scope rules with a maximum of two nested levels.

    • You can select up to 100 data domains or projects.

    Data source table scan scope

    This option appears only when you select Data Source.

    • Data Source: Select one or more data sources to scan.

    • Data Scope: You can choose to scan All tables or Specified tables. If you choose Specified tables, you can add filter conditions based on full table name, asset inventory tag, table description, or db/schema to refine the asset scope. You can add up to 10 filter conditions with a logical relationship of AND or OR.

      • Full table name/Table description/db/schema: The available filter conditions are prefix match, suffix match, contains (only for table descriptions), and belongs to (only for db/schema).

        • Prefix match, Suffix match, Contains: You can enter up to 256 characters.

        • Belongs to: Allows you to select up to 500 assets of the corresponding type from the current source.

      • Asset Inventory Tag: The available filter conditions are Contains any and Contains all.

        • Contains any: Matches an asset if it has at least one of the selected inventory tags.

        • Contains all: Matches an asset only if it has all of the selected inventory tags.

  4. Click OK to create the identification rule.

    The new rule appears in the identification rule list and is enabled by default. It runs automatically the next day according to its execution schedule.

Identification rule list

  1. The identification rule list displays the name, data category, owner, last updated time, and status of each rule. Click the Description button to view details about data sampling, identification results, and result management.

  2. You can search for rules by name or apply filters for data category, owner, or Owned by me.

  3. You can perform the following actions on a target identification rule.

    Actions

    Description

    Enabled

    Toggle the switch in the Enabled column. When enabled, the rule runs according to its scheduled and real-time scan settings and generates execution records. When disabled, you can manually trigger the rule for a specific scope.

    Note

    Disabling a rule does not affect previously generated identification results.

    Reset

    Click Reset in the Actions column or at the bottom of the page. This clears all existing tagging results within the rule's scan scope, then reruns the identification process to generate the latest results.

    View Details

    Click View Details in the Actions column to see the configuration details of the rule.

    Edit

    Click Edit in the Actions column to modify the rule's information.

    Manual Run

    In the Actions column, click the More icon and select Manual Run, or click Manual Run at the bottom of the page to run the selected rule. If auto-inheritance based on data lineage is enabled, identification results can be automatically inherited. For more details, see Data lineage-based inheritance.

    When you run a batch manual scan, you can run both enabled and disabled rules. The available execution scopes are All rules (including disabled rules) and Enabled rules only.

    Copy

    Click Copy in the Actions column to quickly create a duplicate of the rule.

    Transfer

    In the Actions column, click the More icon and select Transfer, or click Transfer at the bottom of the page. Select a new owner for the rule and click OK. You can transfer an identification rule only to a security administrator.

    Delete

    In the Actions column, click the More icon and select Delete, or click Delete at the bottom of the page. Deleting a rule removes all data classification and sensitivity level tags that were applied by it. This removal takes effect the next day. Previously generated execution records are not affected.

    Test

    When you test a rule on a specified project, data source, or table, it applies classification, sensitivity level, and rule-based tags to assets within that scope; otherwise, these actions are ignored. A default test extracts sample data from the rule's scan scope and performs these actions on the sample.

    Click Test at the bottom of the page and select the projects, data sources, or data tables you want to test. You can select up to 10 projects or 10 tables.

    After the test is complete, click View Test Results to see the details.

    Note
    • Test runs only display results for the sample data and do not apply any actual tags.

    • Test runs consume computing resources for data scans and computations. To minimize resource usage and execution time, define a precise test scope. Execution time varies depending on the number and complexity of rules in the selected scope.

    • A test evaluates only whether a single identification rule can identify sensitive data. In a real scan, multiple matching rules are evaluated and one is chosen based on priority, so test tagging results may differ from actual results.

Manually trigger an identification rule

  1. On the Identification Rules page, click Manual Rule Scan to open the Manual Rule Scan dialog box.

  2. In the Manual Rule Scan dialog box, configure the parameters.

    Parameter

    Description

    Scan scope

    Define the scan scope by selecting one of the following options: Full Database Scan, Scan by Project, Scan by Data Source, or Scan by Table.

    • Full Database Scan: Scans all data within the Dataphin instance.

    • Scan by Project: Scans all data within the selected projects.

    • Scan by Data Source: Scans all data within the selected data sources.

    • Scan by Table: Scans all data within the selected data tables. You can select up to 10 tables from a project or data source.

    Rule execution scope

    Define which rules to run by selecting either Enabled rules only or All rules (including disabled rules).

    • Enabled rules only: Includes all identification rules in Dataphin that are currently enabled.

    • All rules (including disabled rules): Includes all identification rules in Dataphin, regardless of their status.

    Note
    • You must first enable auto-inheritance and select the Triggered by rule execution scenario in the auto-inheritance configuration. For details, see Auto-inheritance configuration.

    • When enabled, a manual scan also triggers automatic inheritance. Downstream fields inherit the sensitivity level of their direct upstream fields based on data lineage, expanding scan coverage and improving result consistency across related data.

    • Enabling auto-inheritance expands the scan scope and consumes additional computing resources. Configure this feature based on your business needs.

  3. Click OK to start the scan on the selected assets.

    Go to the Execution Records page to monitor progress. Scan duration varies depending on the amount of data being scanned.

Next steps