Classify and grade data using AI large language models - Data Security Center

Data Security Center (DSC) now supports sensitive data classification and grading based on a hybrid architecture. This architecture combines AI Large Language Models (LLMs), expert small models, and traditional regular expression rules. Compared to the previous method that relied only on regular expressions and keyword rules, this new solution offers significant improvements in coverage, accuracy, and intelligence.

Major improvements include the following:

Upgraded detection capabilities
Combines the Qwen LLM and domain-specific expert small models to automatically detect over 800 data types. This includes structured data, such as database fields, and unstructured data, such as documents, images, and logs.
Improved detection performance
Overcomes the limitations of traditional rule-based methods for detecting content with implied semantics, varied formats, or context-sensitive information. This significantly increases the accuracy and recall rate.
Flexible configuration and efficient response
Supports custom classification and grading policies and provides millisecond-level inference responses to meet compliance and administration requirements in various business scenarios.
Seamless integration and deployment
Integrates with your existing data security system with one click through a cloud-native architecture. You can quickly enable intelligent classification and grading capabilities without modifying your infrastructure.

The release of this capability marks a shift in sensitive data detection from a rule-driven to an AI-driven approach. It provides core support for enterprises to build an accurate, efficient, and scalable data security system.

Procedure

Classification and Grading Enabled

DSC provides a bonus AI detection quota based on your purchased Data Identification quota:

For every 10,000 tables purchased for Data Identification - Database Table Quantity, you receive a bonus of 60,000 AI Text Detection calls.
For every 1 TB of storage purchased for Data Identification - Storage Identification Capacity, you receive a bonus of 4,000 AI Text Detection calls and 4,000 AI Image Detection calls.

If you have purchased DSC and enabled the classification and grading feature for your data assets, the system automatically enables the complimentary AI detection capabilities for data classification and grading.

If data classification and grading is not enabled or you have not purchased Data Security Center

If you have not purchased Data Security Center, you can purchase it from the Data Security Center purchase page. If you purchase the Value-added Service Only edition, you must separately enable the Data Identification feature. This feature is included by default in the Premium and Enterprise editions. For more information about how to purchase and select an edition, see Purchase Data Security Center.

DSC provides a bonus AI detection quota based on your purchased Data Identification quota:

For every 10,000 tables purchased for Data Identification - Database Table Quantity, you receive a bonus of 60,000 AI Text Detection calls.
For every 1 TB of storage purchased for Data Identification - Storage Identification Capacity, you receive a bonus of 4,000 AI Text Detection calls and 4,000 AI Image Detection calls.

One-click AI data detection: Go to the Overview page of the Data Security Center console. In the One-click AI Data Detection dialog box that appears, follow the on-screen instructions to enable the classification and grading feature for all your cloud assets with one click.
Enable AI-powered data detection for a specific asset: Go to the Data Security Center console. In the navigation pane on the left, choose Asset Center. Then, click Asset synchronization and find the target asset. In the Classification and Grading column for the asset, click to enable the feature. For more information, see Asset Center (New Version).

After the classification and grading feature is enabled, the system automatically enables the complimentary AI detection capabilities for data classification and grading. In the navigation pane on the left, you can choose Classification and Grading > Asset Insight to view the results.

Quota consumption rules and pricing

Offline scan scenarios:
- Scanning 1 TB of text files consumes approximately 100,000 text detection calls.
- Scanning 10,000 data tables consumes approximately 3,000,000 text detection calls.
- Scanning 1 TB of images consumes approximately 1,000,000 image detection calls.
API call scenarios:
- Text detection: Each text input that contains at least one piece of sensitive information consumes one call. The text for a single input cannot exceed 2,000 characters. Text that exceeds the limit is not scanned.
- Image detection: Each scanned image consumes one call. A single image cannot exceed 10 MB. Images that exceed the limit are not scanned.

Feature	Pricing
AI Text Detection	Number of calls:
AI Image Detection	Number of calls:

Handling AI detection quota depletion

When your AI detection quota is depleted, the following effects occur:

Text detection: All Identification Models will stop invoking the AI LLM. This reduces detection accuracy by 15% to 30%.
Image detection: DSC cannot detect images.

At this point, the DSC console prompts you to upgrade. You can follow the on-screen instructions or click the Upgrade button on the Overview page to start the upgrade process. On the upgrade page, in the AI Data Security Detection section, select the required number of AI Text Detection and AI Image Detection calls to complete the upgrade.

Note

To save quota, you can also manually control the AI LLM invocation status for Identification Models or OSS Identification Tasks.

Identification Models: Go to Classification and Grading > Config. On the Identification Models tab, find the target detection model and click the switch icon in the AI Model Invocation column to disable the feature. This operation is not recommended unless necessary. For more information, see View and configure detection templates.
OSS Identification Tasks: Go to Classification and Grading > Tasks. On the Identification Tasks tab, select OSS and then click Create. On the Create page, turn off the AI-Powered Image Detection switch. For more information, see Scan for sensitive data using a detection task.

Appendix: Detection models that support LLM invocation

The names of the detection models that support LLM invocation are as follows:

Address
- Address (Malaysia)
- Address (English)
- Address (the Chinese mainland)
- Residential Address
Name
- Name (Malaysia)
- Name (English)
- Name (Traditional Chinese)
- Name (Simplified Chinese)
- Personal Name
Identity/Certificate
- Passport Number (the Chinese mainland)
- U.S. Social Security Number (SSN)
- ID Card Number (Hong Kong (China))
- Passport
- ID Card
Contact Information
- Landline Number (United States)
- Landline Number (the Chinese mainland)
- Personal Phone Number
Bank/Payment
- Credit Card Number
- Bank Card Number (the Chinese mainland)
- Bank Account
Organization/Enterprise Qualification
- Tax Registration Certificate Number
- Unified Social Credit Code
- Organization Code
- Business License Number