Detect Threats Faster with Agentic SOC Monitoring & Alerts - Security Center

This topic describes how to use the cloud observability feature of Agentic SOC and Simple Log Service (SLS) to set up automated monitoring and alerting for core Agentic SOC metrics, such as service health and log usage, to improve service availability and Operations and Maintenance (O&M) efficiency.

Scenarios

Agentic SOC is a core security service whose stability and health are crucial. During daily O&M, you may encounter the following issues:

Service availability risks: Issues such as interruptions in Agentic SOC log ingestion or abnormal operations of core modules may not be detected promptly. This can degrade or disable security analysis capabilities.
Cost management difficulties: If log ingestion traffic exceeds expectations, it can result in unexpected storage and query costs in SLS. Effective monitoring and early alerting are necessary.
Low O&M efficiency: The lack of a unified monitoring view and alerting mechanism hinders the integration of the Agentic SOC operational state into your existing O&M system.

Workflow

This solution uses the cloud observability feature of Agentic SOC to deliver runtime logs to SLS. The alerting feature of SLS is then used to monitor the logs and send notifications.

Log generation: Agentic SOC modules, such as usage metering and module health, generate monitoring logs at runtime.
Log delivery: After you enable the cloud observability feature, Agentic SOC delivers these monitoring logs in real time to your specified SLS project.
Log storage: The logs are stored in an SLS Logstore.
Alerting and monitoring: You can create an alert rule in SLS to periodically execute a query statement (SQL). The system then determines whether the trigger conditions are met based on the query results.
Notification sending: When an alert is triggered, a notification is sent to a specified channel, such as text message, DingTalk, or email, using an action policy.

Steps

Step 1: Enable the cloud observability feature

First, you need to enable the cloud observability feature in the Agentic SOC console to deliver monitoring logs to Simple Log Service (SLS).

Go to the Cloud Observability configuration page
1. Go to Security Center console > System Settings > Feature Settings, and in the top-left corner of the page, select the region where your assets are located: Chinese Mainland or Outside Chinese Mainland.
2. On the Settings tab, click Cloud Observability.
On/Off switch
On the Cloud Observability configuration tab, in the Basic Settings area, turn on the Enable Cloud Observability switch.
Configure log storage information
In the Detailed Configuration area on the Cloud Observability configuration tab, configure the following settings:
- Monitoring Module: Turn on the switches for the logs that you want to deliver.
  - Module Health: Monitors the operational state, connection status, and performance of each functional module.
  - Usage Metering: Monitors the usage of log ingestion traffic and log storage capacity.
- Log Storage Location:
  - Region:: During the initial setup, you must select a region to store logs for Cloud Observability.
    Warning
    The log storage region cannot be changed after the initial configuration. The system automatically creates a dedicated SLS Project and Logstore in this region.
  - Project: The system automatically creates a project based on the region. The format is sas-observability-AccountUID-RegionID.
  - Logstore Mapping:: The system automatically creates two Logstores.
    - health-log: Stores Module Health logs.
    - metering-log: Stores Usage Metering logs.
- Data Retention Days: Specify the retention period for cloud observability data in Simple Log Service. The default value is 30 days. You can change this value as needed.
  Note
  A longer retention period results in higher storage costs.
Save Configuration: Click Save Configuration. After the configuration is saved, Agentic SOC starts to deliver logs to the specified SLS Project.
Important
Log storage for the cloud observability feature incurs additional fees, which are billed by SLS.

Step 2: Configure alert notification rules

Procedure

On the Cloud Observability tab, click Alert Center in the lower-right corner to navigate to the Alert Center configuration page of the Cloud Observability project .

On the Alert Rules tab, click Create Alert. The parameters are described as follows:

Note

For more information, see Create an alert monitoring rule.

Parameter	Description
Rule Name	The name of the alert rule.
Check Frequency	Simple Log Service checks query and analysis results at the frequency you configure. Hourly: Checks the query and analysis results every hour. Daily: Checks the query and analysis results at a fixed time every day. Weekly: Checks the query and analysis results at a fixed time on a specific day of the week. Fixed Interval: Checks the query and analysis results at a fixed interval. Cron: Checks the query and analysis results at an interval specified by a cron expression. Note The minimum precision for a cron expression in an alert rule is one minute. The format uses the 24-hour clock. For example: `0/5 * * * `: Checks every 5 minutes, starting from the 0th minute. `0 0/1 * `: Checks every hour, starting from 00:00. `0 18 * `: Checks every day at 18:00. `0 0 1 *`: Checks at 00:00 on the first day of every month. For more information about the syntax of cron expressions, see Cron jobs.
Query Statistics	Click the input box. In the Query Statistics dialog box, configure a query statement. Associated Report tab: Select a dashboard. Advanced Configuration tab: From the Type list, select a data type: Logstore: stores logs. For more information about query and analysis configurations, see Get started with log query and analysis. Metricstore: stores metrics. For more information about query and analysis configurations, see Query and analyze metric data. Resource Data: configures external data that is associated with the alert rule. For more information, see Create resource data. If you set Type to Logstore or Metricstore and specify a query statement, you can specify whether to enable Dedicated SQL. For more information, see High-performance and fully accurate query and analysis (Dedicated SQL). Auto: Dedicated SQL is not used by default. If a query concurrency limit is reached or the query results are inaccurate, Simple Log Service automatically retries the query using Dedicated SQL. Enable: Dedicated SQL is always used for query and analysis. Disable: Dedicated SQL is disabled. If you configure multiple query statements, you can specify a Set Operation to associate the query results. For more information, see Configure query statements.
Group Evaluation	Simple Log Service can group query and analysis results. For more information, see Configure group evaluation. Custom Label: Simple Log Service groups query and analysis results based on the fields that you specify. After the results are grouped, the trigger condition is evaluated for each group. In each check cycle, if the results in a group meet the trigger condition, an alert is generated for that group. You can specify multiple fields. No Grouping: In each check cycle, only one alert is generated when the trigger condition is met. Auto Label: If you select Metricstore in the Query Statistics section to monitor the query and analysis results of metrics, Simple Log Service automatically groups the results. After the results are grouped, the trigger condition is evaluated for each group. In each check cycle, if the results in a group meet the trigger condition, an alert is generated for that group.
Trigger Condition	Configure the trigger condition and alert severity. Trigger condition Data exists: An alert is triggered if data exists in the query and analysis results. A specific number of data entries exists: An alert is triggered if N data entries exist in the query and analysis results. Data matches the expression: An alert is triggered if data that matches the conditional expression exists in the query and analysis results. A specific number of data entries matches the expression: An alert is triggered if N data entries that match the conditional expression exist in the query and analysis results. Severity This parameter is mainly used for alert denoising and notification control. You can add conditions based on alert severity when you create an alert policy or action policy. For more information, see Set alert severity. Simple configuration: Select an alert severity. All alerts generated by this rule will have the same severity. Conditional configuration: Click Add to set the alert severity based on different conditions. For more information about the syntax of conditional expressions, see Syntax of conditional expressions.
Add Label	Add identifying attributes to generated alerts in key-value format. Labels are mainly used for alert denoising and notification control. You can add conditions based on labels when you create an alert policy or action policy. For more information, see Add labels and annotations.
Add Annotation	Add non-identifying attributes to generated alerts in key-value format. Annotations are mainly used for alert denoising and notification control. You can add conditions based on annotations when you create an alert policy or action policy. For more information, see Add labels and annotations. You can also turn on the Auto-add Annotations switch. The system then automatically adds information such as __count__ to alerts. For more information, see Automatic annotations.
Recovery Notifications	If you turn on the Recovery Notifications switch, a recovery alert is triggered when an alert is resolved. For example, you create an alert rule to monitor the CPU metrics of each host. An alert is triggered if the CPU utilization exceeds 95%. A recovery notification is sent after the CPU utilization drops to a normal value (less than or equal to 95%). For more information, see Set up recovery notifications.
Advanced Configuration > Continuous Trigger Threshold	The number of consecutive checks in which the trigger condition is met before an alert is generated. Checks where the condition is not met are not counted.
Advanced Configuration > No-data Alert	If you turn on the No-data Alert switch, an alert is generated if the number of times that no data is returned in the query and analysis results exceeds the Continuous Trigger Threshold. If multiple query statements are used, the count is based on the result of the set operation. For more information, see No-data alerts.
Outputs	Configure where alert events are sent. You can configure one or more outputs. Event Store: Writes alert events to the eventstore. CloudMonitor Event Center: Writes alert events to the Event Center of CloudMonitor. You can then use CloudMonitor to manage alerts and send notifications. SLS Notification: Sends alert events to the notification service of SLS. You can then use alert policies and action policies to manage alerts and send notifications.
Outputs - EventStore	Enable: If you turn on this switch, alerts are written to the specified EventStore. Region: The region where the destination EventStore is located. Project: The project to which the destination EventStore belongs. Event Store: The eventstore that stores alerts. Authorization Method: Default Role: Click Go to Authorize and complete the authorization as prompted. This grants Simple Log Service the permissions of the AliyunLogETLRole system role to write alerts to the destination EventStore. For more information, see Grant permissions using a default role. Custom Role: Assume a custom role to write alerts to the destination EventStore. Enter the Alibaba Cloud Resource Name (ARN) of the role. For more information, see Grant permissions using a custom role.
Outputs - CloudMonitor Event Center	Enable: If you turn on this switch, alerts are sent to the Event Center of CloudMonitor. For more information, see View system events.
Outputs - SLS Notification	Enable: If you turn on this switch, alerts are sent to the notification service of SLS for management and notification. Alert Policy Simple Mode Simple Log Service uses the built-in dynamic alert policy (sls.builtin.dynamic) to manage alerts by default. Configure an action group. After you configure an action group, Simple Log Service automatically creates an action policy named `Rule Name-Action Policy`. All alerts triggered by this alert rule use this action policy to send notifications. For more information about how to configure an action group, see Notification channels. Important You can modify this action policy on the Action Policies page. For more information, see Action policies. If you add a conditional expression when you modify the action policy, the Alert Policy mode automatically changes to Standard Mode. Repeat Interval: Within this interval, duplicate alerts trigger the action policy only once. This means only one alert notification is sent. Standard Mode Simple Log Service uses the built-in dynamic alert policy (sls.builtin.dynamic) to manage alerts by default. Select a built-in or custom action policy to send alert notifications. For more information about how to create an action policy, see Action policies. Repeat Interval: Within this interval, duplicate alerts trigger the action policy only once. This means only one alert notification is sent. Advanced Mode Select a built-in or custom alert policy to manage alerts. For more information about how to create an alert policy, see Create an alert policy. Select a built-in or custom action policy to send alert notifications. For more information about how to create an action policy, see Action policies. You can also enable or disable Custom Action Policy. For more information, see Dynamic action policy mechanism. Repeat Interval: Within this interval, duplicate alerts trigger the action policy only once. This means only one alert notification is sent.

After you complete the configuration, click OK.

Configuration examples

Traffic drop to zero

Scenario: Log ingestion traffic suddenly drops to 0, and no more data is written to Agentic SOC.
Solution: The system checks the log volume of the last 10 minutes every 10 minutes. If the log volume is 0, the system determines that data reporting is interrupted and triggers an alert. The alert is sent to the specified recipient by text message. A 10-minute cool-down period is set to ensure prompt detection and response to data link anomalies.
Configuration parameters:
- Check Frequency: Fixed Interval of 10 minutes.
- Query Statistics: Click Add. On the Advanced Configuration tab of the Query Statistics dialog box, use the following configuration:
  - Type: Logstore
  - Authorization Method: Default.
  - Logstore: metering-log
  - Dedicated SQL: Disable.
  - Query Interval: 10 minutes (relative to the hour). The query SQL is as follows:
```
* and type: log_traffic |
select if(t.log_size is null, 0, t.log_size) from (select sum(log_size) log_size from log) t
```
- Evaluation by Group: No Grouping.
- Trigger Condition: Data matches condition. The Evaluation Expression is _col0<=0.
- Output Target: Select SLS Notification and turn on the switch.
  - Alert Policy:
    - Mode: Simple Mode.
    - Action Group:
      - Channel: Text Message. For information about how to configure other channels, see Notification channel overview.
      - Recipient Type: Static Recipient.
      - Content Template: SLS built-in content template.
      - Sending Period: Any.
  - Retry Interval: 10 minutes.

Access issues

Scenario: The accessed state of a data source in the Integration Center is abnormal.
Configuration: The Logstore for Module Health is checked every 15 minutes. An alert is triggered if it contains data where the value of status is not equal to normal.
Configuration items:
- Check Frequency: Fixed Interval of 15 minutes.
- Query Statistics: Click Add. On the Advanced Configuration tab of the Query Statistics dialog box, use the following configuration:
  - Type: Logstore
  - Authorization Method: Default.
  - Logstore: health-log
  - Dedicated SQL: Disable.
  - Query Interval: 15 minutes (relative to the hour). The query SQL is as follows:
- Trigger Condition: Data matches condition. The Evaluation Expression is count>0.
- Output Target: Select SLS Notification and turn on the switch.
  - Alert Policy:
    - Mode: Simple Mode.
    - Action Group:
      - Channel: Text Message. For information about how to configure other channels, see Notification channel overview.
      - Recipient Type: Static Recipient.
      - Content Template: SLS built-in content template.
      - Sending Period: Any.
  - Retry Interval: 15 minutes.

Costs and risks

Cost description: After you enable the Cloud Observability feature, monitoring logs are continuously delivered to Simple Log Service (SLS). This incurs fees for log storage (with a default retention period of 30 days) and query and analysis. These fees are billed by SLS.
Key risk: The log storage region cannot be changed in the console after the initial configuration. Therefore, choose the region carefully during the initial configuration. An incorrect region may increase data link latency and management complexity.