AnalyticDB for MySQL enhances materialized view observability across the full lifecycle, offering latency alerts, lineage analysis, task stacking diagnosis, and quantitative value assessment to improve O&M efficiency and resource governance. - AnalyticDB

Background

AnalyticDB for MySQL supports materialized views, which can use either an incremental refresh or a full refresh based on query characteristics. A materialized view encapsulates the table structure and processing logic, and includes a built-in scheduling capability. You only need to define the business logic and data freshness requirements, without managing complex data pipelines and data lineage.

In a production environment, you may encounter the following challenges when using materialized views to build data pipelines at scale:

Difficult root cause analysis in data lineage: When a delay occurs in a downstream table within a cascading refresh pipeline, it is difficult to quickly pinpoint the problematic upstream node.
Delayed failure detection: Materialized view tasks can fail for valid reasons, such as schema changes in an upstream table. If external systems are not notified promptly, downstream services may consume incorrect data, leading to business losses.
Lack of data for governance: Numerous materialized views continuously consume CPU and storage resources and may incur additional computing costs. However, without a way to quantify their business value, it is difficult to identify and remove unused views.
Unpredictable latency fluctuations: A sudden increase in upstream data volume can cause a refresh task to take longer than usual, leading to task stacking for subsequent jobs. You need to quickly identify these abnormal batches.

Limitations

Your cluster must run kernel version 3.2.6 or later.

Note

To view and update the minor version, go to the Configuration Information section on the Cluster Information page in the AnalyticDB for MySQL console.

Proactive alerting: detect refresh failures and latency

The AnalyticDB for MySQL kernel can detect abnormal events in materialized view tasks, such as failures or latency, and automatically send alert events to Alibaba Cloud Monitor. You can receive these alerts in real time through channels like SMS, DingTalk, or Lark.

Configure latency alerting

Make sure that you have created a materialized view.
In the left-side navigation pane, choose Data Management > Data Catalog, click the database that contains the materialized view, and then click the Materialized Views tab.
Click the name of the materialized view to go to its details page.

On the Refresh Policy tab of the view details page, find the Refresh Latency Alerting setting and click to configure it. In the Define refresh latency alerting rule dialog box, turn on the Enable refresh latency alerting switch and set the Maximum tolerated latency, for example, 5 minutes. An alert is triggered if a refresh task is still running after its next scheduled start time plus the configured maximum tolerated latency. The dialog box also displays an alert message template in JSON format, which includes fields such as avgDurationInMs, mvSchemaAndName, and refreshState. Click OK to save your settings.
After you enable alerting, go to Alibaba Cloud Monitor and complete the event subscription.
- Subscription type: System event
- Product: AnalyticDB for MySQL V3.0
- Event type: Abnormal
- Event name: Materialized view refresh task failed & timed out
- Event level: Critical/Warning

Data lineage: locate abnormal nodes

For materialized view pipelines that use nested refreshes, the console provides a DAG graph that lets you recursively view all upstream tables of a materialized view, including base tables and other materialized views.

In the left-side navigation pane, choose Data Management > Data Catalog. Click the database that contains the materialized view, and then click the Materialized Views tab. Click the name of the materialized view and go to the Lineage tab.

The Data Lineage tab contains two sub-tabs: Base Tables and Upstream Source Table Topology. In the topology graph, the connection lines between nodes are labeled with the refresh method, such as COMPLETE.

Task stacking analysis

Materialized view refresh tasks may experience delays due to common causes like I/O resource contention during peak write hours or insufficient computing resources. Quickly determining whether a current task is abnormal is crucial for operations.

View the refresh history of a materialized view in the console

On the details page of the materialized view, you can view the refresh history on the Refresh Policy tab.

Refresh task status descriptions
- PENDING: Waiting for the scheduling cycle to start. Resources have not yet been allocated.
- RUNNING: The task has entered the scheduling cycle and is either waiting for or has been allocated computing resources.
- FINISHED: The refresh task completed successfully.
- FAILED: The refresh task failed. The system automatically retries up to 10 times. If it still fails, the task for the current cycle is skipped. For incremental views, the next refresh scans the missing data to ensure eventual consistency.
- TIMEOUT: The task ran for too long. The system marks it as a timeout and triggers it again. Submit a ticket for troubleshooting.
Two refresh policies are available. For more information about their differences, see How to select a refresh policy.
- COMPLETE: Full refresh.
- FAST: Incremental refresh.
Query auto-refresh records using SQL
```
-- Query auto-refresh tasks for a specified materialized view.
-- mv_schema: The database where the materialized view is located.
-- mv_name:   The name of the materialized view.
SELECT * FROM information_schema.mv_auto_refresh_jobs WHERE mv_schema = 'db_name' AND mv_name = 'mv_name';
```
To determine if a task is abnormal:
1. Find the latest refresh task, which is usually in the PENDING or RUNNING status.
2. Calculate the average duration of historical tasks (end time - start time).
3. Compare the average duration with the current task's duration. If the task is in the RUNNING state, calculate its duration by using the formula: current time - start time.
4. If the current duration is significantly longer than the historical average, the task may be abnormal and requires further diagnosis.
For more information, see Query the refresh history of a materialized view.

Quantify value for cost governance

When many materialized views exist, you need to identify low-value or unused views to optimize resource usage.

On the details page of the materialized view, you can view usage metrics on the Query Popularity tab.
- Explicit access (explicit_hit): A user explicitly queries the materialized view in SQL. This includes references from parent views to child views during a nested refresh.
- Implicit access (implicit_hit): The query rewrite mechanism automatically uses the materialized view to accelerate a query.
Note
- If a materialized view is not accessed for 30 consecutive days, the system sends a reminder.
- These statistics cover the last 7 days and are updated hourly by default.

Query usage data using SQL.

-- Query hit statistics for a specified materialized view.
-- mv_name: The name of the materialized view.
-- explicit_hit:  The number of explicit hits where the user directly queries the materialized view.
-- implicit_hit: The number of implicit hits where the optimizer automatically rewrites a query and uses the materialized view for acceleration.
SELECT explicit_hit, implicit_hit
FROM mv_info
WHERE mv_name = 'mv_db';

You can use this data to evaluate the actual business value of a materialized view, to make precise data governance decisions.