Backend and monitoring developers likely have experienced this anxiety: When the log Data Volume reaches a certain Size, previously smooth queries start to fail or become unresponsive. The monitoring service sends a flood of alarms, or stakeholders need data urgently, but the log API you invoke stalls, eventually returning a Request Timeout.
Recently, we collaborated with a power user (a large business team) to implement Simple Log Service (SLS) materialized views in their core log scenarios. We compared the performance before and after the feature was Enabled in the production environment. Whether in terms of hard Performance Data or actual user experience, the difference is significant.
This article combines real business scenarios and Result Data to review how we optimized several slow queries that frequently timed out to achieve responses within seconds.
This is a very typical automated monitoring scenario. The User's monitoring service invokes the log API at high frequency via the SDK to pull the invocation latency data between services.
The difficulty lies in "high concurrency + Dynamic conditions." The monitoring program sends a large number of Requests in a short time, and the query conditions for each Request change. For example, it queries columnx:"abc" in one second, and columnx:"abd" in the next. This usage puts significant pressure on the backend. Before optimization, the average query took 4100 ms. This creates a vicious loop: slow query -> thread pool backlog -> concurrent processes further competing for resources -> eventually widespread timeouts.
SQL after removing business semantics:
query| select
column1, column2, column3,
(timestamp - timestamp % 3600) as time_slot,
count(*) as cnt,
avg(metric_val) as avg_lat
from log
group by column1,column2,column3,time_slot
After using materialized views: The query duration plummeted to 46 ms, an 89-fold performance improvement. More importantly, now no matter how high the SDK concurrency is, or how the query conditions change, because only the pre-computed Result needs to be read, the response time is very stable, completely eliminating the timeout problem under high concurrency.
Anyone who has worked with data knows that count(distinct) is a notoriously resource-intensive operation, especially in scenarios with large Data Volume.
User SQL:
query | select
project_id,
count(1) as event_cnt,
count(distinct hash_val) as issue_cnt
from log
group by project_id
To get a distinct count of issue signatures (represented by the hash value) after removing duplicates, this SQL struggles when the Data Volume is large.
Before optimization: This query previously took an average of 16.8 s. If the Time Range is slightly extended (such as viewing the Trend over the past month), or the peak Traffic is slightly larger, the query often fails.
After optimization: Accelerated by materialized views, the query time dropped to 2.2 s, an 8-fold Performance improvement, transforming this feature from “frequently unusable” to “reliably available.”
This is the scenario with the largest Performance improvement in this optimization. The User has a requirement to View the comparative change of operation log read latency (comparing Data from 1 Day ago, 3 Days ago, and 7 Days ago).
User SQL:
type:read|
select
time,
diff [1] as day1,
diff [2] as day2,
diff [3] as day3,
diff [4] as day7
from (
select
time,
ts_compare(avg_latency, 86400, 172800,604800) as diff
from (
select
avg(latency) as avg_latency,
date_trunc('hour', __time__) as time
from log
group time )
group by time order by time )
This SQL involves ts_compare and multilayer subquery nesting. When the query Time Range is large, the computational load is very high.
Before optimization: Duration was 54.3 s. If the backend service jitters slightly, the User's Request times out, rendering it essentially unusable.
After optimization: Duration is 958 ms. From a long wait of nearly one minute, it dropped to under one second. Performance improved by 56 times. This experience change from “unqueryable” to “near-instantaneous results” is the most tangible for O&M personnel waiting on the data.
A cost-benefit analysis
The ROI (Return on Investment) of this optimization is very high:
● High utilization rate: In one day, these views served a cumulative total of 10,223 queries.
● Extremely low cost: You may worry whether storing a copy of the Result is expensive. In fact, the added storage cost is less than 0.1% of the raw log storage fee, which is negligible.
Based on this practical experience, we also summarized three scenarios most suitable for SLS materialized views. If your business also fits the following situations, enable materialized views directly:
A picture is worth a thousand words. We have condensed the core Performance Metrics and best Scenarios of this practice into the following infographic, hoping to provide a reference for your Performance Optimization.

Alibaba Cloud Observability and Datadog Release OpenTelemetry Go Automatic Instrumentation Tool
677 posts | 56 followers
FollowAlibaba Cloud Native Community - January 27, 2026
Alibaba Cloud Native Community - November 20, 2025
Alibaba Cloud Native Community - November 4, 2025
Alibaba Cloud Native Community - November 6, 2025
ApsaraDB - September 24, 2025
Alibaba Cloud Native - September 12, 2024
677 posts | 56 followers
Follow
Simple Log Service
An all-in-one service for log-type data
Learn More
Big Data Consulting for Data Technology Solution
Alibaba Cloud provides big data consulting services to help enterprises leverage advanced data technology.
Learn More
Big Data Consulting Services for Retail Solution
Alibaba Cloud experts provide retailers with a lightweight and customized big data consulting service to help you assess your big data maturity and plan your big data journey.
Learn More
Cloud-Native Applications Management Solution
Accelerate and secure the development, deployment, and management of containerized applications cost-effectively.
Learn MoreMore Posts by Alibaba Cloud Native Community