Simple Log Service projects support SQL statements for query result analysis. This topic provides an overview of the basic SQL syntax for analytic statements.
Basic syntax
Query statements and analytic statements are delineated by |
. The structure is as follows:
query statement|analytic statement
Query statements can function independently, while analytic statements must accompany query statements. The log analysis feature enables data analysis within search results or across all data in a logstore.
We recommend limiting a query statement to 30 conditions or fewer.
In the absence of a FROM or WHERE clause in an analytic statement, the current logstore's data is analyzed by default. Analytic statements are case-insensitive, do not support the use of offset, and do not require a semicolon to conclude.
Statement type | Description |
Query statement | A query statement defines one or more search conditions. It can consist of keywords, numeric values, ranges, a space, or an asterisk (*). Specifying a space or asterisk (*) implies no conditions. |
Analytic statement | Analytic statements are used to aggregate or analyze data from search results or all logs. For detailed information on analysis functions and syntax supported by Simple Log Service, see the following topics: |
Examples of SQL analytic statements:
* | SELECT status, count(*) AS PV GROUP BY status
SQL functions and SQL clauses
SQL functions are commonly used for calculations, transformations, and formatting of data, such as summing, averaging, string manipulation, and date processing. SQL functions are typically embedded within SQL clauses.
SQL clauses are used to construct complete SQL queries or data manipulation statements, determining data sources, conditions, grouping, sorting, and more.
Example 1. Query logs from the previous day
Use the current_date function to return the current date. Then, use the date_add function to subtract a specified time interval from the current date. Display the results using a pie chart for a more intuitive view of the data. (Try Demo)
Query and analysis statements
* | SELECT * FROM log WHERE __time__ < to_unixtime(current_date) AND __time__ > to_unixtime(date_add('day', -1, current_date))
Result display
Example 2. View the distribution of log source IPs
Use the ip_to_province function to obtain the province address corresponding to the IP. Use group by
to aggregate addresses and the count function to calculate the occurrence of each address. Display the results using a pie chart. (Try Demo)
Query and analysis statements:
* |
select
count(1) as c,
ip_to_province(remote_addr) as address
group by
address
limit
100
Example 3. Analyze Nginx inbound and outbound traffic
Use the date_trunc function to snap __time__
to the hour (__time__
is a system field representing the log collection time, defaulting to a timestamp in seconds). Use the date_format function to format the snapped result. Use group by
to aggregate the snapped time and the sum function to calculate the total traffic per hour. Display the results using a line chart, setting the x-axis to time
and the left y-axis to net_out
and net_in
. (Try Demo)
Query and analysis statements
* | select sum(body_bytes_sent) as net_out, sum(request_length) as net_in, date_format(date_trunc('hour', __time__), '%m-%d %H:%i') as time group by date_format(date_trunc('hour', __time__), '%m-%d %H:%i') order by time limit 10000
Result display
Example 4. View the top 10 Nginx access paths
Use the split_part function to split request_uri
by ?
into an array
and retrieve the first string to obtain the request path. Aggregate by this path using group by
and calculate the number of accesses for each path using the count function. Sort the counts using order by
, with desc indicating descending order. Display the results using a column chart. (Try Demo)
Query and analysis statements
* | select count(1) as pv, split_part(request_uri, '?', 1) as path group by path order by pv desc limit 10
Result display
Example 5. View the trend of request method classification PV
Use the date_trunc function to snap the time to the minute, then group and aggregate PV with request_method
. Continue sorting by time and display using a flow chart, with the x-axis as time, they-axis as PV, and the aggregated column as request_method
. (Try Demo)
Query and analysis statements
* | select date_format(date_trunc('minute', __time__), '%m-%d %H:%i') as t, request_method, count(*) as pv group by t, request_method order by t asc limit 10000
Result display
Example 6. Compare today's PV with yesterday's
First, calculate the total PV using the count function, then use the compare function to obtain the year-on-year comparison of today's PV with yesterday's. (Try Demo)
Query and analysis statements
* | select diff [1] as today, round((diff [3] -1.0) * 100, 2) as growth FROM ( SELECT compare(pv, 86400) as diff FROM ( SELECT COUNT(1) as pv FROM log ) )
Result display
Example 7. Predict Nginx access log PV
time - time % 60
(subtract the remainder of time timestamp divided by 60 from the time timestamp) to obtain the time aligned by minute stamp
. Use group by
to aggregate stamp
and the count function to calculate the number per minute. Use the obtained result as a subquery and the ts_predicate_simple function to predict the situation for the next 6 points. Display the query results using a time series chart. (Try Demo)
Query and analysis statements
* | select ts_predicate_simple(stamp, value, 6) from ( select __time__ - __time__ % 60 as stamp, COUNT(1) as value from log GROUP BY stamp order by stamp ) LIMIT 1000
Result display
Example 8. Analyze HTTP_USER_AGENT and display sorted by PV
Group and aggregate by http_user_agent
, then query the requests of each proxy and the total client traffic returned. Since the unit is byte
, use the round function to convert to MB
and retain two decimal places. Use case when
to layer status
into 2xx
, 3xx
, 4xx
, 5xx
, and calculate the proportion of each layer. Display using a table for a more intuitive view of the data and its meaning. (Try Demo)
Query and analysis statements:
* |
select
http_user_agent as "User Agent",
count(*) as pv,
round(sum(request_length) / 1024.0 / 1024, 2) as "Request Message Traffic (MB)",
round(sum(body_bytes_sent) / 1024.0 / 1024, 2) as "Returned Client Traffic (MB)",
round(
sum(
case
when status >= 200
and status < 300 then 1
else 0
end
) * 100.0 / count(1),
6
) as "2xx Proportion (%)",
round(
sum(
case
when status >= 300
and status < 400 then 1
else 0
end
) * 100.0 / count(1),
6
) as "3xx Proportion (%)",
round(
sum(
case
when status >= 400
and status < 500 then 1
else 0
end
) * 100.0 / count(1),
6
) as "4xx Proportion (%)",
round(
sum(
case
when status >= 500
and status < 600 then 1
else 0
end
) * 100.0 / count(1),
6
) as "5xx Proportion (%)"
group by
"User Agent"
order by
pv desc
limit
100
Example 9. Analyze the proportion of error requests in Nginx logs
First, obtain the number of error requests with a status exceeding 400 and the total number of requests within SQL, then calculate the ratio externally. Display using a statistical chart. (Try Demo)
Query and analysis statements
* | select round((errorCount * 100.0 / totalCount), 2) as errorRatio from ( select sum( case when status >= 400 then 1 else 0 end ) as errorCount, count(1) as totalCount from log )
Result display