This topic describes how to optimize queries to improve query efficiency.
Increase the number of shards
Shorten the time range and reduce the data volume of a query
- A long time range slows down a query.
If you want to accelerate calculation, you can shorten the time range of a query.
- A large data volume slows down a query.
We recommend that you reduce the data volume of a query.
Query data repeatedly
If the result of a query is inaccurate, you can query the data repeatedly. Each time you perform a query, the underlying acceleration mechanism analyzes the data based on the existing query results. Therefore, repeated queries can return more accurate results.
Optimize analytic statements
- The GROUP BY clause is used to group analysis results based on one or more columns of the string type.
- The GROUP BY clause is used to group analysis results based on more than five columns.
- Operations that generate strings are included in the analytic statement.
- Do not include operations that generate strings.
For example, if you use the date_format function to generate a timestamp in a specified format, the query is inefficient. We recommend that you use the date_trunc function or the time_series function to generate a timestamp.
* | select date_format(from_unixtime(__time__) , '%H_%i') as t, count(1) group by t
- Do not group analysis results based on one or more columns of the string type.
For example, if you use the GROUP BY clause to group analysis results based on one or more columns of the string type, the workload for hash calculation is heavy. The workload for hash calculation accounts for more than 50% of the overall workload for calculation. Examples:
- Efficient query statement
* | select count(1) as pv , from_unixtime(__time__-__time__%3600) as time group by __time__-__time__%3600
- Inefficient query statement
* | select count(1) as pv , date_trunc('hour',__time__) as time group by time
The two query statements are used to calculate the number of logs per hour. In the second statement, the timestamps are converted to strings. Then, the analysis results are grouped based on the strings. For example, 2021-12-12 00:00:00 is a timestamp in the string format. In the first statement, the timestamps on the hour are obtained, the analysis results are grouped based on the timestamps, and then the timestamps are converted to strings.
- Efficient query statement
- If you want to group analysis results based on multiple columns, we recommend that
you place a field that has a larger number of values before a field that has a smaller
number of values.
For example, if the number of values for a specified field is 13 and the number of values for the uid field is 100 million, we recommend that you place the uid field before the specified field in the GROUP BY clause. Examples:
- Efficient query statement
* | select province,uid,count(1) group by uid,province
- Inefficient query statement
* | select province,uid,count(1) group by province,uid
- Efficient query statement
- Use approximate functions.
The performance of approximate functions is better than the performance of functions with specified precision. Approximate functions sacrifice precision for speed. Examples:
- Efficient query statement
* |select approx_distinct(ip)
- Inefficient query statement
* | select count(distinct(ip))
- Efficient query statement
- Specify only the columns that you want to query in an analytic statement.
In an analytic statement, we recommend that you query only the columns that are required in calculation. If you want to query all columns, use the search syntax. Examples:
- Efficient query statement
* |select a,b c
- Inefficient query statement
* |select *
- Efficient query statement
- Place the columns that are not used for grouping in an aggregate function.
For example, the values of the userid and username columns must match each other. You can use the GROUP BY clause to group only the values of the userid column. Examples:
- Efficient query statement
* | select userid, arbitrary(username), count(1) group by userid
- Inefficient query statement
* | select userid, username, count(1) group by userid,username
- Efficient query statement
- Do not use the IN clause.
We recommend that you do not use the IN clause in an analytic statement. You can use OR in a search statement. Examples:
- Efficient query statement
key: a or key: b or key: c | select count(1)
- Inefficient query statement
* | select count(1) where key in ('a','b')
- Efficient query statement