Query acceleration - Simple Log Service - Alibaba Cloud Documentation Center

Simple Log Service supports global cache and concurrent computing to accelerate queries on metrics. This topic describes the principles of global cache and concurrent computing and the related parameters.

Principles

The following content describes the principles of global cache and concurrent computing.

Global cache

By default, the Prometheus Query compute engine does not cache query results. Each time you query data, all data is computed. Your query is inefficient in scenarios in which the data volume is large and the query time range is long. The global cache feature supports reuse of specific results for the same query. If the PromQL statement and the step parameter that you specify for a query are the same as the settings that you specify for a previous query, the two queries are considered the same, and the results of the previous query can be reused. The feature reuses the query results within the same query time range and separately queries data that is beyond the time range.

Important

After you enable the global cache feature, the feature matches the query time range specified by the start and end parameters against the cache based on the integer multiples of the value specified by the step parameter. This greatly increases the cache hit ratio and improves the query efficiency. After a query is complete, the query results are updated in the cache.
To reduce the impact on data integrity, incomplete query results are not cached. Query results of data that is not within the time range of the cached data are not cached.

Concurrent computing

By default, the data that is obtained by executing a standard Prometheus query statement is computed on a server in a single coroutine. In scenarios in which multiple time series, long query time ranges, and complex computational logic are involved, the query speed is slow. Simple Log Service incorporates adaptions to the Prometheus engine to support concurrent and distributed Prometheus queries. The concurrent computing feature splits a PromQL query by time interval or time series and then schedules the query to multiple servers for execution. The performance of a query on multiple servers is improved by 2 to 10 times compared with a query on a single server.

Splitting by time interval
In this example, the execution interval of a query is 12 hours. The query is split into 6 subqueries for execution at 2-hour intervals. The 6 subqueries are concurrently executed for computing. The results of the subqueries are merged.
```
query:    sum(metric)
interval: 12h
step:     2m
```
Splitting by time series
In this example, a metric contains 500,000 time series, and the number of global concurrent tasks is set to 10. A total of 10 tasks are concurrently executed. Each task computes the data related to 50,000 time series. After the tasks are executed, the results of the tasks are merged.

Important

You do not need to worry about whether concurrent computing is applicable on PromQL queries. The time series compute engine of Simple Log Service automatically identifies PromQL queries on which concurrent computing is applicable and schedules the queries to multiple nodes for execution.
Concurrent computing is suitable for scenarios in which the query time range is long or the number of time series for a metric is large. However, if the data volume involved in a query is small, concurrent computing may affect the query or compromise query performance. If the query time range is long or the number of time series for a metric is large in your business scenarios, we recommend that you configure parameters for concurrent computing in HTTP FormValue mode.
In aggregation computing scenarios, the use of Aggregation without() improves computing performance only to a small extent. For example, you can use agg without() metric{} to perform a query. In this query, the by operation is performed on all labels. The number of samples in the result set is barely reduced because a large amount of intermediate results need to be summarized and aggregated. The computing performance is marginally improved. If a large number of time series are involved, even performance degradation may occur. If you perform the by operation on multiple labels, the computing performance is also marginally improved.

Configuration description

The following table describes the parameters that you can configure for query acceleration. You can configure the parameters in MetricsConfig or HTTP FormValue mode.

Category	Parameter	Description	MetricsConfig	FormValue	Description
parallel_config (parameters of concurrent computing)	enable	Specifies whether to enable concurrent computing. By default, concurrent computing is disabled.	Supported	Supported	Concurrent computing splits a query into multiple subqueries and schedules the subqueries to multiple child compute nodes for execution. The results of the subqueries are aggregated on the primary compute node.
	mode	The mode that is used to configure concurrent computing. Valid values: auto: The system automatically selects the degree of concurrency based on the results of the most recent queries. static: You need to manually configure time splitting and the degree of concurrency.	Supported	Unsupported	You can configure concurrent computing in auto or static mode. In most cases, you can set mode to auto. To configure concurrent computing in static mode, we recommend that you consult technical support of Simple Log Service.
	timePieceInterval	The time interval based on which a query is split. Unit: seconds. Valid values: [3600, 86400 × 30]. Default value: 21600, which is equivalent to 6 hours.	Supported	Supported	The time interval based on which a query is split. Unit: seconds. In the Simple Log Service console, you must specify a value that is accurate to the hour.
	timePieceCount	The number of subqueries that you can obtain after splitting based on the specified time interval. Valid values: 1 to 16. Default value: 8.	Supported	Supported	The number of subqueries that you can obtain after splitting based on the specified time interval.
	totalParallelCount	The number of global concurrent tasks. Valid values: 2 to 64. Default value: 8.	Supported	Supported	You can split a query by time series. This parameter specifies the total number of tasks that can be generated for a metric. For example, a metric contains 5 million time series, and the number of global concurrent tasks is set to 10. A total of 10 tasks are concurrently executed. Each task computes the data related to 500,000 time series.
	parallelCountPerHost	The number of concurrent tasks on a server. Valid values: 1 to 8. Default value: 2.	Supported	Supported	You can split a query by time series. The tasks that are obtained after splitting are scheduled to different servers for execution. This parameter specifies the number of tasks that are obtained after splitting on a server.
query_cache_config (parameters of global cache)	enable	Specifies whether to enable global cache. By default, global cache is disabled.	Supported	Supported	After you enable global cache, the feature can reuse some results of the same query.

Configure query acceleration in MetricsConfig mode

MetricsConfig is an independent parameter provided by Simple Log Service for each Metricstore. You can configure the MetricsConfig parameter in the Simple Log Service console or by using SDKs. After you configure or update the MetricsConfig parameter, you must wait for 3 minutes for the configuration to take effect.

Configure the query acceleration settings of a Metricstore in the Simple Log Service console

On the Metricstore Attribute page, perform the following operations. For more information about how to access the Metricstore Attribute page, see Modify the configurations of a Metricstore.

You can configure concurrent computing in the following modes:

In auto mode, Simple Log Service estimates the degree of concurrency for the current concurrent computing task based on the amount of data pulled by the same query in the results of the most recent query tasks.
In static mode, you can manually configure the degree of concurrency. To configure concurrent computing in static mode, we recommend that you consult technical support of Simple Log Service.

Configure the MetricsConfig parameter of a Metricstore by using SDKs

Simple Log Service SDK for Java provides an operation that you can use to modify the MetricsConfig parameter. You can configure the fields for the MetricsConfig parameter in the JSON format. Example:

{
  "parallel_config": {
    "enable": true,
    "mode": "static",
    "parallel_count_per_host": 2,
    "time_piece_count": 8,
    "time_piece_interval": 21600,
    "total_parallel_count": 8
  },
  "query_cache_config": {
    "enable": true
  }
}

The following table describes the mappings between the parameters used to configure query acceleration, the fields of the MetricsConfig parameter in the JSON format, and the key fields in the FormValue parameter.

Category	Parameter	MetricsConfig	FormValue
parallel_config	enable	enable	x-sls-parallel-enable
	mode	mode	None
	timePieceInterval	time_piece_interval	x-sls-parallel-time-piece-interval
	timePieceCount	time_piece_count	x-sls-parallel-time-piece-count
	totalParallelCount	total_parallel_count	x-sls-parallel-count
	parallelCountPerHost	parallel_count_per_host	x-sls-parallel-count-per-host
query_cache_config	enable	enable	x-sls-global-cache-enable

Configure query acceleration in HTTP FormValue mode

You can configure query acceleration in MetricsConfig mode. You can also configure query acceleration in HTTP FormValue mode. If you use the latter configuration mode, the settings take effect only for the current request. For more information about Metricstore-related HTTP API, see API operations for metric queries.

The following examples describe how to configure global cache and concurrent computing in HTTP FormValue mode:

Global cache

Add x-sls-global-cache-enable=true to enable global cache.

https://{project}.{sls-endpoint}/prometheus/{project}/{metricstore}/api/v1/query_range?query=sum(up)&start=1690876800&end=1690877800&step=10&x-sls-global-cache-enable=true

Concurrent computing

Add x-sls-parallel-enable=true&x-sls-parallel-count=16 to enable concurrent computing and set the degree of concurrency to 16.

https://{project}.{sls-endpoint}/prometheus/{project}/{metricstore}/api/v1/query_range?query=sum(up)&start=1690876800&end=1690877800&step=10&x-sls-parallel-enable=true&x-sls-parallel-count=16