AnalyticDB for MySQL provides the paging cache feature. You can use the paging cache feature to improve the efficiency of paged queries that involve large amounts of data in the LIMIT, OFFSET, and ORDER BY clauses. This helps resolve the performance issues and resource bottlenecks of deep paged queries. This topic describes how to use the paging cache feature to optimize the performance of paged queries.
Prerequisites
An AnalyticDB for MySQL cluster of V3.2.3 or later is created.
To view and update the minor version of an AnalyticDB for MySQL cluster, log on to the AnalyticDB for MySQL console and go to the Configuration Information section of the Cluster Information page.
Overview
To resolve the performance issues caused by deep paged queries, AnalyticDB for MySQL provides the paging cache feature. The first time you initiate a paged query, the system queries data from the database and stores the query results in a temporary cache table. When you initiate subsequent paged queries that share the same SQL pattern, the system reads data from the temporary cache table to prevent repeated sorting operations. This effectively resolves the performance issue caused by deep paged queries and prevents OOM errors caused by the ORDER BY clause. AnalyticDB for MySQL automatically clears the cached data that is no longer required based on the eviction policy to ensure that resources are properly utilized.
The paging cache feature is suitable for the following scenarios:
Usage
Configure a cache database
Before you use the paging cache feature to cache query results, we recommend that you specify a database to store temporary cache tables for paged queries. If you do not specify a database, temporary cache tables are stored in the connected internal database. When you enable the paging cache feature, a temporary cache table is automatically created.
You cannot specify an external database as the cache database.
For example, you can specify the paging_cache database as the cache database. You can also specify other databases.
SET ADB_CONFIG PAGING_CACHE_SCHEMA=paging_cache;Enable the paging cache feature for paged queries
If multiple paged queries share the same SQL pattern, you can add a hint to the SQL statements to improve performance. The first time you initiate a paged query that contains a hint, the system creates a temporary cache table to store the paged query results. When you initiate subsequent paged queries that share the same SQL pattern, you can add the same hint to the SQL statements. This way, the system reads data from the temporary cache table without the need for repeated database access.
Limits
After you eliminate the LIMIT and OFFSET clauses, the number of data entries on which you want to perform paged queries must be less than 100 million.
If the number of data entries to be queried exceeds 100 million, Submit a ticket to adjust the upper limit of the number of data entries.
Methods to enable the paging cache feature
You can use one of the following hints to enable the paging cache feature:
paging_id=<paging_id>The
paging_idparameter specifies the cache table that is created for a set of paged queries that share the same SQL pattern but involve different values in theLIMITandOFFSETclauses. The client must generate a unique ID to uniquely identify a cache table for a set of paged queries.If the specified
paging_idparameter does not exist, a cache table is created.If the queried
paging_idparameter exists and the SQL pattern involved in the query matches the SQL pattern that corresponds to thepaging_idparameter, the query hits the cache.If the queried
paging_idparameter exists but the SQL pattern involved in the query does not match the SQL pattern that corresponds to thepaging_idparameter, an error occurs. You can check whether thepaging_idparameter is already used. For more information, see the "Query information about cache tables" section of this topic.
NoteThe
paging_idparameter must meet the following naming conventions: The parameter name must be 1 to 127 characters in length and can contain letters, digits, and underscores (_). The parameter name must start with a letter or an underscore (_). The parameter name cannot contain single quotation marks ('), double quotation marks ("), exclamation points (!), or spaces. The parameter name cannot be an SQL reserved word.For example, you can set the paging ID to
paging123for the result of a set of paged queries that use the paging cache feature./*paging_id=paging123*/ SELECT * FROM t_order ORDER BY id LIMIT 0, 100;paging_cache_enabled=trueThis method does not require frequent modifications of hints. The server uses the SQL pattern that eliminates the
LIMITandOFFSETclauses to generate a paging ID to identify a set of paged queries.The dependency on the SQL pattern matching limits the flexibility of this method. If no cache table exists for the paged queries that share the same SQL pattern excluding the LIMIT and OFFSET clauses, a cache table is created. If a cache table exists for the paged queries that share the same SQL pattern excluding the LIMIT and OFFSET clauses, the cache table is used to query data.
Sample statement:
/*paging_cache_enabled=true*/ SELECT * FROM t_order ORDER BY id LIMIT 0, 100;
If a cache table fails to be created, you must clear the cache data of paged queries and re-initiate a paged query to create a cache table.
Query information about cache tables
You can query information about all cache tables of paged queries within the current cluster, such as the paging ID, cache size, and cache status.
SELECT * FROM INFORMATION_SCHEMA.KEPLER_PAGING_CACHE_STATUS_MERGED;Specify the maximum number of cache tables
You can specify the maximum number of cache tables within the current cluster. Default value: 100.
SET ADB_CONFIG PAGING_CACHE_MAX_TABLE_COUNT=100;If you attempt to create a cache table when the total number of cache tables exceeds the limit, an error occurs. Sample error message:
Paging cache count exceeds the limit. Please clean up unused caches or increase the related parameter using SET ADB_CONFIG PAGING_CACHE_MAX_TABLE_COUNT=xxx.Clear the cache data that is no longer required or increase the maximum number of cache tables based on the error message.
Specify a validity period for a cache table
You can specify a validity period for a cache table. Unit: seconds. After the validity period ends, the cache table becomes invalid. When you initiate subsequent paged queries that share the same SQL pattern, the system performs database access and updates the cache table. In most cases, you can specify a validity period for a cache table in report concurrency control scenarios.
For example, you can use the /*paging_cache_enabled=true, paging_cache_validity_interval=300*/ hint to allow a cache table to be valid within 300 seconds after creation. Sample statement:
/*paging_cache_enabled=true, paging_cache_validity_interval=300*/ SELECT * FROM t_order ORDER BY id LIMIT 0, 100;Clear the cache data of paged queries
When you use the paging cache feature to cache paged query results, the cache data is cached in the hot storage space of AnalyticDB for MySQL. If specific cache data is no longer required, you can clear the cache data to increase the available storage.
Manual clearing
Cache data of paged queries specified by an SQL pattern
Use the
/*paging_cache_enabled=true, invalidate_paging_cache=true*/hint to clear the cache data of paged queries specified by an SQL pattern.Sample statement:
/*paging_cache_enabled=true,invalidate_paging_cache=true*/ SELECT * FROM t_order ORDER BY id LIMIT 0, 100;Cache data of paged queries specified by the
paging_idparameterClear the cache data of paged queries specified by the
paging_idparameter.Sample statement:
CLEAN_PAGING_CACHE paging123;NoteFor information about how to obtain the value of the
paging_idparameter, see the "Query information about cache tables" section of this topic.
Automatic clearing
You can specify a cache expiration time to clear the cache data of paged queries that is not accessed within the specified time range. Default value: 600. Unit: seconds. By default, the cache data that is not accessed within 10 minutes is automatically cleared.
SET ADB_CONFIG PAGING_CACHE_EXPIRATION_TIME=600;Disable the paging cache feature for a cluster
After you disable the paging cache feature for a cluster, all hints related to this feature become invalid, and paged queries are processed by using the original query methods.
SET ADB_CONFIG PAGING_CACHE_ENABLE=false;After you disable the paging cache feature, you can execute the SHOW ADB_CONFIG KEY=PAGING_CACHE_ENABLE; statement to check whether the configuration takes effect.
Common errors and troubleshooting
Paging cache prepare failed, and cache is not available
Error message:
Paging cache prepare failed, and cache is not available. Please use /*paging_cache_enabled=true,invalidate_paging_cache=true*/ to clean the unavailable cache or set a specific pagingId with /*paging_id=xxx*/ to gen a new cache. Note that the old and new cache data may be inconsistent.Cause: When you use the paging cache feature to query data, exceptions may occur, such as node restarting and scaling. If a paged query hits a cache table that fails to be created, the server does not perform database access or automatically re-create a cache table. The server throws an error.
Solution: To ensure data consistency, perform the following operations: In data export scenarios, we recommend that you clear the exported data and the unavailable cache data and then re-initiate a paged query to re-create a cache table. In other scenarios, we recommend that you clear the unavailable cache data or specify a new value for the paging_id parameter and then re-initiate a paged query to re-create a cache table.
Performance comparison
Use a TPC-H data set of 100 GB to evaluate the optimization effect of the paging cache feature for paged queries in data export scenarios.
This test involves 1 million entries of exported data. Each page contains 100,000 entries. Execute the following statements to perform paged queried on the first page:
-- General paged query without using the paging cache feature
SELECT * FROM lineitem ORDER BY l_orderkey,l_linenumber LIMIT 0,100000;
-- Paged query by using the paging cache feature (ORDER BY eliminated in the data export scenario)
/*paging_cache_enabled=true*/ SELECT * FROM lineitem LIMIT 0,100000;Test result:
The query requests are executed in single concurrency mode. During the data export process, the average response time of general paged queries is 54,391 ms. After the paging cache feature is enabled, the average response time is 525 ms. The performance is improved by approximately 103 times. The CPU utilization and the memory usage are significantly decreased.
The paging cache feature significantly reduces the response time of paged queries during the data export process and effectively decreases the consumption of CPU and memory resources.
