CloudDBA provides the latency insight feature to collect millisecond-level latency statistics of all commands that are run and custom events that are executed on ApsaraDB for Redis databases. Latency insight enables you to troubleshoot anomalies and performance issues of ApsaraDB for Redis databases.

Features

Redis 2.8.13 introduced a new feature called latency monitoring to help users identify and troubleshoot possible latency issues. The latency monitoring feature allows you to collect data generated only within the last 160 seconds and access only events that have the highest latency within each second.

As such, ApsaraDB for Redis provides the advanced latency insight feature. With latency insight, up to 27 events and execution durations of all commands can be recorded, and all latency statistics within the last three days can be saved. For more information about the events, see the "Common events" section of this topic. Latency insight provides the following benefits:
  • Persistent: supports data persistence and latency spike tracing.
  • High-precision: allows full events to be monitored within milliseconds.
  • High-performance: supports asynchronous implementations with minimal impact on performance.
  • Real-time: supports real-time data queries and aggregation operations.
  • Multidimensional: provides comprehensive latency data that allows you to analyze an instance based on events, time, and latency.

Prerequisites

The ApsaraDB for Redis instance uses one of the following minor versions. For information about how to update a minor version, see Update the minor version.
  • Minor version 1.6.9 or later if the instance is a performance-enhanced instance of the ApsaraDB for Redis Enhanced Edition (Tair). For more information about performance-enhanced instances, see Performance-enhanced instances. If you want to collect statistics about Tair module commands, update the minor version to 1.7.28 or later.
  • Minor version 5.1.4 or later if the instance is a Community Edition instance that uses the 5.0 major version.
  • Minor version 0.1.15 or later if the instance is a Community Edition instance that uses the 6.0 major version.

Procedure

  1. Log on to the ApsaraDB for Redis console and go to the Instances page. In the top navigation bar, select the region in which the instance is deployed. Then, find the instance and click the instance ID.
  2. In the left-side navigation pane, choose CloudDBA > Latency Insight.
  3. On the page that appears, specify the time range to query and then click Search. The default time range is the last 5 minutes.
    Note Only data of the last three days can be queried, and the time range to query must span within one hour.
  4. Click the name of an event or a number corresponding to an event in the table. Then, a chart appears and shows the trend of the event-matched metric.
    You can also specify the metrics that you want to view on the chart by selecting the metric names from the drop-down list above the chart.
    Note Only commands or events that take longer than the specified amount of time to run or execute are recorded and displayed.
    pqus
    Metric Description
    Event The name of the event. Example values: ExpireCycle, EventLoop, Ping, Scan, Commands, and Info. For more information, see the "Common events" section of this topic.
    Total The total number of occurrences of the event.
    Average Latency (μs) The average latency of the event. Unit: μs.
    Maximum Latency (μs) The maximum latency of the event. Unit: μs.
    Aggregation of Instances (Latency < 1ms) The number of occurrences of the event whose latency is lower than 1 ms. You can click the zhankai icon to view finer-grained statistics, including the number of occurrences of the event whose latency is lower than 1 μs, 2 μs, 4 μs, 8 μs, 16 μs, 32 μs, 64 μs, 128 μs, 256 μs, and 512 μs.
    Note Counting method: The number of occurrences of the event whose latency is from 0 μs to 1 μs is counted and presented under the <1μs category, and the number of occurrences of the event whose latency is from 1 μs to 2 μs is counted and presented under the <2μs category. Other categories follow the same pattern.

    <2ms

    <4ms

    ...

    >33s

    The number of occurrences of the event whose latency is greater than or equal to 1 ms.
    Note Counting method: The number of occurrences of the event whose latency is from 1 ms to 2 ms is counted and presented under the <2ms category, and the number of occurrences of the event whose latency is higher than 33s is counted and presented under the >33s category. Other categories follow the same pattern.

Common events

Category Name Threshold Description
Memory eviction EvictionDel 30ms The amount of time it takes to evict a key.
EvictionLazyFree 30ms The amount of time it takes to evict a key by using the Lazyfree feature.
EvictionCycle 30ms The amount of time it takes to perform an eviction.
Memory defragmentation ActiveDefragCycle 100ms The amount of time it takes to defragment memory.
Rehash Rehash 100ms The amount of time it takes to perform a rehash.
Data structure upgrade ZipListConvertHash 30ms The amount of time it takes to convert a ziplist to a dictionary by means of hash encoding.
IntsetConvertSet 30ms The amount of time it takes to convert an intset to a set by means of set encoding.
ZipListConvertZset 30ms The amount of time it takes to convert a ziplist to a skiplist by means of ziplist encoding.
Append-only file (AOF) AofWriteAlone 30ms The uptime during an AOF write
AofWrite 30ms The amount of time it takes to perform each AOF write. An AOF write can be of the AofWriteAlone, AofWriteActiveChild, or AofWritePendingFsync type.
AofFsyncAlways 30ms The amount of time it takes to perform a fsync operation on an AOF when the appendfsync option is set to 1.
AofFstat 30ms The amount of time it takes to obtain status information about an AOF.
AofRename 30ms The amount of time it takes to rename an AOF.
AofReWriteDiffWrite 30ms The amount of time consumed by an incremental AOF write performed by a parent process after its child process rewrites an AOF.
AofWriteActiveChild 30ms The amount of time it takes to perform an AOF write when other child processes are in progress.
AofWritePendingFsync 30ms The amount of time it takes to perform an AOF write when a fsync operation is in progress.
Redis database (RDB) file RdbUnlinkTempFile 50ms The amount of time it takes to delete a temporary RDB file after a bgsave child process is terminated.
Others Commands 30ms The amount of time it takes to run a command that is not tagged with fast.
FastCommand 30ms The amount of time it takes to run a command that is tagged with fast, such as GET or EXISTS.
EventLoop 50ms The amount of time it takes to have a main event loop running.
Fork 100ms The amount of time recorded in a parent process after the parent process is forked.
Transaction 50ms The actual amount of time consumed by a transaction.
PipeLine 50ms The amount of time consumed by a multi-threaded pipeline.
ExpireCycle 30ms The amount of time consumed by a regular deletion of an expired key.
SlotRdbsUnlinkTempFile 30ms The amount of time it takes to delete a temporary RDB file from a slot after a bgsave child process is terminated.
LoadSlotRdb 100ms The amount of time it takes to load an RDB file from a slot.
SlotreplTargetcron 50ms The amount of time it takes to load an RDB file from a slot to a temporary database and then migrate the file to a destination database by using a child process.