why does an alert indicate that the memory usage exceeds the threshold or an OOM error occurs, but monitoring data does not indicate high memory usage? - ApsaraDB for Redis

If the memory alert for an ApsaraDB for Redis instance indicates high memory usage or if your application encounters out of memory (OOM) exceptions, but the performance monitoring data indicates low memory usage, you can refer to this topic to troubleshoot the issue.

Problem description

Symptom 1:

You receive a memory alert for Redis indicating that the memory usage exceeds the threshold (for example, the average value is greater than or equal to 90% for three consecutive times), but the monitoring page in the console shows that the memory usage is significantly lower than the threshold.

Symptom 2:

Your application encounters the command not allowed when used memory > 'maxmemory' exception, but the monitoring page in the console shows that the memory is not fully occupied or only one data shard has high memory usage.

Causes

Why is the monitored memory usage different from the reported memory usage?

If the memory usage of the Redis instance displayed on the monitoring page is different from the memory usage in the alert information, your instance may be a cluster instance. You are checking monitoring information at the instance level instead of the data node level.

Check whether nodeId = <Instance ID>-db-<Number> is included in the instance details in the alert information that you receive. If the preceding condition is true, only the memory usage of the data node identified by <Instance ID>-db-<Number> exceeds the threshold.

Perform the following steps to check whether the memory usage of the data node is the same as the memory usage in the alert information:

Log on to the ApsaraDB for Redis console and go to the Instances page. In the top navigation bar, select the region in which the instance that you want to manage resides. Then, find the instance and click the instance ID.
In the left-side navigation pane, click Performance Monitor.
Click the Data Node tab and select the data node that corresponds to <Instance ID>-db-<Number>. Check whether the memory usage of the data node is the same as the memory usage in the alert information.

Why is the memory usage of a data node significantly higher than the memory usage of other data nodes?

If the memory usage of one or more data nodes in a cluster instance of Redis is significantly higher compared with other data nodes, data skew may occur. You can use the instance diagnostics feature to check whether data skew occurs on the current instance.

Why does memory skew occur?

In most cases, memory skew occurs due to the following reasons:

Large keys exist.
The Redis cluster instance uses the cyclic redundancy check (CRC) algorithm to calculate the slot to which a key belongs and writes data to the data node to which the slot belongs.
If a particular key stores a significant number of fields or fields that are large in size, the key may become excessively large and cause memory skew even if keys are evenly distributed across different data nodes.
Hash tags are used.
When you use hash tags such as user:{1000}:name, Redis performs CRC calculation on the string that is enclosed in the curly braces and maps keys with the same hash tag to the same slot. This way, the keys reside on the same data node. If the same hash tags are configured for a large number of keys, data may be concentrated on a single data node and cause memory skew.

Solutions

Check whether large keys exist and split the large keys

Identify large keys

You can use the offline key analysis feature to identify large keys. For more information, see Use the offline key analysis feature.

For information about how to identify large keys, see Identify and handle large keys and hotkeys.

Split large keys

For example, you can split a HASH key that contains tens of thousands of members into multiple HASH keys that have the appropriate number of members. In the Redis cluster architecture, splitting large keys can significantly improve memory balance among data shards.

Check whether hash tags are used

If hash tags are used, consider splitting a hash tag into multiple hash tags based on your business requirements. This way, data is evenly distributed across different data nodes.

Upgrade instance specifications

Upgrading the instance specifications by increasing the memory allocated to each shard can serve as a temporary solution to prevent memory skew. For more information, see Change the configurations of an instance.

Important

ApsaraDB for Redis initiates a precheck for data skew during instance specification change. If the instance type that you select cannot handle the data skew issue, ApsaraDB for Redis reports an error. Select an instance type that has higher specifications and try again.
After you upgrade your instance specifications, memory usage skew may be alleviated. However, usage skew may also occur on bandwidth and CPU resources.