All Products
Search
Document Center

Tair:Observability

Last Updated:Mar 12, 2024

Tair provides the observability that contains more dimensions, categories, and advanced features than open source Redis.

Background information

Observability is the ability to access monitoring data, analyze issues, and perform systematic diagnostics based on three pillars of data: metrics, traces, and logs.

  • Metrics: A metric is a numeric value of a dimension that is measured over a period of time to display specific states and trends of a system.

  • Logs: A log is a record of discrete events that happened during the runtime of an application.

  • Traces: A trace records the end-to-end lifecycle of a request.

Tair integrates metrics, traces, and logs to provide data analytics. The following table compares the observability of Tair, ApsaraDB for Redis, and open source Redis. The following list describes the symbols that are used in the table.

  • The ✔️ symbol indicates that the feature is supported.

  • The ❌ symbol indicates that the feature is not supported.

  • The ➖ symbol indicates that no features are involved.

Observability

Open source Redis

ApsaraDB for Redis

Tair

Metric

Query monitoring data

✔️

✔️ (fine-grained)

✔️ (fine-grained)

Log

Query run logs of an instance

✔️

✔️

✔️

Query slow logs

✔️

✔️

✔️

Query audit logs

✔️

✔️

Latency insights

✔️

✔️

Trace

Analytics

Use the real-time key statistics feature

✔️

✔️

Use the real-time key statistics feature

✔️

✔️

Use the offline key analysis feature

✔️

✔️

Create a diagnostic report

✔️

✔️

Note

Typically, tracing analysis requires a middleware or specific code modifications on your client.

Metrics

Open source Redis provides a variety of metrics, including memory-related metrics (such as memory distribution, memory usage, and memory fragmentation ratio), statistics-related metrics (such as the number of connections and commands, network traffic, and synchronization status), CPU utilization, and keyspace information. In addition to the metrics supported by open source Redis, Tair provides more fine-grained metrics, including read queries per second (QPS) and write QPS. For more information about these metrics, see Query monitoring data.

The fine-grained metrics provided by Tair also have the following benefits in implementing observability:

Logs

Tair allows you to view active logs, slow logs, audit logs, and latency insights of an instance.

  • Run logs

    Run logs of a Tair instance record in rows the persistence, synchronous replication, and debugging operations that take place and error messages that are returned when the instance is running.

    You can go to the details page of an instance in the Tair console and choose Logs > Active Logs in the left-side navigation pane to view the run logs of the instance. For more information, see View active logs.

  • Slow logs

    Slow logs record requests that take longer to execute than the threshold specified in Tair. The execution duration of a request does not include the amount of time that the request spends in queue or in transmission. Slow log statistics include execution timestamps, execution durations, command parameters, and client information. You can view slow logs of an instance, identify commands in the instance that take longer than required to run, and optimize these commands to prevent congestion.

    You can go to the details page of an instance in the Tair console and choose Logs > Slow Logs in the left-side navigation pane to view the slow logs of the instance. For more information, see Query slow logs.

  • Audit logs

    Tair provides audit logs based on Log Service. For more information about Log Service, see What is Log Service? Audit logs include statistics such as log types, execution durations, database numbers, client IP addresses, account names, command details, and extension information. Audit logs allow you to search and analyze online operation logs (including logs about sensitive operations related to the FLUSHALL, FLUSHDB, and DEL commands), slow logs, and run logs, and export these logs.

    You can go to the details page of an instance in the Tair console and choose Logs > Audit Log in the left-side navigation pane to view the audit logs of the instance. For more information, see Enable the new audit log feature.

  • Latency insights

    Tair provides the advanced latency insights feature. This feature can record up to 27 events and execution durations of all Tair commands, and save all latency statistics within the last three days.

    You can go to the details page of an instance in the Tair console and choose CloudDBA > Latency Insights in the left-side navigation pane to view the latency insights of the instance. For more information, see Latency insights.

Analytics

Tair integrates metrics, traces, and logs to provide data analytics, which is a critical feature of Tair.

  • Hotkey and large key analysis

    If a key receives significantly more requests than other keys, the key is considered a hotkey. If a hotkey is not handled in a timely manner, it may result in skewed requests or even cache breakdowns. If a key contains a large number of members or occupies a large amount of memory, the key is considered a large key. If a large key is not handled in a timely manner, commands that involve the key take longer to run and an out-of-memory (OOM) error may occur for the key.

    You can use the Real-time Key Statistics feature to identify hotkeys and large keys. The Real-time Key Statistics feature displays hotkeys and large keys in real time and allows you to view hotkeys and large keys that were generated within the last four days. The Real-time Key Statistics feature is high precision and has minimal impact on performance. This feature allows you to view the amount of memory occupied by a key and the frequency at which a key is requested and troubleshoot hotkeys and large keys to optimize instances.

    You can go to the details page of an instance in the Tair console and choose CloudDBA > Real-time Key Statistics in the left-side navigation pane to view statistics about hotkeys and large keys of the instance. For more information, see Use the real-time key statistics feature.

  • Offline key analysis

    The Offline Key Analysis feature supports the processing of offline Redis Database (RDB) files of all data structures and from all instance architectures and Tair versions and does not affect online services provided by Tair. The Offline Key Analysis feature can process a combination of 10% large keys and 90% small keys four times faster than redis-rdb-tools, and a combination of medium keys and large keys 20 times faster than redis-rdb-tools. During the process, memory usage is kept within 1 GB to prevent OOM errors that may occur due to large key processing. The Offline Key Analysis feature also allows you to search for the longest subelement to troubleshoot issues.

    You can go to the details page of an instance in the Tair console and choose CloudDBA > Offline Key Analysis in the left-side navigation pane to view the offline key analysis of the instance. For more information, see Use the offline key analysis feature.

  • Instance diagnostics

    Tair integrates statistics such as performance metrics, slow logs, and key analysis to provide the diagnostic reports feature. This feature performs one-stop diagnostics to evaluate the health of instances based on multiple metrics (such as performance metrics, skewed request statistics, and slow logs) and puts forward suggestions. This feature improves the automatic O&M capabilities of Tair instances and reduces instance usage costs.

    You can go to the details page of an instance in the Tair console and choose CloudDBA > Diagnostic Reports in the left-side navigation pane to perform diagnostics on the instance. For more information, see Create a diagnostic report.