By Mengshi
The PolarDB-X INFORMATION_SCHEMA database contains an extremely useful view named TABLE_DETAIL.
Introduction to the TABLE_DETAIL view:
Some key fields in the TABLE_DETAIL view are listed as follows:
By applying various aggregate functions and using the GROUP BY clause on different fields, this view can provide a wealth of information. This article will demonstrate several common use cases by using specific examples.
By using GROUP BY on TABLE_NAME, you can aggregate data at the table level. In this example, you can view the total data volume of each table, and by sorting the results, you can identify which tables occupy the most space in the current instance.
Of course, you can also add filter conditions to the WHERE clause, such as by database name (TABLE_SCHEMA) or table name (TABLE_NAME), to view the data distribution at a finer granularity.
SELECT
SUM(DATA_LENGTH + INDEX_LENGTH + DATA_FREE) / (1024 * 1024 * 1024) AS `DATA_SIZE(GB)`,
TABLE_NAME
FROM
INFORMATION_SCHEMA.TABLE_DETAIL
GROUP BY
TABLE_NAME
ORDER BY
`DATA_SIZE(GB)` DESC;
SELECT
SUM(DATA_LENGTH + INDEX_LENGTH + DATA_FREE) / (1024 * 1024 * 1024) AS `DATA_SIZE(GB)`,
STORAGE_INST_ID
FROM
INFORMATION_SCHEMA.TABLE_DETAIL
GROUP BY
STORAGE_INST_ID
ORDER BY
`DATA_SIZE(GB)` DESC;
By changing the aggregation field to STORAGE_INST_ID, you can calculate the size of each data node. This makes it easy to identify data skew issues among data nodes.
When you find that one data node occupies more space than others, indicating data skew, you can analyze the problem by performing the following steps:
1. On the specified data node, view the space occupied by each table to identify the largest tables:
SELECT
SUM(DATA_LENGTH + INDEX_LENGTH + DATA_FREE) / (1024 * 1024 * 1024) AS `DATA_SIZE(GB)`,
COUNT(*) AS partitions,
TABLE_NAME
FROM
INFORMATION_SCHEMA.TABLE_DETAIL
WHERE
STORAGE_INST_ID = 'pxc-xdb-s-xxxxxxxxx'
GROUP BY
TABLE_NAME
ORDER BY
`DATA_SIZE(GB)` DESC;
2. If a top table is a singleton table, it is the likely cause of the data skew. To resolve this, you can perform operations such as moving the table or converting it into a partitioned table.
A table with a COUNT() of 1 is typically a singleton table, unless its partition count happens to equal the number of data nodes. A table with a COUNT() greater than 1 is typically a partitioned table.
3. If a top table is a partitioned table, use the PARTITION_NAME field in the GROUP BY clause to calculate the data volume of each partition. This allows you to determine whether the imbalance is caused by data skew among the partitions themselves:
SELECT
SUM((DATA_LENGTH + INDEX_LENGTH + DATA_FREE) / (1024 * 1024 * 1024)) AS `TOTAL_SIZE(GB)`,
PARTITION_NAME
FROM
INFORMATION_SCHEMA.TABLE_DETAIL
WHERE
TABLE_NAME = 'tbl'
GROUP BY
PARTITION_NAME
ORDER BY
`TOTAL_SIZE(GB)` DESC;
In some business scenarios, a table is subpartitioned by time to facilitate the cleanup of historical data. For tables with this structure, you can obtain more interesting information from the TABLE_DETAIL view.
The table's partition structure is as follows:
PARTITION BY KEY(`user_id`)
PARTITIONS 16
SUBPARTITION BY RANGE(TO_DAYS(`xxxx_time`))
(SUBPARTITION `p202501` VALUES LESS THAN (739648),
SUBPARTITION `p202502` VALUES LESS THAN (739676),
SUBPARTITION `p202503` VALUES LESS THAN (739707),
SUBPARTITION `p202504` VALUES LESS THAN (739737),
SUBPARTITION `p202505` VALUES LESS THAN (739768),
SUBPARTITION `p202506` VALUES LESS THAN (739798))
According to the calculation method described above, the table has 16 × 6 = 96 records in the TABLE_DETAIL view.
First, let's introduce several fields related to subpartitioning in TABLE_DETAIL:
By performing aggregate operations on the fields above, you can accomplish many tasks:
1. View monthly data volume:
By using SUBPARTITION_TEMPLATE_NAME as the GROUP BY field, you can obtain the data volume for each month and clearly observe the trend of monthly data changes:
SELECT
SUBPARTITION_TEMPLATE_NAME,
SUM((DATA_LENGTH + INDEX_LENGTH + DATA_FREE) / (1024 * 1024 * 1024)) AS `TOTAL_SIZE(GB)`
FROM
INFORMATION_SCHEMA.TABLE_DETAIL
WHERE
TABLE_NAME = 'tbl'
GROUP BY
SUBPARTITION_TEMPLATE_NAME
ORDER BY
`TOTAL_SIZE(GB)` DESC;
2. Since the table rolls on a monthly basis, you can predict how much data is about to be deleted and how much data will remain on each data node after the deletion:
SELECT
t1.space,
t1.space - t3.space_to_deleted AS available_space,
t1.storage_inst_id,
t3.cancel_cnt
FROM (
SELECT
storage_inst_id,
SUM((data_length + index_length + data_free) / (1024 * 1024 * 1024)) AS space
FROM
information_schema.table_detail
GROUP BY
storage_inst_id
) t1
JOIN (
SELECT
storage_inst_id,
COUNT(*) AS cancel_cnt,
SUM((data_length + index_length + data_free) / (1024 * 1024 * 1024)) AS space_to_deleted
FROM
information_schema.table_detail
WHERE
table_name = 'tbl'
AND subpartition_template_name = 'p202501'
GROUP BY
storage_inst_id
) t3
ON t1.storage_inst_id = t3.storage_inst_id
ORDER BY
t1.space;
In PolarDB-X, a singleton table is stored on a fixed data node by default. These tables may not occupy much space, but can have high queries per second (QPS). Sometimes, you may observe that an individual data node has elevated QPS or CPU utilization. In this case, you can use the ROWS_READ, ROWS_INSERTED, ROWS_UPDATED, and ROWS_DELETED fields in TABLE_DETAIL to help find these highly accessed tables:
SELECT
SUM(rows_read + rows_inserted + rows_updated + rows_deleted) AS access_count,
table_name,
COUNT(*) AS partitions
FROM
information_schema.table_detail
WHERE
storage_inst_id = 'pxc-xdb-s-xxxxxxxx'
GROUP BY
table_name
HAVING
partitions = 1
ORDER BY
access_count DESC;
Note: These fields represent the cumulative number of rows accessed in the partition since the data node started. They are not directly equivalent to metrics such as QPS and CPU utilization, but serve as a valuable reference. Typically, you can examine the tables with the highest counts, from top to bottom, and use information from SQL Explorer for further investigation and confirmation.
Q: Is the data in the TABLE_DETAIL view exact?
A: The information in this view, such as the number of rows, is calculated from statistics, so it is not an exact value. In a few scenarios, the data may be missing or exhibit large deviations, but it is sufficient for reference purposes.
Q: Is it safe and reliable to query the TABLE_DETAIL view?
A: When the system contains a large total number of partitions, querying this view can be time-consuming. It is safe to query, but avoid doing so with high concurrency, for example, by executing a large number of queries on it from within a program.
Best Practices for Setting Up Your Own Webhook Bot for Alibaba Cloud Resource Notifications
ApsaraDB - April 10, 2024
ApsaraDB - February 21, 2023
ApsaraDB - April 20, 2023
ApsaraDB - March 17, 2025
ApsaraDB - December 21, 2022
ApsaraDB - April 10, 2024
Best Practices
Follow our step-by-step best practices guides to build your own business case.
Learn More
PolarDB for Xscale
Alibaba Cloud PolarDB for Xscale (PolarDB-X) is a cloud-native high-performance distributed database service independently developed by Alibaba Cloud.
Learn More
PolarDB for MySQL
Alibaba Cloud PolarDB for MySQL is a cloud-native relational database service 100% compatible with MySQL.
Learn More
AnalyticDB for MySQL
AnalyticDB for MySQL is a real-time data warehousing service that can process petabytes of data with high concurrency and low latency.
Learn MoreMore Posts by ApsaraDB