×
Community Blog PolarDB-X Best Practice Series (9): N Ingenious Uses for the TABLE_DETAIL View

PolarDB-X Best Practice Series (9): N Ingenious Uses for the TABLE_DETAIL View

This article demonstrates several common use cases by using specific examples.

By Mengshi

The PolarDB-X INFORMATION_SCHEMA database contains an extremely useful view named TABLE_DETAIL.

Introduction to the TABLE_DETAIL view:

  1. Each partition corresponds to a single record. For example, if a database has 32 tables, and each table has 16 partitions and 4 subpartitions (partition by... partitions 16 subpartition by... partitions 4), the database has 32 × 16 × 4 = 2,048 records.
  2. Global indexes are also partitioned. Therefore, each partition of a global index also corresponds to a record.
  3. A table with only one record is a singleton table.
  4. A broadcast table has a conceptual "partition" on each data node, so the number of records in a broadcast table is typically equal to the number of data nodes. Exception: The Locality attribute may be used to restrict some databases to specific data nodes.

Some key fields in the TABLE_DETAIL view are listed as follows:

  1. DATA_LENGTH, INDEX_LENGTH, and DATA_FREE: The sum of these three fields (SUM(DATA_LENGTH + INDEX_LENGTH + DATA_FREE)) represents the space occupied by a partition, in bytes.
  2. TABLE_ROWS: the estimated number of rows in the partition.
  3. STORAGE_INST_ID: the instance ID of the corresponding data node.
  4. SCHEMA_NAME and TABLE_NAME: the database name and table name.
  5. INDEX_NAME: the name of a global index for its partitions.
  6. Additional fields will be introduced in the following examples.

By applying various aggregate functions and using the GROUP BY clause on different fields, this view can provide a wealth of information. This article will demonstrate several common use cases by using specific examples.

Calculate Table Sizes to Find the Largest Tables

By using GROUP BY on TABLE_NAME, you can aggregate data at the table level. In this example, you can view the total data volume of each table, and by sorting the results, you can identify which tables occupy the most space in the current instance.

Of course, you can also add filter conditions to the WHERE clause, such as by database name (TABLE_SCHEMA) or table name (TABLE_NAME), to view the data distribution at a finer granularity.

SELECT 
    SUM(DATA_LENGTH + INDEX_LENGTH + DATA_FREE) / (1024 * 1024 * 1024) AS `DATA_SIZE(GB)`, 
    TABLE_NAME
FROM 
    INFORMATION_SCHEMA.TABLE_DETAIL
GROUP BY 
    TABLE_NAME
ORDER BY 
    `DATA_SIZE(GB)` DESC;

Check for Data Imbalance among Data Nodes and Find the Largest Data Nodes

SELECT 
    SUM(DATA_LENGTH + INDEX_LENGTH + DATA_FREE) / (1024 * 1024 * 1024) AS `DATA_SIZE(GB)`, 
    STORAGE_INST_ID
FROM 
    INFORMATION_SCHEMA.TABLE_DETAIL
GROUP BY 
    STORAGE_INST_ID
ORDER BY 
    `DATA_SIZE(GB)` DESC;

By changing the aggregation field to STORAGE_INST_ID, you can calculate the size of each data node. This makes it easy to identify data skew issues among data nodes.

Diagnose Data Skew among Data Nodes

When you find that one data node occupies more space than others, indicating data skew, you can analyze the problem by performing the following steps:

1.  On the specified data node, view the space occupied by each table to identify the largest tables:

SELECT 
    SUM(DATA_LENGTH + INDEX_LENGTH + DATA_FREE) / (1024 * 1024 * 1024) AS `DATA_SIZE(GB)`,
    COUNT(*) AS partitions,
    TABLE_NAME
FROM 
    INFORMATION_SCHEMA.TABLE_DETAIL
WHERE 
    STORAGE_INST_ID = 'pxc-xdb-s-xxxxxxxxx'
GROUP BY 
    TABLE_NAME
ORDER BY 
    `DATA_SIZE(GB)` DESC;

2.  If a top table is a singleton table, it is the likely cause of the data skew. To resolve this, you can perform operations such as moving the table or converting it into a partitioned table.

A table with a COUNT() of 1 is typically a singleton table, unless its partition count happens to equal the number of data nodes. A table with a COUNT() greater than 1 is typically a partitioned table.

3.  If a top table is a partitioned table, use the PARTITION_NAME field in the GROUP BY clause to calculate the data volume of each partition. This allows you to determine whether the imbalance is caused by data skew among the partitions themselves:

SELECT 
    SUM((DATA_LENGTH + INDEX_LENGTH + DATA_FREE) / (1024 * 1024 * 1024)) AS `TOTAL_SIZE(GB)`,
    PARTITION_NAME
FROM 
    INFORMATION_SCHEMA.TABLE_DETAIL
WHERE 
    TABLE_NAME = 'tbl'
GROUP BY 
    PARTITION_NAME
ORDER BY 
    `TOTAL_SIZE(GB)` DESC;

Tables with Rolling Time-based Partitions

In some business scenarios, a table is subpartitioned by time to facilitate the cleanup of historical data. For tables with this structure, you can obtain more interesting information from the TABLE_DETAIL view.

The table's partition structure is as follows:

PARTITION BY KEY(`user_id`)
PARTITIONS 16
SUBPARTITION BY RANGE(TO_DAYS(`xxxx_time`))
(SUBPARTITION `p202501` VALUES LESS THAN (739648),
 SUBPARTITION `p202502` VALUES LESS THAN (739676),
 SUBPARTITION `p202503` VALUES LESS THAN (739707),
 SUBPARTITION `p202504` VALUES LESS THAN (739737),
 SUBPARTITION `p202505` VALUES LESS THAN (739768),
 SUBPARTITION `p202506` VALUES LESS THAN (739798))

According to the calculation method described above, the table has 16 × 6 = 96 records in the TABLE_DETAIL view.

First, let's introduce several fields related to subpartitioning in TABLE_DETAIL:

  1. PARTITION_NAME: the name of a partition only, with values such as p1 and p2. In this table, PARTITION_NAME has a total of 16 values.
  2. SUBPARTITION_TEMPLATE_NAME: Its values are p202501, p202502, etc. In this table, SUBPARTITION_TEMPLATE_NAME has a total of 6 values, with each value appearing across 16 records.
  3. SUBPARTITION_NAME: composed of the partition name and subpartition name, such as p1p202505. Each SUBPARTITION_NAME value corresponds to a partition in TABLE_DETAIL, which means this table has 96 SUBPARTITION_NAME records.

By performing aggregate operations on the fields above, you can accomplish many tasks:

1.  View monthly data volume:

By using SUBPARTITION_TEMPLATE_NAME as the GROUP BY field, you can obtain the data volume for each month and clearly observe the trend of monthly data changes:

SELECT 
    SUBPARTITION_TEMPLATE_NAME,
    SUM((DATA_LENGTH + INDEX_LENGTH + DATA_FREE) / (1024 * 1024 * 1024)) AS `TOTAL_SIZE(GB)`
FROM 
    INFORMATION_SCHEMA.TABLE_DETAIL
WHERE 
    TABLE_NAME = 'tbl'
GROUP BY 
    SUBPARTITION_TEMPLATE_NAME
ORDER BY 
    `TOTAL_SIZE(GB)` DESC;

2.  Since the table rolls on a monthly basis, you can predict how much data is about to be deleted and how much data will remain on each data node after the deletion:

SELECT 
    t1.space,
    t1.space - t3.space_to_deleted AS available_space,
    t1.storage_inst_id,
    t3.cancel_cnt
FROM (
    SELECT 
        storage_inst_id,
        SUM((data_length + index_length + data_free) / (1024 * 1024 * 1024)) AS space
    FROM 
        information_schema.table_detail
    GROUP BY 
        storage_inst_id
) t1
JOIN (
    SELECT 
        storage_inst_id,
        COUNT(*) AS cancel_cnt,
        SUM((data_length + index_length + data_free) / (1024 * 1024 * 1024)) AS space_to_deleted
    FROM 
        information_schema.table_detail
    WHERE 
        table_name = 'tbl'
        AND subpartition_template_name = 'p202501'
    GROUP BY 
        storage_inst_id
) t3
ON t1.storage_inst_id = t3.storage_inst_id
ORDER BY 
    t1.space;

Find Singleton Tables with High Traffic

In PolarDB-X, a singleton table is stored on a fixed data node by default. These tables may not occupy much space, but can have high queries per second (QPS). Sometimes, you may observe that an individual data node has elevated QPS or CPU utilization. In this case, you can use the ROWS_READ, ROWS_INSERTED, ROWS_UPDATED, and ROWS_DELETED fields in TABLE_DETAIL to help find these highly accessed tables:

SELECT 
    SUM(rows_read + rows_inserted + rows_updated + rows_deleted) AS access_count,
    table_name,
    COUNT(*) AS partitions
FROM 
    information_schema.table_detail
WHERE 
    storage_inst_id = 'pxc-xdb-s-xxxxxxxx'
GROUP BY 
    table_name
HAVING 
    partitions = 1
ORDER BY 
    access_count DESC;

Note: These fields represent the cumulative number of rows accessed in the partition since the data node started. They are not directly equivalent to metrics such as QPS and CPU utilization, but serve as a valuable reference. Typically, you can examine the tables with the highest counts, from top to bottom, and use information from SQL Explorer for further investigation and confirmation.

FAQ

Q: Is the data in the TABLE_DETAIL view exact?

A: The information in this view, such as the number of rows, is calculated from statistics, so it is not an exact value. In a few scenarios, the data may be missing or exhibit large deviations, but it is sufficient for reference purposes.

Q: Is it safe and reliable to query the TABLE_DETAIL view?

A: When the system contains a large total number of partitions, querying this view can be time-consuming. It is safe to query, but avoid doing so with high concurrency, for example, by executing a large number of queries on it from within a program.

0 0 0
Share on

ApsaraDB

564 posts | 179 followers

You may also like

Comments

ApsaraDB

564 posts | 179 followers

Related Products