Common statements for health checks in Hologres - Hologres

This topic describes the common statements for health checks in Hologres.

Health checks related to table planning

Check whether an excessive number of table groups or shards exist.
If a Hologres instance has less than 256 CPU cores, we recommend that you use only the default table group. If a Hologres instance has at least 256 CPU cores, you can retain two or three table groups. We recommend that you retain no more than three table groups for a database. The number of shards must be greater than the number of worker nodes and less than 60% of the number of CPU cores. If you use the replication feature in Hologres, you must proportionally decrease the number of shards or proportionally increase the number of worker nodes. The following statements are provided for you to check the information:
```
-- Query the number of table groups.
SELECT COUNT(DISTINCT tablegroup_name)
FROM hologres.hg_table_group_properties;

-- Check the configurations of each table group. We recommend that you include no more than 3,000 tables in each table group. The number of tables is specified by the table_num property.
SELECT tablegroup_name AS table_group
        ,max(property_value) FILTER( WHERE property_key='shard_count') AS shard_count
        ,max(property_value) FILTER( WHERE property_key='replica_count') AS replica_count
        ,max(property_value) FILTER( WHERE property_key='is_default_tg') AS is_default
        ,max(property_value) FILTER( WHERE property_key='table_num') AS table_num
FROM    hologres.hg_table_group_properties
GROUP BY tablegroup_name;
```
If a database has more than three table groups, we recommend that you specify a core table group and merge the redundant table groups. Alternatively, you can specify a primary table group and a secondary table group that contains dimension tables. For more information, see Use the resharding feature to migrate tables to a new table group.

Check whether a schema has an excessive number of tables.

If a schema has an excessive number of tables, the system generates an excessive number of small files, which leads to memory overuse. The following statement is provided for you to check the information:

-- Check the number of tables in each schema.
SELECT table_namespace AS schema 
        , COUNT(distinct table_name) AS total_tables
        , COUNT(distinct table_name) FILTER ( WHERE property_key='storage_format') AS inner_tables
FROM    hologres.hg_table_properties
WHERE   table_namespace NOT IN ('hologres','hologres_statistic','pg_catalog')
GROUP BY table_namespace
ORDER BY table_namespace;

If a schema has an excessive number of small partitioned tables, merge the tables into a standard table.

Check whether table statistics are updated in time.

The following statement is provided for you to check the information:

-- Retrieve tables whose statistics are not updated in the previous day.
SELECT  schema_name
        ,table_name
        ,total_rows
        ,analyze_timestamp
FROM    hologres_statistic.hg_table_statistic
WHERE   analyze_timestamp < now() - interval '1 days'
ORDER BY schema_name
         ,table_name
LIMIT 200;

If the statistics of tables with data updated are not updated in time, execute the analyze tablename statement to update the statistics.

Check whether a Hologres instance has an excessive number of resource groups.
An excessive number of resource groups may affect the use of CPU and memory resources. We recommend that you configure no more than three resource groups for a Hologres instance. Make sure that more than 30% of resources are allocated to the default resource group.
The following statement is provided for you to check the information:
```
SELECT * FROM pg_holo_resource_groups
WHERE property_key='worker_limit';
```

Health checks about table designing

Check whether row-oriented tables are used only in required scenarios.
Row-oriented tables are used only in specific scenarios, such as scenarios in which a Flink data source is joined with a dimension table. Make sure that you do not use row-oriented tables in unnecessary scenarios. The following statement is used to query all row-oriented tables:
```
SELECT  table_namespace AS schema
        ,table_name AS tables
FROM    hologres.hg_table_properties
WHERE
        property_key = 'orientation'
AND     property_value = 'row'
AND     table_namespace NOT IN ('hologres','hologres_statistic','pg_catalog');
```
If you use row-oriented tables in unnecessary scenarios, change the row-oriented tables into column-oriented tables or row-column hybrid tables.

Check whether the distribution key is specified for a table and whether more than two columns constitute the distribution key.

We recommend that you specify one to two columns to constitute the distribution key. The following statements are provided for you to check the information:

-- Query all tables that use more than two columns to constitute the distribution key.
SELECT  table_namespace AS schema
        ,table_name AS tables
        ,property_value AS distribution_key
FROM    hologres.hg_table_properties
WHERE   property_key = 'distribution_key'
AND     table_namespace NOT IN ('hologres','hologres_statistic','pg_catalog')
AND     array_length(string_to_array(property_value,','),1) > 2;

-- Query all tables for which the distribution key is not specified.
SELECT  DISTINCT table_namespace AS schema
        ,table_name AS tables
FROM    hologres.hg_table_properties a
WHERE property_key='storage_format' AND NOT EXISTS (
                     SELECT  1
                     FROM    hologres.hg_table_properties b
                     WHERE   b.property_key = 'distribution_key'
                     AND     a.table_namespace = b.table_namespace
                     AND     a.table_name = b.table_name)
                     AND     table_namespace NOT IN ('hologres','hologres_statistic','pg_catalog'
                 )
ORDER BY schema
         ,tables;

If a table uses more than two columns to constitute the distribution key, recreate the table and specify one to two columns to constitute the distribution key.

Check whether a table has more than 20 dictionary encoding columns.
We recommend that you enable dictionary encoding for no more than 20 columns that have low cardinality. If you are unsure about which columns you want to specify, enable automatic dictionary encoding for the whole table to prevent all columns from being specified as dictionary encoding columns. The following statement is provided for you to check the information:
```
-- Query all tables that have more than 20 dictionary encoding columns and for which automatic dictionary encoding is disabled.
SELECT  table_namespace AS schema
        ,table_name AS tables
        ,property_value AS dictionary_encoding_columns
FROM    hologres.hg_table_properties
WHERE   property_key = 'dictionary_encoding_columns'
AND     table_namespace NOT IN ('hologres','hologres_statistic','pg_catalog')
AND     array_length(string_to_array(property_value,','),1) > 20
AND     property_value NOT LIKE '%:auto';
```
If a table has more than 20 dictionary encoding columns, call the SET_TABLE_PROPERTY or UPDATE_TABLE_PROPERTY function to modify the dictionary_encoding_columns property. For more information, see SET_TABLE_PROPERTY or ALTER TABLE.
Check whether bitmap indexes are configured for more than 30 columns.
We recommend that you configure bitmap indexes only for columns among which you want to check whether the values are equal. If you configure bitmap indexes for an excessive number of columns of the STRING type, excessive storage space and memory are consumed.
```
-- Query all tables in which bitmap indexes are configured for more than 30 columns.
SELECT  table_namespace AS schema
        ,table_name AS tables
        ,property_value AS bitmap_columns
FROM    hologres.hg_table_properties
WHERE   property_key = 'bitmap_columns'
AND     table_namespace NOT IN ('hologres','hologres_statistic','pg_catalog')
AND     array_length(string_to_array(property_value,','),1) > 30;
```
If bitmap indexes are configured for more than 30 columns, call the SET_TABLE_PROPERTY or UPDATE_TABLE_PROPERTY function to modify the bitmap_columns property. For more information, see SET_TABLE_PROPERTY or ALTER TABLE.
Check whether more than two columns are used to constitute the clustering key of a table.
Clustering key-based queries follow the leftmost matching principle. Therefore, we recommend that you specify no more than two columns to constitute the clustering key. Otherwise, clustering key-based queries cannot be performed in specific scenarios. The following statement is provided for you to check the information:
```
-- Query all tables that use more than two columns to constitute the clustering key.
SELECT  table_namespace AS schema
        ,table_name AS tables
        ,property_value AS clustering_key
FROM    hologres.hg_table_properties
WHERE   property_key = 'clustering_key'
AND     table_namespace NOT IN ('hologres','hologres_statistic','pg_catalog')
AND     array_length(string_to_array(property_value,','),1) > 2;
```
If a table uses more than two columns to constitute the clustering key, recreate the table and specify one or two columns used for data sorting to constitute the clustering key.
Check whether only one event time column is specified as the segment key for a table. The event time column stores the timestamps of real-time data writes.
The segment key is used to split files. We recommend that you specify only one event time column of a timestamp or integer type as the segment key. The following statement is provided for you to check the information:
```
-- Query all tables that use more than one event time column to constitute the segment key.
SELECT  table_namespace AS schema
        ,table_name AS tables
        ,property_value AS segment_key
FROM    hologres.hg_table_properties
WHERE   property_key = 'segment_key'
AND     table_namespace NOT IN ('hologres','hologres_statistic','pg_catalog')
AND     array_length(string_to_array(property_value,','),1) > 1;
```
If a table uses more than one event time column to constitute the segment key, recreate the table and specify only one event time column as the segment key.
Check whether the time to live (TTL) of a table is less than seven days.
The TTL of a table is the maximum period of time that a data record can be stored in the table. The data record is recycled after the maximum period of time expires. Therefore, we recommend that you set the TTL to at least seven days. Otherwise, duplicate data records may exist. The following statement is provided for you to check the information:
```
-- Query all tables whose TTL is less than seven days. The TTL is specified by the time_to_live_in_seconds property.
SELECT  table_namespace AS schema
        ,table_name AS tables
        ,property_value AS time_to_live_in_seconds
FROM    hologres.hg_table_properties
WHERE   property_key = 'time_to_live_in_seconds'
AND     table_namespace NOT IN ('hologres','hologres_statistic','pg_catalog')
AND     property_value::bigint < 604800;
```
If the TTL of a table is less than seven days, call the SET_TABLE_PROPERTY function to modify the TTL. For more information, see SET_TABLE_PROPERTY.
Check whether the binary logging feature is enabled only in necessary scenarios.
The binary logging feature is useful but resource-consuming. The feature affects the performance of data writes. If the feature is enabled for a row-oriented table and a column-oriented table, resources consumed by the row-column table are much less than those consumed by the column-oriented table. Therefore, we recommend that you enable the feature for column-oriented tables only in necessary scenarios. The following statements are provided for you to check the information:
```
-- Query all tables for which the binary logging feature is enabled.
SELECT  table_namespace AS schema
        ,table_name AS tables
FROM    hologres.hg_table_properties
WHERE   property_key = 'binlog.level'
AND     property_value = 'replica'
AND     table_namespace NOT IN ('hologres','hologres_statistic','pg_catalog')
;

-- Query all tables for which the TTL of binary logs is set to more than seven days. We recommend that you shorten the TTL.
SELECT  table_namespace AS schema
        ,table_name AS tables
        ,property_value AS "binlog.ttl"
FROM    hologres.hg_table_properties
WHERE   property_key = 'binlog.ttl'
AND     property_value::bigint > 604800
AND     table_namespace NOT IN ('hologres','hologres_statistic','pg_catalog')
;
```
If the binary logging feature is not properly configured, call the SET_TABLE_PROPERTY function to modify the feature settings and the TTL of binary logs. For more information, see SET_TABLE_PROPERTY.
Check whether data is unevenly distributed to shards.
In Hologres, the data of a table is stored in multiple shards. If some shards have significantly more data than other shards, the data is unevenly distributed. In this case, we recommend that you reconfigure the distribution key. The following statement is provided for you to check the information:
```
SELECT  hg_shard_id
        , COUNT(*)
FROM    table_name
GROUP BY hg_shard_id;
```
If the distribution of data is significantly uneven, change the columns that constitute the distribution key and reinsert data.

Health checks about runtime status

Check the resource usage.
You can analyze CPU utilization, memory usage, and connection usage by using CloudMonitor. For more information, see Hologres metrics.

Check the statistics about query statements.

You can check the proportions, success rates, and latencies of different types of query statements. The following statements are provided for you to check the information:

-- Check the number of DML statements, including SELECT, INSERT, UPDATE, and DELETE statements, executed on each database in the previous seven days.
SELECT datname, query_date, count(*) FROM hologres.query_log
WHERE query_date > to_char(current_date - interval'7 days','YYYYMMDD')
AND command_tag IN ('SELECT','INSERT','UPDATE','DELETE')
GROUP BY datname, query_date
ORDER BY datname, query_date DESC;

-- Check the execution status of DML statements in the previous day.
SELECT datname, query_date, command_tag,
       count(*) FILTER( WHERE status='SUCCESS') AS SUCCESS,
       count(*) FILTER( WHERE status='FAILED') AS FAILED
FROM hologres.query_log
WHERE query_date > to_char(current_date - interval'1 days','YYYYMMDD')
AND command_tag IN ('SELECT','INSERT','UPDATE','DELETE')
GROUP BY datname, query_date, command_tag
ORDER BY datname, query_date DESC;

-- Check the query latencies of DML statements in the previous two days.
SELECT datname, query_date, command_tag, count(*), AVG(duration) as duration
FROM hologres.query_log
WHERE query_date > to_char(current_date - interval'1 days','YYYYMMDD')
AND command_tag IN ('SELECT','INSERT','UPDATE','DELETE')
AND status = 'SUCCESS'
GROUP BY datname, query_date, command_tag
ORDER BY datname, query_date DESC;

Check slow queries.

Check the slow queries that consume the most amount of time in the previous day. The following statement is provided for you to check the information:

-- Query the slow queries that consume the most amount of time in the previous day.
SELECT status AS "Status",
       duration AS "Time consumed (ms)",
       optimization_cost AS "Optimization time (ms)",
       start_query_cost AS "Startup time (ms)",
       get_next_cost AS "Execution time (ms)",
       duration-optimization_cost-start_query_cost-get_next_cost AS "Others (ms)",
       query_id AS "QueryID",
       query
 FROM hologres.hg_query_log
 WHERE query_start > current_date - interval '1 days'
  AND command_tag IN ('SELECT')
  AND duration > 1000
 ORDER BY duration DESC,
          start_query_cost DESC,
          optimization_cost,
          get_next_cost DESC,
          duration-optimization_cost-start_query_cost-get_next_cost DESC
 LIMIT 200;

Check the slow queries that consume the largest amount of resources in the previous day. The following statement is provided for you to check the information:

-- Query the slow queries that consume the largest amount of resources in the previous day.
SELECT status AS "Status",
       duration AS "Time consumed (ms)",
       query_start AS "Started at",
       (read_bytes/1048576)::text || ' MB' AS "Read bytes",
       (memory_bytes/1048576)::text || ' MB' AS "Memory",
       (shuffle_bytes/1048576)::text || ' MB' AS "Shuffle",
       (cpu_time_ms/1000)::text || ' s' AS "CPU time",
       physical_reads AS "Physical reads",
       query_id AS "QueryID",
       query
 FROM hologres.hg_query_log
 WHERE query_start > current_date - interval'1 days'
 AND command_tag IN ('SELECT','INSERT','UPDATE','DELETE')
 AND duration > 1000
 ORDER BY duration DESC,
          read_bytes DESC,
          shuffle_bytes DESC,
          memory_bytes DESC,
          cpu_time_ms DESC,
          physical_reads DESC
 LIMIT 500;