Hologres V1.3 and later collect statistics on tables in your Hologres instance on a daily basis and store the statistics in the system table named hologres.hg_table_info. This way, you can query and analyze the statistics on your tables and take optimization measures based on the results of statistical analysis. This topic describes how to query and analyze table statistics in Hologres.
Limits
Hologres V1.3 and later allow you to query table statistics. If the version of your Hologres instance is earlier than V1.3, manually upgrade your Hologres instance in the Hologres console or join a Hologres DingTalk group to apply for an instance upgrade. For more information about how to manually upgrade a Hologres instance, see Instance upgrades. For more information about how to join a Hologres DingTalk group, see Obtain online support for Hologres.
The hologres.hg_table_info table is updated with a delay of one day. Data of the current day may be updated before 05:00 on the next day. On the day when your Hologres instance is upgraded from V1.1 to V1.3, table statistics will not be generated. If you query the table statistics on this day, the following error message is reported:
meta warehouse store currently not available
. In this case, you need to query table statistics the next day after the upgrade is complete.
Usage notes
By default, the table statistics of the previous 30 days are retained.
For a non-partitioned internal table in Hologres, you can query the detailed statistics on the table, including the storage space, the number of files, the accumulated number of times that the table was accessed, and the number of rows.
For other objects in Hologres, such as views, materialized views, foreign tables, and parent tables, you can query only the basic information. For example, you can query the number of partitions of a parent table, the name of the table to which a foreign table maps, and the definition of a view or materialized view.
The hologres.hg_table_info table belongs to the meta warehouse system of Hologres. Failure in querying the hologres.hg_table_info table does not affect queries in instances. Therefore, the stability of the hologres.hg_table_info table is not specified in the service level agreement (SLA) of Hologres.
hologres.hg_table_info table
Table statistics are stored in the system table named hologres.hg_table_info. If the version of your Hologres instance is V1.3 or later, table statistics are collected on a daily basis by default. The following information describes the fields that are contained in the hologres.hg_table_info
table.
Field | Data type | Description
--------------------+--------------------------+-----------------------------------------------------------------------
db_name | text | The name of the database to which the table belongs.
schema_name | text | The name of the schema to which the table belongs.
table_name | text | The name of the table.
table_id | text | The unique ID of the table. If the table is a foreign table, the unique ID is in the format of Database name.Schema name.Table name.
type | text | The type of the table, which can be TABLE, FOREIGN TABLE, PARTITION, VIEW, or MATERIALIZED VIEW.
partition_spec | text | The partitioning condition. The field is valid only if the current table is a child table.
is_partition | boolean | Indicates whether the table is a child table.
owner_name | text | The username of the user who owns the table. This field corresponds to the username field of the hologres.hg_query_log table. You can join the two tables based on the owner_name and usename fields.
create_time | timestamp with time zone | The time when the table was created.
last_ddl_time | timestamp with time zone | The time when the table statistics were last updated.
last_modify_time | timestamp with time zone | The time when the table data was last modified.
last_access_time | timestamp with time zone | The time when the table was last accessed.
view_def | text | The definition of the view. This field is valid only if the current table is a view.
comment | text | The description of the table or the view.
hot_storage_size | bigint | The storage space that is occupied by the hot data of the table, in bytes.
cold_storage_size | bigint | The storage space that is occupied by the cold data of the table, in bytes.
hot_file_count | bigint | The number of hot data files of the table.
cold_file_count | bigint | The number of cold data files of the table.
table_meta | jsonb | The metadata of the table. The value is in the JSON format.
row_count | bigint | The number of rows in the table or the partition.
collect_time | timestamp with time zone | The time when the current statistics were collected.
partition_count | bigint | The number of child tables. This field is valid only if the current table is a parent table.
parent_schema_name | text | The name of the schema to which the parent table belongs. This field is valid only if the current table is a child table.
parent_table_name | text | The name of the parent table. This field is valid only if the current table is a child table.
total_read_count | bigint | The accumulated number of times that data was read from the table. The number can be affected by SELECT, INSERT, UPDATE, and DELETE operations and does not indicate an accurate value.
total_write_count | bigint | The accumulated number of times that data was written to the table. The number can be affected by INSERT, UPDATE, and DELETE operations and does not indicate an accurate value.
Specific fields may be empty. If a table was created before the version of the relevant Hologres instance was upgraded to V1.3 or later, certain statistics about the creation of the table failed to be collected. Such statistics can be collected for tables that are created after the version of the relevant Hologres instance is upgraded to V1.3 or later.
If the value of the
type
field isVIEW
, thecreate_time
field and thelast_ddl_time
field are empty.If the value of the
type
field is VIEW, FOREIGN TABLE, or PARTITION, the following fields are empty:last_modify_time
,last_access_time
,hot_file_count
,cold_file_count
,total_read_count
, andtotal_write_count
.You can obtain the storage size of a table from the
hologres.hg_table_info
system table or use thepg_relation_size
function. If you use the two methods to query the storage size of a table, the query results may be different. This case is normal. This is because statistics in thehologres.hg_table_info
system table are collected on a daily basis and the query result of thepg_relation_size
function does not contain the storage size of binary logs.
Grant query permissions to a user
Only an authorized user can query table statistics. The following section describes how to grant the relevant query permissions to a user in different scenarios.
Query the table statistics of all databases in a Hologres instance.
Assign the superuser role to a user.
After you assign the superuser role to a user, the user can view the table statistics of all databases in the Hologres instance.
-- Replace Account ID with the Alibaba Cloud account ID of the user. For a RAM user, prefix the account ID with p4_. ALTER USER "Account ID" SUPERUSER;
Add users to the
pg_stat_scan_table
user group.In addition to superusers, users in the
pg_stat_scan_tables
(for versions earlier than Hologres V1.3.44) orpg_read_all_stats
(for Hologres V1.3.44 and later) user group also have permissions to query the table statistics of all databases in the relevant Hologres instance. If you are a superuser, you can add regular users to the user group. Syntax:-- For versions earlier than Hologres V1.3.44 GRANT pg_stat_scan_tables TO "Alibaba Cloud account ID";-- Use the standard PostgreSQL authorization model to grant related permissions to the user. CALL spm_grant('pg_stat_scan_tables, 'Alibaba Cloud account ID'); -- Use the simple permission model (SPM) to grant relevant permissions to the user. CALL slpm_grant('pg_stat_scan_tables, 'Alibaba Cloud account ID'); -- Use the schema-level permission model (SLPM) to grant relevant permissions to the user. -- For Hologres V1.3.44 and later GRANT pg_read_all_stats TO "Account ID"; -- Use the standard PostgreSQL authorization model to grant relevant permissions to the user. CALL spm_grant('pg_read_all_stats', 'Account ID'); -- Use the simple permission model (SPM) to grant relevant permissions to the user. CALL slpm_grant('pg_read_all_stats', 'Account ID'); -- Use the schema-level permission model (SLPM) to grant related permissions to the user.
Query the table statistics of a database.
Users in the
db_admin
user group have the permissions to query the table statistics of the relevant database. If you are a superuser, you can add regular users to this user group by using the SPM or SLPM.NoteA regular user can query the statistics on only the tables that the user owns in the relevant database.
CALL spm_grant('<db_name>_admin', 'Account ID'); -- Use the SPM to grant related permissions to the user. CALL slpm_grant('<db_name>.admin', 'Account ID'); -- Use the SLPM to grant related permissions to the user.
Execute SQL statements to query the trends of table statistics
Scenario 1: Query the trends of table statistics by day in a Hologres instance
-- Query the trends of statistics of all tables in an instance in the previous week. The statistics include the occupied storage space, file quantity, accumulated numbers of times that tables were accessed, and number of row records.
SELECT
db_name,
schema_name,
table_name,
collect_time :: date as collect_date,
hot_storage_size,
cold_storage_size,
hot_file_count,
cold_file_count,
total_read_count,
total_write_count,
row_count
FROM
hologres.hg_table_info
WHERE
collect_time > (current_date-interval '1 week')::timestamptz -- Specify the previous week.
AND type = 'TABLE'
;
-- Use an aggregate function to query the trends of statistics on a parent table.
SELECT
db_name,
parent_schema_name AS schema_name,
parent_table_name AS table_name,
collect_time :: date AS collect_date,
sum(hot_file_count) AS hot_file_count,
sum(cold_file_count) AS cold_file_count,
sum(hot_storage_size) AS hot_storage_size,
sum(cold_storage_size) AS cold_storage_size,
sum(total_read_count) AS total_read_count,
sum(total_write_count) AS total_write_count,
sum(row_count) AS row_count,
count(*) AS partition_count
FROM
hologres.hg_table_info
WHERE
is_partition
GROUP BY
collect_date,
db_name,
parent_schema_name,
parent_table_name
;
Scenario 2: Query the access information of tables that occupy large disk space
-- Query the access information about the top 10 tables that occupy large disk space.
SELECT
db_name,
schema_name,
table_name,
hot_storage_size + cold_storage_size as total_storage_size,
row_count,
total_read_count,
total_write_count
FROM
hologres.hg_table_info
WHERE
collect_time > (current_date-interval '1 week')::timestamptz -- Specify the previous week.
AND type = 'TABLE'
AND (
cold_storage_size is NOT NULL
OR hot_storage_size IS NOT NULL
)
ORDER BY
total_storage_size DESC
LIMIT
10;
Scenario 3: Query the access trends of tables that occupy large disk space
-- Query the previous-week access trends of the top 10 tables that occupy large disk space on the previous day.
with tmp_table_info AS (
SELECT
db_name,
schema_name,
table_name,
collect_time,
hot_storage_size + cold_storage_size as total_storage_size,
row_count,
total_read_count,
total_write_count
FROM
hologres.hg_table_info
WHERE
collect_time > (current_date-interval '1 week')::timestamptz -- Specify the previous week.
AND type = 'TABLE'
AND (
hot_storage_size IS NOT NULL
OR cold_storage_size IS NOT NULL
)
)
SELECT
d.*
FROM
(
SELECT
db_name,
schema_name,
table_name,
total_storage_size
FROM
tmp_table_info
WHERE
collect_time > (current_date-interval '1 day')::timestamptz -- Specify the previous day.
and collect_time < current_date::timestamptz
ORDER BY
total_storage_size DESC
LIMIT
10
) t -- The top 10 tables that occupy large disk space on the previous day.
left join (
SELECT
db_name,
schema_name,
table_name,
collect_time :: date as collect_date,
total_read_count,
total_write_count,
total_storage_size,
row_count,
total_storage_size,
total_read_count,
total_write_count,
row_count
FROM
tmp_table_info
) d -- The data size changes by day in the previous seven days.
ON t.db_name = d.db_name
AND t.schema_name = d.schema_name
AND t.table_name = d.table_name
ORDER BY
t.total_storage_size desc,
d.collect_date desc,
d.db_name,
d.schema_name,
d.table_name;
Scenario 4: Query the access information and data size changes of tables that occupy large disk by day
-- Query the access information and data size changes of the top 10 tables that occupy large disk space by day in the previous week.
with tmp_table_info AS (
SELECT
db_name,
schema_name,
table_name,
collect_time,
hot_storage_size + cold_storage_size as total_storage_size,
row_count,
total_read_count,
total_write_count
FROM
hologres.hg_table_info
WHERE
collect_time > (current_date-interval '1 week')::timestamptz -- Specify the previous week.
AND type = 'TABLE'
AND (
hot_storage_size IS NOT NULL
OR cold_storage_size IS NOT NULL
)
)
SELECT
d.*
FROM
(
SELECT
db_name,
schema_name,
table_name,
total_storage_size
FROM
tmp_table_info
WHERE
collect_time > (current_date-interval '1 day')::timestamptz -- Specify the previous day.
and collect_time < current_date::timestamptz
ORDER BY
total_storage_size DESC
LIMIT
10
) t -- The top 10 tables that occupy large disk space on the previous day.
left join (
SELECT
db_name,
schema_name,
table_name,
collect_time :: date as collect_date,
first_value(total_read_count) OVER w - last_value(total_read_count) OVER w AS day_read_count,
first_value(total_write_count) OVER w - last_value(total_write_count) OVER w AS day_write_count,
first_value(row_count) OVER w - last_value(row_count) OVER w AS day_modify_count
FROM
tmp_table_info
WINDOW w AS (
PARTITION BY
db_name,
schema_name,
table_name
ORDER BY
collect_time DESC
ROWS BETWEEN
CURRENT ROW AND 1 FOLLOWING -- The differences between two consecutive days.
)
) d -- The data size changes by day in the previous seven days.
ON t.db_name = d.db_name
AND t.schema_name = d.schema_name
AND t.table_name = d.table_name
ORDER BY
t.total_storage_size desc,
d.collect_date desc,
d.db_name,
d.schema_name,
d.table_name;
Scenario 5: Query the access information and data size changes of tables that occupy large disk space in the previous week
-- Query the access information and data size changes of the top 10 tables that occupy large disk space by day in the previous week.
with tmp_table_info AS (
SELECT
db_name,
schema_name,
table_name,
collect_time,
hot_storage_size + cold_storage_size as total_storage_size,
row_count,
total_read_count,
total_write_count
FROM
hologres.hg_table_info
WHERE
collect_time > (current_date-interval '1 week')::timestamptz -- Specify the previous week.
AND type = 'TABLE'
AND (
hot_storage_size IS NOT NULL
OR cold_storage_size IS NOT NULL
)
)
SELECT
d.*
FROM
(
SELECT
db_name,
schema_name,
table_name,
total_storage_size
FROM
tmp_table_info
WHERE
collect_time > (current_date-interval '1 day')::timestamptz -- Specify the previous day.
and collect_time < current_date::timestamptz
ORDER BY
total_storage_size DESC
LIMIT
10
) t -- The top 10 tables that occupy large disk space on the previous day.
left join (
SELECT
db_name,
schema_name,
table_name,
collect_time,
first_value(total_read_count) OVER w - last_value(total_read_count) OVER w AS week_read_count,
first_value(total_write_count) OVER w - last_value(total_write_count) OVER w AS week_write_count,
first_value(row_count) OVER w - last_value(row_count) OVER w AS week_modify_count
FROM
tmp_table_info
WINDOW w AS (
PARTITION BY
db_name,
schema_name,
table_name
ORDER BY
collect_time DESC
ROWS BETWEEN
CURRENT ROW AND 6 FOLLOWING -- The differences between two consecutive days in the previous week.
)
) d -- The data size changes by day in the previous seven days.
ON t.db_name = d.db_name
AND t.schema_name = d.schema_name
AND t.table_name = d.table_name
WHERE
d.collect_time > (current_date-interval '1 day')::timestamptz -- Specify the previous day.
and d.collect_time < current_date::timestamptz
ORDER BY
t.total_storage_size desc,
d.db_name,
d.schema_name,
d.table_name;
Scenario 6: Query the sizes of tables that were least accessed in the previous week
-- Query the size of the top 10 tables that were least accessed in the previous week.
SELECT
*
FROM
(
SELECT
collect_time,
schema_name,
table_name,
hot_storage_size + cold_storage_size as total_storage_size,
first_value(total_read_count) OVER w - last_value(total_read_count) OVER w AS recent_read_count
FROM
hologres.hg_table_info
WHERE
collect_time > (current_date-interval '1 week')::timestamptz -- Specify the previous week.
AND type = 'TABLE'
AND (
hot_storage_size IS NOT NULL
OR cold_storage_size IS NOT NULL
)
WINDOW w AS (
PARTITION BY
db_name,
schema_name,
table_name
ORDER BY
collect_time DESC
ROWS BETWEEN
CURRENT ROW AND 6 FOLLOWING -- Specify the previous seven days.
)
) tmp
WHERE
collect_time > (current_date - interval '1 day')::timestamptz
and collect_time < current_date::timestamptz -- Only the total access data in the previous seven days is retained.
ORDER BY
recent_read_count
LIMIT
10;
Scenario 7: Query the number of rows that change on the day when the table was last modified
-- Query the time when the table was last modified, and the change in the number of rows compared with the modification before the last modification.
-- If the instance contains a large number of tables, we recommend that you filter the tmp_table_info table to prevent too long query time due to too large data.
with tmp_table_info AS (
SELECT
db_name,
schema_name,
table_name,
row_count,
collect_time,
last_modify_time
FROM
hologres.hg_table_info
WHERE
last_modify_time IS NOT NULL
AND type = 'TABLE'
-- Add filter conditions to limit the size of the tmp_table_info table.
-- Example 1: collect_time > (current_date - interval '14 day'):: timestamptz
-- Example 2: table_name like ''
-- Example 3: type = 'PARTITION'
)
SELECT
end_data.db_name AS db_name,
end_data.schema_name AS schema_name,
end_data.table_name AS table_name,
(end_data.row_count - start_data.row_count) AS modify_row_count,
end_data.row_count AS current_rows,
end_data.last_modify_time AS last_modify_time
FROM
(
SELECT
db_name,
schema_name,
table_name,
row_count,
last_modify_time
FROM
tmp_table_info
WHERE
collect_time > (current_date-interval '1 day')::timestamptz -- Query the last modification time of the table that was recorded in the previous day.
) end_data
left join (
SELECT
db_name,
schema_name,
table_name,
row_count,
collect_time
FROM
tmp_table_info
) start_data ON (
end_data.db_name = start_data.db_name
AND end_data.schema_name = start_data.schema_name
AND end_data.table_name = start_data.table_name
AND end_data.last_modify_time::date = (start_data.collect_time + interval '1 day')::date
)
;
Scenario 8: Query tables that occupy large disk space due to too many small files
-- Query the number of files for each table and the disk space occupied by the table and sort the tables based on the average file size.
-- The table_group parameter displays only the shard count of the current database. The parameter is left empty for other databases.
SELECT
db_name,
schema_name,
table_name,
cold_storage_size + hot_storage_size as total_storage_size,
cold_file_count + hot_file_count as total_file_count,
(cold_storage_size + hot_storage_size) / (cold_file_count + hot_file_count) AS avg_file_size,
tmp_table_info.table_meta ->> 'table_group' as table_group,
tg_info.shard_count
FROM
hologres.hg_table_info tmp_table_info
left join (
SELECT
tablegroup_name,
property_value as shard_count
FROM
hologres.hg_table_group_properties
WHERE
property_key = 'shard_count'
) tg_info on tmp_table_info.table_meta ->> 'table_group' = tg_info.tablegroup_name
WHERE
collect_time > (current_date - interval '1 day')::timestamptz
AND type = 'TABLE'
AND (
cold_storage_size IS NOT NULL
OR hot_storage_size IS NOT NULL
)
AND (
cold_file_count IS NOT NULL
OR hot_file_count IS NOT NULL
)
AND cold_file_count + hot_file_count <> 0
ORDER BY
avg_file_size
;