You can diagnose the causes of slow queries and failed queries by querying and analyzing slow query logs. Then, you can take measures to optimize queries. This topic describes how to query and analyze slow query logs in Hologres.

Limits

When you query slow query logs in Hologres, take note of the following limits:
  • This feature is supported only by Hologres V0.10 and later. You can check the version of your Hologres instance on the instance details page in the Hologres console. If your instance version is earlier than V0.10, submit a ticket to update your instance.
  • To ensure system stability and prevent traffic overloads, a maximum of 10,000 slow query logs are returned for each query.
  • You can query the logs of slow queries that are executed by using DML statements and all queries that are executed by using DDL statements. Supported DML statements include INSERT, SELECT, UPDATE, and DELETE. Supported DDL statements include CREATE, ALTER, DROP, and GRANT.
  • The return values of the memory_bytes, shuffle_bytes, and cpu_time_ms fields are not precise values. The return values are only for reference.
  • Logs of failed queries within a Hologres instance of V0.10 do not contain specific runtime statistics, such as the memory size, disk reads, CPU time consumed, the number of physical reads, and the value of the query_stats field.

Query the hologres.hg_query_log table

Slow query logs are stored in the hologres.hg_query_log system table. If the version of your Hologres instance is V0.10 or later, Hologres collects the logs of slow queries that are executed by using DML statements and consume more than 100 ms and all queries that are executed by using DDL statements. By default, only queries that consume more than 1 second are returned. If you want to query slow queries that consume 100 ms to 1 second, you can specify the configuration items described in the "Configuration items" section. The following table describes the fields that are contained in the hologres.hg_query_log table.
   Column           |           Type           |               Description                
--------------------+--------------------------+-----------------------------------------
 usename            | text                     | The name of the user who initiated the query.
 status             | text                     | The status of the query. The value indicates whether the query is successful.
 query_id           | text                     | The ID of the query.
 datname            | text                     | The name of the database for which data was queried.
 command_tag        | text                     | The type of the statement used in the query. Examples: INSERT, SELECT, UPDATE, DELETE, EXPLAIN, ANALYZE, BEGIN, COMMIT, ROLLBACK, CREATE TABLE, DROP TABLE, ALTER TABLE, COMMENT, CALL, and GRANT.
 duration           | integer                  | The amount of time consumed by the query, in milliseconds.
 message            | text                     | The returned error message.
 query_start        | timestamp with time zone | The time when the query was initiated.
 query_date         | text                     | The date when the query was initiated.
 query              | text                     | The statement used in the query.
 result_rows        | bigint                   | The number of rows that were returned.
 result_bytes       | bigint                   | The number of bytes that were returned.
 read_rows          | bigint                   | The number of rows that were read by the query.
 read_bytes         | bigint                   | The number of bytes that were read by the query.
 affected_rows      | bigint                   | The number of rows that were affected by the DML statement.
 affected_bytes     | bigint                   | The number of bytes that were affected by the DML statement.
 memory_bytes       | bigint                   | The peak memory usage of a single node. The value is not a precise value.
 shuffle_bytes      | bigint                   | The number of bytes that were shuffled. The value is not a precise value.
 cpu_time_ms        | bigint                   | The total CPU time, in milliseconds. The value is not a precise value.
 physical_reads     | bigint                   | The number of physical reads.
 pid                | integer                  | The process identifier (PID) of the query.
 application_name   | text                     | The type of the application involved in the query.
 engine_type        | text[]                   | The engine used in the query.
 client_addr        | text                     | The IP address from which the query was initiated.
 table_write        | text                     | The table to which data was written.
 table_read         | text[]                   | The table from which data was read.
 session_id         | text                     | The ID of the query session.
 session_start      | timestamp with time zone | The time when the query session started.
 command_id         | text                     | The ID of the command or statement.
 optimization_cost  | integer                  | The amount of time consumed to generate the execution plan of the query.
 start_query_cost   | integer                  | The amount of time consumed to start the query.
 get_next_cost      | integer                  | The amount of time consumed to execute the query.
 extended_cost      | text                     | The amount of time consumed by other actions of the query.
 plan               | text                     | The execution plan of the query.
 statistics         | text                     | The runtime statistics of the query.
 visualization_info | text                     | The visualization information about the execution plan.
 query_detail       | text                     | The extended information about the query in the JSON format.
 query_extinfo      | text[]                   | The extended information about the query in the form of an array.

Grant query permissions to users

Only authorized users can query slow query logs. When you grant query permissions to users, take note of the following rules:

  • A user can be granted permissions to query the slow query logs of all databases in a Hologres instance.
    • Grant the superuser permissions to the user.
      After the superuser permissions on a Hologres instance are granted to a user, the user can query the slow query logs of all databases in the Hologres instance.
      ALTER USER "Account ID" SUPERUSER;-- Replace Account ID with the Alibaba Cloud account ID of the user. If the user is a RAM user, prefix the account ID with p4_. 
    • Add the user to the pg_read_all_stats user group.
      In addition to superusers, users in the pg_read_all_stats user group have permissions to query the slow query logs of all databases in the relevant Hologres instance. If you are a superuser, you can add regular users to this user group. To add a user to this user group, execute one of the following statements:
      GRANT pg_read_all_stats TO "Account ID"; -- Use the standard PostgreSQL authorization model to grant related permissions to the user.
      CALL spm_grant('pg_read_all_stats', 'Account ID'); -- Use the simple permission model (SPM) to grant related permissions to the user.
      CALL slpm_grant('pg_read_all_stats', 'Account ID'); -- Use the schema-level permission model (SLPM) to grant related permissions to the user.
  • A user can be granted permissions to query the slow query logs of a specific database in a Hologres instance.
    Users in the db_admin user group have permissions to query the slow query logs of the relevant database. If you are a superuser, you can add regular users to this user group by using the SPM or SLPM.
    CALL spm_grant('<db_name>_admin', 'Account ID'); -- Use the SPM to grant related permissions to the user.
    CALL slpm_grant('<db_name>.admin', 'Account ID'); -- Use the SLPM to grant related permissions to the user.
  • A regular user has permissions to query the logs of slow queries that were initiated by the regular user in the database within the relevant Alibaba Cloud account.

View slow query logs in the HoloWeb console

You can view slow query logs in the HoloWeb console in a visualized manner.
Note Slow query logs within up to seven days can be queried in the HoloWeb console.
  1. Log on to the HoloWeb console. For more information, see HoloWeb quick start.
  2. In the top navigation bar, click Diagnostics and Optimization.
  3. In the left-side navigation pane, click Historical Slow Query.
  4. In the upper part of the Historical Slow Query page, set the query conditions.
    Historical Slow QueryThe following table describes the parameters.
    Parameter Required Description
    Instance Yes The name of the instance for which you want to query slow query logs. By default, the connected instance is selected.
    Database No The name of the database for which you want to query slow query logs. If you have the required permissions, you can query slow query logs of the specified database. Otherwise, you can query slow query logs only within your Alibaba Cloud account.
    Table No The name of the table for which you want to query slow query logs. If you have the required permissions, you can query slow query logs of the specified table. Otherwise, you can query slow query logs only within your Alibaba Cloud account.
    User No The Alibaba Cloud account within which you want to query slow query logs.
    List Limit Yes The number of slow query logs to be returned. A maximum of 2,000 slow query logs can be returned each time.
    Running No The execution duration of the query. By default, Hologres collects the logs of queries that consume more than 1 second.
    Metrics No The dimension from which the query trend is displayed. Valid values: Slow Query and Failed Query.
    Query No The statement used in the query. You can use the percent sign (%) to query all statements.
    Type No The type of the statement used in the query. The statement types in categories such as DDL and DML are supported.
    Time Yes The time range to query. Default value: 10 Minutes. The maximum time range in which slow query logs can be queried is seven days.
    PID No The ID of the query.
  5. Click Query and view the query results in the Query Trend Analysis and Queries sections.
    • Query Trend Analysis
      The Query Trend Analysis section displays the trends of slow queries or failed queries in the specified time range. This allows you to find the periods during which slow queries and failed queries frequently occurred and troubleshoot issues in a centralized manner. Query Trend Analysis
    • Queries
      The Queries section displays the details of slow queries and failed queries. Queries
      Parameter Description
      ID The ID of the Alibaba Cloud account used to execute the SQL statement.
      Database The name of the database for which data was queried.
      User Name The name of the user who executed the SQL statement.
      Type The type of the SQL statement used in the query.
      Query The statement used in the query. You can click the statement to view its details.
      Status The status of the query.
      Start Time The start time of the query.
      Duration The total execution duration of the query, including the optimization duration, the startup duration of the query, and the duration for returning the query result.
      Operation
      • Details: goes to the details page of the query. You can view the details of the query on this page. The details can help you optimize the SQL statement.
      • SQL Editor: opens a new SQL editor for the query.
      You can also click Customize Columns to select the fields to be displayed in the query list. Customize Columns

Analyze slow query logs

You can obtain slow query logs by querying the hologres.hg_query_log table. The following section describes the SQL statements that are commonly used to analyze slow query logs. You can perform the following operations:
  • Execute the following statement to query the total number of slow queries. By default, the number of slows queries for the last 30 days is returned.
    select count(*) from hologres.hg_query_log;
    The following sample code shows the returned result. The result indicates that 44 slow queries were initiated in the last 30 days. Slow queries are queries that consume more time than the specified threshold.
    count
    -------
        44
    (1 row)
  • Execute the following statement to query the number of slow queries initiated by each user:
    select username as "User",
           count(1) as "Number of slow queries"
    from hologres.hg_query_log
    group by usename
    order by count(1) desc;
    The following sample code shows the returned result. The value of the count(1) field indicates the number of slow queries initiated by each user.
    User                   | Number of slow queries
    -----------------------+-----
     1111111111111111      |  27
     2222222222222222      |  11
     3333333333333333      |   4
     4444444444444444      |   2
    (4 rows)
  • Execute the following statement to query the details of a specified slow query:
    select * from hologres.hg_query_log where query_id = '13001450118416xxxx';
    The following sample code shows the returned result. For more information about the fields, see Query the hologres.hg_query_log table.
           usename      | status  |      query_id      | datname| command_tag | duration | message |      query_start       | query_date |                                             query                                                  | result_rows | result_bytes | read_rows |read_bytes | affected_rows | affected_bytes | memory_bytes | shuffle_bytes | cpu_time_ms | physical_reads |   pid   | application_name | engine_type |  client_addr  | table_write | table_read |   session_id   |     session_start      | command_id | optimization_cost | start_query_cost | get_next_cost | extended_cost | plan | statistics | visualization_info | query_detail | query_extinfo
    -----------------------+---------+--------------------+---------+-------------+----------+---------+------------------------+------------+---------------------------------------------------------------------------------------------------------+-------------+--------------+-----------+------------+---------------+----------------+--------------+---------------+-------------+----------------+---------+------------------+-------------+---------------+-------------+------------+-----------------+------------------------+------------+-------------------+------------------+---------------+---------------+------+------------+--------------------+--------------+---------------
     p4_11111111111xxxx | SUCCESS | 13001450118416xxxx | dbname | SELECT      |      149 |         | 2021-03-30 23:45:01+08 | 20210330   | explain analyze select  * from tablename where user_id = '20210330010xxxx' limit 1000;             |        1000 |       417172 |         0 |         0 |            -1 |             -1 |     26731376 |     476603616 |      321626 |              0 | 1984913 | psql             | {HQE}      | 33.41.xxx.xxx |             |            | 6063475a.1e4991 | 2021-03-30 23:44:26+08 | 0          |                58 |               22 |            67 |               |      |            |                    |              |
    (1 row)
  • Execute the following statement to query the slow queries that consume a large amount of time in a recent time range, such as the last 10 minutes. You can specify a time range based on your business requirements.
    select status as "Status",
           duration as "Time consumed (ms)",
           query_start as "Started at",
           (read_bytes/1048576)::text || ' MB' as "Megabytes read",
           (memory_bytes/1048576)::text || ' MB' as "Memory",
           (shuffle_bytes/1048576)::text || ' MB' as "Shuffle",
           (cpu_time_ms/1000)::text || ' s' as "CPU time",
           physical_reads as "Physical reads",
           query_id as "QueryID",
           query::char(30)
     from hologres.hg_query_log
     where query_start >= now() - interval '10 min'
     order by duration desc,
              read_bytes desc,
              shuffle_bytes desc,
              memory_bytes desc,
              cpu_time_ms desc,
              physical_reads desc
     limit 100;
    The following sample code shows the returned result:
    Status    | Time consumed (ms) |    Started at            | Bytes | Memory  | Shuffle | CPU time | Physical reads |      Query ID       |             query
    ---------+---------+------------------------+--------+-------+---------+---------+------+--------------------+--------------------------------
     SUCCESS |  149    | 2021-03-30 23:45:01+08 | 0 MB   | 25 MB | 454 MB  | 321 s   |    0 | 13001450118416xxxx | explain analyze select  * from
     SUCCESS |  137    | 2021-03-30 23:49:18+08 | 247 MB | 21 MB | 213 MB  | 803 s   | 7771 | 13001491818416xxxx | explain analyze select  * from
     FAILED  |   53    | 2021-03-30 23:48:43+08 | 0 MB   | 0 MB  | 0 MB    | 0 s     |    0 | 13001484318416xxxx | select ds::bigint / 0 from pub
    (3 rows)
  • Execute the following statement to query the slow queries that consume a large amount of time at different query stages in a recent time range, such as the last 10 minutes. You can specify a time range based on your business requirements.
    select status as "Status",
           duration as "Time consumed (ms)",
           optimization_cost as "Optimization time (ms)",
           start_query_cost as "Startup time (ms)",
           get_next_cost as "Execution time (ms)",
           duration-optimization_cost-start_query_cost-get_next_cost as "Others (ms)",
           query_id as "QueryID",
           query::char(30)
     from hologres.hg_query_log
     where query_start >= now() - interval '10 min'
     order by duration desc,
              start_query_cost desc,
              optimization_cost,
              get_next_cost desc,
              duration-optimization_cost-start_query_cost-get_next_cost desc
     limit 100;
    The following sample code shows the returned result:
    Status     | Total time (ms) | Optimization time (ms) | Startup time (ms) | Execution time (ms) | Others (ms) |      Query ID       |             query
    ---------+----------+--------------+--------------+--------------+--------------+--------------------+--------------------------------
     SUCCESS |     4572 |          521 |          320 |         3726 |            5 | 6000260625679xxxx  | -- /* user: wang ip: xxx.xx.x
     SUCCESS |     1490 |          538 |           98 |          846 |            8 | 12000250867886xxxx | -- /* user: lisa ip: xxx.xx.x
     SUCCESS |     1230 |          502 |           95 |          625 |            8 | 26000512070295xxxx | -- /* user: zhang ip: xxx.xx.
    (3 rows)
  • Execute the following statement to query the first failed queries:
    select status as "Status",
           regexp_replace(message, '\n', ' ')::char(150) as "Error message",
           duration as "Time consumed (ms)",
           query_start as "Started at",
           query_id as "QueryID",
           query::char(100) as "Query"
     from hologres.hg_query_log
     where query_start between '2021-03-25 17:00:00'::timestamptz
                       and '2021-03-25 17:42:00'::timestamptz + interval '2 min'
       and status = 'FAILED'
     order by query_start asc
     limit 100;
    The following sample code shows the returned result:
    Status    |                                                                     Error message                                                                           | Time consumed (ms)  |        Started at        |     Query ID       | Query
    --------+--------------------------------------------------------------------------------------------------------------------------------------------------------+-------+------------------------+-------------------+-------
     FAILED | Query:[1070285448673xxxx] code: kActorInvokeError msg: "[holo_query_executor.cc:330 operator()] HGERR_code XX000 HGERR_msge internal error: status { c |  1460 | 2021-03-25 17:28:54+08 | 1070285448673xxxx | S...
     FAILED | Query:[1016285560553xxxx] code: kActorInvokeError msg: "[holo_query_executor.cc:330 operator()] HGERR_code XX000 HGERR_msge internal error: status { c |   131 | 2021-03-25 17:28:55+08 | 1016285560553xxxx | S...
    (2 rows)

Configuration items

Hologres provides the following configuration items that allow you to configure the collection of slow query logs:
  • log_min_duration_statement
    • Description
      This configuration item is used to specify a threshold of the amount of time consumed by queries for log collection. By default, this configuration item returns only queries that consume 1 second and longer. You can modify this configuration to query slow queries that consume a different amount of time. The minimum value is 100 ms. You can use this configuration item to specify the scope of slow query logs to be collected. If the amount of time consumed by a slow query is greater than or equal to the specified value, logs of the slow query are collected. The logs contain information such as the SQL statement and the amount of time consumed. A value of -1 indicates that no slow query logs are collected.
      Note Only superusers can modify this configuration item.
    • Examples
      • Execute the following statement to collect the logs of slow queries that consume more than 10 seconds for the current database as a superuser:
        ALTER DATABASE dbname SET log_min_duration_statement = '10s';
      • Execute the following statement to collect the logs of slow queries that consume more than 10 seconds for the current session as a regular user:
        SET log_min_duration_statement = '10s';
  • log_min_duration_query_stats
    • Description
      This configuration item is used to specify a threshold of the amount of time consumed by queries for the collection of runtime statistics. By default, this configuration item returns slow queries that consume more than 10 seconds. You can use this configuration item to specify the scope of slow queries for which runtime statistics are to be collected. If the amount of time consumed by a slow query is greater than or equal the specified value, the runtime statistics of the slow query are collected. The runtime statistics include the statistics of all query stages. A value of -1 indicates that no runtime statistics of slow queries are collected. Such statistics require large storage space. If Hologres collects more runtime statistics, the query and analysis of slow query logs may slow down. Therefore, we recommend that you specify a threshold of at least 10 seconds in general scenarios. Specify a threshold less than 10 seconds only when you troubleshoot specific issues.
      Note Regular users can modify this configuration item based on business requirements.
    • Examples
      • Execute the following statement to collect the runtime statistics of slow queries that consume more than 10 seconds for the current database as a superuser:
        ALTER DATABASE dbname SET log_min_duration_query_stats = '10s';
      • Execute the following statement to collect the runtime statistics of slow queries that consume more than 10 seconds for the current session as a regular user:
        SET log_min_duration_query_stats = '10s';
  • log_min_duration_query_plan
    • Description
      This configuration item is used to specify a threshold of the amount of time consumed by queries for the collection of execution plans. By default, Hologres collects the execution plans of slow queries that consume more than 10 seconds. If the amount of time consumed by a slow query is greater than or equal the specified value, the execution plan of the slow query is collected. By default, you can execute the EXPLAIN statement to query execution plans in real time and you do not need to collect such data. A value of -1 indicates that no execution plans of slow queries are collected.
      Note Regular users can modify this configuration item based on business requirements.
    • Examples
      • Execute the following statement to collect the execution plans of slow queries that consume more than 10 seconds for the current database as a superuser:
        ALTER DATABASE dbname SET log_min_duration_query_plan = '10s';
      • Execute the following statement to collect the execution plans of slow queries that consume more than 10 seconds for the current session as a regular user:
        SET log_min_duration_query_plan = '10s';

Troubleshooting

  • Problem description:

    In Hologres V1.1, detailed statistics such as the numbers of returned rows and read rows cannot be returned in queried slow query logs.

  • Possible cause:

    The collected slow query logs are incomplete.

  • Solution:
    In Hologres V1.1.36 to V1.1.49, you can use a Grand Unified Configuration (GUC) parameter to return complete slow query logs. Complete slow query logs are returned for Hologres V1.1.49 and later.
    Note If the version of your Hologres instance is earlier than V1.1.36, submit a ticket to update your instance.
    -- Return detailed statistics in the slow query logs of a database. You need to execute the statement only once for each database. This operation is recommended. 
    alter databse <db_name> set hg_experimental_force_sync_collect_execution_statistics = on;
    
    -- Return detailed statistics in the slow query logs of the current session.
    set hg_experimental_force_sync_collect_execution_statistics = on;
    The db_name parameter specifies the name of the database.