Hologres creates a new system view called hologres.hg_worker_info. This view allows you to query the number of workers, table groups, and shards and the shard allocation in the current database, which helps you determine whether the resources are unevenly allocated. This topic describes how to use hologres.hg_worker_info to query the resource allocation status.

Background information

For information about the concepts and relationships of workers, table groups, and shards in Hologres, see Architecture and Basic concepts. The shard count in a table group is a multiple of the number of workers. If the shard count is not a multiple of the number of workers, the shard count allocated to each worker varies. This causes load imbalance among workers and leads to inefficient use of resources. At the same time, worker-level monitoring metrics are displayed in the Hologres console. To allow you to check whether the resources are unevenly allocated, Hologres V1.3 and later provide the hologres.hg_worker_info system view for you to query the number of workers, table groups, and shards and the shard allocation in the current database.

Limits

  • Only Hologres V1.3.23 and later support the hologres.hg_worker_info system view. You can view the version of your Hologres instance on the instance details page in the Hologres console. If the version of your Hologres instance is earlier than V1.3.23,
  • hologres.hg_worker_info shows the real-time allocation of shards among workers. You cannot query historical data.
  • A new table group obtains worker_id with a delay of 10 to 20 seconds. If you query the system view immediately after you create a table group, the field of worker_id may be empty.
  • If no table exists in a table group, resources cannot be allocated to workers. The field of worker_id in the query result is displayed as id/0.
  • You can query only the information of workers, table groups, and shards in the current database. You cannot query the information across databases.

Description

The following table describes the fields that are contained in the hologres.hg_worker_info system view.
FieldTypeDescription
worker_idTEXTThe ID of the worker to which the current database belongs.
table_group_nameTEXTThe name of the table group in the current database.
shard_idBIGINTThe ID of the shard in the table group.
Execute the following statement to query the hologres.hg_worker_info system view and the shard allocation among workers:
select * from hologres.hg_worker_info;
The following sample code shows the query result.
Note In the query result, xx_tg_internal is the built-in table group of an instance, which is used to manage metadata. You can ignore this table group.
 worker_id  | table_group_name | shard_id
------------+------------------+----------
 bca90a00ef | tg1              |        0
 ea405b4a9c | tg1              |        1
 bca90a00ef | tg1              |        2
 ea405b4a9c | tg1              |        3
 bca90a00ef | db2_tg_default   |        0
 ea405b4a9c | db2_tg_default   |        1
 bca90a00ef | db2_tg_default   |        2
 ea405b4a9c | db2_tg_default   |        3
 ea405b4a9c | db2_tg_internal  |        3
                    

Best practice: Troubleshoot the issue of uneven allocation of computing resources (load imbalance among workers)

In Hologres, data is distributed among shards. A worker may access the data of one or more shards during computing. In each instance, a shard can be accessed only by one worker at a time. If the total number of shards accessed by each worker varies, loads on workers are unbalanced. In this case, the CPU Utilization (%) metric of Worker Nodes shows that the CPU utilization of one or more workers is low. The following figure shows an example of the load imbalance among workers. Load imbalance among workersCPU load imbalance among workers can result from various causes. You can use the hologres.hg_worker_info system view for further analysis of the issue. The following section describes possible causes of and solutions to the issue:
  • Cause 1: Shards are unevenly allocated to workers after a worker fails over.
    According to Basic concepts, if a worker fails over due to reasons such as out of memory (OOM), the system allocates the corresponding shards to other workers to recover queries. After the terminated worker is recovered, the system allocates some shards to this worker again. This causes uneven allocation of shards among workers. In the hologres.hg_worker_info system view, you can query the shard count allocated to each worker in the current database. This way, you can check whether computing resources are unevenly allocated. However, compute workers are shared by an instance. Therefore, if you want to check the resource allocation, you need to query the shard count allocated to each worker in all databases to obtain the total number of shards allocated to each worker in the instance, and then check whether computing resources are unevenly allocated.
    • Sample statement:
      select worker_id, count(1) as shard_count from hologres.hg_worker_info group by worker_id;
    • Sample result:
      -- Assume that the instance has one database.
       worker_id  | shard_count
      ------------+-------------
       bca90a     |      4
       ea405b     |      4
       tkn4vc     |      4
       bqw5cq     |      3
       mbbrf6     |      3
       hsx66f     |      1
      (6 rows)
    • Result interpretation:

      The instance has six workers, and the shard count allocated to the six workers is different. When you view the monitoring metrics in the Hologres console, you can find that the CPU utilization of workers with fewer shards is lower than that of other workers. This indicates that the computing resources of the instance are unevenly allocated.

    • Solution:

      Restart the instance so that shards are reallocated among compute workers. This ensures the even allocation of shards among workers. If you do not restart the instance, when another worker fails over, more resources are allocated to idle workers.

  • Cause 2: Computing resources are unevenly allocated due to data skew.
    If business data is severely skewed, the data is distributed on some fixed shards. When you query data, workers access the fixed shards. This causes CPU load imbalance among workers. You can use the hologres.hg_worker_info and hologres.hg_table_properties system views to query the allocation of shards among workers and check whether computing resources are unevenly allocated due to data skew based on worker id corresponding to the skewed data in the table. To do so, perform the following steps:
    1. Query the data skew.
      Execute the following SQL statement to check whether the table contains skewed data. If the count value of a shard is much larger than that of other shards, data skew exists.
      select hg_shard_id,count(1) from <table_name> group by hg_shard_id;
      
      -- Sample result: The count value of shard 39 is larger than that of other shards. Data skew exists.
      hg_shard_id | count
      -------------+--------
                53 |  29130
                65 |  28628
                66 |  26970
                70 |  28767
                77 |  28753
                24 |  30310
                15 |  29550
                39 | 164983
    2. Query the corresponding worker_id based on the ID of the shard specified by the hg_shard_id field.
      In the previous step, you find out the skewed shard. You can use the hologres.hg_worker_info and hologres.hg_table_properties system views to query worker_id corresponding to the shard specified by the hg_shard_id field. The following statement provides an example:
      SELECT distinct b.table_name, a.worker_id, a.table_group_name,a.shard_id from
      hologres.hg_worker_info a
      join (SELECT property_value, table_name
      FROM hologres.hg_table_properties
      WHERE property_key = 'table_group') b
      on a.table_group_name = b.property_value and b.table_name = '<tablename>' and shard_id=<shard_id>;
      
      -- Sample result
      table_name  | worker_id  | table_group_name  | shard_id
      ------------+------------+-------------------+------------------
       table03    | bca90a00ef | db2_tg_default    | 39
      According to worker_id in the query result and the CPU Utilization (%) metric of Worker Nodes, if the CPU utilization of the worker is much higher than that of other workers, computing resources are unevenly allocated due to data skew.
    Solution:
    1. Set a proper distribution key to distribute data evenly among shards. For more information, see Optimize the performance of internal table queries.
    2. If business data is severely skewed, you can split the table into multiple tables. For example, the gross merchandise volume (GMV) in the table of orders placed in live streaming may be obviously different among streamers.
  • Cause 3: The shard count is not set properly.
    We recommend that you set the shard count to a multiple of the number of workers to balance the loads among workers. If the shard count is not set properly, the loads among workers may be unbalanced. You can use the hologres.hg_worker_info system view to check whether the shard count in table groups is set properly in the current database.
    • Sample statement:
      select table_group_name, worker_id, 
      count(1) as shard_count from hologres.hg_worker_info 
      group by table_group_name, worker_id order by table_group_name desc;
    • Sample result:
       table_group_name | worker_id  | shard_count
      ------------------+------------+-------------
       tg2              | ea405b4a9c |           1
       tg2              | bca90a00ef |           2
       tg1              | ea405b4a9c |           5
       tg1              | bca90a00ef |           6
       db2_tg_default   | bca90a00ef |           4
       db2_tg_default   | ea405b4a9c |           4
       db2_tg_internal  | bca90a00ef |           1
      (7 rows)
    • Result interpretation (assume that the instance has two workers):
      • The tg2 table group has three shards. One worker is assigned one shard less than the other. If the performance does not meet expectations, we recommend that you balance the shard count among workers or scale up the instance.
      • The tg1table group has 11 shards. One worker is assigned one shard less than the other. If the performance does not meet expectations, we recommend that you balance the shard count among workers or scale up the instance.
      • The default table group db2_tg_default contains eight shards. The shards are evenly allocated to workers.
    • Solution:

      If load imbalance among workers is caused by inappropriate shard allocation, you can evaluate the shard count setting based on your business requirements and set the shard count to a multiple of the number of workers.