This topic describes the basic principles, scenarios, policies, and FAQ of specifying the table group and shard count in Hologres for optimized query performance.

Usage notes

Hologres allows you to specify table groups and set the shard count as required. Compared with similar services, Hologres is more flexible and easier to use because it allows you to customize schemas based on specific business scenarios. With a good understanding of your own business, you can make full use of Hologres to achieve high performance. When you specify a table group, we recommend that you comply with the following rules:
  • If unnecessary, do not create a table group.
  • If the data amount is large, create an independent table group with a large shard count.
  • Create an independent table group with a small shard count for a group of tables with a small data amount. This reduces the query startup overhead and the number of small files.
  • Try best to add the tables involved in local join operations to the same table group. Local join operations in which tables that are not added to the same table group are involved cannot be optimized.
  • The number of shards must be proportional to the number of compute nodes to ensure that the shards can be evenly distributed in the compute nodes. Otherwise, compute nodes that have excessive shards may have heavy loads, and therefore have unstable performance.
The settings of the table group deeply rely on the data amount, query characteristics, available resources, and focus of usage of your business. For example, if your Hologres instance is used only to accelerate the query of MaxCompute data mapped to foreign tables, table groups suitable for internal tables cannot be used.

Scenarios

By default, a database has one table group in a Hologres instance, and table groups for different instance types are configured with different shard counts. For more information, see Instance types. The default shard counts are optimal values that have been verified by large-scale experiments and can meet the requirements in most scenarios. If you have no special requirements, we recommend that you do not create a table group or change the shard count.

To perform the best practices in this topic, you must have a good understanding of the implementation and architecture of Hologres and be experienced in programming. These best practices are applicable to the following scenarios of modifying the table group and shard count:

  1. If a large amount of data is to be written, the query startup overhead is too large for a small amount of data, or the query shuffle overhead is too large, we recommend that you create a table group. The query startup overhead is positively related to the shard count.
  2. You can greatly improve the join efficiency and performance by adding two or more tables with the join key as the distribution key to a table group and performing local join operations in the table group.

Recommended instance types

In practice, you can estimate the data amount and want to determine the most suitable range of the shard count. The most suitable shard count is not only related to the amount of data to be stored, but also related to the actual access frequency, the actual amount of accessed data, the computing load type such as point query or analysis, the write throughput, and the number of tables in the table group. Therefore, you cannot obtain the accurate value for the most suitable shard count. The following table describes the recommended shard counts and instance types that correspond to specific data amount ranges. You can select suitable settings based on the estimated data amount and recommended configurations.
Data amount Recommended shard count Recommended instance type
Less than 40 million rows 10 to 20 More than 32 cores
40 million to 0.4 billion rows 20 to 40 More than 64 cores
0.4 billion to 4 billion rows 40 to 80 More than 128 cores
4 billion to 40 billion rows 80 to 240 More than 256 cores. In this case, you may create a table group.
40 billion to 400 billion rows 160 to 400 More than 512 cores. In this case, you may create multiple table groups.
Note The recommended shard counts and instance types that correspond to specific data amount ranges in the preceding table are not the only criteria. A table whose data amount is small can also be added to a table group that has a large shard count. A table whose data amount is large can also be added to a table group that has a single shard. You must select an appropriate shard count based on your business scenario to implement high concurrency, high computing efficiency, and high data concentration, and prevent unnecessary shuffle overheads.
You can determine whether to create a table group and how to select a table group for your tables based on the best practices in the following plans:
  • Plan 1: Use the default table group
    If you use Hologres to meet the following conditions, we recommend that you directly use the default table group. After you upgrade or downgrade the specifications of a Hologres instance, the shard count of the default table group does not change. You can execute the following statement to query the shard count:
    SELECT * FROM hologres.hg_table_group_properties;
    
    -- Sample result 
     tablegroup_name | property_key  | property_value
    -----------------+---------------+----------------
     test_tg_default | is_default_tg | 1
     test_tg_default | shard_count   | 40
     test_tg_default | tg_version    | 1
     test_tg_default | table_num     | 1
    (4 rows)
    • Data amount

      If the shard count of the default table group meets the requirement of the data amount, you can use the default table group to directly create tables.

    • Overall size

      The total data amount of all tables is controllable and predictable. The usage mode does not significantly change.

    • Local join

      You need to perform efficient local join operations on tables in the default table group.

  • Plan 2: Create a table group

    If the default table group cannot meet your requirements, you may need multiple table groups. Typically, multiple table groups may be required in your instance under the following conditions:

    • Data amount

      The shard count of the existing table group is inappropriate for the estimated data amount in the current table. If the shard count is large and the data amount is small, excessive small files exist and more I/O overheads are consumed. If the shard count is small and the data amount is large, the concurrency of queries is reduced. This is the most common reason why you need multiple table groups.

    • Independent load

      Existing table groups contain a large number of tables, and data needs to be simultaneously written to most of the tables. Consequently, the load of your instance is high. In addition, the table to be created requires high query and write throughput. In this case, multiple table groups can make data write and query of this table independent of other tables to some extent. For more information, see topics about how to isolate computing resources. Alternatively, if you troubleshoot the issue and determine that the existing table groups cannot meet the write and query requirements, multiple table groups are also required.

    • Table correlation

      If a set of tables in your business have a unique data write or query pattern, have or will have local join requirements, and have few or no correlation with tables in existing table groups, you can create multiple independent table groups for the set of tables. Local join operations can be performed only on tables with the join key as the distribution key and in the same table group. In other words, you can create multiple table groups for a set of tables that have a strong correlation with each other but have few correlation and a low probability of local join with tables in existing table groups.

    • Instance resource scaling

      If your instance has been scaled in or out more than five times, the original shard count may no longer meet the requirements. In this case, you can replace the default table group. The number of shards must be greater than the number of compute nodes and less than 60% of the total number of CPU cores.

  • Plan 3: Add tables to multiple table groups

    If you need to plan multiple table groups, we recommend that you plan the roles and significance of the table groups and the table group to which each table belongs before stress testing and production. You can consider the following factors during planning:

    • Data amount

      The shard count is determined by the amount of data to be stored in a table. Specifically, a table group that has a large shard count is appropriate for large tables, and a table group that has a small shard count is appropriate for small and medium tables.

    • Required write performance

      The shard count has a positive correlation with data write performance. The write capability of a single shard has an upper limit. More shards indicate higher write concurrency and throughput. If you need to write data to a table at a high records per second (RPS), a larger shard count may be required. If the CPU utilization for a single core is 100%, a single shard in Hologres writes data at an RPS of 3,000 to 5,000 (1 KB per record). You can estimate the required shard count based on your required RPS. Each shard also needs to perform read operations such as data query. Therefore, the CPU utilization for data write cannot reach 100%. A shard that uses a 1/3 CPU core writes data at an RPS of 1,000 (1 KB per record). For example, if you want to write data at an RPS of 60,000 and the size of each record is 1KB, the shard count must be greater than 60. The shard count can be fine-tuned.

    • Load of each table group

      When you create a table group, you must consider the number of tables to be added to this table group. If a large number of tables will be added to this table group in the future, and most of the tables are frequently queried, a small shard count may fail to support highly concurrent queries.

FAQ

  • I have an instance that has 512 cores and use it to perform online analytical processing (OLAP) on a real-time event table. The table contains about 20 billion to 40 billion rows. In this case, how do I specify the table group and shard count?

    A single computing load is involved so that you can use one table group. The default shard count for an instance that has 512 cores is 160. If the event table contains a large number of columns, such as hundreds of columns, you can appropriately increase the shard count to improve OLAP concurrency. For example, change the shard count of the default table group in the database to 200 or more to store the event table.

  • I have an instance that has 256 cores and a large number of column-oriented tables, and use the instance to perform fast OLAP in milliseconds. Each table contains tens of millions of rows. I want to group multiple fields, and query details based on specific conditions. How do I specify the table group and shard count?

    A single computing load is involved so that you can use one table group. The default shard count for an instance that has 256 cores is 120. For the table that contains tens of millions of rows, we recommend that you specify 10 to 20 shards. Especially for aggregation operations such as grouping, more shards cause more shuffle overheads. Consequently, analysis in milliseconds cannot be implemented. Therefore, the default table group may not be able to meet your requirements. To achieve a better effect, you can change the shard count of the default table group in the database to a value between 16 and 40 based on the specific situation, and perform a stress test.

  • How do I check whether slow query is caused by an inappropriate shard count?
    For an inappropriate shard count, the shard count may be too large or too small.
    • If the shard count is too large, the query startup overhead or shuffle overhead is large. The query startup overhead can be learned from the start query cost line in the execution result of the EXPLAIN ANALYZE statement, while the shuffle overhead can be determined based on the Max_GetNext_Time size of Redistribution Motion in the execution result of the EXPLAIN ANALYZE statement. In Hologres V0.10 or later, you can view these overheads of historical queries in slow query logs.
    • If the shard count is too small, the CPU utilization cannot reach 100% during long-term computing; or the overhead of scanning data is large because of insufficient concurrency; or the data write performance is poor. The overhead of scanning data can be determined based on the Max_GetNext_Time size of Scan Node in the execution result of the EXPLAIN ANALYZE statement. If the CPU utilization for a single core is 100%, a single shard in Hologres writes data at an RPS of 3,000 to 5,000. You can compare the actual RPS with this range to assess whether the shard count is too small.
  • Are queries per second (QPS) not high enough in point query scenarios due to insufficient shards?

    First of all, determine whether another cause exists. For example, analysis query rather than point query is performed, no indexes are used, shards are not split, or the CPU utilization reaches 100%. After the troubleshooting is complete, if other possible causes do not result in the issue, a single SQL statement achieves the highest performance, and the QPS still does not meet your requirement, you can increase the shard count to increase the backend concurrency of the point query.

  • How do I troubleshoot data skew of a shard?
    Hologres provides an internal field hg_shard_id. The field specifies the number of the shard in which data resides. You can execute an SQL statement to check whether data skew exists in a shard.
    SELECT hg_shard_id, COUNT(1) FROM tbl1
    GROUP BY hg_shard_id ORDER BY COUNT(1) DESC;
    If the amount of data in a shard is significantly larger than that in other shards, data skew exists, and you may need to adjust the distribution key.