Hologres is an all-in-one distributed real-time data warehouse engine that has high performance and separates computing from storage. It stores data on shards that reside in the underlying storage system. This topic describes table groups and shard counts in Hologres.

In Hologres, a database contains one or more table groups. Each table group contains multiple tables, and each table can belong to only one table group. A table group matches a group of shards, which are responsible for storing data to and querying data in the tables in this table group. The number of shards for a table group is called the shard count. This number cannot be changed after a table group is created.

You can improve the data storage and computing efficiency by specifying an appropriate table group and an appropriate shard count. Inappropriate settings for the table group and shard count result in degraded performance, and the performance cannot be adjusted to optimal.

Table group

The data in a table is stored in a fixed group of shards. When data is written to a Hologres table, the data is distributed to specific shards based on the distribution key. When a table is created, a group of shards responsible for storing data in the table is allocated. This group of shards is called a table group. The shards responsible for storing data in the table at the underlying layer are those originally allocated until the table group is deleted or the table is moved to another table group. Comply with the following rules when you use table groups:
  • Data of multiple tables can be stored in the same group of shards, namely, the same table group.
  • A table group must contain one or more tables. If no table exists in a table group, it is automatically deleted.
  • A table can belong to only one table group. You cannot move a table to another table group unless you recreate the table or explicitly call the function that is used to move a table to another table group.
Table group is a logical storage concept specific to Hologres. PostgreSQL does not have this concept. Table group differs from tablespace in PostgreSQL. A tablespace uniquely identifies the storage location of database objects and is similar to a directory. A table group represents a group of underlying logical shards.

The following figures provide a clear view of table group for you.

  • Difference between table group and schema: Schema is a standard database concept, whereas table group is not a standard database concept, but a logical storage concept. Tables in different schemas can belong to the same table group. In other words, the same group of shards is used to store data in these tables at the underlying layer. 861
  • Relationship between different table groups: For example, Table Group 1 has five shards, and Table Group X has two shards. Shards are not shared between the two table groups, and each shard has a unique ID in an instance. 862
  • The following figure shows how computing and storage work at the underlying layer of Hologres. In Hologres, computing and storage are separated. Data is stored in distributed storage rather than on compute workers. In Hologres, each compute worker has multiple actors, each of which uniquely corresponds to a shard for data storage. Actors are responsible for reading, writing, and managing data in the corresponding shards in a one-to-one manner. A table group can be considered as a group of actors or a group of shards due to the one-to-one correspondence between actors and shards. 863

Shard count

The shard count of a table group is an important property of the table group. You must specify the shard count when you create a table group, and you cannot change the shard count later. You can change the shard count only by creating another table group.

If a table group has a large shard count, data can be written, queried, and analyzed at high concurrency. Therefore, increasing the shard count can speed up data write, query, and analysis to some extent. However, a larger shard count does not necessarily bring better performance. More communications and computing resources and larger memory space are required for increased shards. From this perspective, if resources are insufficient or only a small amount of data is queried, increasing the shard count may lead to the opposite effect.

In Hologres, the lower limit of the shard count is 1. If the data amount is only hundreds or thousands of records, you can set the shard count to 1. In principle, the upper limit of the shard count is the total number of computing cores of your instance. This ensures that each shard can occupy at least one core for computing. The total number of computing cores of the instance is about 60% of the total number of cores of the instance, and some resources are used in processes such as frontend request processing, foreign table query, cluster management, and metadata management. If the shard count exceeds the number of computing cores, some shards cannot be allocated CPU resources all the time during the query. This may cause long tails and failover overheads.

Similar to the shard count, the number of table groups must be set to an appropriate value. A shard occupies specific memory space regardless of whether it is in use, to store information such as memory tables and schemas. If data is written to the tables, the shard occupies more memory space. The total number of shards in your instance increases with the number of table groups, which results in larger memory space. In addition, if you want to perform local join operations on multiple tables that are related to each other, these tables must be added to the same table group.

From the perspective of a disk, for the same table, if more shards exist, the data is more scattered, small files are more likely to appear, and the number of files is greater. If large numbers of tables and shards exist, the number of files is very large. Consequently, more overheads are required during query and failover, query I/O is increased, and the amount of recovery time is larger.

Default shard count

A default table group is created for each database in a Hologres instance. When you create the first table in a created database without specifying the table group to which the table is to be added, Hologres automatically creates a default table group named {DBname}_tg_default. After the default table group is created, newly created tables are added to it if no table group is specified.

The following table describes the default shard counts of the default table group. These default shard counts are optimal values that have been verified by large-scale experiments and can handle different types of queries of different data amount ranges in most scenarios.
Number of cores of the instance Default shard count Recommended maximum shard count (implemented by multiple table groups)
32 20 20
64 40 40
96 60 80
128 80 100
160 80 120
192 80 160
256 120 220
400 160 360
512 160 460

Note: When an instance is upgraded or downgraded, the default table group of an existing database and the shard count of the default table group do not change. When you create a database, the shard count of the default table group is the default shard count for the new instance type, as shown in the preceding table. Practice shows that if the instance is scaled up by less than five times, you do not need to change the default shard count of the existing database unless otherwise required.