This topic describes the basic principles and policies for specifying the table group and shard count in Hologres for optimized query performance.

Background

Hologres is a distributed real-time data warehouse engine with high performance that separates computing from storage. It stores data on shards located in the underlying storage system.

In Hologres, a database contains zero or more table groups. Each table group contains multiple tables, and each table can belong to only one table group. A table group matches a group of shards, which are responsible for storing data to and querying data in the tables in this table group. The number of shards for a table group is called the shard count. This number cannot be changed once a table group is created.

You can improve the data storage and computing efficiency by specifying the appropriate table group and shard count. Inapproprtiate settings for the table group and shard count will result in degraded performance, which cannot be adjusted to optimal.

Scenarios

To perform this best practice, you must have a good understanding of the implementation and architecture of Hologres and be experienced in programming. The applicable scenarios are as follows:

  1. The default table group in Hologres is suitable for 90% of common scenarios. If you have no special requirements or are unclear about the benefits and risks of a new table group, we recommend that you do not create a new one.
  2. If you have a large amount of data to be written or the query startup overhead for a small amount of data is too large, we recommend that you create a new table group. The query startup overhead is positively related to the shard count.
  3. By adding two or more tables with the join key being the distribution key to a table group and performing local join operations in the table group, you can greatly improve the join efficiency, thereby producing optimal performance.

Concepts

The data in a table is stored in a fixed set of shards. When data is written to a Hologres table, the data is distributed to specific shards based on the distribution key. When you create a table in Hologres, a fixed set of shards are assigned to this table. This set of shards is called a table group, which is fixed and cannot be changed.

Data of multiple tables can be stored in the same set of shards. A table group must contain one or more tables. If no table exists in a table group, it is automatically deleted. A table can belong to only one table group. You cannot move a table from one table group to another.

Table group is a logical storage concept specific to Hologres. PostgreSQL does not have this concept. Table group differs from tablespace in PostgreSQL in that a tablespace uniquely identifies the storage location of database objects, whereas a table group represents a set of underlying logical shards. A tablespace is similar to a directory.

Figure 1 shows table groups and schemas in an instance. Schema is a standard database concept, whereas table group is a logical storage concept. Tables with different schemas can locate in the same table group. That is, data in these tables is stored in the same set of shards at the underlying layer.

Figure 1. Figure 1 Table groups and schemas
Figure 1 Table groups and schemas

Figure 2 shows two table groups with five shards and two shards, respectively. Shards cannot be shared between table groups, and each shard has a unique ID in an instance.

Figure 2. Figure 2 Table groups with five shards and two shards
Figure 2 Table groups with five shards and two shards

Figure 3 shows how computing and storage work at the underlying layer of Hologres. Hologres uses a compute-storage separation architecture. Data is stored in distributed storage instead of on compute workers. In Hologres, each compute worker has several actors, each of which uniquely maps a shard for data storage. Actors are responsible for reading, writing, and managing data in the mapping shards in a one-to-one manner. Because of this one-to-one mapping between actors and shards, a table group can be considered as a set of actors or a set of shards.

Figure 3. Figure 3 One-to-one mapping between worker actors and table group shards
Figure 3 One-to-one mapping between worker actors and table group shards

The shard count is an important attribute, indicating the number of shards assigned to a table group. The shard count is fixed and cannot be changed after a table group is created. If a table group has a large number of shards, data can be written, queried, and analyzed at a high concurrency. Therefore, increasing the shard count can speed up data writing, query, and analysis to a certain extent. However, a larger shard count does not necessarily bring better performance. More communications and computing resources and a larger memory space are required for increased shards. From this perspective, increasing the shard count may lead to the opposite effect if resources are insufficient or only a small amount of data is queried.

Hologres does not create a default table group for a database immediately after the database is created. The default table group named {DBNAME}_tg_default is created only when you create a table and do not specify a table group for this table. After the default table group is created, newly created tables are added to it if no table group is specified. The number of shards in the default table group equals the total number of CPU cores in the compute nodes, that is, 20 times the number of workers. This is an empirical value produced based on experience, which is suitable for tables with diversified data amounts.

Similar to the shard count, the number of table groups must be set to an appropriate value. A shard occupies certain memory space, regardless of whether it is in use. The total number of shards increases with the number of table groups, which results in larger memory space. In addition, if you want to perform local join operations on multiple tables that are related with each other, these tables must locate in the same table group.

Specify the table group when creating a table

Is a new table group required? Which table group is most appropriate for my table? How many shards are appropriate for a new table group? The following figure shows the logic among the three questions.

Figure 4. Figure 4 Specify the table group and shard count in three steps
Figure 4 Specify the table group and shard count in three steps
  1. Is a new table group required?

    When you create a table, you must determine whether to add the table to an existing or a new table group.

    In most cases, you can add the new table to a table group existing in the database. If existing table groups do not meet your needs, you can create a new one so long as resources are sufficient.

    Consider the following factors when creating a table group:

    • Independent load: If all existing table groups contain a large number of tables and are heavily loaded and the new table to be created requires high query and writing throughput, you can create a new table group in this case. In this way, data writing and query of this table is independent from other tables to a certain extent.
    • Table correlation: If a set of tables have a unique data writing and query mode, have or will have local join requirements, and have few or no correlation with tables in existing table groups, a table group can be created for them. Local join can be performed only on tables in the same table group. In other words, you can create a table group for a set of tables having a strong correlation with each other but low correlation with other tables and having a low probability of local join with other tables.
    • Data amount: If the estimated data amount of the table to be created is small, the table should be added to a table group with a small shard count. Otherwise, it should be added to a table group with a large shard count.
    • Instance resource scaling: If the instance has been scaled in because the number of shards is too large, or the instance has been scaled out because the number of shards is too small, you can create a new table group.
      Note We recommend that you do not create a new table group in the following case:

      You want to perform local join operations on the new table and multiple tables in an existing table group at high efficiency. We recommend that multiple tables with frequent join operations be added to the same table group. If you cannot add the new table to the same table group as the other tables, add this table to the table group with the highest join efficiency.

  2. Which table group is most appropriate for my table?

    If you determine not to create a new table group, you should select a table group with optimal performance for your new table. Consider the following factors when selecting a table group:

    • Load: If multiple table groups have similar numbers of shards, we recommend that you select the table group with the lowest load. Here, load indicates the number of tables that have been added to a table group. In the output of the following command, the value of the table_num parameter indicates the number of existing tables.
      select * from hologres.hg_table_group_properties ;
      
       tablegroup_name | property_key  | property_value
      -----------------+---------------+----------------
       test_tg_default | is_default_tg | 1
       test_tg_default | shard_count   | 40
       test_tg_default | tg_version    | 1
       test_tg_default | table_num     | 1
      (4 rows)
    • Data amount: A table with a large data amount should be added to a table group with more shards. When resources are sufficient, data can be written to this table at a higher requests per second (RPS) and be queried with a higher concurrency. Tables with a small data amount should be added to a table group with fewer shards. This reduces the startup overhead caused by concurrent queries, lowers the overall latency, and improves the query efficiency.
    • Local join: If frequent local join operations are required, your table should be added to the same table group as the tables involved in the local join operations.
  3. How many shards are appropriate for a new table group?

    If you determine to create a new table group, you should set an appropriate shard count. Consider the following factors when setting the shard count:

    • Data amount: The shard count is determined by the amount of data to be stored in the table.
    • Instance resources: Set the shard count based on the number of workers in the instance that can be used to analyze the internal table in Hologres. Assume that 40 CPU cores, that is, the total CPU cores of all workers, are assigned to analyze your internal table. We recommend that you set the shard count to 40 for your table with a moderate data amount. This is an empirical value produced based on experience. If the data amount is large, you can double the shard count to 80.
      1. Theoretical maximum static shards: Static shards refer to the shards that have been assigned to an instance and are not involved in data computing. A memory space of 32 MB is reserved for each actor to ensure that the corresponding shard can be loaded. The total memory of a worker is the number of CPU cores of the worker multiplied by 4 GB, and 30% of the total memory is available for assignment. Therefore, the theoretical maximum number of static shards is obtained by dividing the memory space available for assignment by 32 MB reserved for each shard. Based on the experience, we can calculate the theoretical maximum number of static shards by using the following formula: Theoretical maximum number of static shards = Number of workers × Number of CPU cores per worker × 4 GB per CPU core × 30% × 1,024 MB/32 MB = Number of workers × 20 × 38.4 = Number of workers × 768.819
      2. Suitable active shards: Active shards refer to the shards that have been assigned to an instance and are involved in data writing and computing. Each shard requires at least one CPU core to process data. If the number of active shards of an instance (not a database) exceeds the total number of CPU cores of all workers, these active shards will compete for computing resources, degrading Hologres performance sometimes. This is why we recommend that the number of shards equal to the total number of CPU cores of all workers. This value is also the shard count of the default table group, which is suitable for most cases. Unless otherwise specified, you can add your table to the default table group.

        The following table lists the mapping between the number of CPU cores and the default number of workers.

        Number of CPU cores Number of workers
        32 1
        64 1
        128 2
        256 5
        512 11
        > 512 Consult technical support.
    • Required write performance: The shard count has a positive correlation with data write performance. The writing capability of a single shard is fixed. Therefore, more shards indicate higher writing concurrency and throughput. If you need to write data to a table at a high RPS, a larger shard count is required. When computing resources are sufficient, a single shard in Hologres provides an RPS of 5,000 to 10,000. Assume that you want to write data at an RPS of 600,000, the shard count must be in the range of 60 to 120, which can be fine-tuned.
    • Local join: If the query speed for local join operations is slow and fails to be optimized to achieve the optimal performance, the shard count might be improperly set. In this case, you can adjust the shard count based on the data amount.
    • Load of each table group: When you create a table group, you must consider the number of tables to be added to this table group now and in the future. If a large number of tables will be added to this table group in the future and most tables are queried frequently, a small shard count may fail to support highly concurrent queries.

Procedure

This section describes operations related to table groups in Hologres.

  1. Create a table named tb1 and a table group with 60 shards. The tb1 table is added to this table group by default.
    BEGIN;
    CREATE TABLE tb1 (
      a int not null,
      b text,
      c bigint
      ...
    );
    call set_table_property('tbl', 'shard_count', '60'); 
    call set_table_property(...
    ...
    COMMIT;
    Note The call set_table_property('tbl', 'shard_count', '60'); command creates a table group for each run, rather than adding a table to the created table group with 60 shards. Therefore, do not run this command repeatedly.
  2. Add a table to an existing table group.
    The following code uses the colocate_with property to add the tb2 table to the table group where the tb1 table locates.
    BEGIN;
    CREATE TABLE tb2 (
      a int not null,
      b text,
      c bigint
      ...
    );
    call set_table_property('tb2', 'colocate_with', 'tb1'); 
    call set_table_property(...
    ...
    COMMIT;
    820
  3. View the tables and the table groups to which the tables are added.
    select * from hologres.hg_table_properties where property_key = 'table_group';
    
     table_id | table_namespace | table_name | property_key |              property_value
    ----------+-----------------+------------+--------------+-------------------------------------------
            0 | public          | tb1        | table_group  | test_tg_default
            0 | public          | tb2        | table_group  | test_tg_default
            0 | public          | tb3        | table_group  | test_925c1ce6_2260_48c5_ba95_5c2705771c2d
    (3 rows)
  4. View information about all table groups.
    select * from hologres.hg_table_group_properties ;
    
                  tablegroup_name              | property_key  | property_value
    -------------------------------------------+---------------+----------------
     test_tg_default                           | is_default_tg | 1
     test_tg_default                           | shard_count   | 40
     test_tg_default                           | tg_version    | 4
     test_tg_default                           | table_num     | 2
     test_925c1ce6_2260_48c5_ba95_5c2705771c2d | is_default_tg | 0
     test_925c1ce6_2260_48c5_ba95_5c2705771c2d | shard_count   | 60
     test_925c1ce6_2260_48c5_ba95_5c2705771c2d | tg_version    | 1
     test_925c1ce6_2260_48c5_ba95_5c2705771c2d | table_num     | 1
    (8 rows)
  5. View information about the table group where a specific table locates.
    select t.table_id, t.table_namespace, t.table_name, g.* from hologres.hg_table_properties t join hologres.hg_table_group_properties g on (t.property_value = g.tablegroup_name) where t.table_name = 'tb1';
    
    
     table_id | table_namespace | table_name | tablegroup_name | property_key  | property_value
    ----------+-----------------+------------+-----------------+---------------+----------------
            0 | public          | tb1        | test_tg_default | is_default_tg | 1
            0 | public          | tb1        | test_tg_default | shard_count   | 40
            0 | public          | tb1        | test_tg_default | tg_version    | 4
            0 | public          | tb1        | test_tg_default | table_num     | 2
    (4 rows)
  6. Change the shard count of the default table group.
    If shards in the default table group no longer meet your business needs, run the following command to change the shard count:
    call hg_update_database_property('shard_count', '60'); # Change the shard count to 60.

    After you run this command, a new table group with 60 shards is created. The original default table group is no longer the default one, but the properties, shards, and tables added to it remain unchanged. Newly created tables are added to the new default table group, if not otherwise specified.

  7. Delete a table group.

    In Hologres, you do not need to manually delete a table group. Hologres counts the number of tables in a table group. When you delete all tables from a table group, the table group is automatically deleted.

Recommended practice

In summary, table group is an advanced feature of Hologres. It is more suitable for users who have a deep understanding of Hologres and rich development experience.

You can determine whether to create a table group and how to select a table group for your tables based on the following basic principles. In addition, the colocate_with method is optimized to simplify operations.

  1. Basic principles

    You can determine whether to create a table group for your business based on the following principles:

    • Do not create a new table group if you have no special needs.
    • Create an independent table group with a large shard count if the data amount is large.
    • Create an independent table group with a small shard count for a group of tables with small data amount. This reduces the query startup overhead.
    • Try best to add the tables involved in local join operations to the same table group.
  2. Simplified colocate_with method

    If a project has multiple table groups, the colocate_with relationship between tables may become complex in the long run. In this case, we recommend that you create a pivot table to simplify the colocate_with method.

    BEGIN;
    CREATE TABLE tg100 (a int);                             # Create a pivot table.
    call set_table_property('tg100', 'shard_count', '100'); # Create a table group with 100 shards.
    COMMIT;
    
    BEGIN;
    CREATE TABLE tbx (
      a int not null,
      b text,
      c bigint
      ...
    );
    call set_table_property('tbx', 'colocate_with', 'tg100'); # Add tables to the table group with 100 shards.
    COMMIT;
    Figure 5. Comparison between colocate_with relationships with and without a pivot table
    Comparison between colocate_with relationships with and without a pivot table

Conclusion

This best practice describes how to specify the appropriate table group and shard count when creating a table. The settings deeply rely on the data amount, query characteristics, available resources, and focus of usage of your business. For example, if your Hologres instance is used only to accelerate query of MaxCompute data mapped to foreign tables, table group suitable for internal tables cannot be used in this scenario.

Hologres allows users to specify table groups and set the shard count as required. Compared with similar services, Hologres is more flexible and easy to use because users can customize schemas based on their specific business scenarios. With good understanding of their own business, users can make full use of Hologres to achieve high performance.