This topic describes the best practices for using wide tables to perform tag computing in Hologres.

Background information

In the offline data warehouse model, user tag data is stored in multiple theme- and dimension-oriented tables. This is helpful to build a tag system and maintain and manage data. However, if such a data model is used in online profile analysis to organize tag data, multiple tag tables must be joined to filter tags, which is too costly for database services.

Solution

In the wide table solution, stable property tables are aggregated into wide tables offline and JOIN operations on multiple tables are converted into operations on a single wide table. If new tags are required, columns are added to the wide table for these tags. This solution is suited for the following scenarios:
  • Scenarios in which less than 1,000 tags are used.
  • Scenarios in which data is infrequently updated.
When data is stored in wide tables, the AND, OR, and NOT operations on filter conditions of multiple columns are automatically processed by the optimization mechanism of column-oriented storage, which is more efficient than join operations. In addition, Hologres supports column-oriented storage, which prevents I/O operations from increasing. Traditional database modeling and development applications can be used in this solution.

Use example

In the following example, a wide table is used for profile analysis. SQL statements are executed to query men ([gender = Male]) in the Zhejiang province ([province = Zhejiang]) whose marriage status is married in the dws_userbase table. We recommend that you set proper indexes for the table based on the query mode to improve query performance. For more information, see CREATE TABLE.
-- The wide table.
BEGIN;
CREATE TABLE dws_userbase
(
  uid text not null primary key,
  province text,
  gender text,
  married text
  ...             -- Other property columns.
);
call set_table_property('dws_userbase', 'distribution_key', 'uid');
call set_table_property('dws_userbase', 'bitmap_columns', 'province,gender,married');
END;

-- Query based on basic properties.
SELECT count(distinct uid) as cnt,
       married
FROM dws_userbase ub
WHERE province = 'Zhejiang' and gender = 'Male'
GROUP BY married;