The data warehouse service (DWS) layer is driven by analyzed subjects during data modeling. Based on the metric requirements of upper-layer applications and products, the DWS layer creates fact tables to aggregate common metrics. A table at the DWS layer usually corresponds to a derived metric.

Design fact tables at the DWS layer

The DWS layer aggregates data at the original granularity. At the DWS layer, data is aggregated based on analyzed subjects for data modeling. This tutorial aims to analyze the total sales of a category (such as kitchenware) of the last day in each province, the names of products whose sales rank top 10 in this category, and the purchasing power distribution (per capita consumption) of customers in each province. To achieve this, you can aggregate data from perspectives such as the product, category, and buyer in successful transactions of the last day. When aggregating data, note that:
  • Data aggregation does not change the dimensions and measures of the original schema. Data is aggregated based on the original star schema. To ensure that you can obtain query results that are consistent with those obtained based on the original schema, the dimensions and measures after data aggregation must be consistent with those of the original schema.
  • Data aggregation improves query performance, but also increases the difficulty in maintaining extract-transform-load (ETL) jobs. If the parent category of a subcategory changes, the previously existing data that is aggregated into an aggregate table needs to be adjusted accordingly.
In addition, you must observe the following rules when designing fact tables at the DWS layer:
  • Consider whether aggregated data can be provided for third parties. You can check whether the data aggregated based on a certain dimension is frequently used in data analysis. If so, you need to aggregate detail data into an aggregate table.
  • Avoid aggregating data of different data domains. A data domain is an abstract entity that aggregates data at a higher level. Data domains are categorized by business process. For example, transactions belong to the transaction domain, and new and modified products belong to the product domain.
  • Specify the statistical period. The statistical period of data must be specified in each table name. For example, _1d indicates the last day, _td indicates as of today, and _nd indicates the last N days.

Follow DWS layer design specifications

The naming convention for fact tables at the DWS layer is dws_{Abbreviated business unit/pub}_{Abbreviated data domain}_{Abbreviated statistic granularity}[_{Abbreviated custom table tag}]_{Abbreviated statistical period}.
  • For the abbreviated statistical period, offline computation requires three tables by default, where _1d indicates the last day, _nd indicates the last N days, and _td indicates as of today. If the _nd table needs to be split due to excessive fields, you can split it and use each split table to store the data of only one statistical period. For example, a table is split based on the last seven days (_1w). A split table cannot store the data of multiple statistical periods.
  • Hourly tables are identified by _hh, regardless of whether the data is updated on a daily or hourly basis.
  • Minutely tables are identified by _mm, regardless of whether the data is updated on a daily or hourly basis.
For example:
  • dws_asale_trd_byr_subpay_1d: aggregates the data on transactions paid in installments at the buyer granularity for the e-commerce business of Company A in the last day.
  • dws_asale_trd_byr_subpay_td: aggregates the data on transactions paid in installments at the buyer granularity for the e-commerce business of Company A as of today.
  • dws_asale_trd_byr_cod_nd: aggregates the data on transactions paid on delivery at the buyer granularity for the e-commerce business of Company A in the last N days.
  • dws_asale_itm_slr_td: aggregates the data on product inventory at the seller granularity for the e-commerce business of Company A as of today.
  • dws_asale_itm_slr_hh: aggregates the data on products at the seller granularity for the e-commerce business of Company A on an hourly basis.
  • dws_asale_itm_slr_mm: aggregates the data on products at the seller granularity for the e-commerce business of Company A on a minutely basis.

Create tables

To meet business requirements, you can use the following statements to create tables at the DWS layer:
CREATE TABLE IF NOT EXISTS dws_asale_trd_byr_ord_1d
(
    buyer_id                BIGINT COMMENT 'Buyer ID',
    buyer_nick              STRING COMMENT 'Buyer nickname',
    mord_prov               STRING COMMENT 'Recipient's province',
    cate_id                 BIGINT COMMENT 'Product category ID',
    cate_name               STRING COMMENT 'Product category name',
    confirm_paid_amt_sum_1d DOUBLE COMMENT 'Total revenue of received and confirmed products in the orders of the last day'
)
COMMENT 'Fact table of all transactions at the buyer granularity in the last day'
PARTITIONED BY (ds         STRING COMMENT 'Partition field: YYYYMMDD')
LIFECYCLE 36000;

CREATE TABLE IF NOT EXISTS dws_asale_trd_itm_ord_1d
(
    item_id                 BIGINT COMMENT 'Product ID',
    item_title               STRING COMMENT 'Product name',
    cate_id                 BIGINT COMMENT 'Product category ID',
    cate_name               STRING COMMENT 'Product category name',
    mord_prov               STRING COMMENT 'Recipient's province',
    confirm_paid_amt_sum_1d DOUBLE COMMENT 'Total revenue of received and confirmed products in the orders of the last day'
)
COMMENT 'Fact table of all transactions at the product granularity in the last day'
PARTITIONED BY (ds         STRING COMMENT 'Partition field: YYYYMMDD')
LIFECYCLE 36000;