The data warehouse detail (DWD) layer is driven by business processes during data modeling. It creates detailed fact tables at the finest granularity based on each specific business process. In combination with the data usage habits of an enterprise, you can duplicate some key attribute fields of dimensions in detailed fact tables to create wide tables.

As the core of dimensional modeling for a data warehouse, fact tables at the data warehouse service (DWS) and DWD layers need to be designed based on the business process. Fact tables express each business process by using the dimensions referenced by and the measures related to the business process. A measure is generally numeric data and represented as a fact in a logical fact table. Fact attributes are the descriptive information in a logical fact table. A fact table uses a foreign key in fact attributes to associate with the corresponding dimension.

The statistic granularity of a fact table determines the business details expressed by a data entry in the table. Generally, you can determine a statistic granularity to express one of the following meanings: the degree of details expressed by a combination of dimension attributes and the specific business meaning.

To measure a business process, a fact is generally a decimal value of the integer or floating-point type. Facts are categorized into three types: additive facts, semi-additive facts, and non-additive facts.
  • Additive facts can be aggregated based on any dimension associated with fact tables.
  • Semi-additive facts can be aggregated based only on specific dimensions, but not all dimensions. For example, inventory can be aggregated by location or product, but the sum of monthly inventory in a year is useless.
  • Non-additive facts, such as facts of the ratio type, cannot be aggregated. Non-additive facts can be decomposed into additive components for aggregation.

Fact tables have a finer statistic granularity than dimension tables, and therefore rows can be added faster to fact tables. Dimension attributes can be stored in fact tables. The stored dimension columns are called degenerate dimensions, which can speed up queries. Similar to other dimensions stored in dimension tables, degenerate dimensions can be used to filter and aggregate data for fact tables.

Fact tables at the DWD layer can be divided into three types: transaction fact tables, periodic snapshot fact tables, and accumulating snapshot fact tables. For more information, see Data warehouse construction guide. A transaction fact table describes a business process and tracks measurement events at a point in space or time. It stores the most atomic data, and is also known as an atomic fact table. A periodic snapshot fact table records facts at regular and predictable intervals. An accumulating snapshot fact table describes the key step events from the beginning to the end of a business process to cover the entire lifecycle of the process. It usually has multiple date fields to record key time points. As the accumulating snapshot fact table changes over the lifecycle of the process, its data entries are also modified.

Design fact tables at the DWD layer

When designing fact tables at the DWD layer, you need to note the following principles:
  • Generally, associate a fact table at the DWD layer with only one dimension.
  • Include as many facts about a business process as possible in a fact table.
  • Select only the facts that are relevant to a business process.
  • Decompose non-additive facts into additive components.
  • Declare the statistic granularity before selecting dimensions and facts for a business process.
  • Include facts that measure data at only one granularity in a fact table.
  • Use the same unit for a fact in different fact tables.
  • Handle null values with caution.
  • Use degenerate dimensions to improve the usability of fact tables.

For example, the transaction business process and its measures are defined during measurement. To design detailed fact tables, you need to design models for business processes. The design process can be divided into four steps: select a business process, specify the statistic granularity, select dimensions, and specify facts (measures). The statistic granularity is the semantic description of business activities without dimensions. When creating detailed fact tables, you need to develop the data at the DWD layer based on existing tables, and be clear about the granularity of the data stored in the created tables.

Follow DWD layer design specifications

The naming convention for fact tables at the DWD layer is dwd_{Abbreviated business unit/pub}_{Abbreviated data domain}_{Abbreviated business process}[_{Abbreviated custom table tag}] _{Incremental or full partition identifier}. In the naming convention, pub indicates that data comes from multiple business units. The incremental or full partition identifier is described as follows: i indicates incremental and f indicates full. For example, the dwd_asale_trd_ordcrt_trip_di fact table records air ticket orders for the e-commerce business of Company A and updates data in incremental mode on a daily basis. The dwd_asale_itm_item_df fact table records product snapshots for the e-commerce business of Company A and updates data in full mode on a daily basis.

In this tutorial, the DWD layer consists of the following three tables:
  • Fact table of transactional product information: dwd_asale_trd_itm_di
  • Fact table of transactional member information: ods_asale_trd_mbr_di
  • Fact table of transactional order information: dwd_asale_trd_ord_di

Create tables

This tutorial uses degenerate dimensions to improve query efficiency. You can use the following statements to create tables:
CREATE TABLE IF NOT EXISTS dwd_asale_trd_itm_di
(
    item_id              BIGINT COMMENT 'Product ID',
    item_title           STRING COMMENT 'Product name',
    item_price           DOUBLE COMMENT 'Product price',
    item_stuff_status    BIGINT COMMENT 'Product status: 0: new; 1: idle; 2: second-hand',
    item_prov            STRING COMMENT 'Province of the product',
    item_city            STRING COMMENT 'City of the product',
    cate_id              BIGINT COMMENT 'Product category ID',
    cate_name            STRING COMMENT 'Product category name',
    commodity_id         BIGINT COMMENT 'Commodity ID',
    commodity_name       STRING COMMENT 'Commodity name',
    buyer_id             BIGINT COMMENT 'Buyer ID',
)
COMMENT 'Fact table of transactional product information'
PARTITIONED BY (ds     STRING COMMENT 'Date')
LIFECYCLE 400;

CREATE TABLE IF NOT EXISTS ods_asale_trd_mbr_di
(
    order_id         BIGINT COMMENT 'Order ID',
    bc_type          STRING COMMENT 'Business category',
    buyer_id         BIGINT COMMENT 'Buyer ID',
    buyer_nick       STRING COMMENT 'Buyer nickname',
    buyer_star_id    BIGINT COMMENT 'Buyer star ID',
    seller_id        BIGINT COMMENT 'Seller ID',
    seller_nick      STRING COMMENT 'Seller nickname',
    seller_star_id   BIGINT COMMENT 'Seller star ID',
    shop_id          BIGINT COMMENT 'Shop ID',
    shop_name        STRING COMMENT 'Shop name'
)
COMMENT 'Fact table of transactional member information'
PARTITIONED BY (ds     STRING COMMENT 'Date')
LIFECYCLE 400;

CREATE TABLE IF NOT EXISTS dwd_asale_trd_ord_di
(
    order_id              BIGINT COMMENT 'Order ID',
    pay_order_id          BIGINT COMMENT 'Payment order ID',
    pay_status            BIGINT COMMENT 'Payment status: 1: unpaid; 2: paid; 3: refunded',
    succ_time             STRING COMMENT 'Transaction end time',
    item_id               BIGINT COMMENT 'Product ID',
    item_quantity         BIGINT COMMENT 'Quantity of purchased products',
    confirm_paid_amt      DOUBLE COMMENT 'Total revenue of received and confirmed products',
    logistics_id          BIGINT COMMENT 'Logistics order ID',
    mord_prov             STRING COMMENT 'Recipient's province',
    mord_city             STRING COMMENT 'Recipient's city',
    mord_lgt_shipping     BIGINT COMMENT 'Shipping method: 1: surface mail; 2: express delivery; 3: EMS',
    mord_address          STRING COMMENT 'Recipient's address',
    mord_mobile_phone     STRING COMMENT 'Recipient's mobile number',
    mord_fullname         STRING COMMENT 'Recipient name',
    buyer_nick            STRING COMMENT 'Buyer nickname',
    buyer_id              BIGINT COMMENT 'Buyer ID'
)
COMMENT 'Fact table of transactional order information'
PARTITIONED BY (ds       STRING COMMENT 'Date')
LIFECYCLE 400;