The dimension data (DIM) layer stores conformed dimension tables — wide tables that represent the people, places, and things your business measures (such as products, locations, and customers). It is built on the concepts of dimensional modeling.
Dimensions must be conformed across your enterprise: the same dimension defined in one business process carries the same meaning in every other. For example, a province code dimension delivers the same information regardless of which business process queries it.
Define dimensions
When you build a bus matrix and divide data domains, define dimensions by analyzing your business processes.
This tutorial uses the e-commerce marketing business unit of Company A. In the transaction data domain, it focuses on the receipt confirmation business process (where a transaction is successful). Each product is assumed to be purchased and received in the same place.
From the product perspective, the following dimensions apply:
-
Product ID
-
Product name
-
Product price
-
Product status: 0: new; 1: idle; 2: second-hand
-
Product category ID
-
Product category name
-
Commodity ID
-
Commodity name
-
Product sales status: 0: normal; 1: deleted; 2: taken off the shelf; 3: never put on the shelf
-
City of the product
-
Province of the product
From the place of receipt perspective, the following dimensions apply:
-
City code
-
City name
-
Province code
-
Province name
Design dimension tables
After defining dimensions, supplement data to generate dimension tables. Apply these constraints to every dimension table:
| Constraint | Value | Reason |
|---|---|---|
| Maximum table size | 10 million rows | Larger tables degrade join performance |
| Join method | MAPJOIN | Optimized for dimension-to-fact joins in MaxCompute |
| Update frequency | Infrequent | Frequent updates increase pipeline complexity |
Choose a dimension table structure
When designing a dimension table, evaluate the four factors below to decide how to structure and split it:
| Factor | When it applies | Recommended approach |
|---|---|---|
| Data stability | Dimension data rarely expires but may update at any time (e.g., member records) | Create a partition to store full snapshots. If some records never change, separate them into a historical table (partitioned by expiration time) and a daily table (currently valid records). |
| Vertical splitting | The table has many unused attribute columns, or too many attributes slow queries | Split into multiple dimension tables by attribute group |
| Horizontal splitting | Some rows are structurally distinct from the rest | Split into separate tables or add multi-level partitions |
| Query time limits | Core dimension tables have strict query time SLAs | Keep the table lean; enforce row count and column count limits |
Design steps
Follow these four steps to design a dimension table:
-
Define and conform dimensions. Establish dimensions and make sure they are conformed — consistent and reusable across all business processes in your enterprise.
-
Identify the primary dimension table. The primary dimension table is an ODS table that synchronizes directly from a source business system. For example, the
s_auctiontable records product information synchronized from the front-end product center system. This is the primary dimension table for the product dimension in this tutorial. -
Identify related dimension tables. The data warehouse aggregates data from multiple source systems. Analyze the business logic to find which tables associate with your primary dimension table. For the product dimension, related dimensions include category, seller, and shop.
-
Specify dimension attributes in two phases:
-
Phase 1: Select or derive dimension attributes from the primary dimension table.
-
Phase 2: Select or derive dimension attributes from related dimension tables — for example, from the category, seller, and shop dimension tables.
When specifying attributes, apply these principles:
-
Generate as many meaningful dimension attributes as possible.
-
Write descriptive labels that make attribute values self-explanatory.
-
Distinguish numeric attributes (e.g., price) from facts (e.g., revenue). Numeric attributes describe the dimension; facts belong in fact tables.
-
Identify common attributes shared across multiple dimensions and promote them to conformed dimensions.
-
Follow DIM layer naming conventions
Name dimension tables using this pattern:
dim_{Abbreviated business unit|pub}_{Abbreviated dimension}[_{Abbreviated custom tag}]
Use pub when a dimension is not specific to any business unit and is shared across the enterprise (e.g., the time dimension or regional information).
The following table maps business concepts to their DIM layer table names:
| Business concept | Dimension type | Table name |
|---|---|---|
| Regional information (shared) | Geographic, pub | dim_pub_area |
| Full product catalog — Company A e-commerce marketing | Product | dim_asale_itm |
Create dimension tables
The following DDL statements create the two dimension tables used in this tutorial.
CREATE TABLE IF NOT EXISTS dim_asale_itm
(
item_id BIGINT COMMENT 'Product ID',
item_title STRING COMMENT 'Product name',
item_price DOUBLE COMMENT 'Product price',
item_stuff_status BIGINT COMMENT 'Product status: 0: new; 1: idle; 2: second-hand',
cate_id BIGINT COMMENT 'Product category ID',
cate_name STRING COMMENT 'Product category name',
commodity_id BIGINT COMMENT 'Commodity ID',
commodity_name STRING COMMENT 'Commodity name',
item_status BIGINT COMMENT 'Product sales status: 0: normal; 1: deleted; 2: taken off the shelf; 3: never put on the shelf',
city STRING COMMENT 'City of the product',
prov STRING COMMENT 'Province of the product'
)
COMMENT 'Table of full products'
PARTITIONED BY (ds STRING COMMENT 'Date: yyyymmdd');
CREATE TABLE IF NOT EXISTS dim_pub_area
(
city_code STRING COMMENT 'City code',
city_name STRING COMMENT 'City name',
prov_code STRING COMMENT 'Province code',
prov_name STRING COMMENT 'Province name'
)
COMMENT 'Table of common regional information'
PARTITIONED BY (ds STRING COMMENT 'Date: yyyymmdd')
LIFECYCLE 3600;
dim_pub_areais a shared (pub) dimension table. TheLIFECYCLE 3600setting retains data for 3,600 days. Adjust this value based on your data retention requirements.