In the Alibaba data system, we recommend that you divide a data warehouse into three layers from bottom to top: operational data store (ODS) layer, common data model (CDM) layer, and application data service (ADS) layer.
- The ODS layer stores raw data in the data warehouse. The data structure is basically consistent with that in the source system. The ODS layer serves as the data staging area of the data warehouse. It imports basic data to MaxCompute and records historical changes of basic data.
- The CDM layer, which is also called the general data model layer, consists of the
dimension data (DIM), data warehouse detail (DWD), and data warehouse service (DWS)
layers. The CDM layer processes and integrates the data of the ODS layer to define
conformed dimensions, create reusable detailed fact tables for analysis and statistics,
and aggregate common metrics.
- The DIM layer defines conformed dimensions for an enterprise based on the concepts
of dimensional modeling. It reduces the risk of inconsistent statistical criteria
Tables at the DIM layer are also called logical dimension tables. Generally, each dimension corresponds to a logical dimension table.
- The DWS layer is driven by analyzed subjects during data modeling. Based on the metric
requirements of upper-layer applications and products, the DWS layer creates fact
tables to aggregate common metrics and builds a physical data model by using wide
tables. The DWS layer creates statistical metrics in compliance with uniform naming
conventions and statistical criteria, provides common metrics for the upper layer,
and generates aggregate wide tables and detailed fact tables.
Tables at the DWS layer are also called logical aggregate tables, which are used to store derived metrics.
- The DWD layer is driven by business processes during data modeling. It creates detailed
fact tables at the finest granularity based on each specific business process. In
combination with the data usage habits of an enterprise, you can duplicate some key
attribute fields of dimensions in detailed fact tables to create wide tables.
Tables at the DWD layer are also called logical fact tables.
- The DIM layer defines conformed dimensions for an enterprise based on the concepts of dimensional modeling. It reduces the risk of inconsistent statistical criteria and algorithms.
- The ADS layer stores personalized statistical metrics of data products. It processes the data of the CDM and ODS layers.
At the ODS layer, the data classification architecture is divided into three parts: data staging area, offline data area, and quasi-real-time data area.
In this tutorial, DataWorks collects data from the transaction data system and synchronizes the data to the ODS layer of the data warehouse. Then, the data warehouse processes data to create wide fact tables and uses dimensions such as the product and region to aggregate data.
The following figure shows the overall data flow direction. MaxCompute is responsible for the extract-transform-load (ETL) processing from the ODS layer to the DIM layer. The processed data is then synchronized to all storage systems. Data at the ODS and DWD layers is stored in data middleware, so that the data can be subscribed to and used by descendant applications. Data at the DWS and ADS layers is usually stored in online storage systems. Descendant applications can call relevant methods to use the data.