After a data warehouse is layered, you need to specify how different layers use the data of one another.
Principles for using data at different layers
The application data service (ADS) layer preferentially uses the data of the common data model (CDM) layer in the data warehouse. If data exists at the CDM layer, the ADS layer cannot skip the CDM layer to repeatedly process the data of the operational data store (ODS) layer. To provide data services for other data layers, the CDM layer needs to actively collect data construction requirements from the ADS layer and integrate common data. In addition, the ADS layer also needs to cooperate with the CDM layer to continuously construct common data. The data of the ODS layer needs to be used properly to avoid unreasonable data replication and redundant subsets. The principles for using data at different layers are as follows:
- Jobs at the ADS layer cannot directly use the data of the ODS layer. If no data of the ODS layer is processed and integrated at the CDM layer, the ADS layer can access the data of the ODS layer only by using views at the CDM layer. Views at the CDM layer must be encapsulated in periodically triggered nodes to ensure that the views can be maintained and managed.
- We recommend that jobs at the CDM layer have no more than 10-level dependencies on tables.
- Generally, if a computation job is refreshed, it allows only one output table.
- If multiple jobs are refreshed and generate one output table, in which the results of different jobs are inserted into different partitions, you need to create a virtual node in DataWorks. This virtual node is dependent on the refreshing and output of multiple jobs. Generally, descendant jobs are dependent on this virtual node.
- At the CDM layer, the data warehouse service (DWS) layer preferentially uses the data of the data warehouse detail (DWD) layer. The DWS layer can aggregate metrics to compute data. The DWS layer preferentially uses the coarse-grained data that is produced. This prevents the DWS layer from producing data by directly computing large amounts of data at the DWD layer.
- At the CDM layer, accumulating snapshot fact tables preferentially use the data of transaction fact tables to ensure the consistency of data output.
- At the CDM layer, the DWS layer needs to be optimized to prevent the ADS layer from excessively using and depending on the data of the DWD layer.