All Products
Search
Document Center

DataWorks:Create a data layer

Last Updated:Oct 11, 2023

Data layering is used to design the structure of a data model and divide the layer based on the comprehensive analysis of business scenarios, data scenarios, and system scenarios. Each data layer serves a specific purpose. Data layering helps you organize, manage, and maintain data in an efficient manner. This topic describes how to create and manage data layers.

Background information

A data warehouse is a collection of various types of data, such as logs, database data, text data, and external data. In data modeling, the logical structure of a data warehouse is built based on data layers, data domains, business processes, data marts, and subject areas. Data domains and business processes are used at a common layer to build data models for the common layer. Data marts and subject areas are used at an application layer to build data models for specific business applications.

Before raw data is stored in a data warehouse, the raw data is cleansed and filtered at a data layer. This helps optimize the data query process and improves the efficiency of obtaining, calculating, and analyzing data. Data layers associate data of different dimensions for multidimensional analysis and decision-making.

Plan data layers

You must design and plan data layers based on your business requirements and comprehensive analysis of business scenarios, data scenarios, and system scenarios.

By default, a data warehouse is divided into the following layers: operational data store (ODS), dimension (DIM), data warehouse detail (DWD), data warehouse summary (DWS), and application data service (ADS).

  • ODS

    This layer is used to receive and process raw data that needs to be stored in a data warehouse. The structure of a data table at the ODS layer is the same as the structure of a data table in which the raw data is stored. The ODS layer serves as the staging area for the data warehouse. The following operations are performed on the raw data at the ODS layer:

    • Synchronize incremental or full structured raw data to the data warehouse.

    • Structure unstructured raw data, such as logs, and store the outputs in MaxCompute.

    • Record changes in raw data or cleanse raw data based on your business requirements.

    The name of a data table at the ODS layer must start with ods and the time to live (TTL) of the table must be 366 days.

  • DWD

    At this layer, data models are built based on the business activities of an enterprise. You can create a fact table that uses the highest granularity level based on the characteristics of a specific business activity. You can duplicate some key attribute fields of dimensions in fact tables and create wide tables based on the data usage habits of the enterprise. You can also associate fact tables with dimension tables as little as possible to improve the usability of fact tables.

  • DWS

    At this layer, data models are built based on specific subject objects that you want to analyze. You can create a general aggregate table based on the metric requirements of upper-layer applications and products.

    Some general dimensions can be abstracted at the ODS layer based on preliminary classification and summary of user behavior. For example, the dimensions are time, IP address, and ID. You can use these dimensions to obtain statistical data, such as the numbers of products purchased by users at different logon IP addresses in each time period. At the DWS layer, you can add multi-granularity aggregate tables on top of general aggregate tables to improve the calculation efficiency. For example, you can save a long period of time if you evaluate user behavior based on the time interval of 7 days, 30 days, or 90 days.

  • ADS

    This layer is used to store the metric data of products and generate various reports. For example, the ADS layer can be used by an e-commerce enterprise to store statistical information about the sales volume and the ranking of each type of ball sports goods in Hangzhou from June 9 to June 19.

  • DIM

    At this layer, data models are built based on dimensions. You can define dimensions, determine the primary keys, add dimension attributes, and associate different dimensions. This ensures data consistency in data analysis and mitigates the risks of inconsistent data calculation specifications and algorithms.

The following figures show two display modes for data layers: Tiled Display and Hierarchy Display.数仓分层

Display mode

Description

Tiled Display

Data layers are displayed in tiled mode.

Hierarchy Display

DataWorks provides you with the following data layer categories: Data Import Layer, Common Layer, Application Layer, and Others. You can create a data layer and add the data layer to a data layer category.

  • Data Import Layer: A data layer of this category is used to ingest basic data such as database data, logs, and messages. You can store only ODS tables at a data layer of the data import layer category.

  • Common Layer: A data layer of this category is used to process and integrate common data to define unified dimensions, create reusable detailed fact tables for data analysis and statistics collection, and aggregate common metrics. You can store fact tables, dimension tables, aggregate tables, dimensions, and combinations of dimension tables and dimensions at a data layer of the common layer category.

  • Application Layer: A data layer of this category is used to reconstruct the data that is processed and integrated at a data layer of the common layer category based on your business requirements. You can store application tables, dimension tables, dimensions, and combinations of dimension tables and dimensions at a data layer of the application layer category.

  • Others: Data layers that are automatically created by the system and data layers that are customized by users are added to this data layer category.

    • System data layer: This type of data layer is added to the Others data layer category. If you want to change the data layer category to which a system data layer belongs from Others to another data layer category, contact Alibaba Cloud technical support.

    • Custom data layer: This type of data layer is added to the Others data layer category. If you want to change the data layer category to which a custom data layer belongs from Others to another data layer category, modify the data layer.

    Note

    The supported model types vary based on the data layer category. Before you add a model table to a data layer of your desired data layer category, make sure that you change the data layer category to which the data layer belongs from Others to your desired data layer category that supports the model type of the table you create.

Note

You can change the value of the Category parameter only once after you configure this parameter. Select a suitable data layer category based on your business requirements.

Create a data layer

By default, the system creates the following layers for you: ODS, DIM, DWD, DWS, and ADS. These layers can meet your business requirements in most scenarios. If you have special requirements, you can perform the steps described in this section to create a data layer.

Sample scenario for a special requirement: Abstract a temporary (TMP) layer and store temporary tables at this layer. Specify some standards and verification rules for this layer, such as table naming conventions and TTL. This ensures that tables created at this layer conform to the standards and rules specified for this layer.

  1. Go to the Data Layer page.

    1. Log on to the DataWorks console. In the left-side navigation pane, choose Data Modeling and Development > Data Modeling. On the page that appears, select the desired workspace from the drop-down list and click Go to Data Modeling.

    2. In the top navigation bar of the Data Modeling page, click Data Warehouse Planning. The Data Layer page appears.

  2. Create a data layer.

    1. Click Create. In the Create Data Layer dialog box, configure the parameters.新建数仓分层

      Parameter

      Description

      Abbreviation

      The abbreviation for the name of the data layer. An abbreviation uniquely identifies a data layer.

      Name

      The name of the data layer.

      Display Name

      The display name of the data layer.

      Owner

      The owner of the data layer. The default value is the current logon account.

      Category

      The value of this parameter determines the value of the Model Type parameter. This parameter specifies the data layer category to which the data layer belongs.

      • Data Import Layer: A data layer of this category is used to ingest basic data such as database data, logs, and messages. You can store only ODS tables at a data layer of the data import layer category.

      • Common Layer: A data layer of this category is used to process and integrate common data to define unified dimensions, create reusable detailed fact tables for data analysis and statistics collection, and aggregate common metrics. You can store fact tables, dimension tables, aggregate tables, dimensions, and combinations of dimension tables and dimensions at a data layer of the common layer category.

      • Application Layer: A data layer of this category is used to reconstruct the data that is processed and integrated at a data layer of the common layer category based on your business requirements. You can store application tables, dimension tables, dimensions, and combinations of dimension tables and dimensions at a data layer of the application layer category.

      • Others: Data layers that are automatically created by the system and data layers that are customized by users are added to this data layer category.

        • System data layer: This type of data layer is added to the Others data layer category. If you want to change the data layer category to which a system data layer belongs from Others to another data layer category, contact Alibaba Cloud technical support.

        • Custom data layer: This type of data layer is added to the Others data layer category. If you want to change the data layer category to which a custom data layer belongs from Others to another data layer category, modify the data layer.

        Note

        The supported model types vary based on the data layer category. Before you add a model table to a data layer of your desired data layer category, make sure that you change the data layer category to which the data layer belongs from Others to your desired data layer category that supports the model type of the table you create.

      Note

      You can change the value of the Category parameter only once after you configure this parameter. Select a suitable data layer category based on your business requirements.

      Model Type

      The value of this parameter is determined by the value of the Category parameter. This parameter specifies the type of the model table that you can store.

      • ODS Table: You can set Model Type to this value only if you set Category to Data Import Layer.

      • Fact Table: You can set Model Type to this value only if you set Category to Common Layer.

      • Application Table: You can set Model Type to this value only if you set Category to Application Layer.

      • Aggregate Table: You can set Model Type to this value only if you set Category to Common Layer.

      • Dimension Table, Dimension, and Dimension Table and Dimension: You can set Model Type to one of the values only if you set Category to Common Layer or Application Layer.

      Note
      • After you configure the Model Type parameter, you cannot change its value. Therefore, proceed with caution.

      • If the name or display name of a data layer contains the keyword dim, DIM, Dim, or dimension, DataWorks changes the value of the Model Type parameter to Dimension Table and Dimension by default.

      Description

      The description of the data layer. You can select a data layer to store specific business data based on the description of the data layer.

      The description of the data layer can be up to 2,048 characters in length.

  3. Click Confirm.

What to do next

After you create a data layer, you must add a data layer checker to specify the naming conventions of tables at the data layer. For more information, see Configure and use a checker at a data layer.