When business data rapidly grows, it is difficult to manage the huge amount of complex data that has different data standards. DataWorks Data Modeling is provided to structure and manage the huge amount of disordered and complex data. Data Modeling helps enterprises gain more value from business data.
- Browse model details: All roles such as Visitor, Workspace Manager, Model Developer, and Project Owner in a DataWorks workspace can browse the details of a data model. For more information about roles in a DataWorks workspace, see Manage permissions on workspace-level services.
- Edit model information: Only the Workspace Manager, Development, O&M, and Model Developer roles can edit model information. You can assign one of these roles to a user if you want to allow the user to edit model information. For more information about how to assign a role to a user, see Manage permissions on workspace-level services.
- Publish a data model: Only the Workspace Manager and O&M roles can publish a data model.You can assign one of these roles to a user if you want to allow the user to edit model information. For more information about how to assign a role to a user, see Manage permissions on workspace-level services.
In DataWorks Data Modeling, you can plan and design a data warehouse, formulate and summarize data standards, perform dimensional modeling, and define data metrics. You can use Data Modeling to materialize the dimension tables, fact tables, and aggregate tables generated from data modeling into compute engines and use the materialized tables for further processing.
- Data Warehouse PlanningYou can design data layers, business categories, subject areas, and business processes on the Data Warehouse Planning page.
- Data LayerYou can design data layers in a data warehouse based on business scenarios and data scenarios. By default, DataWorks creates the following common layers for you:
- Operational data store (ODS)
- Data warehouse detail (DWD)
- Data warehouse summary (DWS)
- Application data service (ADS)
- Dimension data (DIM)
- Business Category
If your business is complex and different business categories need to share the same data domain, you can categorize the business to suit your business requirements. This helps you quickly locate data of business during model design and application. For more information about how to create a business category, see Business category.
- Data Domain
A data domain is a high-level data classification standard. It is a collection of business processes that are abstracted, refined, and combined. A data domain is the first data grouping entry for business personnel. It helps business personnel quickly locate the desired business data from large amounts of data.
Data domains are often used for business analysis and can be used as an analysis domain, such as procurement, supply chain, human resources, and e-commerce. We recommend that a data domain is uniformly managed and configured by an experienced organization or team, such as data architects or a model design team. Data domain designers must have a deep understanding of enterprise business and can fully express their interpretation and abstraction of the business. For more information about how to plan and create a data domain in DataWorks, see Data domain.
- Business Process
A business process is used to describe the process of a business activity, such as adding commodities to the shopping cart, placing an order, or paying for an order. Business processes have typical application during business effect analysis, such as funnel analysis of commodity purchases. You can break down a commodity purchase into the following business processes: browsing commodities, adding commodities to the shopping cart, placing an order, paying for the order, and confirming the receipt of a commodity. Use the number of orders as a metric for each business process and perform funnel analysis on the metric. For more information about how to create a business process in DataWorks, see Business process.
- Data Layer
- Data Standard
DataWorks Data Modeling allows you to plan and formulate data standards before data modeling, or summarize data standards based on the business conditions during data modeling. The lookup table, measurement unit, data standard, and naming dictionary are standardized to ensure consistent data processing during subsequent modeling and application.
For example, a registration table and a logon table are created. The registration table contains a member ID column that is specified by the user_id field. The logon table also contains a member ID column that is specified by the userid field. You can create a unified data standard for the member ID columns. For example, you can specify a lookup table for data processing, the attribute requirements for each field (such as the data type, length, and default value of each field), and the measurement unit of data. The data standard can be directly applied to the member ID fields during data modeling. The data standard ensures that all the member ID fields observe the same standard.For more information about how to create a data standard in DataWorks, see Data standard.
- Dimensional ModelingDataWorks Data Modeling adopts the dimensional modeling thought. When you use the dimensional modeling feature to design data models in a data warehouse, take note of the following points:
- Dimension Table
Extract all the dimensions that possibly exist in each data domain, and store the dimensions and attributes of the dimensions in dimension tables. For example, when you analyze e-commerce business data, possible dimensions (attributes of each dimension) include order (order ID, order creation time, buyer ID, and seller ID), user (gender and birthdate), and commodity (commodity ID, commodity name, and commodity put-on-shelf time). In this case, you can create the following dimension tables: order dimension table, user dimension table, and commodity dimension table. The attributes of each dimension are used as the fields in the dimension table. You can deploy the dimension tables in a data warehouse and perform extract, transform, and load (ETL) operations to store dimension data in the format defined in the dimension table. This allows business personnel to access the data for subsequent data analysis.
- Fact Table
Sort and analyze data that is generated in each business process, and store the data in fact tables as fields. For example, you can create a fact table for the business process of placing an order, and record the following information as fields in the fact table: order ID, order creation time, commodity ID, number of commodities, and sales amount. You can deploy the fact tables in a data warehouse and perform ETL operations to summarize and store data in the format defined in the fact table. This allows business personnel to access the data for subsequent data analysis.
- Aggregate Table
Summarize and analyze fact data and dimension data based on business data analysis and data layering, and create an aggregate table. This way, you can directly access data in the aggregate table for subsequent data analysis, without the need to access the data in fact tables and dimension tables.
- Reverse Modeling
Reverse modeling is used to apply the models generated by using other modeling tools to DataWorks Dimensional Modeling. For example, if you generated a model by using other modeling tools and you want to use DataWorks Data Modeling for subsequent modeling, you can use the reverse modeling feature of DataWorks. This feature eliminates the need for second modeling. It helps you quickly apply existing models to DataWorks Dimensional Modeling, and therefore saves you a lot of time.
- Dimension Table
- Data Metric
DataWorks Data Modeling provides the Data Metric feature, which allows you to establish a unified metric system.A metric system consists of Atomic Metric, Modifier, Period, and Derived Metric.
- Atomic Metric: a measurement used for a business process, such as the payment amount of an order.
- Modifier: limits the scope of business for which a specific metric is calculated. For example, limit the payment amount metric to calculate only maternity and infant products.
- Period: specifies the time range or point in time at which a metric is calculated. For example, set the period for the payment amount metric to the last seven days.
- Derived Metric: consists of an atomic metric, a period, and one or more modifiers. For example, calculate the payment amount of the maternity and infant products in the last seven days.
Importance of data modeling
- Standardize management of massive data
Larger enterprises have more complex data structures. How to manage and store data in a structured and orderly manner is a challenge that every large enterprise faces.
- Break information barriers by interconnecting business data
If the data of each business or department in an enterprise is isolated from one another, the decision-makers cannot clearly or fully understand the data. How to break data silos between departments or business domains is a big challenge for business data management.
- Integrate data standards to achieve unified and flexible data interconnection
Inconsistent descriptions of the same data result in duplicate data, incorrect calculation results, and difficulties in business data management. How to formulate a unified data standard without changing the original system architecture and realize flexible interconnection between upstream and downstream business is one of the core focuses of standardized management.
- Maximize data value to maximize profit
Make the most of various types of enterprise data to maximize the data value to deliver a more efficient data service for enterprises.