The following figure shows the overall process of building a MaxCompute data warehouse.
Before reading this tutorial, familiarize yourself with the following terms:
- Business unit: the division of business at a higher level than data domains. It is applicable to large business systems.
- Dimension: the entity object that serves as an environment for measures. Dimensional modeling, proposed by Ralph Kimball, encourages you to build data models based on business analysis and decision-making requirements. A dimension is a collection of business attributes. It helps you monitor your business. For example, when analyzing a transaction process, you can describe the transaction environment in various dimensions, such as the buyer, seller, product, and time.
- Attribute (Dimension attribute): the column that constitutes a dimension. Dimension attributes are crucial to data usability. They are the basic sources for generating query constraints, groups, and report tags.
- Measure: the business performance measurement. In dimensional modeling, a measure is described as a fact and a dimension as an environment. Dimensions are diverse environments required to analyze facts. A measure is generally numeric data and represented as a fact in a logical fact table.
- Metric: the measurement indicator. Metrics are categorized into atomic metrics and
derived metrics. An atomic metric is a measure in a business event. It is an indivisible
metric as defined in a business and has a specific meaning. It reflects clear statistical
criteria and computational logic of the business. For example, the payment amount
can be regarded as an atomic metric.
- An atomic metric is a measure in a business process.
- A derived metric is an atomic metric within a business scope constrained by a time period and a modifier.
- Business filter: the business scope for statistics, which is used to find out data that complies with business rules. Business filters are similar to the conditions specified by the WHERE clause in SQL statements, excluding the time interval.
- Statistical period: the time range for statistics, for example, the last day or the last 30 days. A statistical period is similar to the time condition specified by the WHERE clause in SQL statements.
- Statistic granularity: the object or perspective for statistical analysis. It defines the level of data aggregation and can be regarded as a grouping condition for aggregation computing. A statistic granularity is similar to the object specified by the GROUP BY clause in SQL statements. A statistic granularity is a combination of dimensions. It specifies a statistical range. For example, if a derived metric measures the turnover of a seller in a province, the statistic granularity is the combination of the seller and region dimensions. If you want to collect statistics on the data of a table, the statistic granularity is the entire table. When specifying a statistic granularity, you must consider the relationship between your business and dimensions. The statistic granularity often serves as the modifier of a derived metric.