This tutorial provides a comprehensive guide on using Dataphin for constructing a basic offline data warehouse. It covers the initial stages of resource preparation and data warehouse planning, through to the later stages of operations, maintenance, data backfill, and result analysis. The tutorial aims to facilitate a quick and clear understanding of the Dataphin offline data warehouse construction process.
The basic process of constructing an offline data warehouse with Dataphin is outlined as follows:
Main Process |
Description |
Operation Guide |
Preparations |
Begin by preparing the necessary cloud resources, which include setting up an Alibaba Cloud account, activating and configuring Dataphin and MaxCompute, and preparing data sources. |
|
Data Warehouse Planning |
The planning phase is crucial as it serves as the blueprint for data construction. This includes creating data blocks, subject areas, computing sources, data sources, projects, and adding project members. |
|
Data Integration |
Incorporate the prepared data sources into the project. |
|
Specification Definition |
Define statistical metric standards and complete the configuration development using Dataphin's data development module. This includes business objects, activities, atomic metrics, business filters, and derived metrics. |
|
Specification Modeling |
Utilize the specification modeling function of Dataphin to map out the source data and construct the model based on the previously defined specifications. |
|
Data Development |
Following the statistical metric standards outlined in Specification definition and the models detailed in Specification modeling, proceed with the development of specification modeling data. This encompasses logical dimension tables, logical fact tables, atomic metrics, business filters, and derived metrics. |
|
Operations and Maintenance Data Backfill |
Refresh data for tasks, including pipeline tasks, logical dimension tables, logical fact tables, and metrics to ensure they are up-to-date. |
|
Data Verification |
Verify the accuracy of the data by performing ad hoc queries. |