All Products
Search
Document Center

Dataphin:Offline data warehouse construction flow

Last Updated:Jan 21, 2025

This tutorial provides a comprehensive guide on using Dataphin for constructing a basic offline data warehouse. It covers the initial stages of resource preparation and data warehouse planning, through to the later stages of operations, maintenance, data backfill, and result analysis. The tutorial aims to facilitate a quick and clear understanding of the Dataphin offline data warehouse construction process.

The basic process of constructing an offline data warehouse with Dataphin is outlined as follows:

Main Process

Description

Operation Guide

Preparations

Begin by preparing the necessary cloud resources, which include setting up an Alibaba Cloud account, activating and configuring Dataphin and MaxCompute, and preparing data sources.

Step 1: Preparations

Data Warehouse Planning

The planning phase is crucial as it serves as the blueprint for data construction. This includes creating data blocks, subject areas, computing sources, data sources, projects, and adding project members.

Step 2: Data Warehouse Planning

Data Integration

Incorporate the prepared data sources into the project.

Step 3: Introduce Data

Specification Definition

Define statistical metric standards and complete the configuration development using Dataphin's data development module. This includes business objects, activities, atomic metrics, business filters, and derived metrics.

Step 4: Specification Definition

Specification Modeling

Utilize the specification modeling function of Dataphin to map out the source data and construct the model based on the previously defined specifications.

Step 5: Specification Modeling

Data Development

Following the statistical metric standards outlined in Specification definition and the models detailed in Specification modeling, proceed with the development of specification modeling data. This encompasses logical dimension tables, logical fact tables, atomic metrics, business filters, and derived metrics.

  1. Step 1: Develop Logical Dimension Tables

  2. Step 2: Develop Logical Fact Tables

  3. Step 3: Develop Atomic Metrics

  4. Step 4: Develop Business Filters

  5. Step 5: Develop Derived Metrics

Operations and Maintenance Data Backfill

Refresh data for tasks, including pipeline tasks, logical dimension tables, logical fact tables, and metrics to ensure they are up-to-date.

Step 7: Operations and Maintenance Data Backfill

Data Verification

Verify the accuracy of the data by performing ad hoc queries.

Step 8: Data Verification