Data warehouse planning is the first and most critical step in building a data mid-end with Dataphin. Before you start data development, you must define data blocks, projects, data sources, compute sources, and statistical periods. This tutorial walks you through each planning task.
Background information
-
Data block: A namespace within the logical space, segmented by business characteristics. This tutorial uses dataphin_tutorial as the namespace.
-
Compute source: Provides the computing and storage resources required for data processing.
-
Project: The primary organizational unit in Dataphin, providing multi-user isolation and access control. This tutorial uses Dataphin_tutorial as the project name.
-
Data source: Reads raw business data and writes it into the data warehouse.
Step 1: Create a data block
-
Navigate to the top menu bar on the Dataphin home page and select Planning > Data Architecture.
-
On the Business Unit page, click + New Data Block.
-
In the New Data Block dialog box, select Basic Mode in the Production Development Type step and click Next.
-
In the Block Definition section, fill in the Basic Information and Business Information for the block.
Parameter
Description
Block English Name
Enter dataphin_tutorial.
Block Name
Enter Getting Started Tutorial.
Description (Optional)
Enter a brief description. For example, Getting Started Tutorial.
Icon
Select an
icon.Block Architect
Select one or more members responsible for block settings, including basic information updates, business information updates, and unit management.
Business Owner (Optional)
Responsible for the stability of business use of block data. Select as needed.
Resource Owner (Optional)
Responsible for ensuring the production quality of block data. Select as needed.
-
Click Next to configure the Logical Table Naming Convention. The system suggests a logical table name based on this convention when you create a new logical table. You can modify it later. The default settings are sufficient for this tutorial.
NoteOnce the data block is established, you can modify the R&D Specification > Table Specification > Logical Table Naming Convention within the data block. Changing the table prefix affects all subtypes within the same logical table category. For example, altering the dimension logical table category affects ordinary dimension logical tables, hierarchical dimension logical tables, and other subtypes.
-
Click Confirm to finalize the creation of the data block.
For more information on configuring each parameter of the data block, see Create a data block.
Step 2: Create a MaxCompute compute source
-
In the top menu bar on the Dataphin home page, select Planning > Compute Source.
-
On the Compute Source page, click Add Compute Source and select MaxCompute Compute Source.
-
On the Create MaxCompute Compute Source page, enter the required parameters.
Parameter
Description
Compute Source Type
Select MaxCompute.
Endpoint
The default endpoint matches the Dataphin instance compute engine and cannot be modified.
AccessKey ID
Obtain the AccessKey ID from the User Information Management page.
AccessKey Secret
Obtain the AccessKey Secret from the User Information Management page.
MaxCompute Project
Enter dataphin_tutorial (the name of the MaxCompute (ODPS) project created on Alibaba Cloud).
External Project
Do not select this option.
Compute Source Name
Enter dataphin_tutorial.
Description
Enter Offline compute source for the dataphin_tutorial project.
-
Click Validate And Submit to finalize the creation of the compute source.
For more information on configuring each parameter of the compute source, see Create a MaxCompute compute source.
Step 3: Create a project
-
In the top menu bar on the Dataphin home page, select Planning > Project.
-
On the Project Management page, click Create General Project. In the Production Development Type step, select Basic Mode and then click Next.
-
In the Project Definition step, configure the affiliated block, basic information, business information, security settings, and more settings.
Parameter
Description
Business Unit
Select dataphin_tutorial.
Project English Name
Enter dataphin_tutorial.
Project Name
Enter dataphin_tutorial.
Compute Source Type
-
Offline Engine: Enable Offline Engine and select MaxCompute.
-
MAXC: Select the MaxCompute compute source created in dataphin_tutorial.
-
Project Default Resource Group: The default resource group for scheduling instances generated by tasks in this project. You can customize the resource group for individual tasks during task configuration. Only associated and available resource groups can be selected. You can choose Tenant Default Resource Group or create a new resource group. For details on creating a new resource group, see Create a custom resource group.
Space Type
Select General Layer.
Note-
Space type options include Application Layer, Intermediate Layer, Source Layer, and General Layer.
-
Application Layer (ADS): Defines personalized and diversified data metrics for different business scenarios.
-
Intermediate Layer (CDM): Processed, cleansed, and summarized data.
-
Source Layer (ODS): Raw business data after processing and summarization in the STG layer.
-
General Layer: Used for general-purpose tasks or for developing multiple types of tasks simultaneously.
Security Settings
Use default configuration.
More Settings
Use default configuration.
-
-
Click Confirm to complete the creation of the project.
For more information on configuring each parameter of the project, see Create a general-purpose project.
Step 4: Create a source data source
-
In the top menu bar on the Dataphin home page, select Management Center > Datasource Management.
-
On the Datasource page, click + New Data Source. In the Relational Data Source area, select MySQL.
-
In the Create Mysql Data Source dialog box, configure the data source information.
Parameter
Description
Datasource Name
Enter dataphin_tutorial.
Version
Select MySQL8.
Data Source Description (Optional)
Enter a brief description of the data source. For example, Source data for the getting started tutorial.
Datasource Config
Select "production" Data Source.
Tag (Optional)
Default Not Filled.
JDBC URL
Enter the format
jdbc:mysql://host:port/dataphin. You can find thehost:portinformation on the MySQL instance product page.Important-
Before configuring the connection URL, make sure that the IP address of Dataphin is added to the database allow list. Otherwise, the connection may fail.
-
When using an internal endpoint, ensure that the database and Dataphin instance are in the same region.
Username, Password
Enter dataphin.
SSL Encryption
If your MySQL data source uses SSL encryption, you can Enable this option and upload the Truststore Certificate and enter the Truststore Certificate Password for encrypted transmission.
Advanced Settings
Use default configuration.
-
-
Click Test Connection.
-
After the connectivity test is successful, click Confirm to finalize the creation of the data source.
For more information on configuring each parameter of the data source, see Create a MySQL data source.