Planning the data warehouse is the foundational step in establishing a data mid-end with Dataphin and is a critical component of the data architecture's top-level design. Prior to initiating data development, it is essential to finalize the data warehouse plan. This includes defining data blocks, projects, data sources, compute sources, and statistical periods. This topic provides a step-by-step guide for planning your data warehouse within this tutorial.
Background information
Data block: A data block is a key element of the logical space, serving as a namespace that is segmented based on business characteristics. The namespace dataphin_tutorial is used as an example in this tutorial.
Compute source: Provides the necessary computing and storage resources for data processing tasks.
Project: A project in Dataphin serves as the primary organizational unit, offering a boundary for multi-user isolation and access control. The project name Dataphin_tutorial is used as an example in this tutorial.
Data source: The process involves reading raw business data and writing it into the data warehouse.
Step 1: Create a data block
Navigate to the top menu bar on the Dataphin home page and select Planning > Data Architecture.
On the Business Unit page, click + New Data Block.
In the New Data Block dialog box, select Basic Mode in the Production Development Type step and click Next.
In the Block Definition section, fill in the Basic Information and Business Information for the block.
Parameter
Description
Block English Name
Enter dataphin_tutorial.
Block Name
Enter Getting Started Tutorial.
Description (Optional)
Enter a brief description. For example, Getting Started Tutorial.
Icon
Select an
icon.Block Architect
Select one or more members responsible for block information settings, including basic information updates, business information updates, and unit management.
Business Owner (Optional)
Responsible for the stability of business use of block data. You can select as needed.
Resource Owner (Optional)
Responsible for ensuring the production quality of block data. You can select as needed.
Click Next to configure the Logical Table Naming Convention. The system will suggest a logical table name based on the naming convention when creating a new logical table, which you can then modify. The current default settings are sufficient.
NoteOnce the data block is established, you can modify the R&D Specification > Table Specification > Logical Table Naming Convention within the data block. Changing the table prefix will impact all subtypes within the same logical table category. For instance, altering the dimension logical table category will affect ordinary dimension logical tables, hierarchical dimension logical tables, and other subtypes.
Click Confirm to finalize the creation of the data block.
For more information on configuring each parameter of the data block, see Create a data block.
Step 2: Create a MaxCompute compute source
In the top menu bar on the Dataphin home page, select Planning > Compute Source.
On the Compute Source page, click Add Compute Source and select MaxCompute Compute Source.
On the Create MaxCompute Compute Source page, enter the required parameters.
Parameter
Description
Compute Source Type
Select MaxCompute.
Endpoint
The default endpoint is that of the Dataphin instance compute engine and cannot be modified.
AccessKey ID
The AccessKey ID can be obtained from the User Information Management page.
AccessKey Secret
The AccessKey Secret can be obtained from the User Information Management page.
MaxCompute Project
Enter dataphin_tutorial (the name of the MaxCompute (ODPS) project created on Alibaba Cloud).
External Project
Do not select this option.
Compute Source Name
Enter dataphin_tutorial.
Description
Enter Offline compute source for the dataphin_tutorial project.
Click Validate And Submit to finalize the creation of the compute source.
For more information on configuring each parameter of the compute source, see Create a MaxCompute compute source.
Step 3: Create a project
In the top menu bar on the Dataphin home page, select Planning > Project.
On the Project Management page, click Create General Project. In the Production Development Type step, select Basic Mode and then click Next.
In the Project Definition step, configure the affiliated block, basic information, business information, security settings, and more settings.
Parameter
Description
Business Unit
Select dataphin_tutorial.
Project English Name
Enter dataphin_tutorial.
Project Name
Enter dataphin_tutorial.
Compute Source Type
Offline Engine: Enable Offline Engine and select MaxCompute.
MAXC: Select the MaxCompute compute source created in dataphin_tutorial.
Project Default Resource Group: The default resource group used for scheduling instances generated by tasks under this project can be customized to modify the resource group corresponding to a single task during task configuration. Only associated and available resource groups can be selected. You can choose Tenant Default Resource Group or create a new resource group. For details on creating a new resource group, see Create a custom resource group.
Space Type
Select General Layer.
NoteSpace type options include Application Layer, Intermediate Layer, Source Layer, and General Layer.
Application Layer (ADS): Defines personalized and diversified data metrics that can be applied to different scenarios based on business needs.
Intermediate Layer (CDM): Data that has been processed, scrubbed, and summarized.
Source Layer (ODS): The raw data of the business system formed after data processing and summarization in the STG layer.
General Layer: Generally used for general tasks or for developing multiple types of tasks simultaneously.
Security Settings
Use default configuration.
More Settings
Use default configuration.
Click Confirm to complete the creation of the project.
For more information on configuring each parameter of the project, see Create a general project.
Step 4: Create a source data source
In the top menu bar on the Dataphin home page, select Management Center > Datasource Management.
On the Datasource page, click + New Data Source. In the Relational Data Source area, select MySQL.
In the Create Mysql Data Source dialog box, configure the data source information.
Parameter
Description
Datasource Name
Enter dataphin_tutorial.
Version
Select MySQL8.
Data Source Description (Optional)
Provide a brief description of the data source. For example, Source data for the getting started tutorial.
Datasource Config
Select "production" Data Source.
Tag (Optional)
Default Not Filled.
JDBC URL
Enter the format
jdbc:mysql://host:port/dataphin. Thehost:portinformation can be viewed on the MySQL instance product page.Importantconnection URL, make sure that the IP address of the Dataphin is set to the database allow list, otherwise the connection may be failed.
When using an intranet address, ensure that the database and Dataphin instance are in the same region.
Username, Password
Enter dataphin.
SSL Encryption
If your MySQL data source is configured with SSL encryption, you can choose to Enable and upload the Truststore Certificate and enter the Truststore Certificate Password for encrypted transmission.
Advanced Settings
Use default configuration.
Click Test Connection.
After the connectivity test is successful, click Confirm to finalize the creation of the data source.
For more information on configuring each parameter of the data source, see Create a MySQL data source.