A logical fact table is generated after adding multiple attributes to a business activity (process). The logical fact table includes primary keys, measures, and fact attribute fields. The primary key is used to define the data extraction logic from the source table for the logical fact table. Measures are numeric fields that indicate the size, quantity, or degree of a certain aspect. This topic describes how to create and configure a logical fact table.
Limits
If you have not purchased the Data Standard module, setting the Data Standard field in the table is not supported.
If you have not purchased the Asset Security module, setting the Data Classification and Data Categorization fields in the table is not supported.
If you have not purchased the Asset Quality module, implementing Unique and Not Null validation for primary key fields is not supported.
Prerequisites
The creation of the business activity object is completed. For specific operations, see Create and Manage Business Entities.
Procedure
Step 1: Create a Logical Fact Table
On the Dataphin home page, click the top menu bar Development -> Data Development.
Follow the instructions below to enter the Create Logical Fact Table dialog box.
Select Project (Dev-Prod mode requires selecting the environment) -> click Logical Fact Table -> click
icon.
On the Create Logical Fact Table page, configure the parameters.
Parameter
Description
Business Activity
Select the business activity for which you want to create a logical fact table. Business activities are divided into three types: Business Event, Business Snapshot, and Business Process.
Table Type
The type of logical table is determined by the type of business activity.
If Business Activity is selected as Business Event, the Table Type is Event Logical Fact Table.
If Business Activity is selected as Business Snapshot, the Table Type is Snapshot Logical Fact Table.
If Business Activity is selected as Business Process, the Table Type is Process Logical Fact Table.
Business Unit
Defaults to the name of the data block where the business activity is located and cannot be modified.
Data Domain
Defaults to the name of the subject area where the business activity is located and cannot be modified.
Compute Engine
Set the Dataphin instance to support selecting the compute engine for tenants with Hadoop compute engines, including Hive, Impala, and Spark.
ImportantThe corresponding compute engine needs to be enabled before it can be selected. For more information, see Create a Hadoop Compute Source.
The compute engine has the following limitations, as detailed below:
Hive: Cannot read source tables stored in Kudu format.
Impala: Can read source tables stored in Kudu format but does not currently support storing logical tables in Kudu. If there are no source tables in Kudu format, it is not recommended to use.
Spark: Spark cannot read source tables stored in Kudu format.
Data Latency
Data latency is used to define the scheduling type of the subsequent logical table task in the production environment. The data latency of the logical fact table supports selecting Offline T+1 (day task), Offline T+h (hour task), and Offline T+m (minute task).
NoteArgoDB and StarRocks compute engines only support offline T+1 (day table).
Logical Table Name
Enter the logical table name, which must be within 100 characters. After selecting a business object, the system will automatically fill in the logical table name. The filling rule is
<Data Block Name>.fct_<Business Object Code>_<Data Latency>
.ImportantOnly letters, numbers, and underscores (_) are allowed, and the first character must be a letter. It is case-insensitive, and when uppercase letters are entered, the system will automatically convert them to lowercase.
label_
is a reserved prefix in the system and cannot start withlabel_
.The total length of the table name for AnalyticDB for PostgreSQL must be within 50 characters.
The format of the logical table name varies with different data latencies, and the display information of
<Data Latency>
is different:di: T+1 latency, daily incremental, storing the incremental data of the current day every day.
hi: T+h latency, hourly incremental, storing the incremental data of the current hour every hour.
thi: T+h latency, hourly incremental, storing the incremental data from 0:00 to the current hour every hour.
mi: T+m latency, minute incremental, storing the incremental data of the last 15 minutes every 15 minutes partition.
tmi: T+m latency, minute incremental, storing the incremental data from 0:00 to the current minute every 15 minutes.
df/da: T+1 latency, daily full, storing the full data up to the current day every day.
hf: T+h latency, hourly full, storing the full data up to the current hour every hourevery hour.
mf: T+m latency, minute full, storing the full data up to the last 15 minutes every 15 minutes.
Chinese Name
The naming rule for custom names is:
Contains Chinese characters, letters, numbers, underscores (_), or hyphens (-).
Cannot exceed 64 characters.
Description
Provide a brief description of the logical fact table, within 1000 characters.
Click OK.
Step 2: Configure the Logical Fact Table Structure
On the Table Structure page, configure the parameters.
Area
Description
①Table Field Positioning
Click to locate the field list's Basic Configuration or Asset Administration configuration.
①Search and Filter Area
You can search for the required fields by the table field name.
Click
to filter based on Data Type, Field Category, Whether Associated Dimension, Associated Dimension, Field Constraint, Data Classification conditions.
②List Operation
Reorder Columns: Before adjusting the field order, ensure that no downstream uses
select *
to query this logical table to avoid data errors.Introduce Fields: This feature allows you to add new fields to a table, either through the Create Table Statement or by selecting from an existing From Table. For detailed instructions, see Introduce Fields to Logical Fact Table.
Add Fields: Supports adding Data Fields and Partition Fields. You can edit the Name, Description, Data Type, Column Type, Associated Dimension, Recommend Standards, Field Constraint, Data Class, Data Sensitivity Level, and Remarks information according to the business situation.
NoteThe MaxCompute engine supports creating no more than 6-level partition fields.
ArgoDB and StarRocks compute engines do not support adding partition fields.
③Field List
The field list displays detailed information of fields such as Serial Number, Column Name, Description, Data Type, Column Type, Associated Dimension, Recommend Standards, Field Constraint, Data Class, Data Sensitivity Level, and Remarks.
Serial Number: The serial number of the table field. Each time a field is added, it increments by 1.
Field Name: The name of the table field. You can enter the field name or Chinese keywords, and it will automatically match the standard preset field name.
Description: The description information of the table field, limited to 512 characters.
Data Type: Supports string, bigint, double, timestamp, decimal, Text, Numeric, Datetime Variable, and Other data types.
Column Type: Supports setting as Primary Key, Partition, Attribute.
NoteOnly one primary key field is allowed.
Only fields of type string, varchar, bigint, int, tinyint, and smallint are supported as partition fields.
Associated Dimension: For detailed instructions, refer to Add Associated Dimension.
Recommend Standards: Select the field standard for the field. To create a standard, see Create and Manage Data Standards.
Field Constraint: Select the field constraint for the field. Supports Unique and Not Null constraints.
Data Class: Select the data classification for the field. To create a data classification, see Create Data Classification.
Data Sensitivity Level: After selecting the data classification, the system will automatically detect the data level.
Remarks: Fill in the remarks information of the field. Limited to 2048 characters.
At the same time, you can perform the Delete operation on the field in the operation column.
NoteField deletion cannot be undone.
④Batch Operation
You can batch select table fields to perform the following operations.
Delete: Click
icon to batch delete the selected data fields.
Data Type: Click
icon to batch modify the selected data types.
Column Type: Click
icon to batch modify the selected field categories.
Recommend Names: Click
icon, and the system will perform word segmentation on the Description content of the field and match the created root words to recommend field names. You can replace the selected field name with the modified value in the Root Naming dialog box. As shown in the following figure:
NoteIf the recommended field names do not meet the requirements, you can modify them in the Modified Field Name input box.
Click Reset to reset the Modified Field Name to the system's hit root word.
Recommend Standards: Click
icon, and the system will recommend field standards based on the field name. You can set the field to the recommended field standard in the Field Standard dialog box.
Field Constraint: Click
icon to batch set field constraints.
ImportantSub-dimension logical tables do not support setting field constraints.
Click Save And Next.
Introduce Fields to Logical Fact Table
From Table
On the configuration page of the logical fact table created above, click From Table.
In the From Table dialog box, select the Source Table and then select the fields you need to add.
Parameter
Description
Source Table
Supports selecting all physical tables (non-Dataphin automatically generated physical tables), logical tables, and views (non-parameterized views) with read permissions under the current tenant.
For how to obtain read permissions for physical tables, see Apply, Renew, and Return Table Permissions.
Field List
Select the fields you need to add.
NoteYou can select fields from multiple source tables by switching the source table method.
Selected Fields
The added fields will be added to the selected fields. You can perform the Delete operation on the selected fields.
Click Add to introduce fields from a physical table to the logical fact table.
After adding to the New Field area, you can edit the field's name, data type, field category, and associated dimension according to the business situation.
From Create Table Statement
Depending on whether the logical fact table has defined a primary key, select the source table for the measure field. If the logical fact table has already defined a primary key, the measure field can only be selected from the main source table. If the logical fact table does not have a primary key defined, the measure field can be selected from the main source table and other data tables.
On the configuration page of the logical fact table created above, click From Create Table Statement.
In the From Create Table Statement dialog box, fill in the create table statement and click Parse SQL.
Select the fields you need in the field list and click Add to create new fields for the logical fact table.
Add Associated Dimension
The foreign key fields in the fact attributes of the logical fact table are associated with dimensions.
On the configuration page of the logical fact table, click
icon to enter the edit model relationship dialog box.
In the Edit Model Relationship dialog box, configure the parameters.
Area
Parameter
Description
Null Value Replacement
If the foreign key field of the logical fact table cannot be associated with the dimension table, the system will set the Default Value to -110.
Associated Dimension
Associated Entity,Logical Dimension Table
Select the Associated Entity and Logical Dimension Table you have created.
Edit Association Logic
Association Logic
Select the fields in the logical fact table and configure the relationship between the fields of the logical fact table and the primary key of the dimension logical table that needs to be associated.
Dimension Table Version Policy
Select the version (partition) of the associated dimension table. By default, the same cycle partition version as the main table is used.
Missing Association Policy
The missing association policy is used to define the computing logic of fields that exist in the main source table (left table) but do not exist in the dimension logical table (right table), that is, fields that cannot be associated. You can choose Retain Original Missing Association Data and Use Default Value For Missing Association Data:
Retain Original Missing Association Data: When creating derived metrics, retain the original data of the left table.
Use Default Value For Missing Association Data: When fields in the dimension table lack an association with the logical fact table, apply the default value of -110.
Edit Dimension Role
Role English Name, Role Name
The dimension role functions as a dimension alias. When referencing the same dimension multiple times, it cannot be defined with the same name, serving as a role-playing effect. Define the Role English Name and Role Name:
The prefix of Role English Name defaults to
dim
, and the naming rule for the custom part is:Contains letters, numbers, or underscores (_).
Cannot exceed 64 characters.
The naming rule for Role Name is:
Contains Chinese characters, numbers, letters, underscores (_), or hyphens (-).
Cannot exceed 64 characters.
Click OK.
Step 3: Configure the Logical Fact Table Computation Logic
On the Computation Logic page, configure the mapping relationship between the source data and the primary key of the logical fact table.
Click Source Configuration to enter the Source Configuration dialog box and configure the source parameters in the Source Configuration dialog box.
NoteIt is recommended not to set additional filters on event time in filter conditions or custom SQL.
The current logical table has no primary key and can only set one source. If there are multiple source tables, please use custom SQL to complete the association of multiple tables in advance.
Parameter
Description
Source Type
Supports three types of sources: Physical Table, SQL, and Logical Table.
Source Table Type Description:
Source Table with Primary Key: Logical tables with primary keys allow configuring multiple sources, with the first one fixed as the main source. The total amount of data (number of rows) in the logical table is determined by this source.
Source Table without Primary Key: Logical tables without primary keys only allow configuring one source. If there are multiple source tables, please use custom SQL to complete the association of multiple tables in advance.
NoteIf you need to configure multiple source objects, you can click Add Source Object to add new ones.
Source Object
Select Physical Table: Supports selecting all physical tables (non-Dataphin automatically generated physical tables) and physical views (non-parameterized views) with read permissions under the current tenant.
For how to obtain read permissions for physical tables, see Apply, Renew, and Return Table Permissions.
Select SQL: Click
icon and enter content in the edit box, for example:
select id, name from project_name_dev.table_name1 t1 join project_name2_dev.table_name2 t2 on t1.id = t2.id
Select Logical Table: Supports selecting all logical tables with read permissions under the current tenant.
For how to obtain read permissions for physical tables, see Apply, Renew, and Return Table Permissions.
ImportantUsing a logical table as the data source for another logical table increases the complexity of the computation logic and the difficulty of operation and maintenance. Please choose carefully.
Object Alias
Customize the alias of the source table. For example: t1, t2.
Object Description
Please enter the description information of the object. Limited to 1000 characters.
Filter Condition
Click
icon and enter content in the edit box, for example:
ds=${bizdate} and condition1=value1.
Custom SQL has no filter conditions.
Foreign Key Field
The field in the source object corresponding to the primary key of the logical table can be associated with the primary key.
Delete
The main source does not support deletion.
For logical tables without primary keys, deleting the source will clear the computation logic of the fields.
Click OK to complete the source configuration.
After completing the source configuration, drag the source fields into the computation logic.
You can also click the Map By Name button to batch place the data source fields into the computation logic of the logical table fields with the same name.
Click
icon, and you can edit the computation logic expression in the edit box (expressions do not support aggregate functions: sum, count, min, etc.), for example:
Example 1:
substr(t1.column2, 3, 10)
Example 2:
case when t1.column2 != '1' then 'Y' else 'N' end
Example 3:
t1.column2 + t2.column1
Click the bottom
icon to verify the validity of the expression.
Click the Preview SQL button to view the computation logic SQL.
Click Save And Next.
Step 4: Configure Constraints
The system generates quality rules for the current logical table within the quality module, based on field constraints. Here, you can define the rule strength for the field, which includes both hard rules and soft rules. For more information, see Data Table Quality Rules.
Click Save And Next
Step 5: Configure the Logical Fact Table Scheduling Configuration
On the Scheduling & Parameter Configuration page, configure the Data Latency, Scheduling Properties, Scheduling Dependencies, Scheduling Parameters, Running Configuration of the logical fact table.
Parameter
Description
Data Latency
When data latency is enabled, the system will automatically rerun the data for the specified logical table within the Maximum Latency Days period. For detailed instructions, see Configure Logical Table Data Latency.
Scheduling Properties
This section outlines how to define the scheduling method for a logical fact table in a production environment. It allows you to set the scheduling type, frequency, logic, and execution parameters for the table. For detailed instructions, see: Configure Logical Table Scheduling Properties.
Upstream Dependency
This section outlines how to define the nodes within a logical table for a scheduling task. Dataphin systematically executes these nodes according to the configured scheduling dependencies, ensuring the timely and effective production of business data. For detailed instructions, see: Configure Logical Table Upstream Dependency.
Parameter Configuration
Parameter configuration involves assigning values to variables within the code to ensure that when the node is scheduled, these parameter variables are automatically substituted with their respective values. On the scheduling parameter configuration page, you can choose to Ignore or Convert To Global Variable for managing parameters. For more information, see: Logical Table Parameter Configuration.
Running Configuration
Configure the task-level running timeout and rerun policy for this logical fact table to match your business scenario. This approach prevents resource waste from prolonged computing task occupation and enhances the reliability of task operations. For detailed instructions, see Logical Table Running Configuration.
Resource Configuration
You can set up a scheduling resource group for the current logical table task. During scheduling, the task will utilize the resource quota of the assigned group. For configuration instructions, see Logical Table Resource Configuration.
Step 6: Save and Submit the Logical Table
After completing the logical fact table configuration, click Save and Submit. The system will perform configuration verification on Table Structure, Computation Logic, Scheduling Dependencies, and Running Parameters. You can check the configuration information and complete the configuration according to the Check Results that did not pass.
After all the check results pass, fill in the submission remarks and click Confirm and Submit.
When you submit a task, Dataphin conducts a lineage analysis and performs submission checks. For more information, see Specification Modeling Task Submission Instructions.
What to do next
In Dev-Prod mode, the project requires you to publish the logical table to the production environment. For detailed instructions, see Manage Publishing Tasks.
Once the logical table has been deployed to the production environment, you can manage and monitor its associated tasks within the Operation Center. For detailed instructions, see Scheduling center.