All Products
Search
Document Center

Dataphin:Create a Logical Fact Table

Last Updated:Jan 23, 2025

A logical fact table is generated after adding multiple attributes to a business activity (process). The logical fact table includes primary keys, measures, and fact attribute fields. The primary key is used to define the data extraction logic from the source table for the logical fact table. Measures are numeric fields that indicate the size, quantity, or degree of a certain aspect. This topic describes how to create and configure a logical fact table.

Limits

  • If you have not purchased the Data Standard module, setting the Data Standard field in the table is not supported.

  • If you have not purchased the Asset Security module, setting the Data Classification and Data Categorization fields in the table is not supported.

  • If you have not purchased the Asset Quality module, implementing Unique and Not Null validation for primary key fields is not supported.

Prerequisites

The creation of the business activity object is completed. For specific operations, see Create and Manage Business Entities.

Procedure

Step 1: Create a Logical Fact Table

  1. On the Dataphin home page, click the top menu bar Development -> Data Development.

  2. Follow the instructions below to enter the Create Logical Fact Table dialog box.

    Select Project (Dev-Prod mode requires selecting the environment) -> click Logical Fact Table -> click image icon.

    image

  3. On the Create Logical Fact Table page, configure the parameters.

    Parameter

    Description

    Business Activity

    Select the business activity for which you want to create a logical fact table. Business activities are divided into three types: Business Event, Business Snapshot, and Business Process.

    Table Type

    The type of logical table is determined by the type of business activity.

    • If Business Activity is selected as Business Event, the Table Type is Event Logical Fact Table.

    • If Business Activity is selected as Business Snapshot, the Table Type is Snapshot Logical Fact Table.

    • If Business Activity is selected as Business Process, the Table Type is Process Logical Fact Table.

    Business Unit

    Defaults to the name of the data block where the business activity is located and cannot be modified.

    Data Domain

    Defaults to the name of the subject area where the business activity is located and cannot be modified.

    Compute Engine

    Set the Dataphin instance to support selecting the compute engine for tenants with Hadoop compute engines, including Hive, Impala, and Spark.

    Important

    The corresponding compute engine needs to be enabled before it can be selected. For more information, see Create a Hadoop Compute Source.

    The compute engine has the following limitations, as detailed below:

    • Hive: Cannot read source tables stored in Kudu format.

    • Impala: Can read source tables stored in Kudu format but does not currently support storing logical tables in Kudu. If there are no source tables in Kudu format, it is not recommended to use.

    • Spark: Spark cannot read source tables stored in Kudu format.

    Data Latency

    Data latency is used to define the scheduling type of the subsequent logical table task in the production environment. The data latency of the logical fact table supports selecting Offline T+1 (day task), Offline T+h (hour task), and Offline T+m (minute task).

    Note

    ArgoDB and StarRocks compute engines only support offline T+1 (day table).

    Logical Table Name

    Enter the logical table name, which must be within 100 characters. After selecting a business object, the system will automatically fill in the logical table name. The filling rule is <Data Block Name>.fct_<Business Object Code>_<Data Latency>.

    Important
    • Only letters, numbers, and underscores (_) are allowed, and the first character must be a letter. It is case-insensitive, and when uppercase letters are entered, the system will automatically convert them to lowercase.

    • label_ is a reserved prefix in the system and cannot start with label_.

    • The total length of the table name for AnalyticDB for PostgreSQL must be within 50 characters.

    The format of the logical table name varies with different data latencies, and the display information of <Data Latency> is different:

    • di: T+1 latency, daily incremental, storing the incremental data of the current day every day.

    • hi: T+h latency, hourly incremental, storing the incremental data of the current hour every hour.

    • thi: T+h latency, hourly incremental, storing the incremental data from 0:00 to the current hour every hour.

    • mi: T+m latency, minute incremental, storing the incremental data of the last 15 minutes every 15 minutes partition.

    • tmi: T+m latency, minute incremental, storing the incremental data from 0:00 to the current minute every 15 minutes.

    • df/da: T+1 latency, daily full, storing the full data up to the current day every day.

    • hf: T+h latency, hourly full, storing the full data up to the current hour every hourevery hour.

    • mf: T+m latency, minute full, storing the full data up to the last 15 minutes every 15 minutes.

    Chinese Name

    The naming rule for custom names is:

    • Contains Chinese characters, letters, numbers, underscores (_), or hyphens (-).

    • Cannot exceed 64 characters.

    Description

    Provide a brief description of the logical fact table, within 1000 characters.

  4. Click OK.

Step 2: Configure the Logical Fact Table Structure

  1. On the Table Structure page, configure the parameters.

    image

    Area

    Description

    Table Field Positioning

    Click to locate the field list's Basic Configuration or Asset Administration configuration.

    Search and Filter Area

    You can search for the required fields by the table field name.

    Click image..png to filter based on Data Type, Field Category, Whether Associated Dimension, Associated Dimension, Field Constraint, Data Classification conditions.

    List Operation

    • Reorder Columns: Before adjusting the field order, ensure that no downstream uses select * to query this logical table to avoid data errors.

    • Introduce Fields: This feature allows you to add new fields to a table, either through the Create Table Statement or by selecting from an existing From Table. For detailed instructions, see Introduce Fields to Logical Fact Table.

    • Add Fields: Supports adding Data Fields and Partition Fields. You can edit the Name, Description, Data Type, Column Type, Associated Dimension, Recommend Standards, Field Constraint, Data Class, Data Sensitivity Level, and Remarks information according to the business situation.

      Note

      The MaxCompute engine supports creating no more than 6-level partition fields.

      ArgoDB and StarRocks compute engines do not support adding partition fields.

    Field List

    The field list displays detailed information of fields such as Serial Number, Column Name, Description, Data Type, Column Type, Associated Dimension, Recommend Standards, Field Constraint, Data Class, Data Sensitivity Level, and Remarks.

    • Serial Number: The serial number of the table field. Each time a field is added, it increments by 1.

    • Field Name: The name of the table field. You can enter the field name or Chinese keywords, and it will automatically match the standard preset field name.

    • Description: The description information of the table field, limited to 512 characters.

    • Data Type: Supports string, bigint, double, timestamp, decimal, Text, Numeric, Datetime Variable, and Other data types.

    • Column Type: Supports setting as Primary Key, Partition, Attribute.

      Note
      • Only one primary key field is allowed.

      • Only fields of type string, varchar, bigint, int, tinyint, and smallint are supported as partition fields.

    • Associated Dimension: For detailed instructions, refer to Add Associated Dimension.

    • Recommend Standards: Select the field standard for the field. To create a standard, see Create and Manage Data Standards.

    • Field Constraint: Select the field constraint for the field. Supports Unique and Not Null constraints.

    • Data Class: Select the data classification for the field. To create a data classification, see Create Data Classification.

    • Data Sensitivity Level: After selecting the data classification, the system will automatically detect the data level.

    • Remarks: Fill in the remarks information of the field. Limited to 2048 characters.

    At the same time, you can perform the Delete operation on the field in the operation column.

    Note

    Field deletion cannot be undone.

    Batch Operation

    You can batch select table fields to perform the following operations.

    • Delete: Click image icon to batch delete the selected data fields.

    • Data Type: Click image icon to batch modify the selected data types.

    • Column Type: Click image icon to batch modify the selected field categories.

    • Recommend Names: Click image icon, and the system will perform word segmentation on the Description content of the field and match the created root words to recommend field names. You can replace the selected field name with the modified value in the Root Naming dialog box. As shown in the following figure:

      Note
      • If the recommended field names do not meet the requirements, you can modify them in the Modified Field Name input box.

      • Click Reset to reset the Modified Field Name to the system's hit root word.

    • Recommend Standards: Click image icon, and the system will recommend field standards based on the field name. You can set the field to the recommended field standard in the Field Standard dialog box.

    • Field Constraint: Click image icon to batch set field constraints.

      Important

      Sub-dimension logical tables do not support setting field constraints.

  2. Click Save And Next.

Introduce Fields to Logical Fact Table

  • From Table

    1. On the configuration page of the logical fact table created above, click From Table.

    2. In the From Table dialog box, select the Source Table and then select the fields you need to add.

      Parameter

      Description

      Source Table

      Supports selecting all physical tables (non-Dataphin automatically generated physical tables), logical tables, and views (non-parameterized views) with read permissions under the current tenant.

      For how to obtain read permissions for physical tables, see Apply, Renew, and Return Table Permissions.

      Field List

      Select the fields you need to add.

      Note

      You can select fields from multiple source tables by switching the source table method.

      Selected Fields

      The added fields will be added to the selected fields. You can perform the Delete operation on the selected fields.

    3. Click Add to introduce fields from a physical table to the logical fact table.

      After adding to the New Field area, you can edit the field's name, data type, field category, and associated dimension according to the business situation.

  • From Create Table Statement

    Depending on whether the logical fact table has defined a primary key, select the source table for the measure field. If the logical fact table has already defined a primary key, the measure field can only be selected from the main source table. If the logical fact table does not have a primary key defined, the measure field can be selected from the main source table and other data tables.

    1. On the configuration page of the logical fact table created above, click From Create Table Statement.

    2. In the From Create Table Statement dialog box, fill in the create table statement and click Parse SQL.

    3. Select the fields you need in the field list and click Add to create new fields for the logical fact table.

Add Associated Dimension

The foreign key fields in the fact attributes of the logical fact table are associated with dimensions.

  1. On the configuration page of the logical fact table, click gagag icon to enter the edit model relationship dialog box.

  2. In the Edit Model Relationship dialog box, configure the parameters.

    Area

    Parameter

    Description

    Null Value Replacement

    If the foreign key field of the logical fact table cannot be associated with the dimension table, the system will set the Default Value to -110.

    Associated Dimension

    Associated Entity,Logical Dimension Table

    Select the Associated Entity and Logical Dimension Table you have created.

    Edit Association Logic

    Association Logic

    Select the fields in the logical fact table and configure the relationship between the fields of the logical fact table and the primary key of the dimension logical table that needs to be associated.

    Dimension Table Version Policy

    Select the version (partition) of the associated dimension table. By default, the same cycle partition version as the main table is used.

    Missing Association Policy

    The missing association policy is used to define the computing logic of fields that exist in the main source table (left table) but do not exist in the dimension logical table (right table), that is, fields that cannot be associated. You can choose Retain Original Missing Association Data and Use Default Value For Missing Association Data:

    • Retain Original Missing Association Data: When creating derived metrics, retain the original data of the left table.

    • Use Default Value For Missing Association Data: When fields in the dimension table lack an association with the logical fact table, apply the default value of -110.

    Edit Dimension Role

    Role English Name, Role Name

    The dimension role functions as a dimension alias. When referencing the same dimension multiple times, it cannot be defined with the same name, serving as a role-playing effect. Define the Role English Name and Role Name:

    • The prefix of Role English Name defaults to dim, and the naming rule for the custom part is:

      • Contains letters, numbers, or underscores (_).

      • Cannot exceed 64 characters.

    • The naming rule for Role Name is:

      • Contains Chinese characters, numbers, letters, underscores (_), or hyphens (-).

      • Cannot exceed 64 characters.

  3. Click OK.

Step 3: Configure the Logical Fact Table Computation Logic

  1. On the Computation Logic page, configure the mapping relationship between the source data and the primary key of the logical fact table.

  2. Click Source Configuration to enter the Source Configuration dialog box and configure the source parameters in the Source Configuration dialog box.

    Note
    • It is recommended not to set additional filters on event time in filter conditions or custom SQL.

    • The current logical table has no primary key and can only set one source. If there are multiple source tables, please use custom SQL to complete the association of multiple tables in advance.

    Parameter

    Description

    Source Type

    Supports three types of sources: Physical Table, SQL, and Logical Table.

    Source Table Type Description:

    • Source Table with Primary Key: Logical tables with primary keys allow configuring multiple sources, with the first one fixed as the main source. The total amount of data (number of rows) in the logical table is determined by this source.

    • Source Table without Primary Key: Logical tables without primary keys only allow configuring one source. If there are multiple source tables, please use custom SQL to complete the association of multiple tables in advance.

      Note

      If you need to configure multiple source objects, you can click Add Source Object to add new ones.

    Source Object

    • Select Physical Table: Supports selecting all physical tables (non-Dataphin automatically generated physical tables) and physical views (non-parameterized views) with read permissions under the current tenant.

      For how to obtain read permissions for physical tables, see Apply, Renew, and Return Table Permissions.

    • Select SQL: Click image icon and enter content in the edit box, for example:

      select id, name
      from project_name_dev.table_name1 t1
      join
      project_name2_dev.table_name2 t2
      on t1.id = t2.id
    • Select Logical Table: Supports selecting all logical tables with read permissions under the current tenant.

      For how to obtain read permissions for physical tables, see Apply, Renew, and Return Table Permissions.

      Important

      Using a logical table as the data source for another logical table increases the complexity of the computation logic and the difficulty of operation and maintenance. Please choose carefully.

    Object Alias

    Customize the alias of the source table. For example: t1, t2.

    Object Description

    Please enter the description information of the object. Limited to 1000 characters.

    Filter Condition

    Click image icon and enter content in the edit box, for example:

    ds=${bizdate} and condition1=value1.

    Custom SQL has no filter conditions.

    Foreign Key Field

    The field in the source object corresponding to the primary key of the logical table can be associated with the primary key.

    Delete

    The main source does not support deletion.

    For logical tables without primary keys, deleting the source will clear the computation logic of the fields.

  3. Click OK to complete the source configuration.

  4. After completing the source configuration, drag the source fields into the computation logic.

    You can also click the Map By Name button to batch place the data source fields into the computation logic of the logical table fields with the same name.

    Click image icon, and you can edit the computation logic expression in the edit box (expressions do not support aggregate functions: sum, count, min, etc.), for example:

    • Example 1: substr(t1.column2, 3, 10)

    • Example 2: case when t1.column2 != '1' then 'Y' else 'N' end

    • Example 3: t1.column2 + t2.column1

  5. Click the bottom image..png icon to verify the validity of the expression.

    Click the Preview SQL button to view the computation logic SQL.

  6. Click Save And Next.

Step 4: Configure Constraints

  1. The system generates quality rules for the current logical table within the quality module, based on field constraints. Here, you can define the rule strength for the field, which includes both hard rules and soft rules. For more information, see Data Table Quality Rules.

  2. Click Save And Next

Step 5: Configure the Logical Fact Table Scheduling Configuration

  1. On the Scheduling & Parameter Configuration page, configure the Data Latency, Scheduling Properties, Scheduling Dependencies, Scheduling Parameters, Running Configuration of the logical fact table.

    Parameter

    Description

    Data Latency

    When data latency is enabled, the system will automatically rerun the data for the specified logical table within the Maximum Latency Days period. For detailed instructions, see Configure Logical Table Data Latency.

    Scheduling Properties

    This section outlines how to define the scheduling method for a logical fact table in a production environment. It allows you to set the scheduling type, frequency, logic, and execution parameters for the table. For detailed instructions, see: Configure Logical Table Scheduling Properties.

    Upstream Dependency

    This section outlines how to define the nodes within a logical table for a scheduling task. Dataphin systematically executes these nodes according to the configured scheduling dependencies, ensuring the timely and effective production of business data. For detailed instructions, see: Configure Logical Table Upstream Dependency.

    Parameter Configuration

    Parameter configuration involves assigning values to variables within the code to ensure that when the node is scheduled, these parameter variables are automatically substituted with their respective values. On the scheduling parameter configuration page, you can choose to Ignore or Convert To Global Variable for managing parameters. For more information, see: Logical Table Parameter Configuration.

    Running Configuration

    Configure the task-level running timeout and rerun policy for this logical fact table to match your business scenario. This approach prevents resource waste from prolonged computing task occupation and enhances the reliability of task operations. For detailed instructions, see Logical Table Running Configuration.

    Resource Configuration

    You can set up a scheduling resource group for the current logical table task. During scheduling, the task will utilize the resource quota of the assigned group. For configuration instructions, see Logical Table Resource Configuration.

Step 6: Save and Submit the Logical Table

  1. After completing the logical fact table configuration, click Save and Submit. The system will perform configuration verification on Table Structure, Computation Logic, Scheduling Dependencies, and Running Parameters. You can check the configuration information and complete the configuration according to the Check Results that did not pass.

  2. After all the check results pass, fill in the submission remarks and click Confirm and Submit.

    When you submit a task, Dataphin conducts a lineage analysis and performs submission checks. For more information, see Specification Modeling Task Submission Instructions.

What to do next

  • In Dev-Prod mode, the project requires you to publish the logical table to the production environment. For detailed instructions, see Manage Publishing Tasks.

  • Once the logical table has been deployed to the production environment, you can manage and monitor its associated tasks within the Operation Center. For detailed instructions, see Scheduling center.