All Products
Search
Document Center

Dataphin:Create a General Logical Dimension Table

Last Updated:Jan 21, 2025

A general logical dimension table is used to describe an entity object and includes various descriptions of the entity object. For example, a member general logical dimension table includes data such as member name, member ID, and member email. This topic describes how to create and configure a general logical dimension table.

Limits

  • If you have not purchased the Data Standard module, setting the Data Standard field in the table is not supported.

  • If you have not purchased the Asset Security module, setting the Data Grading and Data Classification fields in the table is not supported.

  • If you have not purchased the Asset Quality module, implementing Unique and Not Null validation for primary key fields is not supported.

Prerequisites

The creation of a business entity is completed. For more information, see create and manage business entities.

Procedure

Step 1: Create a General Logical Dimension Table

  1. In the top menu bar on the Dataphin home page, select Development > Data Development.

  2. Select Project in the top menu bar (Dev-Prod mode also requires selecting Environment).

  3. In the left-side navigation pane, select Standard Modeling > Logical Dimension Table.

  4. In the list of logical dimension tables on the right, click the image new icon.

  5. In the New Logical Dimension Table dialog box, configure the following parameters.

    Parameter

    Description

    Business Object

    Select General Object.

    Table Type

    When the business object is General Object, the table type is General Logical Dimension Table and cannot be modified.

    Business Unit

    Defaults to the data section associated with the project and cannot be modified.

    Data Domain

    Defaults to the subject area where the business object is located and cannot be modified.

    Compute Engine

    Set the Dataphin instance to support the selection of compute engines, including Hive, Impala, and Spark, for tenants of the Hadoop compute engine.

    Important
    • The corresponding compute engine must be enabled before it can be selected. For more information, see create a Hadoop compute source.

    • When the compute engine is TDH 6.x or TDH 9.3.x, this option is not supported.

    There are the following restrictions on the compute engine, detailed as follows:

    • Hive: Cannot read source tables stored in Kudu format.

    • Impala: Can read source tables stored in Kudu format, but does not currently support storing logical tables in Kudu. If there are no source tables in Kudu format, it is not recommended to use.

    • Spark: Spark cannot read source tables stored in Kudu format.

    Data Timeliness

    Data timeliness is used to define the data timeliness of the subsequent logical dimension table. The data timeliness of a general logical dimension table supports selecting T+1 (daily table), T+h (hourly table), and T+m (minute table).

    Note

    ArgoDB and StarRocks compute engines only support offline T+1 (daily table).

    Logical Table Name

    Enter the logical table name. The total length of the table name must be within 100 characters. After selecting a business object, the system will automatically fill in the logical table name. The fill rule is <data section name>.dim_<business object code>_<data timeliness>.

    Important
    • Only letters, numbers, and underscores (_) are allowed, and the first character must be a letter. Case is not sensitive. When entering uppercase letters, the system will automatically convert them to lowercase.

    • label_ is a reserved prefix in the system and cannot start with label_.

    • The total length of the table name for AnalyticDB for PostgreSQL must be within 50 characters.

    According to the selected data timeliness above, the <data timeliness> displays different information, detailed as follows:

    • df: T+1 timeliness, daily full data, stores full data up to the current day every day.

    • hf: T+h timeliness, hourly full data, stores full data up to the current hour every hour.

    • mf: T+m timeliness, minute full data, stores full data up to the last 15 minutes every 15 minutes.

    Chinese Name

    The naming rules are as follows:

    • Cannot exceed 128 characters.

    • Any character is supported.

    Description

    Enter a brief description of the logical dimension table, within 1000 characters.

  6. Click OK to complete the creation of the general logical dimension table.

Step 2: Configure General Logical Dimension Table Field Information

  1. On the Table Structure configuration page, configure the table fields, data types, field categories, and other structural information of the current general logical dimension table.

    image..png

    Note

    Click the image icon next to the logical table name to view its basic information.

    Block

    Description

    Field Positioning

    Click to quickly navigate to the Basic Configuration or Asset Administration sections of the field list.

    Search and Filter

    Search for fields by name as needed.

    Click image..png to apply filters such as Data Type, Field Category, Associated Dimension, Field Constraint, and Data Classification.

    Field List Operations

    • Reorder Columns: Before adjusting the field order, ensure that downstream dependencies do not use select * to query this logical table to avoid data errors.

    • Introduce Fields: Add new fields to the table. Supports introduction from Create Table Statement and From Table. For specific operations, see introduce fields to a logical dimension table.

    • Add Fields: Add Data Fields and Partition Fields. Edit the Name, Description, Data Type, Column Type, Associated Dimension, Recommend Standards, Field Constraint, Data Class, Data Sensitivity Level, and Remarks as per business requirements.

      Note

      The MaxCompute engine supports creating partition fields with no more than 6 levels.

      The ArgoDB and StarRocks compute engines do not support adding partition fields.

    Field List

    The field list displays detailed information about fields such as Serial Number, Column Name, Description, Data Type, Column Type, Associated Dimension, Recommend Standards, Field Constraint, Data Class, Data Sensitivity Level, and Remarks.

    • Serial Number: The serial number of the table field. Each new field increases by +1.

    • Field Name: The name of the table field. You can enter the field name or Chinese keywords, and it will automatically match the standard preset field name.

    • Description: The description information of the table field, limited to 512 characters.

    • Data Type: Supports string, bigint, double, timestamp, decimal, Text, Numeric, Datetime Variable, and Other data types.

    • Column Type: Supports setting as Primary Key, Partition, Attribute.

      Note
      • Only one primary key field is allowed.

      • Only fields of type string, varchar, bigint, int, tinyint, and smallint are supported as partition fields.

    • Associated Dimension: For specific operations, see Add Associated Dimension.

    • Recommend Standards: Select the field standard for the field. To create a standard, see Create and manage data standards.

    • Field Constraint: Select the field constraint for the field. Supports Unique and Not Null constraints.

    • Data Class: Select the data classification for the field. To create a data classification, see create data classification.

    • Data Sensitivity Level: After selecting the data classification, the system will automatically detect the data level.

    • Remarks: Enter the remarks information for the field. Limited to 2048 characters.

    At the same time, you can perform Delete operations on fields in the operation column.

    Note
    • Fields cannot be revoked after deletion.

    • The primary key and system partition fields of the logical dimension table do not support deletion.

    Batch Operations

    You can batch select table fields and perform the following operations.

    • Delete: Click the image icon to batch delete the selected data fields.

    • Data Type: Click the image icon to batch modify the selected data types.

    • Column Type: Click the ```html image icon to batch modify the selected field categories.

    • Recommend Names: Click the image icon, and the system will segment the Description content of the field and match the created word roots to recommend field names. You can replace the selected field names with modified values in the Word Root Naming dialog box. As shown in the following figure:

      Note
      • If the recommended field names do not meet the requirements, you can modify them in the Modified Field Name input box.

      • Click Reset to reset the Modified Field Name to the system's hit word root.

    • Recommend Standards: Click the image icon, and the system will recommend field standards based on the field name. You can set the field to the recommended field standard in the Field Standard dialog box.

    • Field Constraint: Click the image icon to batch set field constraints.

      Important

      Sub-dimension logical tables do not support setting field constraints.

  2. Click Save And Next.

Introduce Fields to a Logical Dimension Table

  • Introduce from Table

    1. On the configuration page of the created logical dimension table, click Introduce From Table.

    2. In the Introduce From Table dialog box, select the Source Table and then select the fields you need to add.

      Parameter

      Description

      Source Table

      Supports selecting all physical tables (non-Dataphin automatically generated physical tables), logical tables, and views (non-parameterized views) with read permissions under the current tenant.

      For how to obtain read permissions for physical tables, see apply, renew, and return table permissions.

      Field List

      Select the fields you need to add.

      Note

      You can select fields from multiple source tables by switching the source table method.

      Selected Fields

      The added fields will be added to the selected fields. You can perform Delete operations on the selected fields.

    3. Click Add to introduce fields from a physical table to the logical dimension table.

      After adding to the New Field area, you can edit the field's name, data type, field category, and associated dimension according to the business situation.

  • Introduce from Create Table Statement

    1. On the configuration page of the created logical dimension table, click Introduce From Create Table Statement.

    2. In the Introduce From Create Table Statement dialog box, fill in the create table statement and click Parse SQL.

    3. Select the fields you need in the field list and click Add to create new fields for the logical dimension table.

Add Associated Dimension

  1. On the configuration page of the logical dimension table, click the Associated Dimension column gagag icon to enter the Edit Model Relationship dialog box.

  2. In the Edit Model Relationship dialog box, configure the parameters.

    Block

    Parameter

    Description

    Null Value Replacement

    If the primary table (current logical dimension table) cannot be associated with the required logical dimension table, Dataphin automatically fills the associated field with a value of -110.

    Logical Dimension Table

    Associated Entity,Logical Dimension Table

    Select the Associated Entity and Logical Dimension Table you have created.

    Edit Association Logic

    Association Logic

    By default, it displays the fields that need to be associated with the dimension and the primary key of the associated logical dimension table, which cannot be modified.

    Dimension Table Version Policy

    Define the partition of the primary table (current logical dimension table) and the associated logical dimension table, and by default, use the same scheduling cycle as the current logical dimension table. More details are as follows:

    • Use The Same Cycle Dimension (the Primary Table And Dimension Table Use The Same Cycle Partition): The primary table and the associated logical dimension table use the same cycle time partition during computation.

      For example, if the business date is 20220101, you need to query the data of the ds=20220101 partition of the primary table, and the time partition of the associated logical dimension table is also 2020101, then you need to select to use the same cycle dimension table.

    • Use The Latest Dimension Table (the Dimension Table Uses The Latest Partition): During data computation, the latest partition of the latest associated logical dimension table is used.

      For example, a product category is often adjusted. 10 days ago, it was a mobile phone category, and today it is an appliance category. If the business needs to rerun the data from 10 days ago according to the appliance category, the dimension table version policy needs to be selected as Use The Latest Dimension Table (the Dimension Table Uses The Latest Partition).

    Missing Association Policy

    The missing association policy is used to define the calculation logic for fields that exist in the source primary table (left table) but do not exist in the logical dimension table (right table), that is, fields that cannotbe associated. You can choose Retain Original Missing Association Data and Use Default Value For Missing Association Data:

    • Retain Original Missing Association Data: When creating derived metrics, retain the original data of the left table.

    • Use Default Value For Missing Association Data: If the fields in the primary table are not associated with the logical dimension table, use the default value -110.

    Edit Dimension Role

    Role English Name, Role Name

    The dimension role serves as a dimension alias. When the same dimension is referenced multiple times, it cannot be redefined, playing a role-playing effect. You need to define the Role English Name and Role Name:

    • The prefix of the Role English Name defaults to dim, and the naming rules for the custom part are:

      • Contains letters, numbers, or underscores (_).

      • Cannot exceed 64 characters.

    • The naming rules for the Role Name are:

      • Contains Chinese, numbers, letters, underscores (_), or hyphens (-).

      • Cannot exceed 64 characters.

  3. Click OK.

Step 3: Configure General Logical Dimension Table Computation Logic

  1. The Computation Logic configuration page is used to configure the mapping relationship between the source data and the primary key of the logical dimension table.

  2. Click Source Configuration to enter the Source Configuration dialog box and click the +add Source Object button to configure source parameters.

    Note

    It is recommended not to set additional filters on event time in filter conditions or custom SQL.

    Parameter

    Description

    Source Type

    Supports three source types: Physical Table, SQL, and Logical Table.

    Source Table Type Description:

    • Source Table with Primary Key: Logical tables with primary keys allow multiple sources to be configured, with the first one fixed as the main source. The total amount of data (number of rows) in the logical table is determined by this source.

    • Source Table without Primary Key: Logical tables without primary keys only allow one source to be configured. If there are multiple tables as sources, use custom SQL to complete the association of multiple tables in advance.

      Note

      If you need to configure multiple source objects, you can click Add Source Object to add new ones.

    Source Object

    • Select Physical Table: Supports selecting all physical tables (non-Dataphin automatically generated physical tables) and physical views (non-parameterized views) with read permissions under the current tenant.

      For how to obtain read permissions for physical tables, see apply, renew, and return table permissions.

    • Select SQL: Click the image icon and enter content in the edit box, for example:

      select id, name
      from project_name_dev.table_name1 t1
      join
      project_name2_dev.table_name2 t2
      on t1.id = t2.id
    • Select Logical Table: Supports selecting all logical tables with read permissions under the current tenant.

      For how to obtain read permissions for physical tables, see apply, renew, and return table permissions.

      Important

      Using a logical table as the data source for another logical table increases the complexity of the computation logic and the difficulty of operation and maintenance.

    Object Alias

    Customize the alias of the source table. For example: t1, t2.

    Object Description

    Please enter the description information of the object. Limited to 1000 characters.

    Filter Condition

    Custom SQL filter conditions.

    Click the image icon and enter content in the edit box, for example:

    ds=${bizdate} and condition1=value1.

    Associated Field

    The field in the source object that corresponds to the primary key of the logical table and can be associated with the primary key for equality.

    Delete

    The main source does not support deletion.

    For logical tables without primary keys, deleting the source will clear the calculation logic of the fields.

  3. Click OK to complete the source configuration.

  4. After completing the source configuration, drag the Source Field into the computation logic. You can also click the Map By Name button to batch place the data source fields into the computation logic of the logical table fields with the same name.

    Click the image icon, and you can edit the computation logic expression in the edit box (expressions do not support aggregate functions: sum, count, min, etc.), for example:

    • Example 1: substr(t1.column2, 3, 10)

    • Example 2: case when t1.column2 != '1' then 'Y' else 'N' end

    • Example 3: t1.column2 + t2.column1

  5. After completing the computation logic configuration, click the image..png icon at the bottom to verify the validity of the expression.

    Click the Preview SQL button to view the computation logic SQL.

  6. Click Save And Next.

Step 4: Configure Constraints

  1. Based on field constraints, the system will create quality rules for the current logical table in the quality module. You can set the Rule Strength of the field here, including Hard Rule and Soft Rule. For more information, see data table quality rules.

    Note

    Logical table field constraints can only configure rule strength here and cannot be edited in the quality module.

  2. Click Save And Next.

Step 5: Configure General Logical Dimension Table Scheduling

  1. On the Scheduling & Parameter Configuration page, configure the Data Latency, Scheduling Attributes, Scheduling Dependencies, Scheduling Parameters, Running Configuration of the logical dimension table.

    Parameter

    Description

    Data Latency

    When data latency is enabled, the system will automatically rerun all data of this logical table within the Maximum Latency Days cycle. For detailed instructions, see configure logical table data latency.

    Scheduling Attributes

    Used to define the scheduling method of the logical dimension table in the production environment. You can configure the scheduling type, scheduling cycle, scheduling logic, and execution of the logical dimension table through scheduling attributes. For detailed instructions, see configure logical table scheduling attributes.

    Upstream Dependency

    Used to define the node of the logical table in the scheduling task. Dataphin orderly runs the nodes in the business process through the configuration results of the scheduling dependencies of each node to ensure the effective and timely output of business data. For detailed instructions, see configure logical table upstream dependency.

    Parameter Configuration

    Parameter configuration is used to assign values to variables used in the code so that when the node is scheduled, the parameter variables can be automatically replaced with the corresponding variable values. On the scheduling parameter configuration page, you can perform Ignore or Convert To Global Variable operations on parameter configuration. For detailed instructions, see logical table parameter configuration.

    Running Configuration

    You can configure task-level runtime timeout and rerun policies for task failures for the logical dimension table according to the business scenario to prevent resource waste caused by long-term resource occupation of computing tasks while improving the reliability of computing task operation. For detailed instructions, see logical table running configuration.

    Resource Configuration

    You can configure the scheduling resource group for the current logical table task. The logical table task will occupy the resource quota of this resource group during scheduling. For configuration instructions, see logical table resource configuration.

  2. Click Save And Submit.

Step 6: Save and Submit the Logical Table

  1. After completing the configuration of the general logical dimension table, click Save And Submit.

    The system will perform configuration verification on Table Structure, Computation Logic, Schedule Dependency, and Runtime Parameter. You can check the configuration information and complete the configuration according to the Check Results that did not pass.

  2. After all check results pass, fill in the submission remarks and click Confirm And Submit.

    During submission, Dataphin will perform lineage analysis and submission checks for the task. For more information, see specification modeling task submission instructions.

What to do next

If the project mode is Dev-Prod, you need to publish the logical table to the production environment. For specific operations, see manage release tasks.

After the logical table is published to the production environment, you can view and maintain the logical table tasks in the operation center. For specific operations, see Operation Center.