All Products
Search
Document Center

Dataphin:Create a hierarchical dimension logical table

Last Updated:Jan 21, 2025

A hierarchical dimension logical table organizes data with inherent structure relationships, such as time-based data including days, weeks, months, quarters, and years. This topic explains the process of creating and configuring a hierarchical dimension logical table.

Hierarchical dimension table introduction

A hierarchical dimension table corresponds to a structured object and possesses the following attributes:

  • Limited data volume.

  • The data (rows) maintain a hierarchical relationship within a business context. For instance, administrative regions, organizational structures, and product categories. Countries, provinces, and cities are distinct records, yet they form a hierarchy such as country->province->city. Similarly, in an organizational structure, the hierarchy might be business group->business unit->department->sub-department, and in product categories, it could be food->snacks->dried fruits, etc.

Case study

Data pattern

Hierarchical data in business systems typically follows this pattern:

Field name

Field description

id

Current level ID.

name

Name of the current level.

parent_id

Parent level ID.

level

Level.

Data example

Consider the region data table from a business system.

id

name

parent_id

level

1

China

Null

0

21

East China Region

1

1

212

Zhejiang Province

21

2

2121

Hangzhou

212

3

21213

Yuhang District

2121

4

The hierarchical dimension table created by the Dataphin region business object.

Logical table name

Business object

Source table

Child field

Parent field

Name field

Number of levels

dim_region

region

${xxx}.region

id

parent_id

name

5

Prerequisites

The hierarchical business object has been successfully established. For additional details, see Create and manage business entities.

Procedure

  1. Navigate to the Dataphin home page, and from the top menu bar, select Development > Data Development.

  2. From the top menu bar, choose Project and, if in Dev-Prod mode, also select Environment.

  3. In the left-side navigation pane, click Standard Modeling > Logical Dimension Table.

  4. In the logical dimension table list on the right, click image to create a new table.

  5. In the New Logical Dimension Table dialog box, fill in the required parameters.

    Parameter

    Description

    Business Object

    Select the Hierarchical Object, such as the region business object.

    Table Type

    If the business object is a Hierarchical Object, the table type is automatically set to Hierarchical Dimension Logical Table and cannot be changed.

    Data Section

    Defaults to the data section associated with the project and is not editable.

    Subject Area

    Defaults to the subject area of the business object and cannot be modified.

    Compute Engine

    Configure the compute engine for the Dataphin instance, which may include Hive, Impala, and Spark, for tenants with the Hadoop compute engine enabled.

    Important
    • The compute engine must be enabled before selection. For more information, see Create a Hadoop compute source.

    • If the compute engine is either TDH 6.x or TDH 9.3.x, this option is not supported.

    The compute engine is subject to the following limitations:

    • Hive: Unable to read source tables that are stored in Kudu format.

    • Impala: Capable of reading source tables in Kudu format, but cannot store logical tables using Kudu. It is not recommended for use without Kudu-formatted source tables.

    • Spark: Incapable of reading source tables stored in Kudu format.

    Data Timeliness

    Defines the scheduling type for the hierarchical dimension logical table task in production. The default setting is T+1 (daily task).

    Logical Table Name

    Dataphin automatically generates the logical table name. The default naming convention is <data section name>.dim_<business object name>.

    Chinese Name

    Enter the Chinese name for the dimension logical table, such as the regional dimension table.

    Naming conventions include:

    • No more than 128 characters.

    • Supports any character set.

    Description

    Provide a concise description of the hierarchical dimension logical table within 1000 characters.

  6. Click OK.

  7. On the Computation Logic page, set up the mapping between the source data and the primary key of the dimension logical table.

    Parameter

    Description

    Column Name

    By default, it uses the business process name. You can modify this to a different field name, such as region.

    The naming conventions are as follows:

    • Must not exceed 128 characters.

    • Can include any character.

    Data Type

    Choose the data type for the primary key of the hierarchy logical table, such as bigint.

    Description

    Provide a description for the primary key field of the hierarchy logical table, for instance, region_primary_key ID.

    Source Table

    Select the source of the hierarchical dimension data, such as the region data table.

    Number Of Levels

    Set the number of levels for the hierarchical dimension, for example, 5 levels.

    Generate Leaf Dimension Table

    Decide whether to create a leaf dimension table, which is the most detailed level in the hierarchy. For instance, a product category leaf dimension table might contain details like product ID, name, price, inventory, and sales quantity.

    Child Field, Parent Field

    Identify the fields in the source table that represent child and parent levels, such as the id (child field) and parent_id (parent field) in the region table.

    Column Name

    Choose the field in the source table that corresponds to the name of the hierarchical dimension table field, such as the name field in the region table.

    Root Node Definition

    Define the top node of the hierarchy. Set SQL conditions for root level detection based on the main source table, such as level = 0 or parent_id is null for the root node China in the region table.

    Data Filter Condition

    Optional. Set filter conditions for the source table to exclude dirty data or logically deleted records.

    Note

    Conditions for data update and retrieval periods must be based on the main source table fields, such as time partitions.

  8. Click Save And Next.

  9. On the Schedule & Parameter Configuration page, configure the Scheduling Properties, Scheduling Dependencies, Scheduling Parameters, and Runtime Configuration of the dimension logical table. The configuration descriptions are as follows:

    • Scheduling Properties: These are used to set up the scheduling approach for the dimension logical table in a production setting. You can specify the scheduling type, period, logic, and execution for the dimension logical table via its scheduling properties. For configuration instructions, see Configure logical table scheduling properties .

    • Scheduling Dependencies: This feature is designed to define the nodes within a logical table's scheduling task. Dataphin systematically executes these nodes according to the scheduling dependencies set for each, ensuring business data is efficiently and promptly produced. For configuration instructions, see Configure logical table upstream dependencies.

    • Runtime Parameters: Parameter configuration involves assigning values to variables within the code, enabling automatic substitution of these variables with their respective values during node scheduling. On the parameter configuration page for scheduling, you have the option to either Ignore or Convert To Global Variable. For detailed configuration guidelines, see Logical table parameter configuration.

    • Runtime Configuration: Task-level runtime timeout and rerun policies can be tailored for the dimension logical table to match your business needs. This approach helps avoid the unnecessary consumption of resources due to prolonged compute tasks and enhances the reliability of task execution. For instructions on how to configure these settings, see Compute task runtime configuration .

    • Resource Configuration: You can set up the schedule resource group for the current logical table task, which will utilize the resource quota of the designated group. For configuration instructions, see Logical Table Resource Configuration.

  10. Once you've completed the configuration for the hierarchical dimension logical table, click Save And Submit.

    Upon submission, Dataphin will conduct lineage analysis and perform checks on the task. For more information, see the standard modeling task submission instructions

What to do next

In a Dev-Prod project mode, you must publish the logical table to the production environment. For detailed instructions, see Manage publishing tasks.

Once the logical table has been deployed to the production environment, you can manage and monitor the logical table tasks within the Operation Center. For detailed instructions, see Operation Center.