A hierarchical dimension logical table organizes data with inherent structure relationships, such as time-based data including days, weeks, months, quarters, and years. This topic explains the process of creating and configuring a hierarchical dimension logical table.
Hierarchical dimension table introduction
A hierarchical dimension table corresponds to a structured object and possesses the following attributes:
Limited data volume.
The data (rows) maintain a hierarchical relationship within a business context. For instance, administrative regions, organizational structures, and product categories. Countries, provinces, and cities are distinct records, yet they form a hierarchy such as country->province->city. Similarly, in an organizational structure, the hierarchy might be business group->business unit->department->sub-department, and in product categories, it could be food->snacks->dried fruits, etc.
Case study
Data pattern
Hierarchical data in business systems typically follows this pattern:
Field name | Field description |
id | Current level ID. |
name | Name of the current level. |
parent_id | Parent level ID. |
level | Level. |
Data example
Consider the region
data table from a business system.
id | name | parent_id | level |
1 | China | Null | 0 |
21 | East China Region | 1 | 1 |
212 | Zhejiang Province | 21 | 2 |
2121 | Hangzhou | 212 | 3 |
21213 | Yuhang District | 2121 | 4 |
The hierarchical dimension table created by the Dataphin region
business object.
Logical table name | Business object | Source table | Child field | Parent field | Name field | Number of levels |
dim_region | region | ${xxx}.region | id | parent_id | name | 5 |
Prerequisites
The hierarchical business object has been successfully established. For additional details, see Create and manage business entities.
Procedure
Navigate to the Dataphin home page, and from the top menu bar, select Development > Data Development.
From the top menu bar, choose Project and, if in Dev-Prod mode, also select Environment.
In the left-side navigation pane, click Standard Modeling > Logical Dimension Table.
In the logical dimension table list on the right, click
to create a new table.
In the New Logical Dimension Table dialog box, fill in the required parameters.
Parameter
Description
Business Object
Select the Hierarchical Object, such as the
region
business object.Table Type
If the business object is a Hierarchical Object, the table type is automatically set to Hierarchical Dimension Logical Table and cannot be changed.
Data Section
Defaults to the data section associated with the project and is not editable.
Subject Area
Defaults to the subject area of the business object and cannot be modified.
Compute Engine
Configure the compute engine for the Dataphin instance, which may include Hive, Impala, and Spark, for tenants with the Hadoop compute engine enabled.
ImportantThe compute engine must be enabled before selection. For more information, see Create a Hadoop compute source.
If the compute engine is either TDH 6.x or TDH 9.3.x, this option is not supported.
The compute engine is subject to the following limitations:
Hive: Unable to read source tables that are stored in Kudu format.
Impala: Capable of reading source tables in Kudu format, but cannot store logical tables using Kudu. It is not recommended for use without Kudu-formatted source tables.
Spark: Incapable of reading source tables stored in Kudu format.
Data Timeliness
Defines the scheduling type for the hierarchical dimension logical table task in production. The default setting is T+1 (daily task).
Logical Table Name
Dataphin automatically generates the logical table name. The default naming convention is
<data section name>.dim_<business object name>
.Chinese Name
Enter the Chinese name for the dimension logical table, such as the regional dimension table.
Naming conventions include:
No more than 128 characters.
Supports any character set.
Description
Provide a concise description of the hierarchical dimension logical table within 1000 characters.
Click OK.
On the Computation Logic page, set up the mapping between the source data and the primary key of the dimension logical table.
Parameter
Description
Column Name
By default, it uses the business process name. You can modify this to a different field name, such as
region
.The naming conventions are as follows:
Must not exceed 128 characters.
Can include any character.
Data Type
Choose the data type for the primary key of the hierarchy logical table, such as
bigint
.Description
Provide a description for the primary key field of the hierarchy logical table, for instance,
region_primary_key
ID.Source Table
Select the source of the hierarchical dimension data, such as the
region
data table.Number Of Levels
Set the number of levels for the hierarchical dimension, for example,
5
levels.Generate Leaf Dimension Table
Decide whether to create a leaf dimension table, which is the most detailed level in the hierarchy. For instance, a product category leaf dimension table might contain details like product ID, name, price, inventory, and sales quantity.
Child Field, Parent Field
Identify the fields in the source table that represent child and parent levels, such as the
id
(child field) andparent_id
(parent field) in theregion
table.Column Name
Choose the field in the source table that corresponds to the name of the hierarchical dimension table field, such as the
name
field in theregion
table.Root Node Definition
Define the top node of the hierarchy. Set SQL conditions for root level detection based on the main source table, such as
level = 0
orparent_id is null
for the root node China in theregion
table.Data Filter Condition
Optional. Set filter conditions for the source table to exclude dirty data or logically deleted records.
NoteConditions for data update and retrieval periods must be based on the main source table fields, such as time partitions.
Click Save And Next.
On the Schedule & Parameter Configuration page, configure the Scheduling Properties, Scheduling Dependencies, Scheduling Parameters, and Runtime Configuration of the dimension logical table. The configuration descriptions are as follows:
Scheduling Properties: These are used to set up the scheduling approach for the dimension logical table in a production setting. You can specify the scheduling type, period, logic, and execution for the dimension logical table via its scheduling properties. For configuration instructions, see Configure logical table scheduling properties .
Scheduling Dependencies: This feature is designed to define the nodes within a logical table's scheduling task. Dataphin systematically executes these nodes according to the scheduling dependencies set for each, ensuring business data is efficiently and promptly produced. For configuration instructions, see Configure logical table upstream dependencies.
Runtime Parameters: Parameter configuration involves assigning values to variables within the code, enabling automatic substitution of these variables with their respective values during node scheduling. On the parameter configuration page for scheduling, you have the option to either Ignore or Convert To Global Variable. For detailed configuration guidelines, see Logical table parameter configuration.
Runtime Configuration: Task-level runtime timeout and rerun policies can be tailored for the dimension logical table to match your business needs. This approach helps avoid the unnecessary consumption of resources due to prolonged compute tasks and enhances the reliability of task execution. For instructions on how to configure these settings, see Compute task runtime configuration .
Resource Configuration: You can set up the schedule resource group for the current logical table task, which will utilize the resource quota of the designated group. For configuration instructions, see Logical Table Resource Configuration.
Once you've completed the configuration for the hierarchical dimension logical table, click Save And Submit.
Upon submission, Dataphin will conduct lineage analysis and perform checks on the task. For more information, see the standard modeling task submission instructions
What to do next
In a Dev-Prod project mode, you must publish the logical table to the production environment. For detailed instructions, see Manage publishing tasks.
Once the logical table has been deployed to the production environment, you can manage and monitor the logical table tasks within the Operation Center. For detailed instructions, see Operation Center.