All Products
Search
Document Center

Dataphin:Logical table materialized configuration dimensions and facts

Last Updated:Jan 21, 2025

Materialized configuration is utilized to set up the lifecycle, partition fields, and custom parameters for logical dimension tables and logical fact tables. It enables manual specification of the number of materialized tables and field distribution within these tables, thereby enhancing data query performance and resource efficiency in the auto-generated model. This topic outlines the steps to configure materialized parameters for logical dimension tables and logical fact tables.

Prerequisites

A logical dimension table or a logical fact table with a primary key must be in place.

Procedure

  1. For more information, see the referenced document, and navigate to the Data Development page.

  2. On the Data Development page, use the instructions in the figure below to access the Materialization Configuration panel.

    If you are in the Dev or Basic (non-Data_distill) project, and it is your data development space, there is no need to select the project space.

    物化配置

  3. In the Materialization Configuration panel, set the parameters as follows:

    Area

    Description

    Partition And Lifecycle

    • Derived Metric Partition: Applicable only to periodic snapshot fact tables, accumulating snapshot fact tables, and standard logical dimension tables.

    • Maximum Partition (MAX_PT): The maximum partition maintains the most recent full snapshot data. Using the maximum partition for derived metric calculations can reduce lifecycle and conserve storage resources.

    • Data Timestamp (bizdate): Calculating derived metrics with the data timestamp partition ensures strong data consistency, although historical partitions must be kept for these calculations.

    • Lifecycle: Define the storage lifecycle, with a maximum of 36500 days. Values at or above this threshold are automatically converted to permanent storage (shown as 36500 days).

      The system offers quick selections for storage lifecycles, including 7, 14, 30, and 365 days.

    Distribution Key

    For StarRocks compute engines, configure the distribution key. Opt for fields with high cardinality and frequent use in query conditions, ideally not exceeding three fields. The sequence of distribution keys impacts storage and query efficiency.

    Note

    If the logical table has a primary key, it will automatically serve as the default distribution key (Duplicate Key).

    Custom Materialization

    Enable this to tailor the materialization configuration, including the number of materialized tables and the field distribution within them.

    Note

    Note: With custom materialization enabled, any new field addition requires manual specification in the materialized table.

    Upon enabling custom materialization configuration, proceed with the following steps to define the materialization policy:

    1. (Optional) In the materialization configuration dialog, select the Initialization Policy for a swift setup.

      Quick initialization options include the following policies, but manual adjustments are also possible:

      • Single Materialized Table: Consolidate all fields into one materialized table.

      • Balanced Row Policy: Strike a balance between production time and query performance by segmenting the materialized table based on production intervals.

      • Production Line Policy: Follow the system's default initialization policy.

      Important

      Caution: Changing the initialization policy will reset the configurations below. Please proceed carefully.

    2. Click the 移动 icon to add the fields to be assigned to the expanded materialized table, then click OK.

      • Utilize the Filter to swiftly select the desired materialized table.

      • To add to a new materialized table, click Add Materialized Table.

      Following configuration:

      • The Materialization Results display the Assigned Fields, Number Of Materialized Tables, and Fields To Be Assigned.

      • View the configuration details by clicking View Materialization Configuration.

      • Compare the production environment materialization with the current editing policy by clicking Production Environment Materialization Comparison.

        • The Materialization Cost comparison highlights the differences in the Number Of Materialized Tables, Number Of Redundant Storage Fields, and Number Of Downstream Query Associations, indicating the absolute change value between the Current Policy and the Production Line Policy.

          Note

          The term Redundant Storage Fields refers to the count of fields stored in more than one materialized table; Downstream Query Associations indicates the number of associations based on the logical table's direct downstream references.

        • The Field Production Time comparison marks the absolute change in each field's production time under the Current Policy versus the Production Line Policy.

          Note

          The production line's field production time is the average over the last 7 days, while the current policy's time is the calculated value.

    Advanced Materialization Configuration

    Define custom parameters that align with the attributes of the compute engine currently linked to the Dataphin system. For further details, see the referenced document.

  4. To finalize, click OK.