All Products
Search
Document Center

Hologres:Best practices for development with Hologres

Last Updated:Mar 12, 2026

This topic describes the development standards for Hologres. Follow these standards to understand development requirements and prevent operational errors.

Data domain standards

  • Data warehouse layers

    A data warehouse is typically built in layers. The main layers are listed below. The common data model (CDM) layer includes the data warehouse detail (DWD), data warehouse service (DWS), and dimension (DIM) layers. In Hologres, you can use schemas to isolate different layers.

    • Operation data store (ODS): The operational data layer.

    • Common data model (CDM): The public dimension model layer.

      • Data warehouse detail (DWD): The detail data layer.

      • DWS (Data Warehouse Summary): Data warehouse summary.

      • Dimension (DIM): The dimension data layer.

    • Application data service (ADS): The application data layer.

    You can choose the appropriate level of granularity based on your business complexity. For example, if your enterprise has multiple business units (BUs), you can use the BU abbreviation as the schema prefix.

    create schema ${bu}_ads;
    create schema ${bu}_ads_dev;
    create schema ${bu}_dwd;
    create schema ${bu}_dwd_dev;
    create schema ${bu}_dws;
    create schema ${bu}_dws_dev;
    create schema ${bu}_dim;
    create schema ${bu}_dim_dev;
    create schema ${bu}_ods;
    create schema ${bu}_ods_dev;
  • Data domain abbreviations

    Define shared codes for different data domains to establish a company-wide standard. The following table provides some examples.

    Data domain name

    Data Domain Abbreviations: Examples

    Transaction domain

    trd

    Item domain

    itm

    Log domain

    log

    Member and store domain

    mbr

    Supply, sales, and inventory management domain

    dst

    Sales and customer service domain

    crm

    Credit and risk control domain

    rsk

    Tools and services domain

    tls

    Logistics and express delivery domain

    lgt

Naming conventions

  • Job naming conventions

    The naming rules differ for internal tasks and sync tasks. The rules are as follows.

    • Internal SQL tasks (non-sync tasks): holo_{target_table_name}. This format distinguishes them from external table tasks.

    • Data import to Hologres: {source}2holo_{target_table_name}.

    • Data export from Hologres: holo2{target}_{target_table_name}.

  • Table naming convention

    Layer name

    Naming rule for tables in this layer

    Example

    DWD

    ${bu}_dwd.data_domain_business_process_[custom_root]_suffix

    taobao_dwd.trd_ord_flow

    DWS

    ${bu}_dws.data_domain_data_granularity_abbreviation_business_process_[{custom_root}]_statistical_period

    taobao_dws.trd_all_dtr, taobao_dws.log_slr_pv_dtr

    DIM

    ${bu}_dim.{dimension_definition}[_{custom_root}]

    taobao_cdm.dim_itm

    ADS

    ${bu}_ads.business_domain_dimension_[{custom_root}]_{refresh_period_identifier}

    Note

    The refresh period identifiers are as follows.

    • d: Refreshed daily.

    • r: Refreshed in real time.

    • h: Refreshed in near real-time.

    taobao_ads.trd_cate_d

  • Table Group naming convention

    If your business requires multiple Table Groups, name them using the following format: ${bu}_{data_warehouse_layer_name}_{business_definition}_tg.

  • View naming convention

    The following section describes the naming rules and provides an example for persistent views.

    • Rules

      • DWS: ${bu}_dws.data_domain_data_granularity_abbreviation_business_process_[{custom_root}]_statistical_period_v.

      • ADS: ${bu}_ads.business_domain_dimension_[{custom_root}]_{refresh_period_identifier}_v.

    • Example

      taobao_dws.trd_byr_itm_ord_cm_v
  • External table naming conventions

    Add the ext suffix to the original MaxCompute table name. The following code provides an example.

    taobao_dim.camp_ext
  • Temporary table naming convention

    Add the tmp prefix and a numeric suffix to the original table name. The following code provides an example.

    taobao_dim.tmp_camp_01
  • Common abbreviations

    Statistical period

    Abbreviation

    Last day

    1d

    Recent days

    nd

    Cumulative

    td

    Calendar week

    cw

    Calendar month

    cm

    Cumulative to date

    dtr

    Cumulative to the current hour

    dhr

Table development standards

  • Internal table standards

    Before you create a table, determine the names for the table and its fields according to the data model standards. You must also confirm the table's lifecycle based on your requirements and add complete comments for the table and its fields. The relevant standards are as follows.

    • Strict standards (Publication is not allowed if these standards are not met):

      • The output table and its fields must include a comment. This rule applies to all data development scenarios on the platform. The table's comment must be concise and clear.

      • The table creation statement must specify the table lifecycle (time_to_live_in_seconds).

      • The table creation statement must specify a distribution key (distribute_key). The principles for selecting a distribution key are as follows.

        Select a field that is sufficiently distributed and is most frequently used in JOIN or GROUP BY operations. For example, for a buyer-item table, you can set user_id and item_id as the distribution key. However, if user_id is the key that is most frequently used for joins, set the distribution key to user_id only, not to both user_id and item_id.

      • Tables that are joined in queries must be created in the same Table Group.

      • Use the same name and data type for the same entity ID across all fact tables and dimension tables. For example, if the user ID in a transaction table is user_id, it must also be user_id in the dimension table, not uid. Keep the data types consistent to reduce data type conversions.

      • By default, use ds as the partition field for all physical tables.

    • Recommended standards:

      • The table creation statement should specify at least one of the following properties: bitmap_columns, segment_key, or cluster_key.

      • If the cardinality of a field is unclear, do not set the dictionary_encoding_columns (dictionary index) table property. You can run the following statement to set the property to empty.

        call set_table_property('table_name', 'dictionary_encoding_columns','')
      • For the orientation (data storage format) table property, the column format is recommended. You can also set this property to row.

        Note

        Do not use the row format unless you can ensure that all queries on the table always specify all primary key columns using the equals or in operator. If this property is not set, the column storage format is used by default.

      • The bitmap_columns (bit encoding) table property enables bitmap to quickly filter data within storage files.

        • Set bitmap_columns to the fields that are used in filter conditions. By default, all TEXT fields are included.

        • Do not set fields with high cardinality, such as user_id, as bitmap_columns. We recommend that you set metrics such as activity ID as bitmap_columns.

      • The event_time_column table property must be used for fields that are related to real-time writes, such as an event timestamp.

      • The clustering_key (cluster index) table property sorts data based on the specified cluster index. Creating a cluster index can accelerate range and filter queries on the index columns. You can set only one cluster index. This property is suitable for range filtering, such as when you bucket GMV.

  • MaxCompute foreign table standards

    Hologres supports accelerated queries on MaxCompute through foreign tables, which can simplify the data synchronization process. To improve computing performance, do not join internal tables with foreign tables unless it is necessary. To better manage and maintain foreign tables, you must follow these standards.

    • Strict standard: You must strictly follow the foreign table naming convention by adding the ext suffix to the original MaxCompute table name.

    • Recommended standards:

      • Preserve the DDL of the table schema and place it under version control.

      • Do not join internal tables with foreign tables. Instead, synchronize data from the foreign table to an internal table.

  • View standards

    • Strict standard: You must strictly follow the view naming convention.

    • Recommended standards:

      • Enable task scheduling to ensure the integrity of subsequent development job dependency chains.

      • Create separate views for different granularities to avoid excessive computation for complex requests.

        For example, you can create four separate views for cw, cm, nd, and 1d. If you have different clients, you can create views for pc, wap, and app. If you use different collection methods, you can separate them into ut and non-ut.

  • Lifecycle (internal tables only) standards

    Data warehouse layer

    Corresponding lifecycle rule description

    DWD

    For daily incremental details, the recommended retention period is no more than 2 years.

    DWS

    For daily incremental details, the recommended retention period is no more than 2 years.

    DIM

    For large dimension tables, permanent retention is recommended after extreme storage modeling. For small dimension tables, keep the lifecycle consistent with the MaxCompute table.

    The standard for distinguishing between large and small dimension tables: a single partition cannot exceed 1 TB.

    Recommended standard:

    For partitioned tables, write real-time task data into the current day's partition. Set an appropriate TTL based on the data warehouse layer. Do not write updated historical data into partitions that have already exceeded their TTL.

  • Table Group standards (optional)

    Each database has a default Table Group and shard count. You can create new Table Groups or modify the shard count based on your business needs to achieve better performance. The recommended standards are as follows.

    • Do not create new Table Groups unless it is necessary.

    • For tables with a large data volume, you can create a separate Table Group with a larger shard count.

    • For many tables with small data volumes, you can create a Table Group with a smaller shard count.

    • Tables that need to be joined in queries must be placed in the same Table Group.

Field development standards

  • Field type standards

    You must create field types in strict accordance with the following requirements.

    Field/Field suffix

    Field comment

    Example

    Abbreviations

    user_id

    Auto-incrementing member ID

    user_id=232442843

    int8

    item_id

    Item ID

    item_id=63283278784383

    int8

    member_id

    Registered member ID

    member_id=b2b-dsajk2343821b

    TEXT

    *amt*

    Amount type

    pay_ord_amt_1d_001=923.23

    NUMERIC

    *fee*

    Fee type

    post_fee=923.23

    NUMERIC

    *cnt*

    Quantity type

    pay_ord_byr_cnt_1d_001=923

    int4/int8

    is_*

    Boolean type

    is_pm=Y/is_pm=true

    TEXT/BOOL

    ds

    Partition

    ds=20210120

    YYYYMMDD

  • Basic data type reference

    Hologres data types are compatible with PostgreSQL data types, but Hologres supports only a subset of them. For more information about field types and their mappings with MaxCompute, see Data type summary.

  • Currency unit and precision

    The currency unit is USD. Unless otherwise specified in the model, do not round any amount-related data. This practice prevents inconsistencies in summary calculations that use different statistical methods.

SQL standards

  • Strict standards:

    • In the outermost layer of an SQL statement and within subqueries, do not use select * for fields that do not require calculation. All operations must explicitly specify column names.

    • In WHERE clauses, you must handle null fields and empty strings using the Coalesce function.

  • Recommended standards:

    • Frequently use a count distinct field as the distribution_key. For combinations of multiple count distinct operations, you must rewrite the statement manually.

      select count(distinct userid)
           , count(distinct case when stat_date = '20201111' then userid end) 
      from t group by cate_id;
      
      --Rewrite as follows
      select count(1), sum(c) from 
      (
        select userid
             , cate_id
             , cast(count(case when stat_date = '20201111' then 1 end) > 0) as c 
        from t 
        group by cate_id, userid
      ) t1 
      group by cate_id;
    • For offline scheduling tasks, perform an analyze table operation on partitioned tables.

    • For long-term usage scenarios, use ATTACH/DETACH operations for batch operations on historical partitions to avoid drastic fluctuations in data metrics.