All Products
Search
Document Center

Hologres:FAQ about partitioned tables

Last Updated:Mar 26, 2026

This article addresses data corruption issues — missing data and duplicate primary keys — that affect logical partitioned tables in Hologres V4.0.0 through V4.0.8.

Am I affected?

This bug affects your instance only if all three conditions are true:

  • Your Hologres instance version is V4.0.0 to V4.0.8.

  • The table is a logical partitioned table with column-oriented or hybrid row-columnar storage.

  • The table has an explicitly defined clustering key.

If all three apply, continue to the sections below.

What's happening

The multi-partition flush optimization in V4.0.0–V4.0.8 has a bug in its clustering index implementation. The bug corrupts AliORC data files (data files of column-oriented tables), causing two symptoms:

  • Missing data: Queries fail to locate existing records because the clustering index points to the wrong location.

  • Duplicate primary keys: During updates, Hologres cannot find the existing record and inserts a new one instead, violating the primary key constraint. This surfaces as a duplicate key value violates unique constraint error.

After upgrading to V4.0.9 or later, Hologres detects corrupt index files at query time and returns:

clustered_index size [x] incorrect.

Solution

Step 1: Upgrade your instance

Upgrade to V4.0.9 or later. V4.0.9 prevents queries from silently reading corrupt data — affected queries now fail with the error above, making it safe to identify and repair the corrupted tables.

Step 2: Find tables with corrupted indexes

Run the following SQL against each database. It generates one SELECT ... GROUP BY ... HAVING COUNT(1) > 1 statement per partition.

WITH logical_partition_tables AS (
    SELECT DISTINCT
        p1.table_namespace,
        p1.table_name
    FROM hologres.hg_table_properties p1
    WHERE p1.property_key = 'logical_partition_columns'
      AND EXISTS (
          SELECT 1
          FROM hologres.hg_table_properties p2
          WHERE p2.table_namespace = p1.table_namespace
            AND p2.table_name = p1.table_name
            AND p2.property_key = 'orientation'
            AND p2.property_value IN ('column', 'row,column', 'column,row')
      )
),
tables_with_pk AS (
    SELECT
        l.table_namespace,
        l.table_name,
        p.property_value AS pk_expr
    FROM logical_partition_tables l
    JOIN hologres.hg_table_properties p
      ON l.table_namespace = p.table_namespace
     AND l.table_name = p.table_name
    WHERE p.property_key = 'primary_key'
),
partition_cols AS (
    SELECT
        l.table_namespace,
        l.table_name,
        p.property_value AS part_expr
    FROM logical_partition_tables l
    JOIN hologres.hg_table_properties p
      ON l.table_namespace = p.table_namespace
     AND l.table_name = p.table_name
    WHERE p.property_key = 'logical_partition_columns'
),
all_partitions AS (
    SELECT
        t.table_namespace,
        t.table_name,
        t.pk_expr,
        pc.part_expr,
        part.partition
    FROM tables_with_pk t
    JOIN partition_cols pc
      ON t.table_namespace = pc.table_namespace
     AND t.table_name = pc.table_name
    CROSS JOIN LATERAL (
        SELECT partition
        FROM hologres.hg_list_logical_partition(
            (t.table_namespace || '.' || t.table_name)::TEXT
        )
    ) AS part
),
split_kv AS (
    SELECT
        table_namespace,
        table_name,
        pk_expr,
        partition,
        TRIM(SPLIT_PART(kv, '=', 1)) AS col_name,
        TRIM(SPLIT_PART(kv, '=', 2)) AS col_val
    FROM all_partitions,
    UNNEST(STRING_TO_ARRAY(partition, '/')) AS kv
),
final_where AS (
    SELECT
        table_namespace,
        table_name,
        pk_expr,
        partition,
        STRING_AGG(
            FORMAT('%I = %L', col_name, col_val),
            ' AND '
            ORDER BY col_name
        ) AS where_clause
    FROM split_kv
    GROUP BY table_namespace, table_name, pk_expr, partition
)
SELECT
    FORMAT(
        '-- Check PK duplicates in %I.%I, partition: %s' || E'\n' ||
        'SELECT %s, COUNT(1) FROM %I.%I WHERE %s GROUP BY %s HAVING COUNT(1) > 1;',
        table_namespace, table_name, partition,
        pk_expr,
        table_namespace, table_name,
        where_clause,
        pk_expr
    ) AS detection_sql
FROM final_where
ORDER BY table_namespace, table_name, partition;

This query returns one check statement per partition. Sample output:

image.png

Run the generated statements. Two outcomes are possible:

  • No error: The index is intact for that partition.

  • clustere_index size mismatch error: The index is corrupted. Run a full compaction on the affected table to rebuild the index:

    SELECT hologres.hg_full_compact_table('schema.table_name','max_file_size_mb=1, reclaim_deleted_data_space = false');

After compaction, re-run the check statements for the repaired table. If they complete without error, the index is correct.

Step 3: Find and remove duplicate primary keys

Run the following SQL to generate duplicate-key check statements for each partition.

WITH logical_partition_tables AS (
    SELECT DISTINCT
        p1.table_namespace,
        p1.table_name
    FROM hologres.hg_table_properties p1
    WHERE p1.property_key = 'logical_partition_columns'
      AND EXISTS (
          SELECT 1
          FROM hologres.hg_table_properties p2
          WHERE p2.table_namespace = p1.table_namespace
            AND p2.table_name = p1.table_name
            AND p2.property_key = 'orientation'
            AND p2.property_value IN ('column', 'row,column')
      )
),
tables_with_pk AS (
    SELECT
        l.table_namespace,
        l.table_name,
        p.property_value AS pk_expr
    FROM logical_partition_tables l
    JOIN hologres.hg_table_properties p
      ON l.table_namespace = p.table_namespace
     AND l.table_name = p.table_name
    WHERE p.property_key = 'primary_key'
),
partition_cols AS (
    SELECT
        l.table_namespace,
        l.table_name,
        p.property_value AS part_expr
    FROM logical_partition_tables l
    JOIN hologres.hg_table_properties p
      ON l.table_namespace = p.table_namespace
     AND l.table_name = p.table_name
    WHERE p.property_key = 'logical_partition_columns'
),
all_partitions AS (
    SELECT
        t.table_namespace,
        t.table_name,
        t.pk_expr,
        pc.part_expr,
        part.partition
    FROM tables_with_pk t
    JOIN partition_cols pc
      ON t.table_namespace = pc.table_namespace
     AND t.table_name = pc.table_name
    CROSS JOIN LATERAL (
        SELECT partition
        FROM hologres.hg_list_logical_partition(
            (t.table_namespace || '.' || t.table_name)::TEXT
        )
    ) AS part
),
split_kv AS (
    SELECT
        table_namespace,
        table_name,
        pk_expr,
        partition,
        TRIM(SPLIT_PART(kv, '=', 1)) AS col_name,
        TRIM(SPLIT_PART(kv, '=', 2)) AS col_val
    FROM all_partitions,
    UNNEST(STRING_TO_ARRAY(partition, '/')) AS kv
),
final_where AS (
    SELECT
        table_namespace,
        table_name,
        pk_expr,
        partition,
        STRING_AGG(
            FORMAT('%I = %L', col_name, col_val),
            ' AND '
            ORDER BY col_name
        ) AS where_clause
    FROM split_kv
    GROUP BY table_namespace, table_name, pk_expr, partition
)
SELECT
    FORMAT(
        '-- Check PK duplicates in %I.%I, partition: %s' || E'\n' ||
        'SELECT %s, COUNT(1) FROM %I.%I WHERE %s GROUP BY %s HAVING COUNT(1) > 1;',
        table_namespace, table_name, partition,
        pk_expr,
        table_namespace, table_name,
        where_clause,
        pk_expr
    ) AS detection_sql
FROM final_where
ORDER BY table_namespace, table_name, partition;

Run the generated statements. Sample output:

image.png

Two outcomes are possible:

  • No rows returned: No duplicate primary keys exist in that partition.

  • Rows returned: Duplicate primary keys are present. Run a full compaction followed by the deduplication procedure:

    -- Step 1: compact the table
    select hologres.hg_full_compact_table('schema.table_name','max_file_size_mb=1');
    -- Step 2: remove duplicate primary keys
    call public.hg_remove_duplicated_pk('schema.table_name','max_file_size_mb=1');

    If step 2 returns a Query exceed memory limit error, the data volume is too large to process the whole table at once. Remove duplicates partition by partition. The SQL comments in the generated statements from the previous step identify which partitions contain duplicate data.

    call public.hg_remove_duplicated_pk('schema.table_name_1', 'dt_int=1');
    call public.hg_remove_duplicated_pk('schema.table_name_2', 'dt_text=''A''');

After deduplication, re-run the check statements for the affected table. No rows returned means the duplicates are resolved.

Step 4: Compact tables without a clustering key (optional)

Tables with logical partitions but no clustering key are not directly affected by the index corruption bug, but they can still experience duplicate primary keys and cluster index size mismatch errors during real-time write-backs.

Run the following to generate full compaction commands for these tables:

SELECT
    FORMAT(
       'SELECT hologres.hg_full_compact_table(%L, ''max_file_size_mb=1''); -- Table: %I.%I (has logical partition but NO clustering_key)',
      table_namespace || '.' || table_name,
      table_namespace,
      table_name
    ) AS compact_sql
FROM (
  SELECT DISTINCT
  p1.table_namespace,
  p1.table_name
  FROM hologres.hg_table_properties p1
  WHERE p1.property_key = 'logical_partition_columns'
  AND EXISTS (
    SELECT 1
    FROM hologres.hg_table_properties p2
    WHERE p2.table_namespace = p1.table_namespace
    AND p2.table_name = p1.table_name
    AND p2.property_key = 'orientation'
    AND p2.property_value IN ('column', 'row,column')
  )
) AS tables_missing_ck
ORDER BY table_namespace, table_name;

Sample output:

image.png

Run the generated compaction commands during off-peak hours.