All Products
Search
Document Center

Hologres:Near real-time import via Stage

Last Updated:Mar 11, 2026

Hologres V4.1 and later supports near real-time data import using the internal temporary storage Stage. This approach balances throughput and resource usage while delivering minute-level latency. This topic covers Stage capabilities, management operations—such as create, delete, and query—along with usage patterns—including writing to Stage and loading data from Stage into internal tables—and Arrow data type mapping.

Overview

Hologres provides two primary data import modes, each suited to different business scenarios:

  • Fixed Plan Real-time Streaming Write: Designed for streaming write workloads. End-to-end latency is low—typically in milliseconds—making it suitable for applications with strict real-time requirements. However, this mode incurs relatively high write resource overhead.

  • Bulkload Offline Batch Import: Optimized for high-throughput, large-scale data loading, such as T+1 offline data warehouse synchronization. End-to-end latency is high, so it only satisfies offline business needs.

If your business requires minute-level near real-time processing, offline batch import cannot meet timeliness requirements. While real-time streaming write can satisfy those requirements, it introduces significant write resource overhead. Starting with Hologres V4.1, near real-time data import is implemented using the internal temporary storage Stage. This approach effectively balances timeliness and resource efficiency.

Stage Introduction

Stage is Hologres’s internal high-performance temporary storage layer. Its core attributes include the following:

  • Internal hosting: Managed automatically by Hologres. No external storage configuration or operation is required.

  • Structured cache: Organizes data according to the target table’s schema, ensuring type and format consistency.

  • Transaction safety: Supports ACID semantics to guarantee reliability and consistency during writes.

  • Automatic merge commit: Connectors such as Spark automatically write data from temporary storage to main storage in optimized batches, improving efficiency.

Implement near real-time import using Stage with the following flow:

  • Write external data to Stage.

  • Batch the data stored in Stage and write it to internal tables.

  • Data written to internal tables becomes immediately visible for queries.

Stage Advantages

  • Balances latency and throughput: Data becomes queryable within minutes after ingestion—significantly faster than offline batch import—and supports much higher throughput capacity than real-time streaming write.

  • Simplifies the data link: Write data directly to Hologres’s internal Stage without dependencies on external systems. Integration is supported through standard SQL COPY and INSERT syntax, along with official connectors.

  • Improves resource efficiency: Stage automatically merges small files, enhancing subsequent read and write performance. It supports Serverless Computing and independent virtual warehouses to isolate Stage write resources, thereby improving system stability.

  • Supports a wide range of scenarios: Near real-time write to Stage is supported by the Spark Connector, Flink Connector, and Holo Client.

Conditions

Near real-time import into Hologres via Stage supports only the Arrow format. For the data type mapping between Arrow format and Hologres, see data type mapping.

Permissions

  1. SPM or SLPM:

    • Create Stage: You must have writer or higher permissions, or be a Superuser.

    • Write, delete, or read Stage: You must be the creator of the corresponding Stage and have writer or higher permissions, or be a Superuser.

  2. Standard PostgreSQL Authorization Model:

    • Create Stage: You must have the pg_operate_internal_stages role permission, or be a Superuser.

    • Write, delete, or read Stage: You must be the creator of the corresponding Stage and have the pg_operate_internal_stages role permission, or be a Superuser.

Grant the pg_operate_internal_stages role permission to a user using the following command:

-- Grant Stage operation permission to the specified user
-- Replace variables: <user_name> with the actual username (such as a RAM sub-account name)
GRANT pg_operate_internal_stages TO "<user_name>";

Manage Stage

Create Stage

Command syntax:

-- Create an Internal Stage
-- Replace variables:
--   <internal_stage_name>: Stage name, required, maximum 128 characters
--   <group_name>: Group name, optional, defaults to internal_stage_name
--   <ttl_in_seconds>: TTL (seconds), optional, defaults to 7200, maximum 864000
CALL HOLOGRES.HG_CREATE_INTERNAL_STAGE(
  '<internal_stage_name>', 
  ['<group_name>'],
  ['<ttl_in_seconds>']
);

Parameter description:

Parameter name

Required

Description

internal_stage_name

Required

Internal Stage name. Supports letters (case-sensitive), numbers, underscores, hyphens. Maximum length is 128 characters.

group_name

Optional

The group to which the Stage belongs, used for data classification. Defaults to internal_stage_name.

ttl_in_seconds

Optional

Stage TTL, in seconds. It starts from last_modified_time (viewable from the system table hologres.hg_internal_stages). Defaults to 7200 (2 hours), maximum 864000 (10 days). The system automatically cleans it asynchronously after its lifecycle.

Delete Stage

Command syntax:

-- Delete the specified Internal Stage
-- Replace variables: <internal_stage_name> with the Stage name to delete
CALL HOLOGRES.HG_DROP_INTERNAL_STAGE(
    '<internal_stage_name>'
);

internal_stage_name: Required. The name of the internal Stage.

Query Stage Status

Query Stage status using the system view hologres.hg_internal_stages:

-- Query all Stages or filter by name
-- Replace variables: If <internal_stage_name> is replaced with a Stage name, query only that Stage; omit WHERE to query all.
SELECT * FROM hologres.hg_internal_stages 
[WHERE stage_name = '<internal_stage_name>'];

The view includes the following fields:

Field Name

Meaning

stage_name

Stage name

group_name

Stage group

ttl_in_seconds

Stage TTL (seconds)

create_time

Stage creation time

create_user

User who created the Stage

create_application_name

Create an application for the Stage

create_session_id

Session ID that created the Stage

last_modified_time

Stage last modified time

stage_bytes

Stage storage size (Byte)

file_count

Number of Stage files

Query Stage Files

Query Stage files using the system view hologres.hg_internal_stage_files:

-- Query the list of files under a Stage, filterable by fuzzy match on name
-- Replace variables: <internal_stage_name> is the Stage name; <pattern%> is a fuzzy match for the file name (such as 'batch_%')
SELECT * FROM hologres.hg_internal_stage_files 
[WHERE stage_name = '<internal_stage_name>'] 
  [AND file_name like '<pattern%>'];

The view includes the following fields:

Field Name

Meaning

stage_name

Stage name

file_name

File name

file_size

File storage size (Byte)

last_modified_time

File last modified time

is_complete

Is the file written successfully? True means the write succeeded. False means it is being written or the write failed.

Delete Stage Files

Delete Stage files using the system function hologres.hg_remove_internal_stage_file:

-- Delete the specified file
-- Replace variables: <stage_name> is the Stage name; <file_name> is the file name to delete
SELECT hologres.hg_remove_internal_stage_file ('<stage_name>', '<file_name>');

-- Batch delete Stage files
-- Replace variables: <stage_name> is the Stage name; <glob_pattern> is a file name wildcard (such as '*.arrow'); <pattern%> is a fuzzy match for the file name in WHERE
SELECT
    stage_name,
    file_name,
    hologres.hg_remove_internal_stage_file (stage_name, file_name) AS hg_remove_internal_stage_file
FROM
    hologres.hg_list_internal_stage_files ('<stage_name>',['<glob_pattern>']) 
[WHERE file_name like '<pattern%>'];

Use Stage

Client Write to Stage

Command syntax:

-- Write client data stream to the specified file in Stage
-- Replace variables: <internal_stage_name> is the name of the created Stage; <file_name> is the file name to write (supports letters, numbers, underscores, hyphens, periods, maximum 128 characters)
COPY EXTERNAL_FILES(
  path = 'internal_stage://<internal_stage_name>/<file_name>'
) FROM STDIN;

When writing to Stage using the COPY command, you cannot configure the WITH parameter. Instead, define the Stage file path using the path parameter. Valid characters include letters (case-sensitive), numbers, underscores, hyphens, and periods. The maximum length is 128 characters.

Read from Stage and Write to Internal Tables

Command syntax:

-- Read Arrow files from Stage and write to internal tables
-- Replace variables:
--   <table_name>: Target internal table name
--   <col_name>: Column name (optional, corresponds one-to-one with target table columns when specified)
--   <internal_stage_name>: Stage name. Specify multiple names separated by commas to read from multiple Stages.
--   <col_type>: Column type (optional, use only when specifying column types in the AS clause)
INSERT INTO <table_name> [ ( <col_name> [ , <col_name> ... ] ) ]
SELECT *
FROM EXTERNAL_FILES(
  path = 'internal_stage://<internal_stage_name>, internal_stage://<internal_stage_name>',
  format = arrow
)
[AS ( <col_name> <col_type>[ , <col_name> <col_type> ... ] )];

Reading from Stage supports only Arrow format files.

Usage Examples

Write to Stage Using Holo-client

This example demonstrates how to perform near real-time writes to Stage using Holo-client.

Maven dependency:

<dependency>
  <groupId>com.alibaba.hologres</groupId>
  <artifactId>holo-client</artifactId>
  <version>2.7.0</version>
</dependency>

Example code:

import com.alibaba.hologres.client.HoloClient;
import com.alibaba.hologres.client.HoloConfig;
import com.alibaba.hologres.client.Put;
import com.alibaba.hologres.client.copy.CopyUtil;
import com.alibaba.hologres.client.copy.in.CopyInStageWrapper;
import com.alibaba.hologres.client.copy.in.arrow.RecordArrowWriter;
import com.alibaba.hologres.client.model.OnConflictAction;
import com.alibaba.hologres.client.model.TableSchema;

import java.sql.Connection;
import java.sql.DriverManager;
import java.util.ArrayList;
import java.util.Collections;
import java.util.List;

public class CopyStageDemo {

    public static void main(String[] args) throws Exception {
        // Note: The jdbcUrl must use the jdbc:hologres protocol
        String jdbcUrl = "jdbc:hologres://host:port/db";
        String username = "";
        String password = "";

        /*
        CREATE TABLE copy_stage_demo (id INT NOT NULL, name TEXT NOT NULL, address TEXT, PRIMARY KEY(id));
        */
        String tableName = "copy_stage_demo";

        HoloConfig config = new HoloConfig();
        config.setJdbcUrl(jdbcUrl);
        config.setUsername(username);
        config.setPassword(password);
        config.setRegion("local");

        // Create a temporary Stage name
        String stageName = "temp_stage_" + System.currentTimeMillis();

        try (Connection conn = DriverManager.getConnection(jdbcUrl, username, password);
             HoloClient client = new HoloClient(config)) {

            // Create an internal Stage
            String createStageSql =
                    "call hologres.hg_create_internal_stage('"
                            + stageName
                            + "', 'default_group', 7200);";
            try (java.sql.Statement stmt = conn.createStatement()) {
                stmt.execute(createStageSql);
            }

            // Get table schema
            TableSchema schema = client.getTableSchema(tableName);

            // Define columns to write
            List<String> columns = new ArrayList<>();
            columns.add("id");
            columns.add("name");
            columns.add("address");

            // Use RecordArrowWriter and CopyInStageWrapper to write data to Stage
            try (RecordArrowWriter arrowWriter =
                         new RecordArrowWriter(
                                 schema,
                                 columns,
                                 8192 // maxBatchSize, 1024 rows of data form an Arrow RecordBatch
                         );
                 CopyInStageWrapper<com.alibaba.hologres.client.model.Record> copyInStage =
                         new CopyInStageWrapper<>(
                                 config,
                                 stageName,
                                 "data_file", // File name prefix
                                 arrowWriter,
                                 64 * 1024 * 1024 // fileSizeLimit, each file size 64 MB
                         )) {

                // Write 10 records
                for (int i = 0; i < 10; ++i) {
                    Put put = new Put(schema);
                    // Keep consistent with CopyInStageWrapper's columns
                    put.setObject("id", i);
                    put.setObject("name", "name" + i);
                    put.setObject("address", "address" + i);

                    copyInStage.putRecord(put.getRecord());
                }
                // Call close before the program ends to ensure data is fully written
                // The demo uses try-with-resources, no need to manually close
                // copyInStage.close();
            }

            // Generate an INSERT statement to write from Stage to the target table
            String insertSql =
                    CopyUtil.buildInsertTableSelectFromStageSql(
                            schema,
                            columns,
                            Collections.singletonList(stageName),
                            OnConflictAction.INSERT_OR_UPDATE);

            try (java.sql.Statement stmt = conn.createStatement()) {
                stmt.execute(insertSql);
            }
            // Verify write results
            try (java.sql.Statement stmt = conn.createStatement()) {
                try (java.sql.ResultSet rs = stmt.executeQuery("select * from " + tableName)) {
                    while (rs.next()) {
                        System.out.println(
                                "id: "
                                        + rs.getInt(1)
                                        + ", name: "
                                        + rs.getString(2)
                                        + ", address: "
                                        + rs.getString(3));
                    }
                }
            }
        } finally {
            // Clean up the temporary Stage. If not cleaned, it will be automatically cleaned based on TTL.
            try (Connection conn = DriverManager.getConnection(jdbcUrl, username, password);
                 java.sql.Statement stmt = conn.createStatement()) {
                String dropStageSql = "call hologres.hg_drop_internal_stage('" + stageName + "');";
                stmt.execute(dropStageSql);
            } catch (Exception e) {
                System.err.println("Error cleaning up Stage: " + e.getMessage());
            }
        }
    }
}

Data Type Mapping

During near real-time import via Stage, the data type mapping between Arrow format and Hologres is as follows. Holo-client automatically converts data types during writes.

Hologres Data Type

Arrow Data Type

SMALLINT

SMALLINT

INT

INT

BIGINT

BIGINT

BOOLEAN

UINT8、BIT

REAL(FLOAT4)

FLOAT4

DOUBLE PRECISION(FLOAT8)

FLOAT8

DATE

DateDay

TIMETZ

FixedSizeBinary

TIME

TimeMicro

TIMESTAMP WITHOUT TIME ZONE

TimeStampMicro

TIMESTAMP WITH TIME ZONE

DateMilli

TEXT

VarChar

CHAR(n)

VARCHAR(n)

JSON

JSONB

BYTEA

VarBINARY

roaringbitmap

NUMERIC(m,n)

DECIMAL(m,n)

ARRAY (supports INT, BIGINT, FLOAT, BOOLEAN, DOUBLE, STRING)

ARRAY<type>