Hologres V4.1 and later supports near real-time data import using the internal temporary storage Stage. This approach balances throughput and resource usage while delivering minute-level latency. This topic covers Stage capabilities, management operations—such as create, delete, and query—along with usage patterns—including writing to Stage and loading data from Stage into internal tables—and Arrow data type mapping.
Overview
Hologres provides two primary data import modes, each suited to different business scenarios:
Fixed Plan Real-time Streaming Write: Designed for streaming write workloads. End-to-end latency is low—typically in milliseconds—making it suitable for applications with strict real-time requirements. However, this mode incurs relatively high write resource overhead.
Bulkload Offline Batch Import: Optimized for high-throughput, large-scale data loading, such as T+1 offline data warehouse synchronization. End-to-end latency is high, so it only satisfies offline business needs.
If your business requires minute-level near real-time processing, offline batch import cannot meet timeliness requirements. While real-time streaming write can satisfy those requirements, it introduces significant write resource overhead. Starting with Hologres V4.1, near real-time data import is implemented using the internal temporary storage Stage. This approach effectively balances timeliness and resource efficiency.
Stage Introduction
Stage is Hologres’s internal high-performance temporary storage layer. Its core attributes include the following:
Internal hosting: Managed automatically by Hologres. No external storage configuration or operation is required.
Structured cache: Organizes data according to the target table’s schema, ensuring type and format consistency.
Transaction safety: Supports ACID semantics to guarantee reliability and consistency during writes.
Automatic merge commit: Connectors such as Spark automatically write data from temporary storage to main storage in optimized batches, improving efficiency.
Implement near real-time import using Stage with the following flow:
Write external data to Stage.
Batch the data stored in Stage and write it to internal tables.
Data written to internal tables becomes immediately visible for queries.
Stage Advantages
Balances latency and throughput: Data becomes queryable within minutes after ingestion—significantly faster than offline batch import—and supports much higher throughput capacity than real-time streaming write.
Simplifies the data link: Write data directly to Hologres’s internal Stage without dependencies on external systems. Integration is supported through standard SQL COPY and INSERT syntax, along with official connectors.
Improves resource efficiency: Stage automatically merges small files, enhancing subsequent read and write performance. It supports Serverless Computing and independent virtual warehouses to isolate Stage write resources, thereby improving system stability.
Supports a wide range of scenarios: Near real-time write to Stage is supported by the Spark Connector, Flink Connector, and Holo Client.
Conditions
Near real-time import into Hologres via Stage supports only the Arrow format. For the data type mapping between Arrow format and Hologres, see data type mapping.
Permissions
SPM or SLPM:
Create Stage: You must have writer or higher permissions, or be a Superuser.
Write, delete, or read Stage: You must be the creator of the corresponding Stage and have writer or higher permissions, or be a Superuser.
Standard PostgreSQL Authorization Model:
Create Stage: You must have the
pg_operate_internal_stagesrole permission, or be a Superuser.Write, delete, or read Stage: You must be the creator of the corresponding Stage and have the
pg_operate_internal_stagesrole permission, or be a Superuser.
Grant the pg_operate_internal_stages role permission to a user using the following command:
-- Grant Stage operation permission to the specified user
-- Replace variables: <user_name> with the actual username (such as a RAM sub-account name)
GRANT pg_operate_internal_stages TO "<user_name>";Manage Stage
Create Stage
Command syntax:
-- Create an Internal Stage
-- Replace variables:
-- <internal_stage_name>: Stage name, required, maximum 128 characters
-- <group_name>: Group name, optional, defaults to internal_stage_name
-- <ttl_in_seconds>: TTL (seconds), optional, defaults to 7200, maximum 864000
CALL HOLOGRES.HG_CREATE_INTERNAL_STAGE(
'<internal_stage_name>',
['<group_name>'],
['<ttl_in_seconds>']
);Parameter description:
Parameter name | Required | Description |
internal_stage_name | Required | Internal Stage name. Supports letters (case-sensitive), numbers, underscores, hyphens. Maximum length is 128 characters. |
group_name | Optional | The group to which the Stage belongs, used for data classification. Defaults to internal_stage_name. |
ttl_in_seconds | Optional | Stage TTL, in seconds. It starts from last_modified_time (viewable from the system table |
Delete Stage
Command syntax:
-- Delete the specified Internal Stage
-- Replace variables: <internal_stage_name> with the Stage name to delete
CALL HOLOGRES.HG_DROP_INTERNAL_STAGE(
'<internal_stage_name>'
);internal_stage_name: Required. The name of the internal Stage.
Query Stage Status
Query Stage status using the system view hologres.hg_internal_stages:
-- Query all Stages or filter by name
-- Replace variables: If <internal_stage_name> is replaced with a Stage name, query only that Stage; omit WHERE to query all.
SELECT * FROM hologres.hg_internal_stages
[WHERE stage_name = '<internal_stage_name>'];The view includes the following fields:
Field Name | Meaning |
stage_name | Stage name |
group_name | Stage group |
ttl_in_seconds | Stage TTL (seconds) |
create_time | Stage creation time |
create_user | User who created the Stage |
create_application_name | Create an application for the Stage |
create_session_id | Session ID that created the Stage |
last_modified_time | Stage last modified time |
stage_bytes | Stage storage size (Byte) |
file_count | Number of Stage files |
Query Stage Files
Query Stage files using the system view hologres.hg_internal_stage_files:
-- Query the list of files under a Stage, filterable by fuzzy match on name
-- Replace variables: <internal_stage_name> is the Stage name; <pattern%> is a fuzzy match for the file name (such as 'batch_%')
SELECT * FROM hologres.hg_internal_stage_files
[WHERE stage_name = '<internal_stage_name>']
[AND file_name like '<pattern%>'];The view includes the following fields:
Field Name | Meaning |
stage_name | Stage name |
file_name | File name |
file_size | File storage size (Byte) |
last_modified_time | File last modified time |
is_complete | Is the file written successfully? True means the write succeeded. False means it is being written or the write failed. |
Delete Stage Files
Delete Stage files using the system function hologres.hg_remove_internal_stage_file:
-- Delete the specified file
-- Replace variables: <stage_name> is the Stage name; <file_name> is the file name to delete
SELECT hologres.hg_remove_internal_stage_file ('<stage_name>', '<file_name>');
-- Batch delete Stage files
-- Replace variables: <stage_name> is the Stage name; <glob_pattern> is a file name wildcard (such as '*.arrow'); <pattern%> is a fuzzy match for the file name in WHERE
SELECT
stage_name,
file_name,
hologres.hg_remove_internal_stage_file (stage_name, file_name) AS hg_remove_internal_stage_file
FROM
hologres.hg_list_internal_stage_files ('<stage_name>',['<glob_pattern>'])
[WHERE file_name like '<pattern%>'];Use Stage
Client Write to Stage
Command syntax:
-- Write client data stream to the specified file in Stage
-- Replace variables: <internal_stage_name> is the name of the created Stage; <file_name> is the file name to write (supports letters, numbers, underscores, hyphens, periods, maximum 128 characters)
COPY EXTERNAL_FILES(
path = 'internal_stage://<internal_stage_name>/<file_name>'
) FROM STDIN;When writing to Stage using the COPY command, you cannot configure the WITH parameter. Instead, define the Stage file path using the path parameter. Valid characters include letters (case-sensitive), numbers, underscores, hyphens, and periods. The maximum length is 128 characters.
Read from Stage and Write to Internal Tables
Command syntax:
-- Read Arrow files from Stage and write to internal tables
-- Replace variables:
-- <table_name>: Target internal table name
-- <col_name>: Column name (optional, corresponds one-to-one with target table columns when specified)
-- <internal_stage_name>: Stage name. Specify multiple names separated by commas to read from multiple Stages.
-- <col_type>: Column type (optional, use only when specifying column types in the AS clause)
INSERT INTO <table_name> [ ( <col_name> [ , <col_name> ... ] ) ]
SELECT *
FROM EXTERNAL_FILES(
path = 'internal_stage://<internal_stage_name>, internal_stage://<internal_stage_name>',
format = arrow
)
[AS ( <col_name> <col_type>[ , <col_name> <col_type> ... ] )];Reading from Stage supports only Arrow format files.
Usage Examples
Write to Stage Using Holo-client
This example demonstrates how to perform near real-time writes to Stage using Holo-client.
Maven dependency:
<dependency>
<groupId>com.alibaba.hologres</groupId>
<artifactId>holo-client</artifactId>
<version>2.7.0</version>
</dependency>Example code:
import com.alibaba.hologres.client.HoloClient;
import com.alibaba.hologres.client.HoloConfig;
import com.alibaba.hologres.client.Put;
import com.alibaba.hologres.client.copy.CopyUtil;
import com.alibaba.hologres.client.copy.in.CopyInStageWrapper;
import com.alibaba.hologres.client.copy.in.arrow.RecordArrowWriter;
import com.alibaba.hologres.client.model.OnConflictAction;
import com.alibaba.hologres.client.model.TableSchema;
import java.sql.Connection;
import java.sql.DriverManager;
import java.util.ArrayList;
import java.util.Collections;
import java.util.List;
public class CopyStageDemo {
public static void main(String[] args) throws Exception {
// Note: The jdbcUrl must use the jdbc:hologres protocol
String jdbcUrl = "jdbc:hologres://host:port/db";
String username = "";
String password = "";
/*
CREATE TABLE copy_stage_demo (id INT NOT NULL, name TEXT NOT NULL, address TEXT, PRIMARY KEY(id));
*/
String tableName = "copy_stage_demo";
HoloConfig config = new HoloConfig();
config.setJdbcUrl(jdbcUrl);
config.setUsername(username);
config.setPassword(password);
config.setRegion("local");
// Create a temporary Stage name
String stageName = "temp_stage_" + System.currentTimeMillis();
try (Connection conn = DriverManager.getConnection(jdbcUrl, username, password);
HoloClient client = new HoloClient(config)) {
// Create an internal Stage
String createStageSql =
"call hologres.hg_create_internal_stage('"
+ stageName
+ "', 'default_group', 7200);";
try (java.sql.Statement stmt = conn.createStatement()) {
stmt.execute(createStageSql);
}
// Get table schema
TableSchema schema = client.getTableSchema(tableName);
// Define columns to write
List<String> columns = new ArrayList<>();
columns.add("id");
columns.add("name");
columns.add("address");
// Use RecordArrowWriter and CopyInStageWrapper to write data to Stage
try (RecordArrowWriter arrowWriter =
new RecordArrowWriter(
schema,
columns,
8192 // maxBatchSize, 1024 rows of data form an Arrow RecordBatch
);
CopyInStageWrapper<com.alibaba.hologres.client.model.Record> copyInStage =
new CopyInStageWrapper<>(
config,
stageName,
"data_file", // File name prefix
arrowWriter,
64 * 1024 * 1024 // fileSizeLimit, each file size 64 MB
)) {
// Write 10 records
for (int i = 0; i < 10; ++i) {
Put put = new Put(schema);
// Keep consistent with CopyInStageWrapper's columns
put.setObject("id", i);
put.setObject("name", "name" + i);
put.setObject("address", "address" + i);
copyInStage.putRecord(put.getRecord());
}
// Call close before the program ends to ensure data is fully written
// The demo uses try-with-resources, no need to manually close
// copyInStage.close();
}
// Generate an INSERT statement to write from Stage to the target table
String insertSql =
CopyUtil.buildInsertTableSelectFromStageSql(
schema,
columns,
Collections.singletonList(stageName),
OnConflictAction.INSERT_OR_UPDATE);
try (java.sql.Statement stmt = conn.createStatement()) {
stmt.execute(insertSql);
}
// Verify write results
try (java.sql.Statement stmt = conn.createStatement()) {
try (java.sql.ResultSet rs = stmt.executeQuery("select * from " + tableName)) {
while (rs.next()) {
System.out.println(
"id: "
+ rs.getInt(1)
+ ", name: "
+ rs.getString(2)
+ ", address: "
+ rs.getString(3));
}
}
}
} finally {
// Clean up the temporary Stage. If not cleaned, it will be automatically cleaned based on TTL.
try (Connection conn = DriverManager.getConnection(jdbcUrl, username, password);
java.sql.Statement stmt = conn.createStatement()) {
String dropStageSql = "call hologres.hg_drop_internal_stage('" + stageName + "');";
stmt.execute(dropStageSql);
} catch (Exception e) {
System.err.println("Error cleaning up Stage: " + e.getMessage());
}
}
}
}Data Type Mapping
During near real-time import via Stage, the data type mapping between Arrow format and Hologres is as follows. Holo-client automatically converts data types during writes.
Hologres Data Type | Arrow Data Type |
SMALLINT | SMALLINT |
INT | INT |
BIGINT | BIGINT |
BOOLEAN | UINT8、BIT |
REAL(FLOAT4) | FLOAT4 |
DOUBLE PRECISION(FLOAT8) | FLOAT8 |
DATE | DateDay |
TIMETZ | FixedSizeBinary |
TIME | TimeMicro |
TIMESTAMP WITHOUT TIME ZONE | TimeStampMicro |
TIMESTAMP WITH TIME ZONE | DateMilli |
TEXT | VarChar |
CHAR(n) | — |
VARCHAR(n) | — |
JSON | — |
JSONB | — |
BYTEA | VarBINARY |
roaringbitmap | — |
NUMERIC(m,n) | DECIMAL(m,n) |
ARRAY (supports INT, BIGINT, FLOAT, BOOLEAN, DOUBLE, STRING) | ARRAY<type> |