Object tables map files stored in OSS into a queryable Hologres table—similar to an external table, but purpose-built for unstructured data such as images, PDFs, audio, and video. Hologres V4.0 and later support this feature.
With object tables, you can:
-
Query OSS file metadata using standard SQL without moving data out of OSS.
-
Process unstructured data end-to-end by combining object tables with Dynamic Tables and AI Functions—no external embedding services required.
-
Incrementally detect file changes to avoid reprocessing unchanged files and reduce compute costs.
Prerequisites
Before you begin, ensure that you have:
-
A Hologres instance running V4.0 or later. To upgrade, see Upgrade an instance
-
An OSS bucket containing the files you want to access
-
A RAM role with read access to the OSS bucket. To find the role ARN (Alibaba Cloud Resource Name), go to the Resource Access Management (RAM) console, click Identity Management > Role, and then click the role name
Limitations
-
Only classic network endpoints are supported.
-
Only manual refresh is supported. There is no automatic refresh.
-
By default, object tables scan only the top-level directory of the specified path. Subdirectories are not included. To scan subdirectories recursively, run:
NoteRecursive scans consume more resources than top-level scans. Set
pathto the most specific directory that contains your target files.ALTER DATABASE <db_name> SET hg_experimental_enable_oss_meta_recursive = on;
Create an object table
Object tables have a fixed schema—you do not specify columns when creating one.
Syntax:
CREATE OBJECT TABLE [IF NOT EXISTS] [schema_name.]<table_name>
WITH (
-- Required
path = '<oss_path>',
oss_endpoint = '<oss_endpoint>',
role_arn = '<role_arn>',
-- Optional
[orientation = 'column | row | row,column',]
[table_group = '<table_group_name>',]
[distribution_key = '<column_name>[,...]',]
[clustering_key = '<column_name>[:asc][,...]',]
[storage_mode = 'hot | cold',]
[event_time_column = '<column_name>[,...]',]
[bitmap_columns = '<column_name>[,...]',]
[dictionary_encoding_columns = '<column_name>[,...]',]
[time_to_live_in_seconds = '<non_negative_integer>']
);
After creating an object table, refresh it once to load data:
REFRESH OBJECT TABLE [schema_name.]<table_name>;
Required parameters
| Parameter | Description | Example |
|---|---|---|
path |
The OSS directory path containing your files. The object table reads metadata from files in this directory. | oss://my-bucket/my-dir |
oss_endpoint |
The OSS classic network endpoint for the bucket region. For endpoints by region, see Regions and endpoints. | oss-ap-southeast-1-internal.aliyuncs.com |
role_arn |
The ARN of a RAM role with read access to the OSS bucket. | acs:ram::role-id:role/role-name |
Optional parameters
| Parameter | Default | Description |
|---|---|---|
orientation |
column |
Storage format. Supported values: column, row, row,column. See Table storage formats. |
table_group |
Default table group | The table group and shard count for the object table. |
distribution_key |
object_uri |
The distribution key. |
clustering_key |
object_uri |
The clustering key. |
storage_mode |
hot |
Storage mode. Supported values: hot (hot storage), cold (cold storage). See Data tiering. |
event_time_column |
last_modified_at |
The event time column (segment key). |
bitmap_columns |
object_uri:auto,etag:auto |
The bitmap index columns. |
dictionary_encoding_columns |
object_uri:auto,etag:auto |
Dictionary encoding columns. |
time_to_live_in_seconds |
3153600000 |
Data lifecycle for the table in seconds. |
Query an object table
After the initial refresh, query an object table like any standard table:
SELECT * FROM <object_table_name>;
Object tables have the following fixed columns:
| Column | Type | Description |
|---|---|---|
object_uri |
TEXT | The exact OSS file path. |
etag |
TEXT | A unique identifier for the file content, assigned when the object is created. |
file |
FILE | A JSON-like structure containing detailed file metadata. Parse it using JSON and JSONB types. Fields include: object_uri, etag, size, last_modified_at, owner_name, object_source (always OSS), oss_endpoint, object_table_id, and role_arn. |
metadata |
JSON | Custom metadata attached to the file. |
To view the DDL or storage properties of an object table, use either of the following:
-- View DDL
SELECT hg_dump_script('[schema_name.]<object_table_name>');
-- View storage properties
SELECT * FROM hologres.hg_table_properties WHERE table_name = '<object_table_name>';
Refresh an object table
Refresh an object table to sync file metadata from OSS. Only manual refresh is supported.
REFRESH OBJECT TABLE [schema_name.]<table_name>;
Refreshing consumes compute resources. To minimize cost, set path to the most specific directory that contains your target files.
Manage an object table
Rename or move an object table
-- Rename
ALTER OBJECT TABLE [IF EXISTS] <table_name> RENAME TO <new_name>;
-- Move to a different schema
ALTER OBJECT TABLE [IF EXISTS] [schema_name.]<table_name> SET SCHEMA <new_schema>;
Delete an object table
-- Moves to the recycle bin if it is enabled; otherwise, permanently deletes
DROP OBJECT TABLE [IF EXISTS] <table_name>;
-- Permanently deletes without moving to the recycle bin
DROP OBJECT TABLE [IF EXISTS] <table_name> FORCE;
If the database recycle bin is enabled, dropping an object table moves it to the recycle bin automatically. The object table can be recovered using the recycle bin recovery command, and it remains an object table after recovery. For details, see Recycle bin.
The FORCE option permanently deletes the object table. Recovery is not possible.
Best practices
Object tables work with Dynamic Tables, AI Functions, vector search, and full-text search to automate processing, searching, and analysis of unstructured data. For details, see: