All Products
Search
Document Center

Hologres:EXTERNAL_FILES Function

Last Updated:Feb 06, 2026

Starting with Hologres V4.1, you can use the EXTERNAL_FILES function to directly query, import, and export structured data files stored in Object Storage Service (OSS) without creating foreign tables. This topic describes prerequisites, syntax, parameters, examples, and limits.

Overview

The EXTERNAL_FILES function enables you to query, import, and export structured data files in OSS without creating foreign tables. It supports the following scenarios:

  • Query data: Query CSV, Parquet, and ORC files in OSS.

  • Import data: Import data from OSS into a Hologres internal table.

  • Export data: Export data from a Hologres table to OSS.

Prerequisites

Version requirement

Your Hologres instance must run version V4.1 or later. If your instance runs an earlier version, upgrade it. For more information, see Instance upgrades.

Permission configuration

Only RAM users or Alibaba Cloud accounts can use EXTERNAL_FILES. Custom accounts are not supported. Before using EXTERNAL_FILES, complete the following steps. EXTERNAL_FILES accesses OSS through a RAM role. First, create a RAM role and grant it to the Hologres service. Then, grant OSS permissions to that role.

Create a role

  • Log on to the RAM console at RAM Roles. Go to the Roles page and click Create Role.

  • In the dialog box, set Trusted Entity Type to Cloud Service and Trusted Entity to Real-time Data Warehouse Hologres.

  • Enter a role name and complete the creation.

Configure OSS access permissions

Grant OSS permissions to the RAM role:

  • Read data: Grant the AliyunOSSReadOnlyAccess permission.

  • Write data: Grant the AliyunOSSFullAccess permission.

Grant the GrantAssumeRole permission to the SQL user

If the user who runs the SQL is not an Alibaba Cloud account, first grant them the GrantAssumeRole permission:

  • In the RAM console, create a permission policy. Go to Policies > Create Policy > Script Editor. Paste the following JSON. Replace Role_ARN with the same role_arn used in EXTERNAL_FILES. To find the Role ARN, see FAQ about RAM roles and STS tokens.

{
  "Version": "1",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": "hologram:GrantAssumeRole",
      "Resource": "<RoleARN>"
    }
  ]
}

Then attach this policy to the user who runs the query. For a RAM user, go to Users > Permissions > Add Permissions. For a RAM role, go to Roles > Permissions > Add Permissions.

Syntax

Query data

SELECT * FROM EXTERNAL_FILES(
    path = 'oss://bucket/path',
    format = 'csv|parquet|orc',
    oss_endpoint = 'oss_endpoint',
    role_arn = 'acs:ram::xxx:role/xxx'
    [, other parameters...]
) [AS (col1 type1, col2 type2, ...)]

Import data

INSERT INTO target_table 
SELECT * FROM EXTERNAL_FILES(
    path = 'oss://bucket/path',
    format = 'csv|parquet|orc',
    oss_endpoint = 'oss_endpoint',
    role_arn = 'acs:ram::xxx:role/xxx'
    [, other parameters...]
) [AS (col1 type1, col2 type2, ...)]

Export data

INSERT INTO EXTERNAL_FILES(
    path = 'oss://bucket/path',
    format = 'csv',
    oss_endpoint = 'oss_endpoint',
    role_arn = 'acs:ram::xxx:role/xxx'
    [, other parameters...]
) SELECT * FROM source_table;

Parameters

Common parameters

Parameter

Description

Required

Example

path

The file path. Supports directories and files. You can specify multiple paths separated by commas. Supports the ? wildcard (matches any single character) and the * wildcard (matches zero or more characters).

Yes

oss://bucket/dir/, oss://bucket/dir/*.csv

format

The file format. For queries, supports csv, parquet, and orc. For exports, supports only csv.

Yes

csv

oss_endpoint

The OSS classic network endpoint. For more information, see Regions and endpoints.

No

oss-cn-hangzhou-internal.aliyuncs.com

role_arn

The ARN of the Alibaba Cloud RAM role.

No

acs:ram::xxx:role/xxx

Read parameters

Parameter Name

Description

Required

Example

schema_deduce_file_num

The maximum number of files to scan when inferring the schema. Default: 5.

No

10

schema_deduce_file_order

The order in which files are scanned for schema inference. Default: latest_first.

No

latest_first, earliest_first, random

csv_skip_header

Whether to skip the first row of a CSV file as the table header. Default: false.

No

true

csv_delimiter

The column delimiter for CSV files. Default: comma.

No

|

Write parameters

Parameter name

Description

Required

Example

target_file_size_mb

The size of each output file in MB. Default: 10.

No

500

single_file

Whether to write all data to a single file. Default: false.

No

true

Examples

Example 1: Query a CSV file

Query a CSV file in OSS and infer the schema automatically:

SELECT * FROM EXTERNAL_FILES(
    path = 'oss://mybucket/data/',
    format = 'csv',
    csv_skip_header = 'true',
    csv_delimiter = ',',
    oss_endpoint = 'oss-cn-hangzhou-internal.aliyuncs.com',
    role_arn = 'acs:ram::123456789:role/hologres-role'
);

Query a CSV file and specify the schema manually:

SELECT * FROM EXTERNAL_FILES(
    path = 'oss://mybucket/data/',
    format = 'csv',
    csv_skip_header = 'true',
    oss_endpoint = 'oss-cn-hangzhou-internal.aliyuncs.com',
    role_arn = 'acs:ram::123456789:role/hologres-role'
) AS (id int, name text, amount decimal(10,2));

Example 2: Query a Parquet file

SELECT * FROM EXTERNAL_FILES(
    path = 'oss://mybucket/parquet_data/',
    format = 'parquet',
    oss_endpoint = 'oss-cn-hangzhou-internal.aliyuncs.com',
    role_arn = 'acs:ram::123456789:role/hologres-role'
);

Example 3: Query an ORC file

SELECT * FROM EXTERNAL_FILES(
    path = 'oss://mybucket/orc_data/',
    format = 'orc',
    oss_endpoint = 'oss-cn-hangzhou-internal.aliyuncs.com',
    role_arn = 'acs:ram::123456789:role/hologres-role'
);

Example 4: Import data into a Hologres table

-- Create the target table
CREATE TABLE orders (
    order_id int,
    customer_name text,
    amount decimal(10,2),
    PRIMARY KEY(order_id)
);

-- Import data from OSS
INSERT INTO orders 
SELECT * FROM EXTERNAL_FILES(
    path = 'oss://mybucket/orders/',
    format = 'csv',
    csv_skip_header = 'true',
    oss_endpoint = 'oss-cn-hangzhou-internal.aliyuncs.com',
    role_arn = 'acs:ram::123456789:role/hologres-role'
) AS (order_id int, customer_name text, amount decimal(10,2));

Example 5: Export data to OSS

Export data to multiple files:

INSERT INTO EXTERNAL_FILES(
    path = 'oss://mybucket/export/',
    format = 'csv',
    oss_endpoint = 'oss-cn-hangzhou-internal.aliyuncs.com',
    role_arn = 'acs:ram::123456789:role/hologres-role',
    target_file_size_mb = '100'
) SELECT * FROM orders;

Export data to a single file:

INSERT INTO EXTERNAL_FILES(
    path = 'oss://mybucket/export/',
    format = 'csv',
    oss_endpoint = 'oss-cn-hangzhou-internal.aliyuncs.com',
    role_arn = 'acs:ram::123456789:role/hologres-role',
    single_file = 'true'
) SELECT * FROM orders;

Example 6: Use a serverless resource group

SET hg_computing_resource = 'serverless';

SELECT * FROM EXTERNAL_FILES(
    path = 'oss://mybucket/data/',
    format = 'csv',
    oss_endpoint = 'oss-cn-hangzhou-internal.aliyuncs.com',
    role_arn = 'acs:ram::123456789:role/hologres-role'
);

Schema inference rules

Automatic schema inference

If you do not specify a schema using the AS clause, Hologres infers the schema automatically:

  • Parquet/ORC: Infer from file metadata.

  • CSV with headers: Infer from the first row (header) and data content.

  • CSV without headers: All files must have identical schemas.

Hologres scans the number of files specified by schema_deduce_file_num and takes the union of all schemas.

Schema inference behavior

Scenario

Action

A column in the schema does not exist in the file

Fill with NULL

A column in the file does not exist in the schema

Ignore the column

Column types do not match but are convertible

Automatic Type Conversion

Column types do not match and are not convertible

Return NULL

Type mapping

When EXTERNAL_FILES reads ORC or Parquet files, it maps file types to PostgreSQL types. The mappings are listed below. Unsupported types are noted under each table.

ORC type mapping

ORC type

PostgreSQL type

BOOLEAN

BOOLEAN

TINYINT / SMALLINT

SMALLINT

INT

INTEGER

BIGINT

BIGINT

FLOAT

REAL

DOUBLE

DOUBLE PRECISION

DECIMAL(p, s)

DECIMAL(p, s)

STRING

TEXT

VARCHAR(n)

VARCHAR(n)

CHAR(n)

CHAR(n)

BINARY

BYTEA

DATE

DATE

TIMESTAMP

TIMESTAMP WITHOUT TIME ZONE

TIMESTAMP WITH LOCAL TIMEZONE

TIMESTAMP WITH TIME ZONE

LIST

Array (pg_type[])

Note

ORC types UNION, STRUCT, and MAP are not supported.

Parquet type mapping

Parquet physical type

Parquet logical type

PostgreSQL type

BOOLEAN

BOOLEAN

INT32

INTEGER

INT32

DATE

DATE

INT32

DECIMAL(p, s)

DECIMAL(p, s)

INT64

BIGINT

INT64

TIMESTAMP_MILLIS / MICROS

TIMESTAMP / TIMESTAMPTZ

INT64

DECIMAL(p, s)

DECIMAL(p, s)

FLOAT

REAL

DOUBLE

DOUBLE PRECISION

BYTE_ARRAY

BYTEA

BYTE_ARRAY

STRING

TEXT

BYTE_ARRAY

JSON / BSON

JSONB

BYTE_ARRAY

ENUM

TEXT

FIXED_LEN_BYTE_ARRAY

DECIMAL(p, s)

DECIMAL(p, s)

FIXED_LEN_BYTE_ARRAY

UUID

UUID

LIST

LIST

Array (pg_type[])

Note

Parquet types STRUCT and MAP are not supported.

Limits

  • Export format: Only CSV is supported for exports.

  • Recursive search: Subdirectories are not searched recursively.

  • Unsupported types: ORC types UNION, STRUCT, and MAP. Parquet types STRUCT and MAP.

FAQ

Q: I get a permission error when exporting data. What should I do?

A: Make sure the RAM role has the AliyunOSSFullAccess permission.

Q: How do I control the size of exported files?

A: Use the target_file_size_mb parameter to set the size of each file. For finer control, adjust the hg_experimental_query_batch_size parameter.