All Products
Search
Document Center

Hologres:EXTERNAL_FILES

Last Updated:Mar 26, 2026

Starting with Hologres V4.1, you can use the EXTERNAL_FILES function to directly query, import, and export structured data files stored in Object Storage Service (OSS) without creating foreign tables. This topic describes prerequisites, syntax, parameters, examples, and limits.

Overview

EXTERNAL_FILES lets you run SQL queries directly against CSV, Parquet, and ORC files in OSS — no foreign table required. Use it for three scenarios:

  • Query data: Run SELECT statements against CSV, Parquet, and ORC files in OSS.

  • Import data: Load data from OSS files into a Hologres internal table.

  • Export data: Write data from a Hologres table to OSS as CSV files.

Prerequisites

Before you begin, ensure that you have:

  • A Hologres instance running V4.1 or later. To upgrade, see Instance upgrades.

  • An Alibaba Cloud account or RAM user. Custom accounts are not supported.

Permission configuration

EXTERNAL_FILES accesses OSS through a RAM role. Complete the following steps before running EXTERNAL_FILES.

Step 1: Create a RAM role

  1. Log on to the RAM console and go to RAM Roles.

  2. Click Create Role.

  3. In the dialog box, set Trusted Entity Type to Cloud Service and Trusted Entity to Real-time Data Warehouse Hologres.

  4. Enter a role name and complete the creation.

Step 2: Grant OSS permissions to the role

Attach an OSS policy to the RAM role based on the operations you need:

  • Read data: Attach AliyunOSSReadOnlyAccess.

  • Write data: Attach AliyunOSSFullAccess.

Step 3: Grant the GrantAssumeRole permission to RAM users

Skip this step if the user running the SQL is an Alibaba Cloud account. For RAM users, grant them the GrantAssumeRole permission as follows:

  1. In the RAM console, go to Policies and click Create Policy.

  2. Switch to Script Editor and paste the following JSON. Replace <RoleARN> with the ARN of the role you created. To find the ARN, see FAQ about RAM roles and STS tokens.

    {
      "Version": "1",
      "Statement": [
        {
          "Effect": "Allow",
          "Action": "hologram:GrantAssumeRole",
          "Resource": "<RoleARN>"
        }
      ]
    }
  3. Attach this policy to the user:

    • For a RAM user: go to Users > Permissions > Add Permissions.

    • For a RAM role: go to Roles > Permissions > Add Permissions.

Syntax

Query data

SELECT * FROM EXTERNAL_FILES(
    path = 'oss://bucket/path',
    format = 'csv|parquet|orc',
    oss_endpoint = 'oss_endpoint',
    role_arn = 'acs:ram::xxx:role/xxx'
    [, other parameters...]
) [AS (col1 type1, col2 type2, ...)]

Import data

INSERT INTO target_table
SELECT * FROM EXTERNAL_FILES(
    path = 'oss://bucket/path',
    format = 'csv|parquet|orc',
    oss_endpoint = 'oss_endpoint',
    role_arn = 'acs:ram::xxx:role/xxx'
    [, other parameters...]
) [AS (col1 type1, col2 type2, ...)]

Export data

INSERT INTO EXTERNAL_FILES(
    path = 'oss://bucket/path',
    format = 'csv',
    oss_endpoint = 'oss_endpoint',
    role_arn = 'acs:ram::xxx:role/xxx'
    [, other parameters...]
) SELECT * FROM source_table;

Parameters

Common parameters

Parameter Type Description Required Default Example
path STRING The OSS file path. Supports directories, individual files, and multiple comma-separated paths. Supports wildcards — see table below. Yes oss://bucket/dir/, oss://bucket/dir/*.csv
format STRING The file format. Supported formats for queries: csv, parquet, orc. Exports support csv only. Yes csv
oss_endpoint STRING The OSS classic network endpoint. For endpoint values, see Regions and endpoints. No oss-cn-hangzhou-internal.aliyuncs.com
role_arn STRING The ARN of the RAM role used to access OSS. No acs:ram::xxx:role/xxx

Wildcards supported in `path`

Wildcard Description Example
* Matches zero or more characters oss://bucket/dir/*.csv
? Matches any single character oss://bucket/dir/file?.csv

Read parameters

Parameter Type Description Required Default
schema_deduce_file_num INT Maximum number of files to scan when inferring the schema. No 5
schema_deduce_file_order STRING Order in which files are scanned for schema inference. Valid values: latest_first, earliest_first, random. No latest_first
csv_skip_header BOOLEAN Specifies whether to treat the first row of a CSV file as a header and skip it during reads. No false
csv_delimiter STRING The column delimiter for CSV files. No , (comma)

Write parameters

Parameter Type Description Required Default
target_file_size_mb INT The target size of each output file, in MB. No 10
single_file BOOLEAN Specifies whether to write all exported data into a single file. No false

Examples

Query a CSV file

Query a CSV file with automatic schema inference:

SELECT * FROM EXTERNAL_FILES(
    path = 'oss://mybucket/data/',
    format = 'csv',
    csv_skip_header = 'true',
    csv_delimiter = ',',
    oss_endpoint = 'oss-cn-hangzhou-internal.aliyuncs.com',
    role_arn = 'acs:ram::123456789:role/hologres-role'
);

Query a CSV file with a manually specified schema:

SELECT * FROM EXTERNAL_FILES(
    path = 'oss://mybucket/data/',
    format = 'csv',
    csv_skip_header = 'true',
    oss_endpoint = 'oss-cn-hangzhou-internal.aliyuncs.com',
    role_arn = 'acs:ram::123456789:role/hologres-role'
) AS (id int, name text, amount decimal(10,2));

Query a Parquet file

SELECT * FROM EXTERNAL_FILES(
    path = 'oss://mybucket/parquet_data/',
    format = 'parquet',
    oss_endpoint = 'oss-cn-hangzhou-internal.aliyuncs.com',
    role_arn = 'acs:ram::123456789:role/hologres-role'
);

Query an ORC file

SELECT * FROM EXTERNAL_FILES(
    path = 'oss://mybucket/orc_data/',
    format = 'orc',
    oss_endpoint = 'oss-cn-hangzhou-internal.aliyuncs.com',
    role_arn = 'acs:ram::123456789:role/hologres-role'
);

Import data into a Hologres table

-- Create the target table
CREATE TABLE orders (
    order_id int,
    customer_name text,
    amount decimal(10,2),
    PRIMARY KEY(order_id)
);

-- Import data from OSS
INSERT INTO orders
SELECT * FROM EXTERNAL_FILES(
    path = 'oss://mybucket/orders/',
    format = 'csv',
    csv_skip_header = 'true',
    oss_endpoint = 'oss-cn-hangzhou-internal.aliyuncs.com',
    role_arn = 'acs:ram::123456789:role/hologres-role'
) AS (order_id int, customer_name text, amount decimal(10,2));

Export data to OSS

Export to multiple files with a size limit per file:

INSERT INTO EXTERNAL_FILES(
    path = 'oss://mybucket/export/',
    format = 'csv',
    oss_endpoint = 'oss-cn-hangzhou-internal.aliyuncs.com',
    role_arn = 'acs:ram::123456789:role/hologres-role',
    target_file_size_mb = '100'
) SELECT * FROM orders;

Export all data to a single file:

INSERT INTO EXTERNAL_FILES(
    path = 'oss://mybucket/export/',
    format = 'csv',
    oss_endpoint = 'oss-cn-hangzhou-internal.aliyuncs.com',
    role_arn = 'acs:ram::123456789:role/hologres-role',
    single_file = 'true'
) SELECT * FROM orders;

Use a serverless resource group

SET hg_computing_resource = 'serverless';

SELECT * FROM EXTERNAL_FILES(
    path = 'oss://mybucket/data/',
    format = 'csv',
    oss_endpoint = 'oss-cn-hangzhou-internal.aliyuncs.com',
    role_arn = 'acs:ram::123456789:role/hologres-role'
);

Schema inference

Automatic inference

If you omit the AS clause, Hologres infers the schema automatically:

  • Parquet and ORC: Schema is inferred from file metadata.

  • CSV with a header row: Schema is inferred from the header and data content. Set csv_skip_header = 'true' so the header row is not read as data.

  • CSV without a header row: All files in the path must have identical column structures.

Hologres scans up to schema_deduce_file_num files (default: 5) and takes the union of all discovered schemas.

Schema mismatch behavior

When the inferred or specified schema does not exactly match the file contents, Hologres handles mismatches as follows:

Scenario Behavior
Column exists in schema but not in the file Filled with NULL
Column exists in the file but not in schema Ignored
Column types differ but are convertible Automatic type conversion
Column types differ and are not convertible Returns NULL

Example: querying files with mismatched columns

Suppose you have two CSV files in the same path:

orders_jan.csv — 3 columns: order_id, customer_name, amount

orders_feb.csv — 4 columns: order_id, customer_name, amount, region

When you query the path without specifying a schema, Hologres takes the union of both schemas. Rows from orders_jan.csv return NULL in the region column:

SELECT * FROM EXTERNAL_FILES(
    path = 'oss://mybucket/orders/',
    format = 'csv',
    csv_skip_header = 'true',
    oss_endpoint = 'oss-cn-hangzhou-internal.aliyuncs.com',
    role_arn = 'acs:ram::123456789:role/hologres-role'
);
order_id customer_name amount region
1001 Alice 99.00 NULL
1002 Bob 149.00 NULL
2001 Carol 79.00 us-west
2002 Dave 199.00 eu-central

To avoid unexpected NULLs, specify the schema explicitly using the AS clause, or ensure all files share the same column structure.

Type mapping

ORC type mapping

ORC type PostgreSQL type
BOOLEAN BOOLEAN
TINYINT / SMALLINT SMALLINT
INT INTEGER
BIGINT BIGINT
FLOAT REAL
DOUBLE DOUBLE PRECISION
DECIMAL(p, s) DECIMAL(p, s)
STRING TEXT
VARCHAR(n) VARCHAR(n)
CHAR(n) CHAR(n)
BINARY BYTEA
DATE DATE
TIMESTAMP TIMESTAMP WITHOUT TIME ZONE
TIMESTAMP WITH LOCAL TIMEZONE TIMESTAMP WITH TIME ZONE
LIST Array (pg_type[])
ORC types UNION, STRUCT, and MAP are not supported.

Parquet type mapping

Parquet physical type Parquet logical type PostgreSQL type
BOOLEAN BOOLEAN
INT32 INTEGER
INT32 DATE DATE
INT32 DECIMAL(p, s) DECIMAL(p, s)
INT64 BIGINT
INT64 TIMESTAMP_MILLIS / MICROS TIMESTAMP / TIMESTAMPTZ
INT64 DECIMAL(p, s) DECIMAL(p, s)
FLOAT REAL
DOUBLE DOUBLE PRECISION
BYTE_ARRAY BYTEA
BYTE_ARRAY STRING TEXT
BYTE_ARRAY JSON / BSON JSONB
BYTE_ARRAY ENUM TEXT
FIXED_LEN_BYTE_ARRAY DECIMAL(p, s) DECIMAL(p, s)
FIXED_LEN_BYTE_ARRAY UUID UUID
LIST LIST Array (pg_type[])
Parquet types STRUCT and MAP are not supported.

Limitations

  • Export format: Only CSV is supported for exports.

  • Recursive directory search: Subdirectories are not searched recursively. Use wildcards in the path parameter to match files across a flat directory.

  • Unsupported ORC types: UNION, STRUCT, and MAP.

  • Unsupported Parquet types: STRUCT and MAP.

  • Supported accounts: Only RAM users and Alibaba Cloud accounts. Custom accounts are not supported.

FAQ

I get a permission error when exporting data. What should I do?

Make sure the RAM role has the AliyunOSSFullAccess permission. Read operations only require AliyunOSSReadOnlyAccess, but write operations (exports) require full access.

How do I control the number and size of exported files?

Use target_file_size_mb to set the target size for each output file (default: 10 MB). For finer-grained control over the number of rows per file, adjust the hg_experimental_query_batch_size parameter.