All Products
Search
Document Center

MaxCompute:OBJECT TABLE definition

Last Updated:Mar 26, 2026

An Object Table maps an OSS directory to a queryable metadata table inside MaxCompute. It lets the SQL engine read, filter, and process unstructured data — images, videos, PDFs, log files — at the same scale as structured data, without moving any files out of OSS.

With an Object Table, you can:

  • Query file metadata with SQL — list all objects in an OSS directory and filter by name, size, type, or last-modified time using standard SELECT statements.

  • Read object content at scale — use the built-in GET_DATA_FROM_OSS function inside SQL queries to stream object content into distributed compute workers for large-scale processing.

  • Keep metadata fresh automatically — cache OSS metadata inside MaxCompute and configure manual, periodic, or scheduled refreshes to stay in sync with changes in OSS.

  • Process unstructured data with custom logic — upload custom images to build UDFs (user-defined functions) that run alongside the SQL engine and write results to internal or foreign tables. Object Tables will also support writing unstructured results back to OSS in a future release.

  • Use the Maxframe engine — Object Tables support the Maxframe engine for the Python ecosystem.

Creating an Object Table does not copy or modify your OSS data. MaxCompute caches only the object metadata. The actual files remain in OSS.

Prerequisites

Before you begin, make sure you have:

  • A MaxCompute project with the Schema feature enabled. See Enable the Schema feature.

  • The MaxCompute 2.0 data type system enabled in your project.

  • Completed the one-click authorization for AliyunODPSDefaultRole, which grants MaxCompute access to OSS via a Security Token Service (STS) token. One-click authorization is only available when the ProjectOwner of the MaxCompute project is the same Alibaba Cloud account that owns the OSS bucket.

Limitations

  • Object Tables do not support partitions.

  • Both manual and periodic refreshes are full refreshes.

Billing

Metadata storage: MaxCompute charges storage fees for the metadata cached in an Object Table. See Storage fees. OSS charges separately for the actual data stored and accessed in buckets. See OSS storage fees.

Refresh tasks: The inputsize for each file scanned during a refresh is based on its metadata size, not the actual file size. Refresh costs scale with the number of files, not their size. See SQL billing for foreign tables.

Compute: Running queries against an Object Table generates compute fees.

Create an Object Table

Syntax

CREATE OBJECT TABLE [IF NOT EXISTS] <objecttable_name>
WITH SERDEPROPERTIES ('<key>' = '<value>')
LOCATION '<location>'
[TBLPROPERTIES ('<key>' = '<value>')]
[COMMENT '<comment>'];
You do not need to define columns. MaxCompute provides the metadata columns automatically.

Parameters

Parameter Required Description
objecttable_name Yes The table name.
SERDEPROPERTIES Yes Specifies the RAM role for authentication. The key is odps.properties.rolearn and the value is the role ARN in the format acs:ram::<uid>:role/aliyunodpsdefaultrole. If omitted, AliyunODPSDefaultRole under the current Alibaba Cloud account is used. To get the account UID, see View RAM user information. For STS mode authorization details, see Grant permissions to a common RAM role in STS mode for OSS.
location Yes The OSS path to map. Format: oss://<oss_endpoint>/<bucket_name>/<oss_directory_name>/. The Object Table indexes all objects in that directory. Use an internal network endpoint, not a public one — using a public endpoint causes connection errors. To get the internal endpoint: log on to the OSS console, on the Buckets page click the target Bucket Name to open the Objects page, then in the Port section of the Overview page obtain the Endpoint for Access from ECS over the Classic Network (internal network).
metadata.cache.mode No Refresh trigger mode. See Choose a refresh mode. Default: manual.
metadata.staleness.seconds No Refresh interval in seconds for periodic mode. Range: 1–604800 (1 second to 1 week). Not a strict guarantee; the scheduler uses this as a target. Required when metadata.cache.mode is periodic.
metadata.crontab.expression No Cron expression for crontab mode. For example, 0 0 14 * * ? runs a refresh daily at 2:00 PM. Required when metadata.cache.mode is crontab.
comment No A comment for the table.

Example

SET odps.namespace.schema=true;

CREATE OBJECT TABLE ot_demo_day
WITH SERDEPROPERTIES (
  'odps.properties.rolearn'='acs:ram::xxxxxx:role/aliyunodpsdefaultrole'
)
LOCATION 'oss://oss-cn-hangzhou-internal.aliyuncs.com/odps-external-****/ottest/';

View Object Table properties

DESC <object_table_name>;

Example:

DESC ot_demo_day;

The output lists the table's owner, project, Schema, timestamps, and its native metadata columns:

+------------------------------------------------------------------------------------+
| Owner:                    ALIYUN$****@test.aliyunid.com                            |
| Project:                  test_objecttable                                         |
| Schema:                   default                                                  |
| TableComment:                                                                      |
+------------------------------------------------------------------------------------+
| CreateTime:               2024-09-02 20:01:56                                      |
| LastDDLTime:              2024-09-02 20:01:56                                      |
| LastModifiedTime:         2024-09-02 20:01:56                                      |
+------------------------------------------------------------------------------------+
| InternalTable: YES      | Size: 0                                                  |
+------------------------------------------------------------------------------------+
| Native Columns:                                                                    |
+------------------------------------------------------------------------------------+
| Field              | Type          | Label | Comment                               |
+------------------------------------------------------------------------------------+
| key                | varchar(2048) |       | The name of the object.               |
| size               | bigint        |       | The size of the object in bytes.      |
| type               | varchar(32)   |       | Object type: Normal, Multipart, Appendable, or Symlink. |
| last_modified      | timestamp     |       | The last modified time of the object. |
| storage_class      | varchar(32)   |       | The storage class of the object.      |
| etag               | varchar(64)   |       | The ETag of the object.               |
| restore_info       | varchar(256)  |       | Restoration status from cold storage. |
| owner_id           | bigint        |       | The ID of the bucket owner.           |
| owner_display_name | varchar(256)  |       | The display name of the bucket owner. |
+------------------------------------------------------------------------------------+

The native columns provide the following metadata:

Column Type Nullable Description
key VARCHAR(2048) — native length limit: 1,023 characters No Relative path of the object within the Object Table. See OSS object naming conventions.
size BIGINT No Object size in bytes.
type VARCHAR(32) No OSS object type: Normal, Multipart, Appendable, or Symlink.
last_modified TIMESTAMP_NTZ No Time the object was last modified in OSS.
storage_class VARCHAR(32) No OSS storage class. See Storage class.
etag VARCHAR(64) No Entity tag generated when the object was created. Identifies whether object content changed between updates, but is not a unique identifier.
restore_info VARCHAR(256) Yes Restoration status for archived objects. Populated only while a restore is in progress.
owner_id BIGINT Yes ID of the object owner.
owner_display_name VARCHAR(256) Yes Display name of the object owner.

View the DDL statement of an Object Table

SHOW CREATE TABLE <object_table_name>;

Example:

SHOW CREATE TABLE ot_demo_day;

Sample output:

CREATE OBJECT TABLE IF NOT EXISTS yunqi_object_****.`default`.ot_demo_day
WITH SERDEPROPERTIES (
  'serialization.format'='1',
  'odps.properties.rolearn'='acs:ram::139699392458****:role/aliyunodpsdefaultrole')
LOCATION
  'oss://oss-cn-hangzhou-internal.aliyuncs.com/odps-external-****/ottest/'
TBLPROPERTIES (
  'last_modified_time'='1731478307',
  'transient_lastDdlTime'='1731478307',
  'metadata.cache.mode'='manual',
  'metadata.staleness.seconds'='3600');

Refresh Object Table metadata

MaxCompute caches OSS object metadata and queries run against the cache. Refresh the cache before running queries to make sure the metadata reflects the current state of your OSS directory.

Choose a refresh mode

Select a refresh mode when you create the table or leave it as the default (manual).

Mode Trigger Use when Required parameter
manual (default) ALTER TABLE ... REFRESH METADATA Changes are infrequent or you need full control over timing
periodic Automatic, on a fixed interval Files change frequently and you want low-maintenance upkeep metadata.staleness.seconds
crontab Automatic, on a cron schedule You need refreshes at specific times of day or week metadata.crontab.expression

Manual refresh

Run the following command to trigger a full metadata refresh:

ALTER TABLE <objecttable_name> REFRESH METADATA;

Example:

ALTER TABLE ot_demo_day REFRESH METADATA;

Periodic refresh

Set metadata.cache.mode to periodic and specify metadata.staleness.seconds when creating the table. The scheduler triggers a full refresh at approximately the interval you specify.

SET odps.namespace.schema=true;
SET odps.sql.type.system.odps2 = true;

CREATE OBJECT TABLE ot_demo_day
WITH SERDEPROPERTIES (
  'odps.properties.rolearn'='acs:ram::xxxxxx:role/aliyunodpsdefaultrole'
)
LOCATION 'oss://oss-cn-hangzhou-internal.aliyuncs.com/odps-external-****/ottest/'
TBLPROPERTIES (
  'metadata.cache.mode' = 'periodic',
  'metadata.staleness.seconds' = '3600'
);

metadata.staleness.seconds accepts values between 1 and 604800 (1 second to 1 week). The value is not a strict guarantee — the scheduler uses it as a target.

Scheduled refresh

Set metadata.cache.mode to crontab and specify metadata.crontab.expression when creating the table. Use a standard cron expression to define the exact schedule.

SET odps.namespace.schema=true;
SET odps.sql.type.system.odps2 = true;

CREATE OBJECT TABLE ot_demo_day
WITH SERDEPROPERTIES (
  'odps.properties.rolearn'='acs:ram::xxxxxx:role/aliyunodpsdefaultrole'
)
LOCATION 'oss://oss-cn-region-internal.aliyuncs.com/odps-external-****/ottest/'
TBLPROPERTIES (
  'metadata.cache.mode' = 'crontab',
  'metadata.crontab.expression' = '0 0 14 * * ?'
);

The cron expression 0 0 14 * * ? runs a refresh at 2:00 PM every day: 0 (second), 0 (minute), 14 (hour), * (any day of month), * (any month), ? (day-of-week — mutually exclusive with day-of-month).

View refresh task history

Check the status of past refresh tasks with:

SHOW refresh task history FOR object TABLE <object_table_name>;

Example and output:

-- View refresh task history for ot_demo_day04.
SHOW refresh task history FOR object TABLE ot_demo_day04;

ID = 20260105*******f
+---------------------------------------------------------------------------------------------------+
| Project:                  test_project                                                            |
| Schema:                   default                                                                 |
| Task:                     ***                                                                     |
+---------------------------------------------------------------------------------------------------+
| History:                                                                                          |
+---------------------------------------------------------------------------------------------------+
| InstanceId                       | CreateTime             | EndTime                | Status       |
+---------------------------------------------------------------------------------------------------+
| 20260105******************ks     | 2026-01-05 14:12:00    | 2026-01-05 14:12:04    | Terminated   |
| 20260105******************y3     | 2026-01-05 14:10:00    | 2026-01-05 14:10:03    | Terminated   |
+---------------------------------------------------------------------------------------------------+

OK

The output contains one row per task: InstanceId, CreateTime, EndTime, and Status. If a task shows Failed, run wait <InstanceId>; and check the Logview output for details.

Query an Object Table

Run a standard SELECT statement to query object metadata:

SELECT * FROM <object_table_name>;

Example — preview the first five records:

SELECT * FROM ot_demo_day LIMIT 5;

You can apply any SQL operation to the metadata: aggregations, joins, window functions, ORDER BY, LIMIT, and condition pushdown filters.

Read object content

Querying an Object Table works on cached metadata only — no object content is read from OSS. To process the actual content of objects, use the built-in GET_DATA_FROM_OSS function inside your SQL queries.

GET_DATA_FROM_OSS

GET_DATA_FROM_OSS reads part or all of an object and returns it as a binary value. All examples below use the three-tier path project.schema.object_table to identify the Object Table.

BINARY GET_DATA_FROM_OSS (
  STRING <full_object_table_name>,
  STRING <key>
  [, BIGINT <offset>]
  [, BIGINT <length>]
  [, STRING <object_not_found_policy>]
)
Parameter Required Type Default Description
full_object_table_name Yes STRING Full three-tier path to the Object Table: project.schema.object_table. Used to generate an STS token for OSS access when RAM role (RoleARN) authentication is configured.
key Yes STRING The key value from the Object Table row — the relative path of the object to read.
offset No BIGINT 0 Byte offset to start reading from. Must be >= 0.
length No BIGINT -1 (no limit) Number of bytes to read.
object_not_found_policy No STRING OUTPUT_NULL What to do when a cached key no longer exists in OSS. OUTPUT_NULL: return NULL without an error. THROW_EXCEPTION: raise an error and stop the task. WARN_AND_NULL: return NULL and log a warning — if this happens for many objects, overall task performance may degrade.
GET_DATA_FROM_OSS returns binary data. Wrap it in STRING() to get a string result.

Examples

Cast the result to STRING:

SELECT STRING(
  GET_DATA_FROM_OSS('<project_name>.default.ot_demo_day', key, 0, -1, 'OUTPUT_NULL')
)
FROM ot_demo_day;

Equivalent calls — all of the following read the full content of each object with default settings (offset=0, length=-1, policy=OUTPUT_NULL):

-- Full explicit form
SELECT GET_DATA_FROM_OSS('<project_name>.default.ot_demo_day', key, 0, -1, 'OUTPUT_NULL') FROM ot_demo_day;

-- Equivalent shorthand forms
SELECT GET_DATA_FROM_OSS('<project_name>.default.ot_demo_day', key) FROM ot_demo_day;

SELECT GET_DATA_FROM_OSS('<project_name>.default.ot_demo_day', key, 0) FROM ot_demo_day;

SELECT GET_DATA_FROM_OSS('<project_name>.default.ot_demo_day', key, 0, -1) FROM ot_demo_day;

SELECT GET_DATA_FROM_OSS('<project_name>.default.ot_demo_day', key, 'OUTPUT_NULL') FROM ot_demo_day;

SELECT GET_DATA_FROM_OSS('<project_name>.default.ot_demo_day', key, 0, 'OUTPUT_NULL') FROM ot_demo_day;

Performance considerations

When a SQL query downloads OSS object content via GET_DATA_FROM_OSS, the default MaxCompute chunking strategy (based on row count and record bytes) is not optimal. Chunks are sized by metadata volume, not actual object size — a single large file can end up in the same chunk as hundreds of small files, causing severe data skew and long-tail problems.

Example of the problem:

Object Size
a0000.jpg – a1023.jpg (1,024 files) 10 MB each
b.avi 10 GB

With two workers and row-based splitting, the engine might create split1 (a0000.jpga0511.jpg, 5 GB download) and split2 (a0512.jpga1023.jpg + b.avi, 15 GB download). The 3x imbalance causes a long tail.

Object Table chunking by size: MaxCompute automatically chunks Object Tables by actual object size instead. With a 10 GB split unit, split1 gets all 1,024 JPEG files (10 GB total) and split2 gets b.avi (10 GB) — both workers do equal work.

The default split unit is 1 GB. Adjust it to match your workload:

-- Split by GB (default: 1 GB)
SET odps.sql.object.table.split.unit.gb = 1;
SELECT GET_DATA_FROM_OSS('project.default.ot_demo_day', key) FROM ot_demo_day WHERE ...;

-- Split by MB (higher priority than GB)
SET odps.sql.object.table.split.unit.mb = 512;
SELECT GET_DATA_FROM_OSS('project.default.ot_demo_day', key) FROM ot_demo_day WHERE ...;

-- Split by KB (highest priority)
SET odps.sql.object.table.split.unit.kb = 512;
SELECT GET_DATA_FROM_OSS('project.default.ot_demo_day', key) FROM ot_demo_day WHERE ...;

Priority order: KB > MB > GB.

To disable size-based chunking entirely, set:

SET odps.sql.object.table.split.by.object.size.enabled = false;
Disabling size-based chunking causes the query to run as two separate jobs. The first job performs pre-processing and is visible in Logview.

More operations

Clear Object Table metadata

TRUNCATE TABLE removes all cached metadata but keeps the table definition:

TRUNCATE TABLE <object_table_name>;

Example:

TRUNCATE TABLE ot_demo_day;

Rename an Object Table

ALTER TABLE <object_table_name> RENAME TO <new_object_table_name>;

Example:

ALTER TABLE ot_demo_day RENAME TO new_ot_demo_day;

Delete an Object Table

Deleting an Object Table removes the cached metadata and stops storage charges. The actual OSS data is not affected. Recreate the table at any time to resume using the data.

DROP TABLE [IF EXISTS] <object_table_name>;

Example:

DROP TABLE IF EXISTS ot_demo_day;

FAQ

Connection refused error when refreshing metadata

You see an error like:

ODPS-0010000:System internal error -
ActionHandler job failed with failinfo storage service worker error occured:
common/io/oss/oss_file_system_cppsdk.cpp(919):
OSSRequestException: Status: -50, RequestId: ,
ErrorCode: ClientError:-50, Message: E_HTTP_ERROR_CONN_REFUSED

The location parameter was set to a public OSS endpoint. Object Tables require an internal network endpoint. Update the location to use the internal endpoint — see the location parameter description in Create an Object Table.

Periodic refresh does not trigger on schedule

Check that the location parameter uses an OSS internal network domain name, not a public endpoint. See the location parameter description in Create an Object Table.

Periodic refresh task fails with an internal error

You see:

FAILED: ODPS-0010000:System internal error - ActionHandler job failed with failinfo
storage service worker error occured: common/io/oss/oss_file_system_cppsdk.cpp(877):
OSSRequestException: Status: -50, RequestId: , ErrorCode: ClientError:-50, Message:

This is a platform-side error. Submit a ticket to contact MaxCompute technical support.