An Object Table maps an OSS directory to a queryable metadata table inside MaxCompute. It lets the SQL engine read, filter, and process unstructured data — images, videos, PDFs, log files — at the same scale as structured data, without moving any files out of OSS.
With an Object Table, you can:
-
Query file metadata with SQL — list all objects in an OSS directory and filter by name, size, type, or last-modified time using standard SELECT statements.
-
Read object content at scale — use the built-in
GET_DATA_FROM_OSSfunction inside SQL queries to stream object content into distributed compute workers for large-scale processing. -
Keep metadata fresh automatically — cache OSS metadata inside MaxCompute and configure manual, periodic, or scheduled refreshes to stay in sync with changes in OSS.
-
Process unstructured data with custom logic — upload custom images to build UDFs (user-defined functions) that run alongside the SQL engine and write results to internal or foreign tables. Object Tables will also support writing unstructured results back to OSS in a future release.
-
Use the Maxframe engine — Object Tables support the Maxframe engine for the Python ecosystem.
Creating an Object Table does not copy or modify your OSS data. MaxCompute caches only the object metadata. The actual files remain in OSS.
Prerequisites
Before you begin, make sure you have:
-
A MaxCompute project with the Schema feature enabled. See Enable the Schema feature.
-
The MaxCompute 2.0 data type system enabled in your project.
-
Completed the one-click authorization for
AliyunODPSDefaultRole, which grants MaxCompute access to OSS via a Security Token Service (STS) token. One-click authorization is only available when the ProjectOwner of the MaxCompute project is the same Alibaba Cloud account that owns the OSS bucket.
Limitations
-
Object Tables do not support partitions.
-
Both manual and periodic refreshes are full refreshes.
Billing
Metadata storage: MaxCompute charges storage fees for the metadata cached in an Object Table. See Storage fees. OSS charges separately for the actual data stored and accessed in buckets. See OSS storage fees.
Refresh tasks: The inputsize for each file scanned during a refresh is based on its metadata size, not the actual file size. Refresh costs scale with the number of files, not their size. See SQL billing for foreign tables.
Compute: Running queries against an Object Table generates compute fees.
-
Pay-as-you-go: Metadata queries are billed as internal table queries. See Standard SQL billing. Queries that read OSS object content via
GET_DATA_FROM_OSSare billed as foreign table queries. See SQL billing for foreign tables. -
Subscription: Subscription computing resources are consumed. See Compute fees (Subscription).
Create an Object Table
Syntax
CREATE OBJECT TABLE [IF NOT EXISTS] <objecttable_name>
WITH SERDEPROPERTIES ('<key>' = '<value>')
LOCATION '<location>'
[TBLPROPERTIES ('<key>' = '<value>')]
[COMMENT '<comment>'];
You do not need to define columns. MaxCompute provides the metadata columns automatically.
Parameters
| Parameter | Required | Description |
|---|---|---|
objecttable_name |
Yes | The table name. |
SERDEPROPERTIES |
Yes | Specifies the RAM role for authentication. The key is odps.properties.rolearn and the value is the role ARN in the format acs:ram::<uid>:role/aliyunodpsdefaultrole. If omitted, AliyunODPSDefaultRole under the current Alibaba Cloud account is used. To get the account UID, see View RAM user information. For STS mode authorization details, see Grant permissions to a common RAM role in STS mode for OSS. |
location |
Yes | The OSS path to map. Format: oss://<oss_endpoint>/<bucket_name>/<oss_directory_name>/. The Object Table indexes all objects in that directory. Use an internal network endpoint, not a public one — using a public endpoint causes connection errors. To get the internal endpoint: log on to the OSS console, on the Buckets page click the target Bucket Name to open the Objects page, then in the Port section of the Overview page obtain the Endpoint for Access from ECS over the Classic Network (internal network). |
metadata.cache.mode |
No | Refresh trigger mode. See Choose a refresh mode. Default: manual. |
metadata.staleness.seconds |
No | Refresh interval in seconds for periodic mode. Range: 1–604800 (1 second to 1 week). Not a strict guarantee; the scheduler uses this as a target. Required when metadata.cache.mode is periodic. |
metadata.crontab.expression |
No | Cron expression for crontab mode. For example, 0 0 14 * * ? runs a refresh daily at 2:00 PM. Required when metadata.cache.mode is crontab. |
comment |
No | A comment for the table. |
Example
SET odps.namespace.schema=true;
CREATE OBJECT TABLE ot_demo_day
WITH SERDEPROPERTIES (
'odps.properties.rolearn'='acs:ram::xxxxxx:role/aliyunodpsdefaultrole'
)
LOCATION 'oss://oss-cn-hangzhou-internal.aliyuncs.com/odps-external-****/ottest/';
View Object Table properties
DESC <object_table_name>;
Example:
DESC ot_demo_day;
The output lists the table's owner, project, Schema, timestamps, and its native metadata columns:
+------------------------------------------------------------------------------------+
| Owner: ALIYUN$****@test.aliyunid.com |
| Project: test_objecttable |
| Schema: default |
| TableComment: |
+------------------------------------------------------------------------------------+
| CreateTime: 2024-09-02 20:01:56 |
| LastDDLTime: 2024-09-02 20:01:56 |
| LastModifiedTime: 2024-09-02 20:01:56 |
+------------------------------------------------------------------------------------+
| InternalTable: YES | Size: 0 |
+------------------------------------------------------------------------------------+
| Native Columns: |
+------------------------------------------------------------------------------------+
| Field | Type | Label | Comment |
+------------------------------------------------------------------------------------+
| key | varchar(2048) | | The name of the object. |
| size | bigint | | The size of the object in bytes. |
| type | varchar(32) | | Object type: Normal, Multipart, Appendable, or Symlink. |
| last_modified | timestamp | | The last modified time of the object. |
| storage_class | varchar(32) | | The storage class of the object. |
| etag | varchar(64) | | The ETag of the object. |
| restore_info | varchar(256) | | Restoration status from cold storage. |
| owner_id | bigint | | The ID of the bucket owner. |
| owner_display_name | varchar(256) | | The display name of the bucket owner. |
+------------------------------------------------------------------------------------+
The native columns provide the following metadata:
| Column | Type | Nullable | Description |
|---|---|---|---|
key |
VARCHAR(2048) — native length limit: 1,023 characters | No | Relative path of the object within the Object Table. See OSS object naming conventions. |
size |
BIGINT | No | Object size in bytes. |
type |
VARCHAR(32) | No | OSS object type: Normal, Multipart, Appendable, or Symlink. |
last_modified |
TIMESTAMP_NTZ | No | Time the object was last modified in OSS. |
storage_class |
VARCHAR(32) | No | OSS storage class. See Storage class. |
etag |
VARCHAR(64) | No | Entity tag generated when the object was created. Identifies whether object content changed between updates, but is not a unique identifier. |
restore_info |
VARCHAR(256) | Yes | Restoration status for archived objects. Populated only while a restore is in progress. |
owner_id |
BIGINT | Yes | ID of the object owner. |
owner_display_name |
VARCHAR(256) | Yes | Display name of the object owner. |
View the DDL statement of an Object Table
SHOW CREATE TABLE <object_table_name>;
Example:
SHOW CREATE TABLE ot_demo_day;
Sample output:
CREATE OBJECT TABLE IF NOT EXISTS yunqi_object_****.`default`.ot_demo_day
WITH SERDEPROPERTIES (
'serialization.format'='1',
'odps.properties.rolearn'='acs:ram::139699392458****:role/aliyunodpsdefaultrole')
LOCATION
'oss://oss-cn-hangzhou-internal.aliyuncs.com/odps-external-****/ottest/'
TBLPROPERTIES (
'last_modified_time'='1731478307',
'transient_lastDdlTime'='1731478307',
'metadata.cache.mode'='manual',
'metadata.staleness.seconds'='3600');
Refresh Object Table metadata
MaxCompute caches OSS object metadata and queries run against the cache. Refresh the cache before running queries to make sure the metadata reflects the current state of your OSS directory.
Choose a refresh mode
Select a refresh mode when you create the table or leave it as the default (manual).
| Mode | Trigger | Use when | Required parameter |
|---|---|---|---|
manual (default) |
ALTER TABLE ... REFRESH METADATA |
Changes are infrequent or you need full control over timing | — |
periodic |
Automatic, on a fixed interval | Files change frequently and you want low-maintenance upkeep | metadata.staleness.seconds |
crontab |
Automatic, on a cron schedule | You need refreshes at specific times of day or week | metadata.crontab.expression |
Manual refresh
Run the following command to trigger a full metadata refresh:
ALTER TABLE <objecttable_name> REFRESH METADATA;
Example:
ALTER TABLE ot_demo_day REFRESH METADATA;
Periodic refresh
Set metadata.cache.mode to periodic and specify metadata.staleness.seconds when creating the table. The scheduler triggers a full refresh at approximately the interval you specify.
SET odps.namespace.schema=true;
SET odps.sql.type.system.odps2 = true;
CREATE OBJECT TABLE ot_demo_day
WITH SERDEPROPERTIES (
'odps.properties.rolearn'='acs:ram::xxxxxx:role/aliyunodpsdefaultrole'
)
LOCATION 'oss://oss-cn-hangzhou-internal.aliyuncs.com/odps-external-****/ottest/'
TBLPROPERTIES (
'metadata.cache.mode' = 'periodic',
'metadata.staleness.seconds' = '3600'
);
metadata.staleness.seconds accepts values between 1 and 604800 (1 second to 1 week). The value is not a strict guarantee — the scheduler uses it as a target.
Scheduled refresh
Set metadata.cache.mode to crontab and specify metadata.crontab.expression when creating the table. Use a standard cron expression to define the exact schedule.
SET odps.namespace.schema=true;
SET odps.sql.type.system.odps2 = true;
CREATE OBJECT TABLE ot_demo_day
WITH SERDEPROPERTIES (
'odps.properties.rolearn'='acs:ram::xxxxxx:role/aliyunodpsdefaultrole'
)
LOCATION 'oss://oss-cn-region-internal.aliyuncs.com/odps-external-****/ottest/'
TBLPROPERTIES (
'metadata.cache.mode' = 'crontab',
'metadata.crontab.expression' = '0 0 14 * * ?'
);
The cron expression 0 0 14 * * ? runs a refresh at 2:00 PM every day: 0 (second), 0 (minute), 14 (hour), * (any day of month), * (any month), ? (day-of-week — mutually exclusive with day-of-month).
View refresh task history
Check the status of past refresh tasks with:
SHOW refresh task history FOR object TABLE <object_table_name>;
Example and output:
-- View refresh task history for ot_demo_day04.
SHOW refresh task history FOR object TABLE ot_demo_day04;
ID = 20260105*******f
+---------------------------------------------------------------------------------------------------+
| Project: test_project |
| Schema: default |
| Task: *** |
+---------------------------------------------------------------------------------------------------+
| History: |
+---------------------------------------------------------------------------------------------------+
| InstanceId | CreateTime | EndTime | Status |
+---------------------------------------------------------------------------------------------------+
| 20260105******************ks | 2026-01-05 14:12:00 | 2026-01-05 14:12:04 | Terminated |
| 20260105******************y3 | 2026-01-05 14:10:00 | 2026-01-05 14:10:03 | Terminated |
+---------------------------------------------------------------------------------------------------+
OK
The output contains one row per task: InstanceId, CreateTime, EndTime, and Status. If a task shows Failed, run wait <InstanceId>; and check the Logview output for details.
Query an Object Table
Run a standard SELECT statement to query object metadata:
SELECT * FROM <object_table_name>;
Example — preview the first five records:
SELECT * FROM ot_demo_day LIMIT 5;
You can apply any SQL operation to the metadata: aggregations, joins, window functions, ORDER BY, LIMIT, and condition pushdown filters.
Read object content
Querying an Object Table works on cached metadata only — no object content is read from OSS. To process the actual content of objects, use the built-in GET_DATA_FROM_OSS function inside your SQL queries.
GET_DATA_FROM_OSS
GET_DATA_FROM_OSS reads part or all of an object and returns it as a binary value. All examples below use the three-tier path project.schema.object_table to identify the Object Table.
BINARY GET_DATA_FROM_OSS (
STRING <full_object_table_name>,
STRING <key>
[, BIGINT <offset>]
[, BIGINT <length>]
[, STRING <object_not_found_policy>]
)
| Parameter | Required | Type | Default | Description |
|---|---|---|---|---|
full_object_table_name |
Yes | STRING | — | Full three-tier path to the Object Table: project.schema.object_table. Used to generate an STS token for OSS access when RAM role (RoleARN) authentication is configured. |
key |
Yes | STRING | — | The key value from the Object Table row — the relative path of the object to read. |
offset |
No | BIGINT | 0 |
Byte offset to start reading from. Must be >= 0. |
length |
No | BIGINT | -1 (no limit) |
Number of bytes to read. |
object_not_found_policy |
No | STRING | OUTPUT_NULL |
What to do when a cached key no longer exists in OSS. OUTPUT_NULL: return NULL without an error. THROW_EXCEPTION: raise an error and stop the task. WARN_AND_NULL: return NULL and log a warning — if this happens for many objects, overall task performance may degrade. |
GET_DATA_FROM_OSSreturns binary data. Wrap it inSTRING()to get a string result.
Examples
Cast the result to STRING:
SELECT STRING(
GET_DATA_FROM_OSS('<project_name>.default.ot_demo_day', key, 0, -1, 'OUTPUT_NULL')
)
FROM ot_demo_day;
Equivalent calls — all of the following read the full content of each object with default settings (offset=0, length=-1, policy=OUTPUT_NULL):
-- Full explicit form
SELECT GET_DATA_FROM_OSS('<project_name>.default.ot_demo_day', key, 0, -1, 'OUTPUT_NULL') FROM ot_demo_day;
-- Equivalent shorthand forms
SELECT GET_DATA_FROM_OSS('<project_name>.default.ot_demo_day', key) FROM ot_demo_day;
SELECT GET_DATA_FROM_OSS('<project_name>.default.ot_demo_day', key, 0) FROM ot_demo_day;
SELECT GET_DATA_FROM_OSS('<project_name>.default.ot_demo_day', key, 0, -1) FROM ot_demo_day;
SELECT GET_DATA_FROM_OSS('<project_name>.default.ot_demo_day', key, 'OUTPUT_NULL') FROM ot_demo_day;
SELECT GET_DATA_FROM_OSS('<project_name>.default.ot_demo_day', key, 0, 'OUTPUT_NULL') FROM ot_demo_day;
Performance considerations
When a SQL query downloads OSS object content via GET_DATA_FROM_OSS, the default MaxCompute chunking strategy (based on row count and record bytes) is not optimal. Chunks are sized by metadata volume, not actual object size — a single large file can end up in the same chunk as hundreds of small files, causing severe data skew and long-tail problems.
Example of the problem:
| Object | Size |
|---|---|
| a0000.jpg – a1023.jpg (1,024 files) | 10 MB each |
| b.avi | 10 GB |
With two workers and row-based splitting, the engine might create split1 (a0000.jpg–a0511.jpg, 5 GB download) and split2 (a0512.jpg–a1023.jpg + b.avi, 15 GB download). The 3x imbalance causes a long tail.
Object Table chunking by size: MaxCompute automatically chunks Object Tables by actual object size instead. With a 10 GB split unit, split1 gets all 1,024 JPEG files (10 GB total) and split2 gets b.avi (10 GB) — both workers do equal work.
The default split unit is 1 GB. Adjust it to match your workload:
-- Split by GB (default: 1 GB)
SET odps.sql.object.table.split.unit.gb = 1;
SELECT GET_DATA_FROM_OSS('project.default.ot_demo_day', key) FROM ot_demo_day WHERE ...;
-- Split by MB (higher priority than GB)
SET odps.sql.object.table.split.unit.mb = 512;
SELECT GET_DATA_FROM_OSS('project.default.ot_demo_day', key) FROM ot_demo_day WHERE ...;
-- Split by KB (highest priority)
SET odps.sql.object.table.split.unit.kb = 512;
SELECT GET_DATA_FROM_OSS('project.default.ot_demo_day', key) FROM ot_demo_day WHERE ...;
Priority order: KB > MB > GB.
To disable size-based chunking entirely, set:
SET odps.sql.object.table.split.by.object.size.enabled = false;
Disabling size-based chunking causes the query to run as two separate jobs. The first job performs pre-processing and is visible in Logview.
More operations
Clear Object Table metadata
TRUNCATE TABLE removes all cached metadata but keeps the table definition:
TRUNCATE TABLE <object_table_name>;
Example:
TRUNCATE TABLE ot_demo_day;
Rename an Object Table
ALTER TABLE <object_table_name> RENAME TO <new_object_table_name>;
Example:
ALTER TABLE ot_demo_day RENAME TO new_ot_demo_day;
Delete an Object Table
Deleting an Object Table removes the cached metadata and stops storage charges. The actual OSS data is not affected. Recreate the table at any time to resume using the data.
DROP TABLE [IF EXISTS] <object_table_name>;
Example:
DROP TABLE IF EXISTS ot_demo_day;
FAQ
Connection refused error when refreshing metadata
You see an error like:
ODPS-0010000:System internal error -
ActionHandler job failed with failinfo storage service worker error occured:
common/io/oss/oss_file_system_cppsdk.cpp(919):
OSSRequestException: Status: -50, RequestId: ,
ErrorCode: ClientError:-50, Message: E_HTTP_ERROR_CONN_REFUSED
The location parameter was set to a public OSS endpoint. Object Tables require an internal network endpoint. Update the location to use the internal endpoint — see the location parameter description in Create an Object Table.
Periodic refresh does not trigger on schedule
Check that the location parameter uses an OSS internal network domain name, not a public endpoint. See the location parameter description in Create an Object Table.
Periodic refresh task fails with an internal error
You see:
FAILED: ODPS-0010000:System internal error - ActionHandler job failed with failinfo
storage service worker error occured: common/io/oss/oss_file_system_cppsdk.cpp(877):
OSSRequestException: Status: -50, RequestId: , ErrorCode: ClientError:-50, Message:
This is a platform-side error. Submit a ticket to contact MaxCompute technical support.