All Products
Search
Document Center

MaxCompute:OBJECT TABLE definition

Last Updated:Jan 05, 2026

MaxCompute introduced the Object Table feature. This feature allows the data warehouse compute engine to access unstructured data and its metadata in data lake storage. This topic describes the command syntax for Object Tables and provides examples.

Background information

Many AI processes require data warehouse developers who are familiar with data and business operations. These developers use the low-cost, large-scale computing power of big data platforms to pre-process data for large models or handle unstructured data. The processes and their results interact with data in the data warehouse or data lake.

Using SQL to process unstructured data presents the following challenges.

  • Big data SQL engines are unaware of object sizes when reading files from Object Storage Service (OSS). This makes it difficult to optimize execution plans, control concurrency, or automatically start the correct number of concurrent tasks. Additionally, filter conditions cannot be effectively pushed down. This prevents the full use of computing power when data skew occurs.

  • Reading metadata from Object Storage Service is inefficient. Each query requires remote access to the storage service, which causes high latency.

  • File lists from Object Storage Service (OSS) can only be retrieved serially in a single process within a user-defined table-valued function (UDTF). This results in poor data read performance.

  • The logic for handling permissions and network connections to storage services must be implemented within the user-defined function (UDF).

  • UDFs have limited capabilities for processing unstructured data. Traditional data warehouses lack a flexible and secure way to upload custom images. They also lack a secure runtime environment for UDFs. Additionally, remote invocations require concurrent integration with distributed computing services.

Features

MaxCompute introduced the Object Table feature. This feature allows the data warehouse compute engine to access unstructured data and its metadata in data lake storage. It provides the following capabilities:

  • Allows the engine to read OSS object metadata as a table.

  • Caches various types of OSS object metadata in a versioned manner based on the Metadata Table capability. The SQL engine can then use the Metadata Table to perform effective query optimizations, such as data filtering and condition pushdown.

  • Allows you to read the content of unstructured data files in multiple ways using built-in document functions.

  • The MaxCompute SQL engine uses Object Table metadata to perform concurrent chunking. This enables large-scale distributed computing to improve the efficiency of data reading and processing.

  • Supports uploading custom images to build UDFs and process unstructured data read by the engine.

  • Allows the engine to process unstructured data, generate structured data results, and write them to internal and foreign tables in the data warehouse. It will also support generating unstructured data results and writing them back to Object Storage Service through Object Tables.

  • Supports the Maxframe engine for the Python ecosystem.

Limits

  • The MaxCompute project must support the Schema feature. For more information, see Enable the Schema feature.

  • MaxCompute must support the 2.0 data type system.

  • Object Tables do not support partitions.

Billing

  • An Object Table is a collection of object metadata from OSS. Storage fees are charged for the metadata that is refreshed and stored in the Object Table. For more information, see Storage fees. Because the files on OSS are not stored within MaxCompute, MaxCompute does not charge storage fees. OSS charges for data storage and access. For more information, see OSS storage fees.

  • In a task that extracts and refreshes OSS metadata, the inputsize for each scanned file is a value related to the metadata, not the actual file size. Therefore, the total cost of a refresh task is related to the number of files, not their actual size on OSS. For more information, see SQL billing for foreign tables.

  • Compute fees are generated when you use Object Tables and their metadata to analyze, extract, and process unstructured data from OSS.

    • In pay-as-you-go mode, metadata analysis on an Object Table is billed as an internal table. For more information, see Standard SQL billing. Processing the content of unstructured data from OSS is billed as a foreign table. For more information, see SQL billing for foreign tables.

    • In subscription mode, subscription computing resources are used. For more information, see Compute fees (Subscription).

Create an OBJECT TABLE

Syntax

CREATE OBJECT TABLE [IF NOT EXISTS] <objecttable_name> 
WITH SERDEPROPERTIES ('<key>' = '<value>') 
LOCATION '<location>' 
[TBLPROPERTIES ('<key>' = '<value>')] 
[COMMENT '<comment>'] 
;
Note
  • Object Tables must be used in a project that supports Schema mode, and the Schema syntax switch must be enabled.

  • You do not need to define columns for an Object Table. The system provides the metadata columns.

Parameters

Parameter

Required

Description

objecttable_name

Required.

The table name.

SERDEPROPERTIES ('<key>'='<value>')

Required.

Specifies the RAM role for authentication. If you do not specify this parameter, the RAM role named AliyunODPSDefaultRole under the current Alibaba Cloud account is used to access OSS by default.

Example: 'odps.properties.rolearn'='acs:ram::uid:role/aliyunodpsdefaultrole'.

Before you use the role, make sure that you have completed the one-click authorization for AliyunODPSDefaultRole to grant the MaxCompute project direct access to the OSS resources of the current Alibaba Cloud account using a Security Token Service (STS) token.

Note

One-click authorization can be performed only when the ProjectOwner of the MaxCompute project is the OSS Alibaba Cloud account.

location

Required.

  • The path of the OSS objects that the Object Table maps to. The format is oss://<oss_endpoint>/<bucket_name>/<oss_directory_name>/. For example: oss://oss-cn-hangzhou-internal.aliyuncs.com/odps-external-****/ottest/.

  • The Object Table extracts the metadata of the files in the directory.

  • To obtain the oss_endpoint:

    • Log on to the Object Storage Service (OSS) console.

    • On the Buckets page, click the target Bucket Name to open the Objects page.

    • In the Port section of the Overview page, obtain the Endpoint for Access from ECS over the Classic Network (internal network).

TBLPROPERTIES ('<key>'='<value>')

Optional.

  • metadata.cache.mode: The cache refresh mode.

    • manual: manual trigger (default)

    • periodic: periodic trigger. This mode must be used with the metadata.staleness.seconds refresh period parameter.

  • metadata.staleness.seconds: The refresh period. The value ranges from [1, 604800], which is 1 second to 1 week. This parameter is not a strict guarantee. The scheduler tries to execute the refresh based on this parameter.

comment

Optional.

The table comment.

Example

SET odps.namespace.schema=true; 
CREATE OBJECT TABLE ot_demo_day 
WITH serdeproperties ( 
 'odps.properties.rolearn'='acs:ram::xxxxxx:role/aliyunodpsdefaultrole') 
LOCATION 'oss://oss-cn-hangzhou-internal.aliyuncs.com/odps-external-****/ottest/';

View OBJECT TABLE properties

Syntax

DESC <object_table_name>

Parameters

object_table_name: Required. The table name.

Example

DESC ot_demo_day; 

The following result is returned.

+------------------------------------------------------------------------------------+
| Owner:                    ALIYUN$****@test.aliyunid.com                        |
| Project:                  test_objecttable                                         |
| Schema:                   default                                                  |
| TableComment:                                                                      |
+------------------------------------------------------------------------------------+
| CreateTime:               2024-09-02 20:01:56                                      |
| LastDDLTime:              2024-09-02 20:01:56                                      |
| LastModifiedTime:         2024-09-02 20:01:56                                      |
+------------------------------------------------------------------------------------+
| InternalTable: YES      | Size: 0                                                  |
+------------------------------------------------------------------------------------+
| Native Columns:                                                                    |
+------------------------------------------------------------------------------------+
| Field           | Type       | Label | Comment                                     |
+------------------------------------------------------------------------------------+
| key             | varchar(2048) |       | The name of the object.                     |
| size            | bigint     |       | The size of the returned object in bytes.   |
| type            | varchar(32) |       | The type of the object and valid values: Normal, Multipart, Appendable, and Symlink. |
| last_modified   | timestamp  |       | The last modified time of the object.       |
| storage_class   | varchar(32) |       | The storage class of the object.            |
| etag            | varchar(64) |       | The entity tag (ETag). When an object is created, an ETag is created to identify the content of the object. |
| restore_info    | varchar(256) |       | The restoration status of the object.       |
| owner_id        | bigint     |       | The ID of the bucket owner.                 |
| owner_display_name | varchar(256) |       | The display name of the bucket owner.       |
+------------------------------------------------------------------------------------+

The following table describes some of the columns in the returned result.

Column Name

Type Description

NULL Allowed

Description

key

VARCHAR(2048)

The native length constraint is 1023.

For more information, see OSS object naming conventions and examples.

False

The relative path name of the object in the Object Table.

size

BIGINT

False

The size of the object in bytes.

type

VARCHAR(32)

False

The file type of the object in OSS: Normal, Multipart, Appendable, and Symlink.

last_modified

TIMESTAMP_NTZ

False

The time when the object data was last modified on OSS.

storage_class

VARCHAR(32)

False

The storage class of the object on OSS. For more information about specific types, see Storage classes.

etag

VARCHAR(64)

False

The ETag is an entity tag created when each object is generated. It is a signature used to identify whether the content of an object has changed between two updates, but it is not a unique identifier.

restore_info

VARCHAR(256)

True

Describes whether an object has been restored from cold storage. If an object is being restored, this field contains information about the process.

owner_id

BIGINT

True

The ID of the object owner.

owner_display_name

VARCHAR(256)

True

The display name of the object owner.

View the DDL statement of an OBJECT TABLE

Syntax

SHOW CREATE TABLE <object_table_name>;

Parameters

object_table_name: Required. The table name.

Example

SHOW CREATE TABLE ot_demo_day; 

The following result is returned.

CREATE OBJECT TABLE IF NOT EXISTS yunqi_object_****.`default`.ot_demo_day 
WITH SERDEPROPERTIES ( 
  'serialization.format'='1', 
  'odps.properties.rolearn'='acs:ram::139699392458****:role/aliyunodpsdefaultrole') 
LOCATION 
  'oss://oss-cn-hangzhou-internal.aliyuncs.com/odps-external-****/ottest/' 
TBLPROPERTIES ( 
  'last_modified_time'='1731478307', 
  'transient_lastDdlTime'='1731478307', 
  'metadata.cache.mode'='manual', 
  'metadata.staleness.seconds'='3600');

Refresh OBJECT TABLE metadata

The actual object data for an Object Table is stored in OSS. MaxCompute caches the metadata of these objects and performs queries and computations based on the cached metadata. Therefore, before you use an Object Table, you must refresh its cache. You can refresh the cache manually or configure periodic refreshes when you create the table.

Note

Both manual and periodic refreshes are full refreshes.

Manual refresh

Each refresh is a full metadata synchronization. You can control the timing and frequency of the refreshes.

  • Syntax

    ALTER TABLE <objecttable_name> REFRESH METADATA;
  • Parameters

    objecttable_name: Required. The table name.

  • Example

    ALTER TABLE ot_demo_day REFRESH METADATA;

Periodic refresh

If files in the OSS directory that is mapped to the Object Table change frequently, you can refresh the metadata periodically. To do this, specify the relevant parameters when you create the table. This reduces maintenance costs.

  • Syntax

    SET odps.namespace.schema=true;
    SET odps.sql.type.system.odps2 = true;
    
    CREATE OBJECT TABLE ot_demo_day 
    WITH serdeproperties (
     'odps.properties.rolearn'='acs:ram::xxxxxx:role/aliyunodpsdefaultrole'
    )
    location 'oss://oss-cn-hangzhou-internal.aliyuncs.com/odps-external-****/ottest/'
    tblproperties (
      'metadata.cache.mode' = 'periodic',
      'metadata.staleness.seconds' = '3600'
    );
    
  • Parameters

    • metadata.staleness.seconds: The refresh period. For periodic mode, you must specify this parameter. The value must be within the range of [1, 604800], which is 1 second to 1 week. This parameter is not a strict guarantee. The scheduler attempts to execute the refresh based on this parameter.

    • metadata.cache.mode: The refresh trigger method. The following options are available.

      • periodic: The refresh is triggered periodically.

      • manual: The refresh is triggered manually (default). You can control the trigger timing.

View refresh tasks

You can use the following command to view the status of historical refresh tasks.

SHOW refresh task history FOR object TABLE <object_table_name>;
  • Parameters

    • <object_table_name> must be an Object Table.

    • Return values: The command returns the instance ID (InstanceId), start time (CreateTime), end time (EndTime), and status (Status) of each refresh task.

    • If the status is Failed, you can run wait InstanceId; and then print the log view to see the error details.

  • Example

    -- View the historical refresh tasks of the Object Table.
    SHOW refresh task history for object table ot_demo_day04;
    
    -- The following result is returned.
    ID = 20260105*******f
    +---------------------------------------------------------------------------------------------------+
    | Project:                  test_project                                                                              |
    | Schema:                   default                                                                                           |
    | Task:                     ***                        |
    +---------------------------------------------------------------------------------------------------+
    | History:                                                                                          |
    +---------------------------------------------------------------------------------------------------+
    | InstanceId                       | CreateTime             | EndTime                | Status       |
    +---------------------------------------------------------------------------------------------------+
    | 20260105******************ks     | 2026-01-05 14:12:00    | 2026-01-05 14:12:04    | Terminated   |
    | 20260105******************y3     | 2026-01-05 14:10:00    | 2026-01-05 14:10:03    | Terminated   |
    +---------------------------------------------------------------------------------------------------+
    
    OK

Query an OBJECT TABLE

An Object Table retrieves the metadata of files in an OSS directory. You can query the Object Table to view this metadata. You can also use SQL statements to perform computations on the metadata, such as filtering and matching. These computations include but are not limited to aggregations, joins, window functions, and `ORDER BY` and `LIMIT` clauses.

Syntax

SELECT * FROM <object_table_name>;

Parameters

object_table_name: Required. The table name.

Example

-- You can query the data uploaded to the specified OSS directory. If the data volume is large, you can view five records.
SELECT * FROM  ot_demo_day [limit 5];

Query object content for business computation

The query process performs computations only on the object metadata. It does not involve the actual content of the objects. However, the computation process needs to read the actual content of the objects. MaxCompute has a built-in download function that you can integrate into your computation process.

GET_DATA_FROM_OSS function syntax

The GET_DATA_FROM_OSS function reads part or all of an object's content and returns it in binary format.

BINARY GET_DATA_FROM_OSS (
  STRING <full_object_table_name>,
  STRING <key> 
  [, BIGINT <offset>]              
  [, BIGINT <length>] 
  [, STRING <object_not_found_policy>] 
)

Parameters

Parameter

Required

Data type

Description

Default value

full_object_table_name

Yes

STRING

The full path to the OBJECT TABLE in the three-tier model, including the Project and Schema names, such as project.schema.object_table.

If you use RoleARN for authentication when you create the table, this parameter helps automatically generate a Security Token Service (STS) token to access OSS.

None

key

Yes

STRING

The name of the accessed object in the Object Table. For more information, see the description of the `key` parameter in the response described in View OBJECT TABLE properties.

None

offset

No

BIGINT

The starting position from which to read the object content. The value must be greater than or equal to 0.

0. This means the read operation starts from the beginning of the object.

length

No

BIGINT

The number of bytes to read.

-1. This means the length is not limited.

object_not_found_policy

No

STRING

Specifies how MaxCompute returns the result of the function invocation if an object key exists in the cached data but the object no longer exists in OSS. Valid values:

  • OUTPUT_NULL: Does not throw an exception. The output is NULL.

  • THROW_EXCEPTION: Throws an exception and terminates the current task.

  • WARN_AND_NULL: Does not throw an exception and returns NULL. A log entry is added to the SQL runtime output to indicate that the object does not exist. If this inconsistency occurs for many objects, the performance of the overall task may be affected.

The default value is OUTPUT_NULL.

Example 1

To output the result of the GET_DATA_FROM_OSS function as the STRING type, nest it within the `STRING` function.

SELECT STRING(
  GET_DATA_FROM_OSS('<project_name>.default.ot_demo_day', key, 0, -1, 'OUTPUT_NULL')
) 
FROM ot_demo_day;

Example 2

Read all the content of the Object Table and return it as a binary value. The full path of the Object Table is <project_name>.default.ot_demo_day. The following code shows different parameter combinations.

-- The complete format.
SELECT GET_DATA_FROM_OSS('<project_name>.default.ot_demo_day', key, 0, -1, 'OUTPUT_NULL') FROM ot_demo_day;

-- The following statements are equivalent to get_data_from_oss('<project_name>.default.ot_demo_day', key, 0, -1, 'OUTPUT_NULL').
SELECT GET_DATA_FROM_OSS('<project_name>.default.ot_demo_day', key) FROM ot_demo_day;

SELECT GET_DATA_FROM_OSS('<project_name>.default.ot_demo_day', key, 0) FROM ot_demo_day;

SELECT GET_DATA_FROM_OSS('<project_name>.default.ot_demo_day', key, 0, -1) FROM ot_demo_day;

SELECT GET_DATA_FROM_OSS('<project_name>.default.ot_demo_day', key, 'OUTPUT_NULL') FROM ot_demo_day;

SELECT GET_DATA_FROM_OSS('<project_name>.default.ot_demo_day', key, 0, 'OUTPUT_NULL') FROM ot_demo_day;

Use query optimization to improve performance

By default, when you access a table in a MaxCompute SQL statement, MaxCompute creates evenly sized chunks based on the number of records, the size of the records in bytes, and the total computing resources available under your quota. This provides parallel computing capabilities for different shards, which effectively controls long-tail problems in parallel tasks and improves overall query performance.

This chunking method is not ideal for SQL statements that need to download the content of objects from an Object Table for in-memory computation. Performance is not optimal, and I/O operations can easily become a bottleneck, leading to long-tail problems. Consider the following case:

key

size

a0000.jpg

10 MB

a0001.jpg

10 MB

a0002.jpg

10 MB

...

...

a1022.jpg

10 MB

a1023.jpg

10 MB

b.avi

10 GB

Assume that only two worker resources are available. If chunks are created based on the number of rows or record bytes, as is done for standard internal tables, two split objects might be created: split1 ([a0000.jpg to a0511.jpg]) and split2 ([a0512.jpg to a1023.jpg, b.avi]). The actual data volume to download for split1 is 10 MB × 512 = 5 GB. The data volume for split2 is 5 GB + 10 GB = 15 GB. The computation load for split2 is significantly higher than for split1, which causes a severe long-tail problem.

A more logical approach is to create chunks based on the actual size of the objects when your SQL statement needs to download them. This helps control long-tail problems. If the logic for consuming OSS object data is more time-consuming, a more flexible chunking capability is needed. For the preceding case, you can create two splits: split1 ([a0000.jpg to a1023.jpg]) and split2 ([b.avi]). The download volume for both splits is 10 GB.

MaxCompute Object Tables provide this capability and create chunks based on the actual object size by default. The default value is 1 GB. You can adjust this value as needed, with granularity levels of KB, MB, or GB.

# The default value is 1 GB. You can adjust it.
SET odps.sql.object.table.split.unit.gb = 1;
SELECT get_data_from_oss('project.default.ot_demo_day', key) FROM ot_demo_da WHERE ...


# Alternatively, you can control it at the MB level. This has a higher priority than the GB parameter.
SET odps.sql.object.table.split.unit.mb = 1;
SELECT get_data_from_oss('project.default.ot_demo_day', key) FROM ot_demo_da WHERE ...


# Alternatively, you can control it at the KB level. This has a higher priority than the GB and MB parameters.
SET odps.sql.object.table.split.unit.kb = 1;
SELECT get_data_from_oss('project.default.ot_demo_day', key) FROM ot_demo_da WHERE ...

In some cases, you may want to disable this query optimization solution. This solution starts two jobs. The first job performs pre-processing, and you can view its information in Logview. MaxCompute Object Tables provide a way to do this:

SET odps.sql.object.table.split.by.object.size.enabled = false;
SELECT get_data_from_oss('project.default.ot_demo_day', key) FROM ot_demo_day WHERE ...

Delete an OBJECT TABLE

Object Tables cache your metadata, which consumes storage space and incurs storage costs. If you no longer need this cached data for your business, you can delete the Object Table. To use the data again later, you can recreate the Object Table.

Syntax

DROP TABLE [IF EXISTS] <object_table_name>; 

Parameters

object_table_name: Required. The table name.

Example

DROP TABLE IF EXISTS ot_demo_day;

FAQ

ODPS-0010000:System internal error

  • Symptom

    The following error message is returned:

    ODPS-0010000:System internal error - 
    ActionHandler job failed with failinfo	storage service worker error occured: 
    common/io/oss/oss_file_system_cppsdk.cpp(919): 
    OSSRequestException: Status: -50, RequestId: , 
    ErrorCode: ClientError:-50, Message: E_HTTP_ERROR_CONN_REFUSED
  • Cause

    A public endpoint for OSS was used when the Object Table was created.

  • Solution

    When you create an Object Table, the oss_endpoint in the location parameter must be an internal network address. For more information about how to obtain the address, see Parameters.

Periodic refresh fails

  • Symptom

You set parameters for a periodic refresh when you create an Object Table. However, the refresh does not occur when the refresh period is reached.

  • Solution

Check whether the value of the `location` parameter configured for the Object Table is an OSS internal network domain name. For more information about how to create an Object Table, see Parameters.

Periodic refresh task reports an error

  • Error message

    FAILED: ODPS-0010000:System internal error - ActionHandler job failed with failinfo storage service worker error occured: common/io/oss/oss_file_system_cppsdk.cpp(877): OSSRequestException: Status: -50, RequestId: , ErrorCode: ClientError:-50, Message:

  • Solution

    This is an internal error. You can submit a ticket to contact the MaxCompute technical support team.