GET_DATA_FROM_OSS reads all or part of an OSS object's content and returns it as a BINARY value.
MaxCompute's Object Table feature lets the compute engine access unstructured data and metadata stored in Object Storage Service (OSS). An Object Table stores metadata of OSS objects at a specific path. Use GET_DATA_FROM_OSS to dynamically load the binary content of a specified OSS object. For details about Object Tables, see OBJECT TABLE definition.
Syntax
BINARY GET_DATA_FROM_OSS (
STRING <full_object_table_name>,
STRING <key>
[, BIGINT <offset>]
[, BIGINT <length>]
[, STRING <object_not_found_policy>]
)Parameters
| Parameter | Required | Data type | Description | Default |
|---|---|---|---|---|
full_object_table_name | Yes | STRING | The full path to the OBJECT TABLE in the three-tier model, including the Project and Schema names, for example, project.schema.object_table. If you used RoleARN authentication when creating the table, this parameter triggers automatic generation of a Security Token Service (STS) token to access OSS. | None |
key | Yes | STRING | The object key (name) of the OSS object in the Object Table. For the exact value, see the key field in View OBJECT TABLE properties. | None |
offset | No | BIGINT | The byte position at which to start reading. Must be >= 0. | 0 (start of object) |
length | No | BIGINT | The number of bytes to read. | -1 (no limit) |
object_not_found_policy | No | STRING | The behavior when an object key exists in the metadata cache but the object has been deleted from OSS. See the table below for valid values. | OUTPUT_NULL |
object_not_found_policy values
This parameter applies when the metadata cache contains a key that no longer exists in OSS — for example, because the object was deleted after the last ALTER TABLE ... REFRESH METADATA.
| Value | Throws exception | Returns NULL | Logs warning | Notes |
|---|---|---|---|---|
OUTPUT_NULL | No | Yes | No | Default. Silent null output. |
THROW_EXCEPTION | Yes | — | No | Terminates the current task immediately. |
WARN_AND_NULL | No | Yes | Yes | Adds a log entry per missing object. May affect overall task performance if many objects are missing. |
Return value
Returns a value of the BINARY type.
Examples
In the following examples, replace <project_name> with your actual MaxCompute project name.
Prerequisites
Before running the examples, complete these setup steps.
Step 1: Upload test data to OSS
Log on to the OSS console and upload the test file signedget.txt to the object-table-test/object_table_folder folder. For upload instructions, see Upload files.
Step 2: Create an Object Table
Using the local client (odpscmd) or a MaxCompute SQL node in DataWorks, run the following SQL. This example names the Object Table ot_demo_day. For details, see Create an OBJECT TABLE.
-- Enable the three-layer model (required for Object Table)
SET odps.namespace.schema=true;
-- Select the MaxCompute project
USE <project_name>;
-- Select the schema
USE SCHEMA <schema_name>;
-- Enable the 2.0 data type system (required for Object Table)
SET odps.sql.type.system.odps2=true;
-- Create the Object Table
CREATE OBJECT TABLE ot_demo_day
WITH serdeproperties (
'odps.properties.rolearn'='acs:ram::xxxxxx:role/aliyunodpsdefaultrole')
LOCATION 'oss://oss-cn-hangzhou-internal.aliyuncs.com/object-table-test/object_table_folder';Step 3: Refresh the metadata cache
ALTER TABLE ot_demo_day REFRESH METADATA;Example 1: Read all object content
The following statements all read the full content of the Object Table <project_name>.default.ot_demo_day. All six forms are equivalent — they each default offset to 0, length to -1, and object_not_found_policy to OUTPUT_NULL.
-- Complete form with all parameters explicit
SELECT GET_DATA_FROM_OSS('<project_name>.default.ot_demo_day', key, 0, -1, 'OUTPUT_NULL') FROM ot_demo_day;
-- The following statements are equivalent to the one above
SELECT GET_DATA_FROM_OSS('<project_name>.default.ot_demo_day', key) FROM ot_demo_day;
SELECT GET_DATA_FROM_OSS('<project_name>.default.ot_demo_day', key, 0) FROM ot_demo_day;
SELECT GET_DATA_FROM_OSS('<project_name>.default.ot_demo_day', key, 0, -1) FROM ot_demo_day;
SELECT GET_DATA_FROM_OSS('<project_name>.default.ot_demo_day', key, 'OUTPUT_NULL') FROM ot_demo_day;
SELECT GET_DATA_FROM_OSS('<project_name>.default.ot_demo_day', key, 0, 'OUTPUT_NULL') FROM ot_demo_day;Result:
+------+
| _c0 |
+------+
| test=20maxcompute=20download=20files=20by=20url=20 |
+------+Example 2: Cast output to STRING
GET_DATA_FROM_OSS returns BINARY. To work with the content as text, wrap it in the STRING() function.
SELECT STRING(
GET_DATA_FROM_OSS('<project_name>.default.ot_demo_day', key, 0, -1, 'OUTPUT_NULL')
)
FROM ot_demo_day;Result:
+-----+
| _c0 |
+-----+
| test maxcompute download files by url |
+-----+Related functions
GET_DATA_FROM_OSS is part of MaxCompute's unstructured data processing functions. For the full list, see Functions for unstructured data processing.