MaxCompute supports schemaless queries on Parquet data from external tables in Object Storage Service (OSS). You can export the parsed dataset to OSS, write it to an internal table, or embed it as a subquery in SQL operations. This feature simplifies data operations in a data lake.
Background information
When you use Spark to process structured data for ad hoc exploration, you do not need a fixed data warehouse model. However, defining a table schema for both input and output and mapping fields from OSS files to an external table for read and write operations makes partition maintenance complex.
Schemaless Query simplifies this process. When you use a command to read a Parquet external table, MaxCompute automatically parses the file format and reads the data into a dataset with a schema. You can then select specific columns from the dataset for data processing, similar to how you would with a table. The results can be exported to OSS using the UNLOAD command or imported into an internal table using CREATE TABLE AS. You can also use the dataset as a subquery in other SQL statements. This provides a flexible way to operate on data in a data lake using MaxCompute.
Limits
Schemaless Query does not support treating subdirectories in an OSS bucket as partitions.
Syntax
SELECT *, <col_name>, <table_alias>.<col_name>
FROM
LOCATION '<location_path>'
('key'='value' [, 'key1'='value1', ...])
[AS <table_alias>];Parameters
Parameter | Required | Description |
* | Yes | Queries all fields in the Parquet file. |
col_name | Yes | Queries a field with a known name in the Parquet file. |
table_alias.col_name | Yes | Queries a field with a known name in the Parquet file. The field is specified by its full path, which consists of the table alias and the field name. |
table_alias | No | A custom alias for the table. |
location_path | Yes | The location of the Parquet files. This must be an OSS directory with the structure oss://oss_endpoint/bucket_name/path/. The subdirectory under path can be a partition directory in the partition_name=partition_value format. Note Schemaless Query does not support treating subdirectories in the location as partitions. Therefore, partition pruning is not supported. |
key&value | Yes | The parameters and their values for the query. For more information, see the following table. |
Optional key-value parameters:
Key | Value | Required | Description | Default value |
file_format | parquet | Yes | Specifies the format of the files in the location. Only Parquet is supported. If you specify another format, an error is reported. | parquet |
rolearn | acs:ram::xxxxxx:role/aliyunodpsdefaultrole | No | Specifies the Rolearn required to access the Location. You can obtain the Rolearn in the RAM console by searching for AliyunODPSDefaultRole on the Roles page. Note If you do not specify a RoleArn in the SQL statement, the system uses the ARN of the | acs:ram::1234****:role/aliyunodpsdefaultrole |
file_pattern_blacklist | A regular expression. Example:
| No | Specifies a blacklist of files to read. If a scanned file name matches the blacklist, the file is not read. | None |
file_pattern_whitelist | A regular expression. Example:
| No | Specifies a whitelist of files to read. A file is read only if its name matches the whitelist. |
|
Examples
Example 1: Read OSS data by setting blacklist and whitelist parameters
Prepare the data.
Log on to the OSS console and upload the test data to the
object-table-test/schema/directory in an OSS bucket. For more information, see Upload files to OSS.Prepare a Parquet file to read and verify the whitelist parameter.
Prepare a CSV file to verify the blacklist parameter.

Read the Parquet file.
Log on to the MaxCompute client (odpscmd) and execute the following SQL commands.
Add
test_oss.csvto the blacklist and read the Parquet file from OSS.SELECT tinyint_name, int_name, binary_name, float_name,varchar_name FROM location 'oss://oss-cn-hangzhou-internal.aliyuncs.com/object-table-test/schema/' ( 'file_format'='parquet', 'rolearn'='acs:ram::<uid>:role/aliyunodpsdefaultrole', 'file_pattern_blacklist'='.*test_oss.*' );Add
20250610TableSinkto the whitelist and read the Parquet file from OSS.SELECT tinyint_name, int_name, binary_name, float_name, varchar_name FROM location 'oss://oss-cn-hangzhou-internal.aliyuncs.com/object-table-test/schema/' ( 'file_format'='parquet', 'rolearn'='acs:ram::<uid>:role/aliyunodpsdefaultrole', 'file_pattern_whitelist'='.*20250610TableSink.*' );
In both cases, the
20250610TableSinkfile is read. The following result is returned:+--------------+------------+-------------+------------+--------------+ | tinyint_name | int_name | binary_name | float_name | varchar_name | +--------------+------------+-------------+------------+--------------+ | 1 | 100 | abc | 3.14 | N | +--------------+------------+-------------+------------+--------------+
Example 2: Read data written by Spark
Generate Parquet data using Serverless Spark. For more information, see Create an SQL task.
NoteIf your OSS directory already contains Parquet files generated by Spark, you can skip this step.
For more information about Schemaless Query operations, see Use Schemaless Query to read Parquet data from a data lake.
Log on to the E-MapReduce console and create an EMR Serverless Spark workspace.
On the EMR Serverless > Spark page, click the newly created workspace. On the Data Development page, execute the following SQL statement.
CREATE TABLE example_table_parquet04 ( id STRING, name STRING, age STRING, salary DOUBLE, is_active BOOLEAN, created_at TIMESTAMP, details STRUCT<department:STRING, position:STRING> ) USING PARQUET; INSERT INTO example_table_parquet04 VALUES ('1', 'Alice', '30', 5000.50, TRUE, TIMESTAMP '2024-01-01 10:00:00', STRUCT('HR', 'Manager')), ('2', 'Bob', '25', 6000.75, FALSE, TIMESTAMP '2024-02-01 11:00:00', STRUCT('Engineering', 'Developer')), ('3', 'Charlie','35', 7000.00, TRUE, TIMESTAMP '2024-03-01 12:00:00', STRUCT('Marketing', 'Analyst')), ('4', 'David', '40', 8000.25, FALSE, TIMESTAMP '2024-04-01 13:00:00', STRUCT('Sales', 'Representative')), ('5', 'Eve', '28', 5500.50, TRUE, TIMESTAMP '2024-05-01 14:00:00', STRUCT('Support', 'Technician')); SELECT * FROM example_table_parquet04;
Log on to the OSS console and view the generated data files in the destination path.

Log on to the MaxCompute client, add
_SUCCESSto the blacklist, and read the Parquet file from OSS. Assume that the specified OSS directory isobject-table-test/spark.SELECT * from location 'oss://oss-cn-hangzhou-internal.aliyuncs.com/object-table-test/spark/example_table_parquet04/' ( 'file_format'='parquet', 'rolearn'='acs:ram::<uid>:role/aliyunodpsdefaultrole', 'file_pattern_blacklist'='.*_SUCCESS.*' );The following result is returned:
+----+---------+-----+------------+-----------+---------------------+----------------------------------------------+ | id | name | age | salary | is_active | created_at | details | +----+---------+-----+------------+-----------+---------------------+----------------------------------------------+ | 1 | Alice | 30 | 5000.5 | true | 2024-01-01 10:00:00 | {department:HR, position:Manager} | | 2 | Bob | 25 | 6000.75 | false | 2024-02-01 11:00:00 | {department:Engineering, position:Developer} | | 3 | Charlie | 35 | 7000.0 | true | 2024-03-01 12:00:00 | {department:Marketing, position:Analyst} | | 4 | David | 40 | 8000.25 | false | 2024-04-01 13:00:00 | {department:Sales, position:Representative} | | 5 | Eve | 28 | 5500.5 | true | 2024-05-01 14:00:00 | {department:Support, position:Technician} | +----+---------+-----+------------+-----------+---------------------+----------------------------------------------+
Example 3: Use Schemaless Query in a subquery
Prepare the data.
Log on to the OSS console and upload the test data part-00001.snappy.parquet to the specified directory
object-table-test/schema/in an OSS bucket. For more information, see Upload files to OSS.Log on to the MaxCompute client and create an internal table to store the automatically discovered data from OSS.
CREATE TABLE ow_test ( id INT, name STRING, age INT );Read the OSS data using Schemaless Query. The command is as follows.
SELECT * FROM LOCATION 'oss://oss-cn-hangzhou-internal.aliyuncs.com/object-table-test/schema/' ( 'file_format'='parquet' );The following result is returned:
+------------+------------+------------+ | id | name | age | +------------+------------+------------+ | 3 | Charlie | 35 | | 4 | David | 40 | | 5 | Eve | 28 | +------------+------------+------------+Pass the data read from OSS as a subquery to an outer SQL statement to query the ow_test table.
INSERT OVERWRITE TABLE ow_test SELECT id,name,age FROM ( SELECT * FROM LOCATION 'oss://oss-cn-hangzhou-internal.aliyuncs.com/object-table-test/schema/' ( 'file_format'='parquet' ) ); SELECT * FROM ow_test;The following result is returned:
+------------+------------+------------+ | id | name | age | +------------+------------+------------+ | 3 | Charlie | 35 | | 4 | David | 40 | | 5 | Eve | 28 | +------------+------------+------------+
Example 4: Save the Schemaless Query result to an internal data warehouse table
Prepare the data.
Log on to the OSS console and upload the test data part-00001.snappy.parquet to the specified directory
object-table-test/schema/in an OSS bucket. For more information, see OSS File Upload.Log on to the MaxCompute client and read the OSS data using Schemaless Query.
SELECT * FROM LOCATION 'oss://oss-cn-hangzhou-internal.aliyuncs.com/object-table-test/schema/' ( 'file_format'='parquet' );The following result is returned:
+------------+------------+------------+ | id | name | age | +------------+------------+------------+ | 3 | Charlie | 35 | | 4 | David | 40 | | 5 | Eve | 28 | +------------+------------+------------+Copy the automatically discovered data from OSS to an internal table using the
CREATE TABLE ASstatement, and then query the table.CREATE TABLE ow_test_2 AS SELECT * FROM LOCATION 'oss://oss-cn-hangzhou-internal.aliyuncs.com/object-table-test/schema/' ( 'file_format'='parquet' ); -- Query the result table ow_test_2 SELECT * FROM ow_test_2;The following result is returned:
+------------+------------+------------+ | id | name | age | +------------+------------+------------+ | 3 | Charlie | 35 | | 4 | David | 40 | | 5 | Eve | 28 | +------------+------------+------------+
Example 5: Unload the Schemaless Query result back to the data lake
Prepare the data.
Log on to the OSS console and upload the test data part-00001.snappy.parquet to the
object-table-test/schema/directory in an OSS bucket. For more information, see Upload files to OSS.Log on to the MaxCompute client and read the OSS data using Schemaless Query.
SELECT * FROM LOCATION 'oss://oss-cn-hangzhou-internal.aliyuncs.com/object-table-test/schema/' ( 'file_format'='parquet' );The following result is returned:
+------------+------------+------------+ | id | name | age | +------------+------------+------------+ | 3 | Charlie | 35 | | 4 | David | 40 | | 5 | Eve | 28 | +------------+------------+------------+Unload the automatically discovered data to OSS. For more information about the UNLOAD operation, see UNLOAD.
UNLOAD FROM ( SELECT * FROM LOCATION 'oss://oss-cn-hangzhou-internal.aliyuncs.com/object-table-test/schema/' ('file_format'='parquet') ) INTO LOCATION 'oss://oss-cn-hangzhou-internal.aliyuncs.com/object-table-test/unload/ow_test_3/' ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' WITH SERDEPROPERTIES ('odps.external.data.enable.extension'='true') STORED AS PARQUET;View the files generated in the OSS directory:
