This topic describes how to create, read from, and write to a JSON external table in OSS.
Scope
OSS external tables do not support the cluster property.
The size of a single file cannot exceed 2 GB. You must split files that are larger than 2 GB.
MaxCompute and OSS must be in the same region.
Create an external table
Syntax
If a JSON file has fewer columns than the external table definition, MaxCompute populates the missing columns with NULL. If the file has more columns, MaxCompute discards the extra columns.
Streamlined syntax
CREATE EXTERNAL TABLE <mc_oss_extable_name>
(
<col_name> <data_type>,
...
)
[COMMENT <table_comment>]
[PARTITIONED BY (<col_name> <data_type>, ...)]
ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
STORED AS textfile
LOCATION '<oss_location>';Full syntax
CREATE EXTERNAL TABLE <mc_oss_extable_name>
(
<col_name> <data_type>,
...
)
[COMMENT <table_comment>]
[PARTITIONED BY (<col_name> <data_type>, ...)]
ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
[WITH serdeproperties (
['<property_name>'='<property_value>',...])
]
STORED AS textfile
LOCATION '<oss_location>'
[tblproperties ('<tbproperty_name>'='<tbproperty_value>',...)];Common parameters
For more information, see Common syntax parameters.
Exclusive parameters
Tblproperties parameters
Parameter | Use case | Description | Value | Default |
mcfed.mapreduce.output.fileoutputformat.compress | To write TEXTFILE data to OSS in a compressed format. | TEXTFILE compression property. If set to |
| False |
mcfed.mapreduce.output.fileoutputformat.compress.codec | To write TEXTFILE data to OSS in a compressed format. | TEXTFILE compression property. Sets the compression codec for TEXTFILE data files. By default, files are compressed by using the Note: Only the compression method in |
| org.apache.hadoop.io.compress.DeflateCodec |
odps.external.data.output.prefix (backward compatible with odps.external.data.prefix) | To add a custom prefix to the names of output files. |
| A valid character combination, such as | None |
odps.external.data.enable.extension | To add a file extension to the names of output files. | If set to |
| False |
odps.external.data.output.suffix | To add a custom suffix to the names of output files. | The suffix can contain only digits (0-9), letters (a-z, A-Z), and underscores (_). | A valid string, such as '_hangzhou'. | None |
odps.external.data.output.explicit.extension | To add a custom file extension to the names of output files. |
| A valid character combination, such as | None |
odps.text.option.bad.row.skipping | To skip dirty data in OSS data files. | Specifies whether to skip dirty data when MaxCompute reads from OSS files. |
|
When creating an external table for JSON with nested objects (structs), do not define the object field's data type as STRING or JSON. Otherwise, MaxCompute cannot parse its subfields.
The following two approaches are recommended. For detailed steps, see the examples in this topic:
Define the field as STRING and use functions such as
get_json_objectin your queries to extract the content of internal subfields as needed.Use the
STRUCTtype to structurally define the field, mapping each subfield of the JSON object to a separate sub-column. This allows you to directly access the internal data by using thefield_name.subfield_namesyntax.
Write data
For information about the syntax for writing data from MaxCompute to OSS, see Write data to OSS.
Query data
For information about the SELECT syntax, see Read data from OSS.
For information about how to optimize query plans, see Query optimization.
For more information about BadRowSkipping, see BadRowSkipping.
BadRowSkipping
The BadRowSkipping feature lets you skip rows that contain dirty data or cause parsing errors. It does not change how the underlying data format is interpreted.
Parameters
Table-level parameter
odps.text.option.bad.row.skippingrigid: Always executes the skip logic and cannot be overridden by session-level or project-level configurations.flexible: Enables elastic skip at the data level, but can also be overridden by configurations at the session or project level.
session/project-level parametersodps.sql.unstructured.text.bad.row.skippingcan override flexible table parameters, but cannot override rigid table parameters.on: Enables the feature. If the feature is not configured for a table, it is automatically enabled.off: Active close. If a table is configured as flexible, an active close is performed. Otherwise, the behavior is determined by the table parameters.For <null> or invalid input: The system considers only the table-level configuration behavior.
odps.sql.unstructured.text.bad.row.skipping.debug.numspecifies the number of bad rows that can be printed to the Stdout of Logview.The maximum value is 1000.
If the value is less than or equal to 0, this feature is disabled.
If the value is invalid, this feature is disabled.
Interaction between session-level and table-level parameters
Table property
Session flag
Result
rigid
on
Enabled, forced on
off
<null>, an invalid value, or this parameter is not configuredflexible
on
Enabled
off
Disabled, disabled by the session
<null>, an invalid value, or this parameter is not configuredEnabled
Not configured
on
Enabled, enabled by the session
off
Disabled
<null>, an invalid value, or this parameter is not configured
Examples
Prepare data
Upload the test data json_bad_row_skipping.json, which contains some dirty data, to the
oss-mc-test/badrow/directory in OSS.Create a JSON external table
The behavior differs based on the table-level property and the session-level flag. The following three cases are possible:
Table parameter:
odps.text.option.bad.row.skipping = flexible/rigid/<unspecified>Session flag:
odps.sql.unstructured.text.bad.row.skipping = on/off/<not set>
No table-level property
-- No table-level property is set. Errors are handled based on the session-level flag. CREATE EXTERNAL TABLE test_json_bad_data_skipping_flag ( a INT, b INT ) row format serde 'org.apache.hive.hcatalog.data.JsonSerDe' stored AS textfile location '<oss://databucketpath>';Flexible skipping
-- The table-level property skips error rows, but the session-level flag can disable this behavior. CREATE EXTERNAL TABLE test_json_bad_data_skipping_flexible ( a INT, b INT ) row format serde 'org.apache.hive.hcatalog.data.JsonSerDe' stored AS textfile location '<oss://databucketpath>' tblproperties ( 'odps.text.option.bad.row.skipping' = 'flexible' -- Flexible mode, can be disabled at the session level. );Rigid skipping
-- The table-level property forces error skipping. This behavior cannot be disabled at the session level. CREATE EXTERNAL TABLE test_json_bad_data_skipping_rigid ( a INT, b INT ) row format serde 'org.apache.hive.hcatalog.data.JsonSerDe' stored AS textfile location '<oss://databucketpath>' tblproperties ( 'odps.text.option.bad.row.skipping' = 'rigid' -- Forced on. );Verify query results
No table-level property
-- Enable at the session level. SET odps.sql.unstructured.text.bad.row.skipping=on; -- Disable at the session level. If the table property is 'flexible', it will be disabled. If the table property is 'rigid', this setting has no effect. SET odps.sql.unstructured.text.bad.row.skipping=off; -- Print problematic rows. The maximum is 1,000. If the value is less than or equal to 0, printing is disabled. SET odps.sql.unstructured.text.bad.row.skipping.debug.num=10; SELECT * FROM test_json_bad_data_skipping_flag;If error skipping is disabled at the session level (
SET odps.sql.unstructured.text.bad.row.skipping=off), the query fails with the following error: FAILED: ODPS-0123131:User defined function exceptionFlexible skipping
-- Enable at the session level. SET odps.sql.unstructured.text.bad.row.skipping=on; -- Disable at the session level. If the table property is 'flexible', it will be disabled. If the table property is 'rigid', this setting has no effect. SET odps.sql.unstructured.text.bad.row.skipping=off; -- Print problematic rows. The maximum is 1,000. If the value is less than or equal to 0, printing is disabled. SET odps.sql.unstructured.text.bad.row.skipping.debug.num=10; SELECT * FROM test_json_bad_data_skipping_flexible;If error skipping is disabled at the session level (
SET odps.sql.unstructured.text.bad.row.skipping=off), the query fails with the following error: FAILED: ODPS-0123131:User defined function exceptionRigid skipping
-- Enable at the session level. SET odps.sql.unstructured.text.bad.row.skipping=on; -- Disable at the session level. If the table property is 'flexible', it will be disabled. If the table property is 'rigid', this setting has no effect. SET odps.sql.unstructured.text.bad.row.skipping=off; -- Print problematic rows. The maximum is 1,000. If the value is less than or equal to 0, printing is disabled. SET odps.sql.unstructured.text.bad.row.skipping.debug.num=10; SELECT * FROM test_json_bad_data_skipping_rigid;The following result is returned:
+------------+------------+ | a | b | +------------+------------+ | 1 | 2 | | 15 | 16 | +------------+------------+
Examples
Prerequisites
A MaxCompute project has been created. For more information, see Create a MaxCompute project.
An OSS bucket and directory have been prepared. For more information, see Create a bucket and Manage directories.
MaxCompute is deployed only in some regions. To avoid potential cross-region data connectivity issues, we recommend using a bucket in the same region as your MaxCompute project.
The required permissions are granted:
The permission to access OSS. Alibaba Cloud accounts, RAM users, or RAM roles can be used to access OSS external tables. For more information about authorization, see STS-mode authorization for OSS.
The
CreateTablepermission in your MaxCompute project. For more information about table operation permissions, see MaxCompute permissions.
Example 1: Create, write, and query a JSON table
This example shows how to create a JSON external table using the built-in open source data parser, write data to OSS, and then query the data.
Prepare data.
Log on to the OSS console and upload the test data file json2025.txt to the
external-table-test/json/dt=20250521/directory in an OSS Bucket. For more information, see Upload files to OSS.Create a JSON external table.
CREATE EXTERNAL TABLE mc_oss_extable_name_json ( action STRING, time STRING ) PARTITIONED BY (dt STRING) ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe' WITH serdeproperties ( 'odps.properties.rolearn'='acs:ram::<uid>:role/aliyunodpsdefaultrole' ) STORED AS textfile LOCATION 'oss://oss-cn-hangzhou-internal.aliyuncs.com/external-table-test/json/';If your OSS external table is a partitioned table, you must also import the partitioned data. For more information, see OSS external table.
-- Add partitions. MSCK REPAIR TABLE mc_oss_extable_name_json ADD PARTITIONS;Read data from the JSON external table.
SELECT * FROM mc_oss_extable_name_json WHERE dt=20250526;The following result is returned:
+------------+------------+------------+ | action | time | dt | +------------+------------+------------+ | Close | 1469679568 | 20250526 | | Close | 1469679568 | 20250526 | +------------+------------+------------+Write data to the JSON external table.
INSERT INTO mc_oss_extable_name_json PARTITION (dt='20250526') VALUES ('test','1627273823');View the written data.
SELECT * FROM mc_oss_extable_name_json WHERE dt=20250526;The following result is returned:
+------------+------------+------------+ | action | time | dt | +------------+------------+------------+ | test | 1627273823 | 20250526 | | Close | 1469679568 | 20250526 | | Close | 1469679568 | 20250526 | +------------+------------+------------+
Example 2: Read nested JSON fields
Prepare data
Create the JSON file events.json:
{"a":{"x":1, "y":2}, "id":"123"}
{"a":{"x":3, "y":4}, "id":"345"}Log on to the OSS console and upload test data to the external-table-test/json-sturct/ directory in an OSS bucket. For more information, see Upload files to OSS.
Method 1: Create a TEXTFILE external table and use the get_json_object function to read field values
Create a TEXTFILE external table that contains only a single column of the
stringtype:CREATE EXTERNAL TABLE extable_json_test01 ( col STRING ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\n' STORED AS textfile LOCATION 'oss://oss-cn-hangzhou-internal.aliyuncs.com/external-table-test/json_struct/'; SELECT * FROM extable_json_test01;The following result is returned:
+------------------------------------+ | col | +------------------------------------+ | {"a": {"x": 1, "y": 2},"id":"123"} | | {"a": {"x": 3, "y": 4},"id":"345"} | +------------------------------------+Use the
get_json_objectfunction to read theaandidfields:SELECT get_json_object(col, '$.a') AS a, get_json_object(col, '$.id') AS id FROM extable_json_test01;The following result is returned:
+-------------------+-----+ | a | id | +-------------------+-----+ | {"x":1,"y":2} | 123 | | {"x":3,"y":4} | 345 | +-------------------+-----+Read the nested fields
x,y, andid:SELECT get_json_object(get_json_object(col,'$.a'),'$.x') AS x, get_json_object(get_json_object(col,'$.a'),'$.y') AS y, get_json_object(col,'$.id') AS id FROM extable_json_test01;The following result is returned:
+---+---+-----+ | x | y | id | +---+---+-----+ | 1 | 2 |123 | | 3 | 4 |345 | +---+---+-----+
Method 2: Create a JSON external table and use the STRUCT type to receive the data
Create a JSON-formatted external table and use the
STRUCTtype to receive nested fields:CREATE EXTERNAL TABLE extable_json_test02 ( a STRUCT<x: BIGINT, y: BIGINT>, id STRING ) ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe' STORED AS textfile LOCATION 'oss://oss-cn-hangzhou-internal.aliyuncs.com/external-table-test/json_struct/';Query the data in the table directly:
SELECT * FROM extable_json_test02;The following result is returned:
+----------+-----+ | a | id | +----------+-----+ | {x:1, y:2}|123 | | {x:3, y:4}|345 | +----------+-----+You can also use the
get_json_objectandTO_JSONfunctions to read thexandyfields:SELECT get_json_object(TO_JSON(a), '$.x') AS x, get_json_object(TO_JSON(a), '$.y') AS y, id FROM extable_json_test02;The following result is returned:
+---+---+-----+ | x | y | id | +---+---+-----+ | 1 | 2 |123 | | 3 | 4 |345 | +---+---+-----+
Example 3: Customize output filenames
Set the custom prefix for files written to OSS to
test06_. The DDL is as follows:CREATE EXTERNAL TABLE <mc_oss_extable_name> ( vehicleId INT, recordId INT, patientId INT, calls INT, locationLatitute DOUBLE, locationLongitude DOUBLE, recordTime STRING, direction STRING ) ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe' WITH serdeproperties ( 'odps.properties.rolearn'='acs:ram::<uid>:role/aliyunodpsdefaultrole' ) STORED AS textfile LOCATION 'oss://oss-cn-beijing-internal.aliyuncs.com/***/' TBLPROPERTIES ( -- Add a custom prefix. 'odps.external.data.output.prefix'='test06_') ; -- Write data to the external table. INSERT INTO <mc_oss_extable_name> VALUES (1,32,76,1,63.32106,-92.08174,'9/14/2014 0:10','NW');The following figure shows the files that are generated after the write operation.

To customize the suffix for files written to OSS as
_beijing, the DDL is as follows:CREATE EXTERNAL TABLE <mc_oss_extable_name> ( vehicleId INT, recordId INT, patientId INT, calls INT, locationLatitute DOUBLE, locationLongitude DOUBLE, recordTime STRING, direction STRING ) ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe' WITH serdeproperties ( 'odps.properties.rolearn'='acs:ram::<uid>:role/aliyunodpsdefaultrole' ) STORED AS textfile LOCATION 'oss://oss-cn-beijing-internal.aliyuncs.com/***/' TBLPROPERTIES ( -- Add a custom suffix. 'odps.external.data.output.suffix'='_beijing') ; -- Write data to the external table. INSERT INTO <mc_oss_extable_name> VALUES (1,32,76,1,63.32106,-92.08174,'9/14/2014 0:10','NW');The following figure shows the files that are generated after the write operation.

To automatically generate a file extension for output files, use the following DDL:
CREATE EXTERNAL TABLE <mc_oss_extable_name> ( vehicleId INT, recordId INT, patientId INT, calls INT, locationLatitute DOUBLE, locationLongitude DOUBLE, recordTime STRING, direction STRING ) ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe' WITH serdeproperties ( 'odps.properties.rolearn'='acs:ram::<uid>:role/aliyunodpsdefaultrole' ) STORED AS textfile LOCATION 'oss://oss-cn-beijing-internal.aliyuncs.com/***/' TBLPROPERTIES ( -- Automatically generate a file extension. 'odps.external.data.enable.extension'='true') ; -- Write data to the external table. INSERT INTO <mc_oss_extable_name> VALUES (1,32,76,1,63.32106,-92.08174,'9/14/2014 0:10','NW');The following figure shows the files that are generated after the write operation.
To customize the file extension to
jsonlfor files written to OSS, the DDL is as follows:CREATE EXTERNAL TABLE <mc_oss_extable_name> ( vehicleId INT, recordId INT, patientId INT, calls INT, locationLatitute DOUBLE, locationLongitude DOUBLE, recordTime STRING, direction STRING ) ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe' WITH serdeproperties ( 'odps.properties.rolearn'='acs:ram::<uid>:role/aliyunodpsdefaultrole' ) STORED AS textfile LOCATION 'oss://oss-cn-beijing-internal.aliyuncs.com/***/' TBLPROPERTIES ( -- Add a custom file extension. 'odps.external.data.output.explicit.extension'='jsonl') ; -- Write data to the external table. INSERT INTO <mc_oss_extable_name> VALUES (1,32,76,1,63.32106,-92.08174,'9/14/2014 0:10','NW');The following figure shows the files that are generated after the write operation.

For files written to OSS, set the prefix to
mc_, the suffix to_beijing, and the file extension tojsonl. The DDL is as follows:CREATE EXTERNAL TABLE <mc_oss_extable_name> ( vehicleId INT, recordId INT, patientId INT, calls INT, locationLatitute DOUBLE, locationLongitude DOUBLE, recordTime STRING, direction STRING ) ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe' WITH serdeproperties ( 'odps.properties.rolearn'='acs:ram::<uid>:role/aliyunodpsdefaultrole' ) STORED AS textfile LOCATION 'oss://oss-cn-beijing-internal.aliyuncs.com/***/' TBLPROPERTIES ( -- Add a custom prefix. 'odps.external.data.output.prefix'='mc_', -- Add a custom suffix. 'odps.external.data.output.suffix'='_beijing', -- Add a custom file extension. 'odps.external.data.output.explicit.extension'='jsonl') ; -- Write data to the external table. INSERT INTO <mc_oss_extable_name> VALUES (1,32,76,1,63.32106,-92.08174,'9/14/2014 0:10','NW');The following figure shows the files that are generated after the write operation.

FAQ
Error: Unexpected end-of-input: expected close marker for OBJECT
Error message
ODPS-0123131:User defined function exception - Traceback: com.aliyun.odps.serde.SerDeException: org.apache.hadoop.hive.serde2.SerDeException: org.codehaus.jackson.JsonParseException: Unexpected end-of-input: expected close marker for OBJECT (from [Source: java.io.ByteArrayInputStream@5a021cb9; line: 1, column: 0]) at [Source: java.io.ByteArrayInputStream@5a021cb9; line: 1, column: 3] at com.aliyun.odps.hive.wrapper.HiveSerDeWrapper.deserialize(HiveSerDeWrapper.java:122) at com.aliyun.odps.udf.HiveReaderHandler.next(HiveReaderHandler.java:152) at com.aliyun.odps.udf.HiveReaderHandler5c9b68c118d14fb2b392e8d916ddea8c.next(Unknown Source)Cause
This error typically occurs with invalid JSON data, such as a JSON Lines (JSONL) file. The error is triggered if a record contains an unescaped newline character, violating the one-record-per-line rule.
Solution
Escape the newline characters in the JSON file and then read the data.
Troubleshooting
When you enable BadRowSkipping, details about skipped rows are printed to the stdout log in Logview. This helps you diagnose data errors. Use the following settings to enable this feature:
-- Set the BadRowSkipping parameter to on at the session level to skip the error data. SET odps.sql.unstructured.text.bad.row.skipping=on; -- Specify the number of error records that can be printed to stdout in Logview. SET odps.sql.unstructured.text.bad.row.skipping.debug.num=<number>;
Supported data types
For more information about MaxCompute data types, see Data types (Version 1.0) and Data types (Version 2.0).
Type | Supported | Type | Supported |
TINYINT | STRING | ||
SMALLINT | DATE | ||
INT | DATETIME | ||
BIGINT | TIMESTAMP | ||
BINARY | TIMESTAMP_NTZ | ||
FLOAT | BOOLEAN | ||
DOUBLE | ARRAY | ||
DECIMAL(precision, scale) | MAP | ||
VARCHAR(n) | STRUCT | ||
CHAR(n) | JSON |
Schema evolution
JSON external tables map table columns to JSON fields by name.
In the following table, Data compatibility indicates whether the table can still read historical data correctly after a schema change.
Operation | Supported | Description | Data compatibility |
Add a column |
|
| |
Delete a column | JSON external tables map columns by name. | Compatible | |
Change column order | JSON external tables map columns by name. | Compatible | |
Change column data type | For information about supported data type conversions, see Change column data type. | Compatible | |
Rename a column | This operation is not recommended. JSON external tables map columns by name. After you rename a column, the column name in the JSON file no longer matches the new schema, which may cause read operations to fail. |
| |
Modify a column comment | The comment must be a valid string of no more than 1,024 bytes. Otherwise, an error is reported. | Compatible | |
Modify the non-null property of a column | This operation is not supported. Columns are nullable by default. | Not applicable |