This topic describes how to create, read from, and write to JSON-formatted external tables in OSS.
Applicability
OSS external tables do not support the cluster property.
The size of a single file cannot exceed 2 GB. You must split files that are larger than 2 GB.
MaxCompute and OSS must be in the same region.
Create an External Table
Syntax
When you read JSON data, if a JSON file has fewer columns than what is defined in the external table, MaxCompute populates the missing column values with NULL. If the file has more columns, MaxCompute discards the extra columns.
Streamlined syntax structure
CREATE EXTERNAL TABLE <mc_oss_extable_name>
(
<col_name> <data_type>,
...
)
[COMMENT <table_comment>]
[PARTITIONED BY (<col_name> <data_type>, ...)]
ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
STORED AS textfile
LOCATION '<oss_location>';Full Syntax
CREATE EXTERNAL TABLE <mc_oss_extable_name>
(
<col_name> <data_type>,
...
)
[COMMENT <table_comment>]
[PARTITIONED BY (<col_name> <data_type>, ...)]
ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
[WITH serdeproperties (
['<property_name>'='<property_value>',...])
]
STORED AS textfile
LOCATION '<oss_location>'
[tblproperties ('<tbproperty_name>'='<tbproperty_value>',...)];Common Parameters
For more information, see Common syntax parameters.
Table-Specific Parameters
tblproperties Parameters
property_name | Use case | Description | property_value | Default |
mcfed.mapreduce.output.fileoutputformat.compress | Add this property when you want to compress TEXTFILE data files before writing them to OSS. | TEXTFILE compression property. Set this parameter to True to enable compression. If set to False, no compression occurs. |
| False |
mcfed.mapreduce.output.fileoutputformat.compress.codec | Add this property when you want to compress TEXTFILE data files before writing them to OSS. | TEXTFILE compression property. Specifies the compression codec for TEXTFILE data files. The default output format is Note: Only the compression codecs listed in |
| org.apache.hadoop.io.compress.DeflateCodec |
odps.external.data.output.prefix (Backward compatible with odps.external.data.prefix) | Add this property when you want to add a custom prefix to output files. |
| A valid combination of allowed characters, such as | None |
odps.external.data.enable.extension | Add this property when you want to display file extensions in output filenames. | If set to True, output files include extensions. If set to False, extensions are omitted. |
| False |
odps.external.data.output.suffix | Add this property when you want to add a custom suffix to output files. | Must contain only letters, digits, or underscores (a–z, A–Z, 0–9, _). | A valid combination of allowed characters, such as _hangzhou. | None |
odps.external.data.output.explicit.extension | Add this property when you want to specify a custom file extension for output files. |
| A valid combination of allowed characters, such as | None |
odps.text.option.bad.row.skipping | Add this property when you want to skip dirty data in OSS data files. | When MaxCompute reads OSS data files, you can choose whether to skip dirty data. |
|
When you create a JSON external table, if a field contains a nested JSON object, do not define its data type as STRING or JSON. Otherwise, MaxCompute cannot automatically parse the subfields.
You can use one of the two approaches described in the following examples:
Define the field as STRING. Then, use functions such as
get_json_objectduring queries to extract subfield values.Use the STRUCT type to define the field, mapping each subfield of the JSON object to a separate column in the table. You can directly access the data using the
field_name.subfield_namesyntax.
Write Data
For more information about the syntax for writing data from MaxCompute to OSS, see Write data to OSS.
Query and Analysis
For more information about SELECT syntax, see Read data from OSS.
For more information about query plan optimization, see Query optimization.
For more information about BadRowSkipping, see BadRowSkipping.
BadRowSkipping
If dirty data is present in your JSON data, you can use the BadRowSkipping feature to skip or retain the rows that contain errors. Enabling or disabling this feature does not change how the underlying data format is interpreted.
Parameter settings
Table-level parameter
odps.text.option.bad.row.skippingrigid: The skip logic must be executed and cannot be overwritten by session-level or project-level settings.flexible: Enables flexible skipping at the data plane. This setting can be overwritten by session-level or project-level settings.
Session-level or project-levelparametersodps.sql.unstructured.text.bad.row.skipping: This parameter can overwrite the flexible table-level parameter but not the rigid table-level parameter.on: Actively enabled. If this parameter is not configured for the table, it is automatically enabled.off: Actively disabled. If the table-level parameter is set to flexible, this feature is disabled. In other cases, the table-level parameter takes precedence.<null>/If the input is invalid: The behavior is determined only by the table-level configuration.
odps.sql.unstructured.text.bad.row.skipping.debug.num: Specifies the number of error results that can be printed to standard output (stdout) in Logview.The maximum value is 1000.
If the value is less than or equal to 0, this feature is disabled.
If the input is invalid, this feature is disabled.
Interaction between session-level and table-level parameters
tbl property
session flag
result
rigid
on
On, forcibly enabled
off
<null>, an invalid value, or this parameter is not configured
flexible
on
On
off
Off, disabled by the session
<null>, an invalid value, or this parameter is not configured
On
Not configured
on
Status: Session enabled
off
Off
<null>, an invalid value, or this parameter is not configured
Examples
Prepare the data.
Upload the test file json_bad_row_skipping.json, which contains dirty data, to the
oss-mc-test/badrow/path in your OSS bucket.Create a JSON external table.
The behavior differs based on the table-level parameter and the session-level flag. The following three cases are possible:
Table-level parameter:
odps.text.option.bad.row.skipping = flexible/rigid/<not set>Session-level flag:
odps.sql.unstructured.text.bad.row.skipping = on/off/<not set>
No table-level parameter set
-- No table-level parameter set. Errors occur as expected. Session flag controls behavior. CREATE EXTERNAL TABLE test_json_bad_data_skipping_flag ( a INT, b INT ) row format serde 'org.apache.hive.hcatalog.data.JsonSerDe' stored AS textfile location '<oss://databucketpath>';Table-level parameter skips errors. Session-level flag can disable it.
-- Table-level parameter skips errors, but session-level flag can override it. CREATE EXTERNAL TABLE test_json_bad_data_skipping_flexible ( a INT, b INT ) row format serde 'org.apache.hive.hcatalog.data.JsonSerDe' stored AS textfile location '<oss://databucketpath>' tblproperties ( 'odps.text.option.bad.row.skipping' = 'flexible' -- Elastic mode. Can be disabled by session flag. );Table-level parameter forces error skipping. Session-level flag cannot disable it.
-- Table-level parameter forces error skipping. Session-level flag cannot override it. CREATE EXTERNAL TABLE test_json_bad_data_skipping_rigid ( a INT, b INT ) row format serde 'org.apache.hive.hcatalog.data.JsonSerDe' stored AS textfile location '<oss://databucketpath>' tblproperties ( 'odps.text.option.bad.row.skipping' = 'rigid' -- Forced mode. );Verify the query results.
No table-level parameter set
-- Enable at session level. SET odps.sql.unstructured.text.bad.row.skipping=on; -- Disable at session level. If table is set to 'flexible', it disables skipping. If table is set to 'rigid', it has no effect. SET odps.sql.unstructured.text.bad.row.skipping=off; -- Print up to 10 problematic rows to Logview Stdout. Values ≤ 0 disable printing. SET odps.sql.unstructured.text.bad.row.skipping.debug.num=10; SELECT * FROM test_json_bad_data_skipping_flag;The query fails and returns the following error: FAILED: ODPS-0123131:User defined function exception
Table-level parameter skips errors. Session-level flag can disable it.
-- Enable at session level. SET odps.sql.unstructured.text.bad.row.skipping=on; -- Disable at session level. If table is set to 'flexible', it disables skipping. If table is set to 'rigid', it has no effect. SET odps.sql.unstructured.text.bad.row.skipping=off; -- Print up to 10 problematic rows to Logview Stdout. Values ≤ 0 disable printing. SET odps.sql.unstructured.text.bad.row.skipping.debug.num=10; SELECT * FROM test_json_bad_data_skipping_flexible;The query fails and returns the following error: FAILED: ODPS-0123131:User defined function exception
Table-level parameter forces error skipping. Session-level flag cannot disable it.
-- Enable at session level. SET odps.sql.unstructured.text.bad.row.skipping=on; -- Disable at session level. If table is set to 'flexible', it disables skipping. If table is set to 'rigid', it has no effect. SET odps.sql.unstructured.text.bad.row.skipping=off; -- Print up to 10 problematic rows to Logview Stdout. Values ≤ 0 disable printing. SET odps.sql.unstructured.text.bad.row.skipping.debug.num=10; SELECT * FROM test_json_bad_data_skipping_rigid;The query returns the following result:
+------------+------------+ | a | b | +------------+------------+ | 1 | 2 | | 15 | 16 | +------------+------------+
Examples
Prerequisites
You have created a MaxCompute project.
You have prepared an OSS bucket and OSS directory. For more information, see Create a bucket and Manage folders.
Because MaxCompute is available only in specific regions, cross-region connectivity issues may occur. We recommend using an OSS bucket in the same region as your MaxCompute project.
Authorization
You have permissions to access OSS. An Alibaba Cloud account (primary account), Resource Access Management (RAM) user, or RAM role can access OSS external tables. For authorization details, see STS-mode authorization for OSS.
You have CreateTable permissions in the MaxCompute project. For details about table operation permissions, see MaxCompute permissions.
Example 1: Create, write to, and query a JSON external table
This example shows how to create a JSON external table using the built-in open-source JSON parser, and then write data to and query data from OSS.
Prepare the data.
Log on to the OSS console and upload the test file json2025.txt to the
external-table-test/json/dt=20250521/directory in your OSS bucket. For more information, see Upload OSS files.Create a JSON external table.
CREATE EXTERNAL TABLE mc_oss_extable_name_json ( action STRING, time STRING ) PARTITIONED BY (dt STRING) ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe' WITH serdeproperties ( 'odps.properties.rolearn'='acs:ram::<uid>:role/aliyunodpsdefaultrole' ) STORED AS textfile LOCATION 'oss://oss-cn-hangzhou-internal.aliyuncs.com/external-table-test/json/';Add partitions. If your OSS external table is partitioned, you must run this command to add the existing partitions. For more information, see Add partitions to an OSS external table.
-- Add partitions. MSCK REPAIR TABLE mc_oss_extable_name_json ADD PARTITIONS;Read the JSON external table.
SELECT * FROM mc_oss_extable_name_json WHERE dt=20250526;The query returns the following result:
+------------+------------+------------+ | action | time | dt | +------------+------------+------------+ | Close | 1469679568 | 20250526 | | Close | 1469679568 | 20250526 | +------------+------------+------------+Write to the JSON external table.
INSERT INTO mc_oss_extable_name_json PARTITION (dt='20250526') VALUES ('test','1627273823');Check the written data.
SELECT * FROM mc_oss_extable_name_json WHERE dt=20250526;The query returns the following result:
+------------+------------+------------+ | action | time | dt | +------------+------------+------------+ | test | 1627273823 | 20250526 | | Close | 1469679568 | 20250526 | | Close | 1469679568 | 20250526 | +------------+------------+------------+
Example 2: Read JSON field values
Prepare the data
Create a JSON file named events.json with the following content:
{"a":{"x":1, "y":2}, "id":"123"}
{"a":{"x":3, "y":4}, "id":"345"}Log on to the OSS console and upload the test file to the external-table-test/json-sturct/ directory in your OSS bucket. For more information, see Upload OSS files.
Method 1: Create a TEXTFILE external table and use the get_json_object function to read field values
Create a TEXTFILE external table with a single
stringcolumn:CREATE EXTERNAL TABLE extable_json_test01 ( col STRING ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\n' STORED AS textfile LOCATION 'oss://oss-cn-hangzhou-internal.aliyuncs.com/external-table-test/json_struct/'; SELECT * FROM extable_json_test01;The query returns the following result:
+------------------------------------+ | col | +------------------------------------+ | {"a": {"x": 1, "y": 2},"id":"123"} | | {"a": {"x": 3, "y": 4},"id":"345"} | +------------------------------------+Use the
get_json_objectfunction to read theaandidfields:SELECT get_json_object(col, '$.a') AS a, get_json_object(col, '$.id') AS id FROM extable_json_test01;The query returns the following result:
+-------------------+-----+ | a | id | +-------------------+-----+ | {"x":1,"y":2} | 123 | | {"x":3,"y":4} | 345 | +-------------------+-----+Read the nested fields
x,y, andid:SELECT get_json_object(get_json_object(col,'$.a'),'$.x') AS x, get_json_object(get_json_object(col,'$.a'),'$.y') AS y, get_json_object(col,'$.id') AS id FROM extable_json_test01;The query returns the following result:
+---+---+-----+ | x | y | id | +---+---+-----+ | 1 | 2 |123 | | 3 | 4 |345 | +---+---+-----+
Method 2: Create a JSON external table and use the STRUCT type to handle nested fields
Create a JSON external table that uses the
STRUCTtype to represent nested fields:CREATE EXTERNAL TABLE extable_json_test02 ( a STRUCT<x: BIGINT, y: BIGINT>, id STRING ) ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe' STORED AS textfile LOCATION 'oss://oss-cn-hangzhou-internal.aliyuncs.com/external-table-test/json_struct/';Query the table directly:
SELECT * FROM extable_json_test02;The query returns the following result:
+----------+-----+ | a | id | +----------+-----+ | {x:1, y:2}|123 | | {x:3, y:4}|345 | +----------+-----+Use the
get_json_objectandTO_JSONfunctions to read thexandyfields:SELECT get_json_object(TO_JSON(a), '$.x') AS x, get_json_object(TO_JSON(a), '$.y') AS y, id FROM extable_json_test02;The query returns the following result:
+---+---+-----+ | x | y | id | +---+---+-----+ | 1 | 2 |123 | | 3 | 4 |345 | +---+---+-----+
Example 3: Set custom prefixes, suffixes, and file extensions for OSS output files
Set a custom prefix
test06_for the output files. The DDL is as follows:CREATE EXTERNAL TABLE <mc_oss_extable_name> ( vehicleId INT, recordId INT, patientId INT, calls INT, locationLatitute DOUBLE, locationLongitude DOUBLE, recordTime STRING, direction STRING ) ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe' WITH serdeproperties ( 'odps.properties.rolearn'='acs:ram::<uid>:role/aliyunodpsdefaultrole' ) STORED AS textfile LOCATION 'oss://oss-cn-beijing-internal.aliyuncs.com/***/' TBLPROPERTIES ( -- Set custom prefix. 'odps.external.data.output.prefix'='test06_') ; -- Write data to the external table. INSERT INTO <mc_oss_extable_name> VALUES (1,32,76,1,63.32106,-92.08174,'9/14/2014 0:10','NW');The output files are displayed in the following figure:

Set a custom suffix
_beijingfor the output files. The DDL is as follows:CREATE EXTERNAL TABLE <mc_oss_extable_name> ( vehicleId INT, recordId INT, patientId INT, calls INT, locationLatitute DOUBLE, locationLongitude DOUBLE, recordTime STRING, direction STRING ) ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe' WITH serdeproperties ( 'odps.properties.rolearn'='acs:ram::<uid>:role/aliyunodpsdefaultrole' ) STORED AS textfile LOCATION 'oss://oss-cn-beijing-internal.aliyuncs.com/***/' TBLPROPERTIES ( -- Set custom suffix. 'odps.external.data.output.suffix'='_beijing') ; -- Write data to the external table. INSERT INTO <mc_oss_extable_name> VALUES (1,32,76,1,63.32106,-92.08174,'9/14/2014 0:10','NW');The output files are displayed in the following figure:

You can enable automatic file extensions for output files. DDL:
CREATE EXTERNAL TABLE <mc_oss_extable_name> ( vehicleId INT, recordId INT, patientId INT, calls INT, locationLatitute DOUBLE, locationLongitude DOUBLE, recordTime STRING, direction STRING ) ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe' WITH serdeproperties ( 'odps.properties.rolearn'='acs:ram::<uid>:role/aliyunodpsdefaultrole' ) STORED AS textfile LOCATION 'oss://oss-cn-beijing-internal.aliyuncs.com/***/' TBLPROPERTIES ( -- Enable automatic file extensions. 'odps.external.data.enable.extension'='true') ; -- Write data to the external table. INSERT INTO <mc_oss_extable_name> VALUES (1,32,76,1,63.32106,-92.08174,'9/14/2014 0:10','NW');The output files appear as shown in the following image:
Set a custom file extension
jsonlfor the output files. The DDL is as follows:CREATE EXTERNAL TABLE <mc_oss_extable_name> ( vehicleId INT, recordId INT, patientId INT, calls INT, locationLatitute DOUBLE, locationLongitude DOUBLE, recordTime STRING, direction STRING ) ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe' WITH serdeproperties ( 'odps.properties.rolearn'='acs:ram::<uid>:role/aliyunodpsdefaultrole' ) STORED AS textfile LOCATION 'oss://oss-cn-beijing-internal.aliyuncs.com/***/' TBLPROPERTIES ( -- Set custom file extension. 'odps.external.data.output.explicit.extension'='jsonl') ; -- Write data to the external table. INSERT INTO <mc_oss_extable_name> VALUES (1,32,76,1,63.32106,-92.08174,'9/14/2014 0:10','NW');The output files are displayed in the following figure:

Set a custom prefix
mc_, a suffix_beijing, and an extensionjsonlfor the output files. The DDL is as follows:CREATE EXTERNAL TABLE <mc_oss_extable_name> ( vehicleId INT, recordId INT, patientId INT, calls INT, locationLatitute DOUBLE, locationLongitude DOUBLE, recordTime STRING, direction STRING ) ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe' WITH serdeproperties ( 'odps.properties.rolearn'='acs:ram::<uid>:role/aliyunodpsdefaultrole' ) STORED AS textfile LOCATION 'oss://oss-cn-beijing-internal.aliyuncs.com/***/' TBLPROPERTIES ( -- Set custom prefix. 'odps.external.data.output.prefix'='mc_', -- Set custom suffix. 'odps.external.data.output.suffix'='_beijing', -- Set custom file extension. 'odps.external.data.output.explicit.extension'='jsonl') ; -- Write data to the external table. INSERT INTO <mc_oss_extable_name> VALUES (1,32,76,1,63.32106,-92.08174,'9/14/2014 0:10','NW');The output files are displayed in the following figure:

Supported Data Types
For more information about MaxCompute data types, see Data types version 1.0 and Data types version 2.0.
Data type | Supported | Data type | Supported? |
TINYINT | STRING | ||
SMALLINT | DATE | ||
INT | DATETIME | ||
BIGINT | TIMESTAMP | ||
BINARY | TIMESTAMP_NTZ | ||
FLOAT | BOOLEAN | ||
DOUBLE | ARRAY | ||
DECIMAL(precision,scale) | MAP | ||
VARCHAR(n) | STRUCT | ||
CHAR(n) | JSON |
Schema Evolution Support
JSON external tables map column values by name between the table schema and the columns in the JSON file.
The Data compatibility notes column in the following table describes whether an external table with an evolved schema can read both new data that matches the updated schema and historical data that does not.
Operation Type | Supported | Description | Data compatibility notes |
Add column |
|
| |
Drop column | JSON external tables map column values by name. | Compatible. | |
Modify Column Order | JSON external tables map column values by name. | Compatible. | |
Change column data type | See Change column data type for supported conversions. | Compatible. | |
Modify column name | We do not recommend renaming columns. Because JSON external tables map by name, renaming breaks alignment between JSON file keys and the table schema. This causes query failures. |
| |
Modify column comment | Comments must be valid strings up to 1024 bytes. Invalid comments cause errors. | Compatible. | |
Modify column nullability | This operation is not supported. All columns are nullable by default. | Not applicable. |