This topic describes how to create, read from, and write to JSON-formatted external tables in Object Storage Service (OSS).
Prerequisites
Ensure your Alibaba Cloud account, RAM user, or RAM role has the required permissions to access OSS external tables. For more information about authorization, see STS authorization for OSS.
(Optional) Prepare an OSS bucket, directories, and data files. For more information, see Create buckets, Manage directories, and Simple upload.
MaxCompute can automatically create directories in OSS. You can use a single SQL statement to read from or write to an external table that uses a UDF. Manual directory creation is no longer required, but the legacy method is still supported.
A MaxCompute project has been created. For more information, see Create a MaxCompute project.
MaxCompute is deployed only in specific regions. To prevent cross-region data connectivity issues, we recommend that you use a bucket in the same region as your MaxCompute project.
The Alibaba Cloud account or RAM user has the CreateTable permission on your project. For more information about table operation permissions, see MaxCompute permissions.
Limits
The cluster attribute is not supported for OSS external tables.
A single file cannot exceed 3 GB. If a file exceeds this limit, we recommend that you split it.
Usage notes
If the file contains more columns than the external table, the extra columns are discarded.
Data type support
For more information about MaxCompute data types, see Data types (V1.0) and Data types (V2.0).
Data type | Supported | Data type | Supported |
TINYINT | STRING | ||
SMALLINT | DATE | ||
INT | DATETIME | ||
BIGINT | TIMESTAMP | ||
BINARY | TIMESTAMP_NTZ | ||
FLOAT | BOOLEAN | ||
DOUBLE | ARRAY | ||
DECIMAL(precision,scale) | MAP | ||
VARCHAR(n) | STRUCT | ||
CHAR(n) | JSON |
Create an external table
Syntax
Simplified syntax
CREATE EXTERNAL TABLE <mc_oss_extable_name>
(
<col_name> <data_type>,
...
)
[COMMENT <table_comment>]
[PARTITIONED BY (<col_name> <data_type>, ...)]
ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
STORED AS textfile
LOCATION '<oss_location>';Complete syntax
CREATE EXTERNAL TABLE <mc_oss_extable_name>
(
<col_name> <data_type>,
...
)
[COMMENT <table_comment>]
[PARTITIONED BY (<col_name> <data_type>, ...)]
ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
[WITH serdeproperties (
['<property_name>'='<property_value>',...])
]
STORED AS textfile
LOCATION '<oss_location>'
[tblproperties ('<tbproperty_name>'='<tbproperty_value>',...)];Common parameters
For more information, see Basic syntax parameters.
Unique parameters
tblproperties parameters
property_name | Scenario | Description | property_value | Default value |
mcfed.mapreduce.output.fileoutputformat.compress | Specifies whether to compress TEXTFILE data when writing to OSS. | TEXTFILE compression property. Specifies whether to compress TEXTFILE data files when writing to OSS. If you set this property to True, MaxCompute writes TEXTFILE data files to OSS in a compressed format. Otherwise, the files are not compressed. |
| False |
mcfed.mapreduce.output.fileoutputformat.compress.codec | Specifies whether to compress TEXTFILE data when writing to OSS. | TEXTFILE compression property. Specifies the compression method for TEXTFILE data files. By default, files are compressed in the Note: Only the compression methods in |
| org.apache.hadoop.io.compress.DeflateCodec |
odps.external.data.output.prefix (Compatible with odps.external.data.prefix) | Specifies a custom prefix for the output filename. |
| A string that meets the requirements, such as 'mc_'. | None |
odps.external.data.enable.extension | Specifies whether to display the filename extension of the output file. | If you set this property to True, the extension of the output file is displayed. Otherwise, the extension is not displayed. |
| False |
odps.external.data.output.suffix | Specifies a custom suffix for the output filename. | The suffix can contain only letters (a-z and A-Z), digits (0-9), and underscores (_). | A string that meets the requirements, such as '_hangzhou'. | None |
odps.external.data.output.explicit.extension | Specifies a custom extension for the output file. |
| A string that meets the requirements, such as "jsonl". | None |
Usage notes
When you create a JSON external table, if a field contains a nested JSON object (the value of the field is a JSON struct), do not define the data type of this field as STRING or JSON. Otherwise, the system cannot automatically parse the subfields. You can use one of the following methods:
Define the field as STRING. When you query data, you can use functions such as
get_json_objectto extract the content of the internal subfields as needed.You can use the STRUCT type to define the structure of the field. This maps the subfields of the JSON object to separate subcolumns in the table. You can then directly access the internal data using the
field_name.subfield_nameformat.
For more information about these operations, see Example 2.
Write data
For more information about the syntax for writing data from MaxCompute to OSS, see Write data to OSS.
Query and analyze data
For more information about the SELECT syntax, see Read data from OSS.
For more information about optimizing query plans, see Query optimization.
Examples
Example 1: Create, write to, and query a JSON external table
This example shows how to create a JSON-formatted external table using a built-in open-source data parser, write data to OSS, and query the data.
Prepare the data.
Log on to the OSS console and upload the test data file json2025.txt to the
external-table-test/json/dt=20250521/folder in an OSS bucket. For more information, see Upload files.Create a JSON-formatted external table.
CREATE EXTERNAL TABLE mc_oss_extable_name_json ( action STRING, time STRING ) PARTITIONED BY (dt STRING) ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe' WITH serdeproperties ( 'odps.properties.rolearn'='acs:ram::<uid>:role/aliyunodpsdefaultrole' ) STORED AS textfile LOCATION 'oss://oss-cn-hangzhou-internal.aliyuncs.com/external-table-test/json/';Discover and register partitions. If the OSS external table that you created is a partitioned table, you must run an additional command to register the partitions from your OSS directory. For more information, see Syntax for adding partition data to an OSS external table.
-- Discover and register partitions. MSCK REPAIR TABLE mc_oss_extable_name_json ADD PARTITIONS;Read data from the JSON external table.
SELECT * FROM mc_oss_extable_name_json WHERE dt=20250526;The following result is returned:
+------------+------------+------------+ | action | time | dt | +------------+------------+------------+ | Close | 1469679568 | 20250526 | | Close | 1469679568 | 20250526 | +------------+------------+------------+Write data to the JSON external table.
INSERT INTO mc_oss_extable_name_json PARTITION (dt='20250526') VALUES ('test','1627273823');View the written data.
SELECT * FROM mc_oss_extable_name_json WHERE dt=20250526;The following result is returned:
+------------+------------+------------+ | action | time | dt | +------------+------------+------------+ | test | 1627273823 | 20250526 | | Close | 1469679568 | 20250526 | | Close | 1469679568 | 20250526 | +------------+------------+------------+
Example 2: Read values from JSON-formatted fields
Prepare the data
Create the JSON data file events.json:
{"a":{"x":1, "y":2}, "id":"123"}
{"a":{"x":3, "y":4}, "id":"345"}Log on to the OSS console and upload the test data to the external-table-test/json-struct/ folder in an OSS bucket. For more information, see Upload files.
Method 1: Create a TEXTFILE external table and use the get_json_object function to read field values
Create a TEXTFILE external table that contains only one column of the
stringtype:CREATE EXTERNAL TABLE extable_json_test01 ( col STRING ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\n' STORED AS textfile LOCATION 'oss://oss-cn-hangzhou-internal.aliyuncs.com/external-table-test/json_struct/'; SELECT * FROM extable_json_test01;The following result is returned:
+------------------------------------+ | col | +------------------------------------+ | {"a": {"x": 1, "y": 2},"id":"123"} | | {"a": {"x": 3, "y": 4},"id":"345"} | +------------------------------------+You can use the
get_json_objectfunction to read theaandidfields:SELECT get_json_object(col, '$.a') AS a, get_json_object(col, '$.id') AS id FROM extable_json_test01;The following result is returned:
+-------------------+-----+ | a | id | +-------------------+-----+ | {"x":1,"y":2} | 123 | | {"x":3,"y":4} | 345 | +-------------------+-----+You can read the nested fields
x,y, andid:SELECT get_json_object(get_json_object(col,'$.a'),'$.x') AS x, get_json_object(get_json_object(col,'$.a'),'$.y') AS y, get_json_object(col,'$.id') AS id FROM extable_json_test01;The following result is returned:
+---+---+-----+ | x | y | id | +---+---+-----+ | 1 | 2 |123 | | 3 | 4 |345 | +---+---+-----+
Method 2: Create a JSON external table and use the STRUCT type to define the data structure
Create a JSON-formatted external table and use the
STRUCTtype to define nested fields:CREATE EXTERNAL TABLE extable_json_test02 ( a STRUCT<x: BIGINT, y: BIGINT>, id STRING ) ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe' STORED AS textfile LOCATION 'oss://oss-cn-hangzhou-internal.aliyuncs.com/external-table-test/json_struct/';You can directly query the table content:
SELECT * FROM extable_json_test02;The following result is returned:
+----------+-----+ | a | id | +----------+-----+ | {x:1, y:2}|123 | | {x:3, y:4}|345 | +----------+-----+You can also use the
get_json_objectandTO_JSONfunctions to read thexandyfields:SELECT get_json_object(TO_JSON(a), '$.x') AS x, get_json_object(TO_JSON(a), '$.y') AS y, id FROM extable_json_test02;The following result is returned:
+---+---+-----+ | x | y | id | +---+---+-----+ | 1 | 2 |123 | | 3 | 4 |345 | +---+---+-----+
Example 3: Set prefixes, suffixes, and extensions for output OSS files
Set the custom prefix of the output OSS file to
test06_. The DDL statement is as follows:CREATE EXTERNAL TABLE <mc_oss_extable_name> ( vehicleId INT, recordId INT, patientId INT, calls INT, locationLatitute DOUBLE, locationLongitude DOUBLE, recordTime STRING, direction STRING ) ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe' WITH serdeproperties ( 'odps.properties.rolearn'='acs:ram::<uid>:role/aliyunodpsdefaultrole' ) STORED AS textfile LOCATION 'oss://oss-cn-beijing-internal.aliyuncs.com/***/' TBLPROPERTIES ( -- Set a custom prefix. 'odps.external.data.output.prefix'='test06_') ; -- Write data to the external table. INSERT INTO <mc_oss_extable_name> VALUES (1,32,76,1,63.32106,-92.08174,'9/14/2014 0:10','NW');The following figure shows the generated file.

Set the custom suffix of the output OSS file to
_beijing. The DDL statement is as follows:CREATE EXTERNAL TABLE <mc_oss_extable_name> ( vehicleId INT, recordId INT, patientId INT, calls INT, locationLatitute DOUBLE, locationLongitude DOUBLE, recordTime STRING, direction STRING ) ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe' WITH serdeproperties ( 'odps.properties.rolearn'='acs:ram::<uid>:role/aliyunodpsdefaultrole' ) STORED AS textfile LOCATION 'oss://oss-cn-beijing-internal.aliyuncs.com/***/' TBLPROPERTIES ( -- Set a custom suffix. 'odps.external.data.output.suffix'='_beijing') ; -- Write data to the external table. INSERT INTO <mc_oss_extable_name> VALUES (1,32,76,1,63.32106,-92.08174,'9/14/2014 0:10','NW');The following figure shows the generated file.

An extension is automatically generated for the output OSS file. The DDL statement is as follows:
CREATE EXTERNAL TABLE <mc_oss_extable_name> ( vehicleId INT, recordId INT, patientId INT, calls INT, locationLatitute DOUBLE, locationLongitude DOUBLE, recordTime STRING, direction STRING ) ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe' WITH serdeproperties ( 'odps.properties.rolearn'='acs:ram::<uid>:role/aliyunodpsdefaultrole' ) STORED AS textfile LOCATION 'oss://oss-cn-beijing-internal.aliyuncs.com/***/' TBLPROPERTIES ( -- Automatically generate an extension. 'odps.external.data.enable.extension'='true') ; -- Write data to the external table. INSERT INTO <mc_oss_extable_name> VALUES (1,32,76,1,63.32106,-92.08174,'9/14/2014 0:10','NW');The figure below shows the generated file.
Set the custom extension of the output OSS file to
jsonl. The DDL statement is as follows:CREATE EXTERNAL TABLE <mc_oss_extable_name> ( vehicleId INT, recordId INT, patientId INT, calls INT, locationLatitute DOUBLE, locationLongitude DOUBLE, recordTime STRING, direction STRING ) ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe' WITH serdeproperties ( 'odps.properties.rolearn'='acs:ram::<uid>:role/aliyunodpsdefaultrole' ) STORED AS textfile LOCATION 'oss://oss-cn-beijing-internal.aliyuncs.com/***/' TBLPROPERTIES ( -- Set a custom extension. 'odps.external.data.output.explicit.extension'='jsonl') ; -- Write data to the external table. INSERT INTO <mc_oss_extable_name> VALUES (1,32,76,1,63.32106,-92.08174,'9/14/2014 0:10','NW');The following figure shows the generated file.

Set the prefix of the output OSS file to
mc_, the suffix to_beijing, and the extension tojsonl. The DDL statement is as follows:CREATE EXTERNAL TABLE <mc_oss_extable_name> ( vehicleId INT, recordId INT, patientId INT, calls INT, locationLatitute DOUBLE, locationLongitude DOUBLE, recordTime STRING, direction STRING ) ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe' WITH serdeproperties ( 'odps.properties.rolearn'='acs:ram::<uid>:role/aliyunodpsdefaultrole' ) STORED AS textfile LOCATION 'oss://oss-cn-beijing-internal.aliyuncs.com/***/' TBLPROPERTIES ( -- Set a custom prefix. 'odps.external.data.output.prefix'='mc_', -- Set a custom suffix. 'odps.external.data.output.suffix'='_beijing', -- Set a custom extension. 'odps.external.data.output.explicit.extension'='jsonl') ; -- Write data to the external table. INSERT INTO <mc_oss_extable_name> VALUES (1,32,76,1,63.32106,-92.08174,'9/14/2014 0:10','NW');The following figure shows the generated file.
