All Products
Search
Document Center

MaxCompute:JSON external tables

Last Updated:Sep 20, 2025

This topic describes how to create, read from, and write to JSON-formatted external tables in Object Storage Service (OSS).

Prerequisites

  • Ensure your Alibaba Cloud account, RAM user, or RAM role has the required permissions to access OSS external tables. For more information about authorization, see STS authorization for OSS.

  • (Optional) Prepare an OSS bucket, directories, and data files. For more information, see Create buckets, Manage directories, and Simple upload.

    MaxCompute can automatically create directories in OSS. You can use a single SQL statement to read from or write to an external table that uses a UDF. Manual directory creation is no longer required, but the legacy method is still supported.
  • A MaxCompute project has been created. For more information, see Create a MaxCompute project.

    MaxCompute is deployed only in specific regions. To prevent cross-region data connectivity issues, we recommend that you use a bucket in the same region as your MaxCompute project.
  • The Alibaba Cloud account or RAM user has the CreateTable permission on your project. For more information about table operation permissions, see MaxCompute permissions.

Limits

  • The cluster attribute is not supported for OSS external tables.

  • A single file cannot exceed 3 GB. If a file exceeds this limit, we recommend that you split it.

Usage notes

If the file contains more columns than the external table, the extra columns are discarded.

Data type support

For more information about MaxCompute data types, see Data types (V1.0) and Data types (V2.0).

Data type

Supported

Data type

Supported

TINYINT

Supported

STRING

Supported

SMALLINT

Supported

DATE

Supported

INT

Supported

DATETIME

Supported

BIGINT

Supported

TIMESTAMP

Not supported

BINARY

Not supported

TIMESTAMP_NTZ

Supported

FLOAT

Supported

BOOLEAN

Supported

DOUBLE

Supported

ARRAY

Not supported

DECIMAL(precision,scale)

Supported

MAP

Not supported

VARCHAR(n)

Supported

STRUCT

Supported

CHAR(n)

Supported

JSON

Not supported

Create an external table

Syntax

Simplified syntax

CREATE EXTERNAL TABLE <mc_oss_extable_name>
(
  <col_name> <data_type>,
  ...
)
[COMMENT <table_comment>]
[PARTITIONED BY (<col_name> <data_type>, ...)]
ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
STORED AS textfile
LOCATION '<oss_location>';

Complete syntax

CREATE EXTERNAL TABLE <mc_oss_extable_name>
(
  <col_name> <data_type>,
  ...
)
[COMMENT <table_comment>]
[PARTITIONED BY (<col_name> <data_type>, ...)]
ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
  [WITH serdeproperties (
    ['<property_name>'='<property_value>',...])
  ]
STORED AS textfile
LOCATION '<oss_location>'
[tblproperties ('<tbproperty_name>'='<tbproperty_value>',...)];

Common parameters

For more information, see Basic syntax parameters.

Unique parameters

tblproperties parameters

property_name

Scenario

Description

property_value

Default value

mcfed.mapreduce.output.fileoutputformat.compress

Specifies whether to compress TEXTFILE data when writing to OSS.

TEXTFILE compression property. Specifies whether to compress TEXTFILE data files when writing to OSS. If you set this property to True, MaxCompute writes TEXTFILE data files to OSS in a compressed format. Otherwise, the files are not compressed.

  • True

  • False

False

mcfed.mapreduce.output.fileoutputformat.compress.codec

Specifies whether to compress TEXTFILE data when writing to OSS.

TEXTFILE compression property. Specifies the compression method for TEXTFILE data files. By default, files are compressed in the .deflate format.

Note: Only the compression methods in property_value are supported.

  • com.hadoop.compression.lzo.LzoCodec

  • com.hadoop.compression.lzo.LzopCodec

  • org.apache.hadoop.io.compress.SnappyCodec

  • com.aliyun.odps.io.compress.SnappyRawCodec

  • org.apache.hadoop.io.compress.BZip2Codec

  • org.apache.hadoop.io.compress.Lz4Codec

  • org.apache.hadoop.io.compress.DeflateCodec

  • org.apache.hadoop.io.compress.GzipCodec

org.apache.hadoop.io.compress.DeflateCodec

odps.external.data.output.prefix

(Compatible with odps.external.data.prefix)

Specifies a custom prefix for the output filename.

  • The prefix can contain only letters (a-z and A-Z), digits (0-9), and underscores (_).

  • The prefix must be 1 to 10 characters in length.

A string that meets the requirements, such as 'mc_'.

None

odps.external.data.enable.extension

Specifies whether to display the filename extension of the output file.

If you set this property to True, the extension of the output file is displayed. Otherwise, the extension is not displayed.

  • True

  • False

False

odps.external.data.output.suffix

Specifies a custom suffix for the output filename.

The suffix can contain only letters (a-z and A-Z), digits (0-9), and underscores (_).

A string that meets the requirements, such as '_hangzhou'.

None

odps.external.data.output.explicit.extension

Specifies a custom extension for the output file.

  • The extension can contain only letters (a-z and A-Z), digits (0-9), and underscores (_).

  • The extension must be 1 to 10 characters in length.

  • This property has a higher priority than the odps.external.data.enable.extension property.

A string that meets the requirements, such as "jsonl".

None

Usage notes

When you create a JSON external table, if a field contains a nested JSON object (the value of the field is a JSON struct), do not define the data type of this field as STRING or JSON. Otherwise, the system cannot automatically parse the subfields. You can use one of the following methods:

  • Define the field as STRING. When you query data, you can use functions such as get_json_object to extract the content of the internal subfields as needed.

  • You can use the STRUCT type to define the structure of the field. This maps the subfields of the JSON object to separate subcolumns in the table. You can then directly access the internal data using the field_name.subfield_name format.

For more information about these operations, see Example 2.

Write data

For more information about the syntax for writing data from MaxCompute to OSS, see Write data to OSS.

Query and analyze data

Examples

Example 1: Create, write to, and query a JSON external table

This example shows how to create a JSON-formatted external table using a built-in open-source data parser, write data to OSS, and query the data.

  1. Prepare the data.

    Log on to the OSS console and upload the test data file json2025.txt to the external-table-test/json/dt=20250521/ folder in an OSS bucket. For more information, see Upload files.

  2. Create a JSON-formatted external table.

    CREATE EXTERNAL TABLE mc_oss_extable_name_json
    (
      action STRING,
      time STRING
    )
    PARTITIONED BY (dt STRING)
    ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
    WITH serdeproperties (
      'odps.properties.rolearn'='acs:ram::<uid>:role/aliyunodpsdefaultrole'
    )
    STORED AS textfile
    LOCATION 'oss://oss-cn-hangzhou-internal.aliyuncs.com/external-table-test/json/';
  3. Discover and register partitions. If the OSS external table that you created is a partitioned table, you must run an additional command to register the partitions from your OSS directory. For more information, see Syntax for adding partition data to an OSS external table.

    -- Discover and register partitions.
    MSCK REPAIR TABLE mc_oss_extable_name_json ADD PARTITIONS;
  4. Read data from the JSON external table.

    SELECT * FROM mc_oss_extable_name_json WHERE dt=20250526;

    The following result is returned:

    +------------+------------+------------+
    | action     | time       | dt         |
    +------------+------------+------------+
    | Close      | 1469679568 | 20250526   |
    | Close      | 1469679568 | 20250526   |
    +------------+------------+------------+
  5. Write data to the JSON external table.

    INSERT INTO mc_oss_extable_name_json PARTITION (dt='20250526') VALUES ('test','1627273823');
  6. View the written data.

    SELECT * FROM mc_oss_extable_name_json WHERE dt=20250526;

    The following result is returned:

    +------------+------------+------------+
    | action     | time       | dt         |
    +------------+------------+------------+
    | test       | 1627273823 | 20250526   |
    | Close      | 1469679568 | 20250526   |
    | Close      | 1469679568 | 20250526   |
    +------------+------------+------------+

Example 2: Read values from JSON-formatted fields

Prepare the data

Create the JSON data file events.json:

{"a":{"x":1, "y":2}, "id":"123"}
{"a":{"x":3, "y":4}, "id":"345"}

Log on to the OSS console and upload the test data to the external-table-test/json-struct/ folder in an OSS bucket. For more information, see Upload files.

Method 1: Create a TEXTFILE external table and use the get_json_object function to read field values

  1. Create a TEXTFILE external table that contains only one column of the string type:

    CREATE EXTERNAL TABLE extable_json_test01 (
      col STRING
    )
    ROW FORMAT DELIMITED FIELDS TERMINATED BY '\n'
    STORED AS textfile
    LOCATION 'oss://oss-cn-hangzhou-internal.aliyuncs.com/external-table-test/json_struct/';  
    
    SELECT * FROM extable_json_test01;      

    The following result is returned:

    +------------------------------------+
    |               col                  |
    +------------------------------------+
    | {"a": {"x": 1, "y": 2},"id":"123"} |
    | {"a": {"x": 3, "y": 4},"id":"345"} |
    +------------------------------------+
  2. You can use the get_json_object function to read the a and id fields:

    SELECT 
        get_json_object(col, '$.a') AS a,
        get_json_object(col, '$.id') AS id
    FROM extable_json_test01;         

    The following result is returned:

    +-------------------+-----+
    |        a          | id  |
    +-------------------+-----+
    | {"x":1,"y":2}     | 123 |
    | {"x":3,"y":4}     | 345 |
    +-------------------+-----+
  3. You can read the nested fields x, y, and id:

    SELECT 
        get_json_object(get_json_object(col,'$.a'),'$.x') AS x,
        get_json_object(get_json_object(col,'$.a'),'$.y') AS y,
        get_json_object(col,'$.id') AS id
    FROM extable_json_test01;          

    The following result is returned:

    +---+---+-----+
    | x | y | id  |
    +---+---+-----+
    | 1 | 2 |123  |
    | 3 | 4 |345  |
    +---+---+-----+       

Method 2: Create a JSON external table and use the STRUCT type to define the data structure

  1. Create a JSON-formatted external table and use the STRUCT type to define nested fields:

    CREATE EXTERNAL TABLE extable_json_test02
    (
      a STRUCT<x: BIGINT, y: BIGINT>,
      id STRING
    )
    ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
    STORED AS textfile
    LOCATION 'oss://oss-cn-hangzhou-internal.aliyuncs.com/external-table-test/json_struct/';          
  2. You can directly query the table content:

    SELECT * FROM extable_json_test02;

    The following result is returned:

    +----------+-----+
    |    a     | id  |
    +----------+-----+
    | {x:1, y:2}|123 |
    | {x:3, y:4}|345 |
    +----------+-----+
  3. You can also use the get_json_object and TO_JSON functions to read the x and y fields:

    SELECT 
        get_json_object(TO_JSON(a), '$.x') AS x,
        get_json_object(TO_JSON(a), '$.y') AS y,
        id
    FROM extable_json_test02;         

    The following result is returned:

    +---+---+-----+
    | x | y | id  |
    +---+---+-----+
    | 1 | 2 |123  |
    | 3 | 4 |345  |
    +---+---+-----+       

Example 3: Set prefixes, suffixes, and extensions for output OSS files

  1. Set the custom prefix of the output OSS file to test06_. The DDL statement is as follows:

    CREATE EXTERNAL TABLE  <mc_oss_extable_name>
    (
      vehicleId INT,
      recordId INT,
      patientId INT,
      calls INT,
      locationLatitute DOUBLE,
      locationLongitude DOUBLE,
      recordTime STRING,
      direction STRING
    )
    ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
    WITH serdeproperties (
      'odps.properties.rolearn'='acs:ram::<uid>:role/aliyunodpsdefaultrole'
    ) 
    STORED AS textfile
    LOCATION 'oss://oss-cn-beijing-internal.aliyuncs.com/***/'
    TBLPROPERTIES (
    -- Set a custom prefix.
        'odps.external.data.output.prefix'='test06_') 
    ;
    
    -- Write data to the external table.
    INSERT INTO  <mc_oss_extable_name> VALUES (1,32,76,1,63.32106,-92.08174,'9/14/2014 0:10','NW');

    The following figure shows the generated file.

    image

  2. Set the custom suffix of the output OSS file to _beijing. The DDL statement is as follows:

    CREATE EXTERNAL TABLE <mc_oss_extable_name>
    (
      vehicleId INT,
      recordId INT,
      patientId INT,
      calls INT,
      locationLatitute DOUBLE,
      locationLongitude DOUBLE,
      recordTime STRING,
      direction STRING
    )
    ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
    WITH serdeproperties (
      'odps.properties.rolearn'='acs:ram::<uid>:role/aliyunodpsdefaultrole'
    ) 
    STORED AS textfile
    LOCATION 'oss://oss-cn-beijing-internal.aliyuncs.com/***/'
    TBLPROPERTIES (
    -- Set a custom suffix.
        'odps.external.data.output.suffix'='_beijing') 
    ;
    
    -- Write data to the external table.
    INSERT INTO <mc_oss_extable_name> VALUES (1,32,76,1,63.32106,-92.08174,'9/14/2014 0:10','NW');

    The following figure shows the generated file.

    image

  3. An extension is automatically generated for the output OSS file. The DDL statement is as follows:

    CREATE EXTERNAL TABLE <mc_oss_extable_name>
    (
      vehicleId INT,
      recordId INT,
      patientId INT,
      calls INT,
      locationLatitute DOUBLE,
      locationLongitude DOUBLE,
      recordTime STRING,
      direction STRING
    )
    ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
    WITH serdeproperties (
      'odps.properties.rolearn'='acs:ram::<uid>:role/aliyunodpsdefaultrole'
    ) 
    STORED AS textfile
    LOCATION 'oss://oss-cn-beijing-internal.aliyuncs.com/***/'
    TBLPROPERTIES (
    -- Automatically generate an extension.
        'odps.external.data.enable.extension'='true') 
    ;
    
    -- Write data to the external table.
    INSERT INTO <mc_oss_extable_name> VALUES (1,32,76,1,63.32106,-92.08174,'9/14/2014 0:10','NW');

    The figure below shows the generated file.

  4. Set the custom extension of the output OSS file to jsonl. The DDL statement is as follows:

    CREATE EXTERNAL TABLE <mc_oss_extable_name>
    (
      vehicleId INT,
      recordId INT,
      patientId INT,
      calls INT,
      locationLatitute DOUBLE,
      locationLongitude DOUBLE,
      recordTime STRING,
      direction STRING
    )
    ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
    WITH serdeproperties (
      'odps.properties.rolearn'='acs:ram::<uid>:role/aliyunodpsdefaultrole'
    ) 
    STORED AS textfile
    LOCATION 'oss://oss-cn-beijing-internal.aliyuncs.com/***/'
    TBLPROPERTIES (
    -- Set a custom extension.
       'odps.external.data.output.explicit.extension'='jsonl') 
    ;
    
    -- Write data to the external table.
    INSERT INTO <mc_oss_extable_name> VALUES (1,32,76,1,63.32106,-92.08174,'9/14/2014 0:10','NW');

    The following figure shows the generated file.

    image.png

  5. Set the prefix of the output OSS file to mc_, the suffix to _beijing, and the extension to jsonl. The DDL statement is as follows:

    CREATE EXTERNAL TABLE <mc_oss_extable_name>
    (
      vehicleId INT,
      recordId INT,
      patientId INT,
      calls INT,
      locationLatitute DOUBLE,
      locationLongitude DOUBLE,
      recordTime STRING,
      direction STRING
    )
    ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
    WITH serdeproperties (
      'odps.properties.rolearn'='acs:ram::<uid>:role/aliyunodpsdefaultrole'
    ) 
    STORED AS textfile
    LOCATION 'oss://oss-cn-beijing-internal.aliyuncs.com/***/'
    TBLPROPERTIES (
        -- Set a custom prefix.
        'odps.external.data.output.prefix'='mc_', 
        -- Set a custom suffix.
        'odps.external.data.output.suffix'='_beijing', 
        -- Set a custom extension.
        'odps.external.data.output.explicit.extension'='jsonl') 
    ;  
    
    -- Write data to the external table.
    INSERT INTO <mc_oss_extable_name> VALUES (1,32,76,1,63.32106,-92.08174,'9/14/2014 0:10','NW');

    The following figure shows the generated file.

    image.png