All Products
Search
Document Center

MaxCompute:JSON External Tables

Last Updated:Feb 12, 2026

This topic describes how to create, read from, and write to JSON-formatted external tables in OSS.

Applicability

  • OSS external tables do not support the cluster property.

  • The size of a single file cannot exceed 2 GB. You must split files that are larger than 2 GB.

  • MaxCompute and OSS must be in the same region.

  • Supported data types.

  • Schema evolution support.

Create an External Table

Syntax

When you read JSON data, if a JSON file has fewer columns than what is defined in the external table, MaxCompute populates the missing column values with NULL. If the file has more columns, MaxCompute discards the extra columns.

Streamlined syntax structure

CREATE EXTERNAL TABLE <mc_oss_extable_name>
(
  <col_name> <data_type>,
  ...
)
[COMMENT <table_comment>]
[PARTITIONED BY (<col_name> <data_type>, ...)]
ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
STORED AS textfile
LOCATION '<oss_location>';

Full Syntax

CREATE EXTERNAL TABLE <mc_oss_extable_name>
(
  <col_name> <data_type>,
  ...
)
[COMMENT <table_comment>]
[PARTITIONED BY (<col_name> <data_type>, ...)]
ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
  [WITH serdeproperties (
    ['<property_name>'='<property_value>',...])
  ]
STORED AS textfile
LOCATION '<oss_location>'
[tblproperties ('<tbproperty_name>'='<tbproperty_value>',...)];

Common Parameters

For more information, see Common syntax parameters.

Table-Specific Parameters

tblproperties Parameters

property_name

Use case

Description

property_value

Default

mcfed.mapreduce.output.fileoutputformat.compress

Add this property when you want to compress TEXTFILE data files before writing them to OSS.

TEXTFILE compression property. Set this parameter to True to enable compression. If set to False, no compression occurs.

  • True

  • False

False

mcfed.mapreduce.output.fileoutputformat.compress.codec

Add this property when you want to compress TEXTFILE data files before writing them to OSS.

TEXTFILE compression property. Specifies the compression codec for TEXTFILE data files. The default output format is .deflate.

Note: Only the compression codecs listed in property_value are supported.

  • com.hadoop.compression.lzo.LzoCodec

  • com.hadoop.compression.lzo.LzopCodec

  • org.apache.hadoop.io.compress.SnappyCodec

  • com.aliyun.odps.io.compress.SnappyRawCodec

  • org.apache.hadoop.io.compress.BZip2Codec

  • org.apache.hadoop.io.compress.Lz4Codec

  • org.apache.hadoop.io.compress.DeflateCodec

  • org.apache.hadoop.io.compress.GzipCodec

org.apache.hadoop.io.compress.DeflateCodec

odps.external.data.output.prefix

(Backward compatible with odps.external.data.prefix)

Add this property when you want to add a custom prefix to output files.

  • Must contain only letters, digits, or underscores (a–z, A–Z, 0–9, _).

  • Length must be between 1 and 10 characters.

A valid combination of allowed characters, such as mc_.

None

odps.external.data.enable.extension

Add this property when you want to display file extensions in output filenames.

If set to True, output files include extensions. If set to False, extensions are omitted.

  • True

  • False

False

odps.external.data.output.suffix

Add this property when you want to add a custom suffix to output files.

Must contain only letters, digits, or underscores (a–z, A–Z, 0–9, _).

A valid combination of allowed characters, such as _hangzhou.

None

odps.external.data.output.explicit.extension

Add this property when you want to specify a custom file extension for output files.

  • Must contain only letters, digits, or underscores (a–z, A–Z, 0–9, _).

  • Length must be between 1 and 10 characters.

  • Takes precedence over odps.external.data.enable.extension.

A valid combination of allowed characters, such as jsonl.

None

odps.text.option.bad.row.skipping

Add this property when you want to skip dirty data in OSS data files.

When MaxCompute reads OSS data files, you can choose whether to skip dirty data.

  • rigid: Skips dirty rows unconditionally. Cannot be overridden by session- or project-level settings.

  • flexible: Enables flexible skipping at the data layer, but it can also be overwritten by configurations at the session or project layer.

When you create a JSON external table, if a field contains a nested JSON object, do not define its data type as STRING or JSON. Otherwise, MaxCompute cannot automatically parse the subfields.

You can use one of the two approaches described in the following examples:

  • Define the field as STRING. Then, use functions such as get_json_object during queries to extract subfield values.

  • Use the STRUCT type to define the field, mapping each subfield of the JSON object to a separate column in the table. You can directly access the data using the field_name.subfield_name syntax.

Write Data

For more information about the syntax for writing data from MaxCompute to OSS, see Write data to OSS.

Query and Analysis

BadRowSkipping

If dirty data is present in your JSON data, you can use the BadRowSkipping feature to skip or retain the rows that contain errors. Enabling or disabling this feature does not change how the underlying data format is interpreted.

Parameter settings

  • Table-level parameter odps.text.option.bad.row.skipping

    • rigid: The skip logic must be executed and cannot be overwritten by session-level or project-level settings.

    • flexible: Enables flexible skipping at the data plane. This setting can be overwritten by session-level or project-level settings.

  • Session-level or project-level parameters

    • odps.sql.unstructured.text.bad.row.skipping: This parameter can overwrite the flexible table-level parameter but not the rigid table-level parameter.

      • on: Actively enabled. If this parameter is not configured for the table, it is automatically enabled.

      • off: Actively disabled. If the table-level parameter is set to flexible, this feature is disabled. In other cases, the table-level parameter takes precedence.

      • <null>/If the input is invalid: The behavior is determined only by the table-level configuration.

    • odps.sql.unstructured.text.bad.row.skipping.debug.num: Specifies the number of error results that can be printed to standard output (stdout) in Logview.

      • The maximum value is 1000.

      • If the value is less than or equal to 0, this feature is disabled.

      • If the input is invalid, this feature is disabled.

  • Interaction between session-level and table-level parameters

    tbl property

    session flag

    result

    rigid

    on

    On, forcibly enabled

    off

    <null>, an invalid value, or this parameter is not configured

    flexible

    on

    On

    off

    Off, disabled by the session

    <null>, an invalid value, or this parameter is not configured

    On

    Not configured

    on

    Status: Session enabled

    off

    Off

    <null>, an invalid value, or this parameter is not configured

Examples

  1. Prepare the data.

    Upload the test file json_bad_row_skipping.json, which contains dirty data, to the oss-mc-test/badrow/ path in your OSS bucket.

  2. Create a JSON external table.

    The behavior differs based on the table-level parameter and the session-level flag. The following three cases are possible:

    • Table-level parameter: odps.text.option.bad.row.skipping = flexible/rigid/<not set>

    • Session-level flag: odps.sql.unstructured.text.bad.row.skipping = on/off/<not set>

    No table-level parameter set

    -- No table-level parameter set. Errors occur as expected. Session flag controls behavior.
    CREATE EXTERNAL TABLE test_json_bad_data_skipping_flag
    (
      a INT,
      b INT
    )
    row format serde 'org.apache.hive.hcatalog.data.JsonSerDe'
    stored AS textfile
    location '<oss://databucketpath>';

    Table-level parameter skips errors. Session-level flag can disable it.

    -- Table-level parameter skips errors, but session-level flag can override it.
    CREATE EXTERNAL TABLE test_json_bad_data_skipping_flexible
    (
      a INT,
      b INT
    )
    row format serde 'org.apache.hive.hcatalog.data.JsonSerDe'
    stored AS textfile
    location '<oss://databucketpath>'
    tblproperties (
      'odps.text.option.bad.row.skipping' = 'flexible'   -- Elastic mode. Can be disabled by session flag.
    );

    Table-level parameter forces error skipping. Session-level flag cannot disable it.

    -- Table-level parameter forces error skipping. Session-level flag cannot override it.
    CREATE EXTERNAL TABLE test_json_bad_data_skipping_rigid
    (
      a INT,
      b INT
    )
    row format serde 'org.apache.hive.hcatalog.data.JsonSerDe'
    stored AS textfile
    location '<oss://databucketpath>'
    tblproperties (
      'odps.text.option.bad.row.skipping' = 'rigid'  -- Forced mode.
    );
  3. Verify the query results.

    No table-level parameter set

    -- Enable at session level.
    SET odps.sql.unstructured.text.bad.row.skipping=on;
    
    -- Disable at session level. If table is set to 'flexible', it disables skipping. If table is set to 'rigid', it has no effect.
    SET odps.sql.unstructured.text.bad.row.skipping=off;
    
    -- Print up to 10 problematic rows to Logview Stdout. Values ≤ 0 disable printing.
    SET odps.sql.unstructured.text.bad.row.skipping.debug.num=10;
    
    SELECT * FROM test_json_bad_data_skipping_flag;

    The query fails and returns the following error: FAILED: ODPS-0123131:User defined function exception

    Table-level parameter skips errors. Session-level flag can disable it.

    -- Enable at session level.
    SET odps.sql.unstructured.text.bad.row.skipping=on;
    
    -- Disable at session level. If table is set to 'flexible', it disables skipping. If table is set to 'rigid', it has no effect.
    SET odps.sql.unstructured.text.bad.row.skipping=off;
    
    -- Print up to 10 problematic rows to Logview Stdout. Values ≤ 0 disable printing.
    SET odps.sql.unstructured.text.bad.row.skipping.debug.num=10;
    
    SELECT * FROM test_json_bad_data_skipping_flexible;

    The query fails and returns the following error: FAILED: ODPS-0123131:User defined function exception

    Table-level parameter forces error skipping. Session-level flag cannot disable it.

    -- Enable at session level.
    SET odps.sql.unstructured.text.bad.row.skipping=on;
    
    -- Disable at session level. If table is set to 'flexible', it disables skipping. If table is set to 'rigid', it has no effect.
    SET odps.sql.unstructured.text.bad.row.skipping=off;
    
    -- Print up to 10 problematic rows to Logview Stdout. Values ≤ 0 disable printing.
    SET odps.sql.unstructured.text.bad.row.skipping.debug.num=10;
    
    SELECT * FROM test_json_bad_data_skipping_rigid;

    The query returns the following result:

    +------------+------------+
    | a          | b          | 
    +------------+------------+
    | 1          | 2          | 
    | 15         | 16         | 
    +------------+------------+

Examples

Prerequisites

  1. You have created a MaxCompute project.

  2. You have prepared an OSS bucket and OSS directory. For more information, see Create a bucket and Manage folders.

    Because MaxCompute is available only in specific regions, cross-region connectivity issues may occur. We recommend using an OSS bucket in the same region as your MaxCompute project.
  3. Authorization

    1. You have permissions to access OSS. An Alibaba Cloud account (primary account), Resource Access Management (RAM) user, or RAM role can access OSS external tables. For authorization details, see STS-mode authorization for OSS.

    2. You have CreateTable permissions in the MaxCompute project. For details about table operation permissions, see MaxCompute permissions.

Example 1: Create, write to, and query a JSON external table

This example shows how to create a JSON external table using the built-in open-source JSON parser, and then write data to and query data from OSS.

  1. Prepare the data.

    Log on to the OSS console and upload the test file json2025.txt to the external-table-test/json/dt=20250521/ directory in your OSS bucket. For more information, see Upload OSS files.

  2. Create a JSON external table.

    CREATE EXTERNAL TABLE mc_oss_extable_name_json
    (
      action STRING,
      time STRING
    )
    PARTITIONED BY (dt STRING)
    ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
    WITH serdeproperties (
      'odps.properties.rolearn'='acs:ram::<uid>:role/aliyunodpsdefaultrole'
    )
    STORED AS textfile
    LOCATION 'oss://oss-cn-hangzhou-internal.aliyuncs.com/external-table-test/json/';
  3. Add partitions. If your OSS external table is partitioned, you must run this command to add the existing partitions. For more information, see Add partitions to an OSS external table.

    -- Add partitions.
    MSCK REPAIR TABLE mc_oss_extable_name_json ADD PARTITIONS;
  4. Read the JSON external table.

    SELECT * FROM mc_oss_extable_name_json WHERE dt=20250526;

    The query returns the following result:

    +------------+------------+------------+
    | action     | time       | dt         |
    +------------+------------+------------+
    | Close      | 1469679568 | 20250526   |
    | Close      | 1469679568 | 20250526   |
    +------------+------------+------------+
  5. Write to the JSON external table.

    INSERT INTO mc_oss_extable_name_json PARTITION (dt='20250526') VALUES ('test','1627273823');
  6. Check the written data.

    SELECT * FROM mc_oss_extable_name_json WHERE dt=20250526;

    The query returns the following result:

    +------------+------------+------------+
    | action     | time       | dt         |
    +------------+------------+------------+
    | test       | 1627273823 | 20250526   |
    | Close      | 1469679568 | 20250526   |
    | Close      | 1469679568 | 20250526   |
    +------------+------------+------------+

Example 2: Read JSON field values

Prepare the data

Create a JSON file named events.json with the following content:

{"a":{"x":1, "y":2}, "id":"123"}
{"a":{"x":3, "y":4}, "id":"345"}

Log on to the OSS console and upload the test file to the external-table-test/json-sturct/ directory in your OSS bucket. For more information, see Upload OSS files.

Method 1: Create a TEXTFILE external table and use the get_json_object function to read field values

  1. Create a TEXTFILE external table with a single string column:

    CREATE EXTERNAL TABLE extable_json_test01 (
      col STRING
    )
    ROW FORMAT DELIMITED FIELDS TERMINATED BY '\n'
    STORED AS textfile
    LOCATION 'oss://oss-cn-hangzhou-internal.aliyuncs.com/external-table-test/json_struct/';  
    
    SELECT * FROM extable_json_test01;      

    The query returns the following result:

    +------------------------------------+
    |               col                  |
    +------------------------------------+
    | {"a": {"x": 1, "y": 2},"id":"123"} |
    | {"a": {"x": 3, "y": 4},"id":"345"} |
    +------------------------------------+
  2. Use the get_json_object function to read the a and id fields:

    SELECT 
        get_json_object(col, '$.a') AS a,
        get_json_object(col, '$.id') AS id
    FROM extable_json_test01;         

    The query returns the following result:

    +-------------------+-----+
    |        a          | id  |
    +-------------------+-----+
    | {"x":1,"y":2}     | 123 |
    | {"x":3,"y":4}     | 345 |
    +-------------------+-----+
  3. Read the nested fields x, y, and id:

    SELECT 
        get_json_object(get_json_object(col,'$.a'),'$.x') AS x,
        get_json_object(get_json_object(col,'$.a'),'$.y') AS y,
        get_json_object(col,'$.id') AS id
    FROM extable_json_test01;          

    The query returns the following result:

    +---+---+-----+
    | x | y | id  |
    +---+---+-----+
    | 1 | 2 |123  |
    | 3 | 4 |345  |
    +---+---+-----+       

Method 2: Create a JSON external table and use the STRUCT type to handle nested fields

  1. Create a JSON external table that uses the STRUCT type to represent nested fields:

    CREATE EXTERNAL TABLE extable_json_test02
    (
      a STRUCT<x: BIGINT, y: BIGINT>,
      id STRING
    )
    ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
    STORED AS textfile
    LOCATION 'oss://oss-cn-hangzhou-internal.aliyuncs.com/external-table-test/json_struct/';          
  2. Query the table directly:

    SELECT * FROM extable_json_test02;

    The query returns the following result:

    +----------+-----+
    |    a     | id  |
    +----------+-----+
    | {x:1, y:2}|123 |
    | {x:3, y:4}|345 |
    +----------+-----+
  3. Use the get_json_object and TO_JSON functions to read the x and y fields:

    SELECT 
        get_json_object(TO_JSON(a), '$.x') AS x,
        get_json_object(TO_JSON(a), '$.y') AS y,
        id
    FROM extable_json_test02;         

    The query returns the following result:

    +---+---+-----+
    | x | y | id  |
    +---+---+-----+
    | 1 | 2 |123  |
    | 3 | 4 |345  |
    +---+---+-----+       

Example 3: Set custom prefixes, suffixes, and file extensions for OSS output files

  1. Set a custom prefix test06_ for the output files. The DDL is as follows:

    CREATE EXTERNAL TABLE  <mc_oss_extable_name>
    (
      vehicleId INT,
      recordId INT,
      patientId INT,
      calls INT,
      locationLatitute DOUBLE,
      locationLongitude DOUBLE,
      recordTime STRING,
      direction STRING
    )
    ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
    WITH serdeproperties (
      'odps.properties.rolearn'='acs:ram::<uid>:role/aliyunodpsdefaultrole'
    ) 
    STORED AS textfile
    LOCATION 'oss://oss-cn-beijing-internal.aliyuncs.com/***/'
    TBLPROPERTIES (
    -- Set custom prefix.
        'odps.external.data.output.prefix'='test06_') 
    ;
    
    -- Write data to the external table.
    INSERT INTO  <mc_oss_extable_name> VALUES (1,32,76,1,63.32106,-92.08174,'9/14/2014 0:10','NW');

    The output files are displayed in the following figure:

    image

  2. Set a custom suffix _beijing for the output files. The DDL is as follows:

    CREATE EXTERNAL TABLE <mc_oss_extable_name>
    (
      vehicleId INT,
      recordId INT,
      patientId INT,
      calls INT,
      locationLatitute DOUBLE,
      locationLongitude DOUBLE,
      recordTime STRING,
      direction STRING
    )
    ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
    WITH serdeproperties (
      'odps.properties.rolearn'='acs:ram::<uid>:role/aliyunodpsdefaultrole'
    ) 
    STORED AS textfile
    LOCATION 'oss://oss-cn-beijing-internal.aliyuncs.com/***/'
    TBLPROPERTIES (
    -- Set custom suffix.
        'odps.external.data.output.suffix'='_beijing') 
    ;
    
    -- Write data to the external table.
    INSERT INTO <mc_oss_extable_name> VALUES (1,32,76,1,63.32106,-92.08174,'9/14/2014 0:10','NW');

    The output files are displayed in the following figure:

    image

  3. You can enable automatic file extensions for output files. DDL:

    CREATE EXTERNAL TABLE <mc_oss_extable_name>
    (
      vehicleId INT,
      recordId INT,
      patientId INT,
      calls INT,
      locationLatitute DOUBLE,
      locationLongitude DOUBLE,
      recordTime STRING,
      direction STRING
    )
    ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
    WITH serdeproperties (
      'odps.properties.rolearn'='acs:ram::<uid>:role/aliyunodpsdefaultrole'
    ) 
    STORED AS textfile
    LOCATION 'oss://oss-cn-beijing-internal.aliyuncs.com/***/'
    TBLPROPERTIES (
    -- Enable automatic file extensions.
        'odps.external.data.enable.extension'='true') 
    ;
    
    -- Write data to the external table.
    INSERT INTO <mc_oss_extable_name> VALUES (1,32,76,1,63.32106,-92.08174,'9/14/2014 0:10','NW');

    The output files appear as shown in the following image:

  4. Set a custom file extension jsonl for the output files. The DDL is as follows:

    CREATE EXTERNAL TABLE <mc_oss_extable_name>
    (
      vehicleId INT,
      recordId INT,
      patientId INT,
      calls INT,
      locationLatitute DOUBLE,
      locationLongitude DOUBLE,
      recordTime STRING,
      direction STRING
    )
    ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
    WITH serdeproperties (
      'odps.properties.rolearn'='acs:ram::<uid>:role/aliyunodpsdefaultrole'
    ) 
    STORED AS textfile
    LOCATION 'oss://oss-cn-beijing-internal.aliyuncs.com/***/'
    TBLPROPERTIES (
    -- Set custom file extension.
       'odps.external.data.output.explicit.extension'='jsonl') 
    ;
    
    -- Write data to the external table.
    INSERT INTO <mc_oss_extable_name> VALUES (1,32,76,1,63.32106,-92.08174,'9/14/2014 0:10','NW');

    The output files are displayed in the following figure:

    image.png

  5. Set a custom prefix mc_, a suffix _beijing, and an extension jsonl for the output files. The DDL is as follows:

    CREATE EXTERNAL TABLE <mc_oss_extable_name>
    (
      vehicleId INT,
      recordId INT,
      patientId INT,
      calls INT,
      locationLatitute DOUBLE,
      locationLongitude DOUBLE,
      recordTime STRING,
      direction STRING
    )
    ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
    WITH serdeproperties (
      'odps.properties.rolearn'='acs:ram::<uid>:role/aliyunodpsdefaultrole'
    ) 
    STORED AS textfile
    LOCATION 'oss://oss-cn-beijing-internal.aliyuncs.com/***/'
    TBLPROPERTIES (
        -- Set custom prefix.
        'odps.external.data.output.prefix'='mc_', 
        -- Set custom suffix.
        'odps.external.data.output.suffix'='_beijing', 
        -- Set custom file extension.
        'odps.external.data.output.explicit.extension'='jsonl') 
    ;  
    
    -- Write data to the external table.
    INSERT INTO <mc_oss_extable_name> VALUES (1,32,76,1,63.32106,-92.08174,'9/14/2014 0:10','NW');

    The output files are displayed in the following figure:

    image.png

Supported Data Types

For more information about MaxCompute data types, see Data types version 1.0 and Data types version 2.0.

Data type

Supported

Data type

Supported?

TINYINT

Supported

STRING

Supported

SMALLINT

Supported

DATE

Supported

INT

Supported

DATETIME

Not supported

BIGINT

Supported

TIMESTAMP

Not supported

BINARY

Not supported

TIMESTAMP_NTZ

Supported

FLOAT

Supported

BOOLEAN

Supported

DOUBLE

Supported

ARRAY

Supported

DECIMAL(precision,scale)

Supported

MAP

Supported

VARCHAR(n)

Supported

STRUCT

Supported

CHAR(n)

Supported

JSON

Not supported

Schema Evolution Support

JSON external tables map column values by name between the table schema and the columns in the JSON file.

The Data compatibility notes column in the following table describes whether an external table with an evolved schema can read both new data that matches the updated schema and historical data that does not.

Operation Type

Supported

Description

Data compatibility notes

Add column

Supported

  • New columns have no guaranteed order. They always appear last.

  • Default values for new columns apply only to data written by MaxCompute.

  • New data matching the updated schema reads correctly.

  • The columns of the legacy Schema data are not modified. Instead, the table is read using the new Schema.

    Example: After adding a column, historical data lacks that column. When querying, the new column returns NULL for those rows.

Drop column

Supported

JSON external tables map column values by name.

Compatible.

Modify Column Order

Supported

JSON external tables map column values by name.

Compatible.

Change column data type

Supported

See Change column data type for supported conversions.

Compatible.

Modify column name

Not supported

We do not recommend renaming columns. Because JSON external tables map by name, renaming breaks alignment between JSON file keys and the table schema. This causes query failures.

  • New data matching the updated schema reads correctly.

  • If you do not perform a column modification operation on existing data that uses the old schema, the table is read using the new schema.

    Example: After renaming a column, if the JSON file still uses the old key name, the renamed column returns NULL.

Modify column comment

Supported

Comments must be valid strings up to 1024 bytes. Invalid comments cause errors.

Compatible.

Modify column nullability

Not supported

This operation is not supported. All columns are nullable by default.

Not applicable.