All Products
Search
Document Center

MaxCompute:JSON data type

Last Updated:Mar 26, 2026

Use the JSON type to store semi-structured data in MaxCompute. On write, MaxCompute automatically extracts a public schema and stores those fields in columnar format. On read, column pruning scans only the fields your query touches — giving you faster query performance and lower storage than STRING.

When to use the JSON type

Situation Recommendation
Data is semi-structured JSON; schema varies across rows Use the JSON type
Schema is fully fixed; all fields are always present Use typed columns — better type safety and broader SQL compatibility
You need ORDER BY, GROUP BY, or JOIN on the JSON column Use typed columns — these operations are not supported on JSON

How it works

When you insert JSON data, MaxCompute automatically extracts a public schema from the data and performs optimizations. Fields in the public schema are stored in columnar format. Fields that are not in the public schema are stored as BINARY.

For example, given three rows with fields a, b, and c:

INSERT INTO json_table
SELECT json_parse(string_val)
FROM   string_table;

MaxCompute extracts the public schema <"a":binary, "b":bigint, "c":bigint>. A subsequent query that reads only b and c scans just those columns:

SELECT json_val["b"], json_val["c"]
FROM   json_table;
-- Column pruning keeps only b and c.
+------+------+
| _c0  | _c1  |
+------+------+
| 2    | NULL |
| 2    | NULL |
| NULL | 3    |
+------+------+

JSON values are stored using MaxCompute internal types. Because of this mapping, values outside the BIGINT or DOUBLE range will overflow or lose precision.

Prerequisites

Before you begin, make sure you have:

  • A MaxCompute project with the JSON type enabled (see Enable the JSON type)

  • Java SDK V0.44.0 or later, or PyODPS V0.11.4.1 or later

  • If using odpscmd: version V0.46.5 or later, with use_instance_tunnel=false in conf\odps_config.ini

Enable the JSON type

The odps.sql.type.json.enable flag controls JSON type availability:

Project type Default value
New projects true
Existing projects false

To enable the JSON type in an existing project:

SET odps.sql.type.json.enable=true;

To check the current value:

setproject;

Limitations

Constraint reference

Category Constraint Implication
Table operations Cannot add a JSON column to an existing table Create a new table; migrate data with INSERT INTO ... SELECT
Table types Clustered tables and Delta Table type are not supported Use standard tables
SQL operations ORDER BY, GROUP BY, and JOIN keys on JSON columns are not supported Extract the value to a typed column first
SQL operations Comparison operations on JSON type are not supported Cast to STRING or a typed column before comparing
Nesting Maximum 20 levels deep Flatten deeply nested structures before ingestion
Engine compatibility Hologres cannot read JSON columns Keep a STRING copy if cross-engine reads are required
User-defined functions (UDFs) Java UDFs and Python UDFs do not support JSON type Use built-in JSON functions instead
Tooling Dataphin and other external ecosystems are not supported Verify compatibility before use

Type storage and precision

JSON values are mapped to MaxCompute internal types. Values outside these ranges cause overflow or precision loss:

JSON type Internal storage Note
NUMBER (integer part) BIGINT Overflow if outside BIGINT range
NUMBER (decimal part) DOUBLE Precision loss possible
STRING BINARY (non-public) or typed column \u0000 is not supported
BOOLEAN BOOLEAN
NULL json 'null' differs from SQL NULL
ARRAY ARRAY
OBJECT OBJECT

Tool and SDK requirements

Tool or SDK Requirement
odpscmd (MaxCompute client) V0.46.5 or later; set use_instance_tunnel=false in conf\odps_config.ini
MaxCompute Studio Supported
DataWorks Supported
Java SDK V0.44.0 or later
PyODPS V0.11.4.1 or later

Work with the JSON type

All examples below use a shared order record to show create, insert, and query patterns end-to-end:

{"id": 1001, "customer": "Molly", "amount": 299.50}

Create a JSON table

No schema definition is required — declare the column as JSON:

CREATE TABLE orders (record JSON);

Generate JSON data

From a JSON literal:

INSERT INTO orders VALUES (JSON '{"id": 1001, "customer": "Molly", "amount": 299.50}');

Using JSON_OBJECT and JSON_ARRAY:

-- JSON_OBJECT builds a JSON object from key-value pairs.
INSERT INTO orders SELECT JSON_OBJECT("id", 1002, "customer", "Frank", "amount", 150.00);

-- JSON_ARRAY builds a JSON array.
SELECT JSON_ARRAY("tag1", "tag2", "promo");
-- Returns: ["tag1","tag2","promo"]

By converting a STRING column:

Use json_parse to convert existing string data. Wrap it with json_valid to skip malformed rows:

INSERT INTO orders
SELECT json_parse(raw_json)
FROM   staging_table
WHERE  json_valid(raw_json);
CAST("abc" AS JSON) and json_parse("abc") behave differently for edge cases. See JSON functions for details.

Access JSON data

All examples below query the row inserted above: {"id": 1001, "customer": "Molly", "amount": 299.50}.

Index access

Index access uses strict mode — NULL is returned when the path does not match the data structure.

-- Returns 1001
SELECT record['id']
FROM   orders
WHERE  record['id'] IS NOT NULL;

-- Returns "Molly"
SELECT record['customer']
FROM   orders;

-- Returns NULL (field does not exist)
SELECT record['email']
FROM   orders;

Index access is equivalent to JSON_EXTRACT in strict mode:

-- These two expressions return the same result:
SELECT record['id']                          FROM orders;
SELECT JSON_EXTRACT(record, 'strict $.id')   FROM orders;
-- Both return: 1001

Access using JSON functions

Two functions are available:

Function Returns JSON Path parser
JSON_EXTRACT JSON type New standardized parser (PostgreSQL-compatible subset)
GET_JSON_OBJECT STRING type Legacy parser

Use JSON_EXTRACT in new SQL — its parser is consistent with the index accessor and supports column pruning in strict mode.

-- JSON_EXTRACT returns a JSON value (with quotes).
SELECT JSON_EXTRACT(record, '$.customer')
FROM   orders;
-- Returns: "Molly"

-- GET_JSON_OBJECT returns a STRING value (no quotes).
SELECT GET_JSON_OBJECT(record, '$.customer')
FROM   orders;
-- Returns: Molly

JSON Path reference

A JSON Path identifies a node in JSON data. The parser used by the JSON type is a subset of the PostgreSQL JSON Path specification.

Sample data for JSON Path examples:

{
  "name": "Molly",
  "phones": [
    { "phonetype": "work",  "phone#": "650-506-7000" },
    { "phonetype": "cell",  "phone#": "650-555-5555" }
  ]
}

Accessor syntax:

Accessor Example Description
Member $.name Access a field by name
Member (special chars) $."phone#" Use quotes for names with special characters
Wildcard member $.* All fields of an object
Element $.phones[1] Array element by index
Element range $.phones[0, 1] or $[1, 2, 4 to 7] Elements at specified indexes or a range
Wildcard element $.phones[*] All array elements

Modes:

JSON Path supports two modes. The default is lax.

Mode Behavior Column pruning
lax Automatically wraps scalars as arrays and unwraps arrays to objects when the path expects a different structure. Returns results permissively. Not supported
strict Returns NULL if the path does not match the actual data structure exactly. Supported

lax mode examples (using the sample data above):

Expression Result Reason
lax $.phones.phonetype ["work","cell"] Unwraps the phones array and reads phonetype from each object
lax $.phones[*].phonetype ["work","cell"] Direct wildcard element access
lax $.name[*] ["Molly"] Wraps the string "Molly" into an array
lax $.name.* NULL Expects an object under name, finds a string

strict mode examples:

Expression Result Reason
strict $.phones[1]."phone#" "650-555-5555" Exact path match
strict $.phones.phonetype NULL phones is an array; an object is expected
strict $.address NULL Field does not exist
Important

Use strict mode when you need column pruning. Lax mode does not support column pruning optimization.

Design considerations

  • Use strict mode for column pruning. Lax mode does not trigger column pruning optimization, so queries in lax mode scan more data.

  • Keep JSON documents small. Each JSON column is stored as a single column value. Large documents increase the memory and I/O cost per row.

  • Validate before ingesting. Use json_valid() to filter malformed rows before calling json_parse().

  • Check precision before inserting large numbers. JSON numbers are stored as BIGINT (integer part) and DOUBLE (decimal part). Numbers outside those ranges overflow.

  • Plan your schema before creating tables. You cannot add a JSON column to an existing table. Design the table with the JSON column from the start.

End-to-end example

-- Enable the JSON type if your project was created before the feature was enabled.
SET odps.sql.type.json.enable=true;

-- Create a JSON table.
CREATE TABLE orders (record JSON);

-- Ingest from a staging STRING table, skipping malformed rows.
CREATE TABLE staging (raw_json STRING);
INSERT INTO staging VALUES ('{"id": 1001, "customer": "Molly", "amount": 299.50}');

INSERT INTO orders
SELECT json_parse(raw_json)
FROM   staging
WHERE  json_valid(raw_json);

-- Query all non-null records.
SELECT * FROM orders WHERE record IS NOT NULL;
-- Returns:
-- +--------------------------------------------------+
-- | record                                           |
-- +--------------------------------------------------+
-- | {"id":1001,"customer":"Molly","amount":299.5}    |
-- +--------------------------------------------------+

-- Access a specific field.
SELECT record['customer'] FROM orders WHERE record IS NOT NULL;
-- Returns:
-- +-----------+
-- | _c0       |
-- +-----------+
-- | "Molly"   |
-- +-----------+

What's next

  • JSON functions — full reference for JSON_EXTRACT, GET_JSON_OBJECT, JSON_OBJECT, JSON_ARRAY, json_parse, json_valid, and related functions

  • CAST expressions — type conversion behavior for JSON