Use the JSON type to store semi-structured data in MaxCompute. On write, MaxCompute automatically extracts a public schema and stores those fields in columnar format. On read, column pruning scans only the fields your query touches — giving you faster query performance and lower storage than STRING.
When to use the JSON type
| Situation | Recommendation |
|---|---|
| Data is semi-structured JSON; schema varies across rows | Use the JSON type |
| Schema is fully fixed; all fields are always present | Use typed columns — better type safety and broader SQL compatibility |
You need ORDER BY, GROUP BY, or JOIN on the JSON column |
Use typed columns — these operations are not supported on JSON |
How it works
When you insert JSON data, MaxCompute automatically extracts a public schema from the data and performs optimizations. Fields in the public schema are stored in columnar format. Fields that are not in the public schema are stored as BINARY.
For example, given three rows with fields a, b, and c:
INSERT INTO json_table
SELECT json_parse(string_val)
FROM string_table;
MaxCompute extracts the public schema <"a":binary, "b":bigint, "c":bigint>. A subsequent query that reads only b and c scans just those columns:
SELECT json_val["b"], json_val["c"]
FROM json_table;
-- Column pruning keeps only b and c.
+------+------+
| _c0 | _c1 |
+------+------+
| 2 | NULL |
| 2 | NULL |
| NULL | 3 |
+------+------+
JSON values are stored using MaxCompute internal types. Because of this mapping, values outside the BIGINT or DOUBLE range will overflow or lose precision.
Prerequisites
Before you begin, make sure you have:
-
A MaxCompute project with the JSON type enabled (see Enable the JSON type)
-
Java SDK V0.44.0 or later, or PyODPS V0.11.4.1 or later
-
If using odpscmd: version V0.46.5 or later, with
use_instance_tunnel=falseinconf\odps_config.ini
Enable the JSON type
The odps.sql.type.json.enable flag controls JSON type availability:
| Project type | Default value |
|---|---|
| New projects | true |
| Existing projects | false |
To enable the JSON type in an existing project:
SET odps.sql.type.json.enable=true;
To check the current value:
setproject;
Limitations
Constraint reference
| Category | Constraint | Implication |
|---|---|---|
| Table operations | Cannot add a JSON column to an existing table | Create a new table; migrate data with INSERT INTO ... SELECT |
| Table types | Clustered tables and Delta Table type are not supported | Use standard tables |
| SQL operations | ORDER BY, GROUP BY, and JOIN keys on JSON columns are not supported |
Extract the value to a typed column first |
| SQL operations | Comparison operations on JSON type are not supported | Cast to STRING or a typed column before comparing |
| Nesting | Maximum 20 levels deep | Flatten deeply nested structures before ingestion |
| Engine compatibility | Hologres cannot read JSON columns | Keep a STRING copy if cross-engine reads are required |
| User-defined functions (UDFs) | Java UDFs and Python UDFs do not support JSON type | Use built-in JSON functions instead |
| Tooling | Dataphin and other external ecosystems are not supported | Verify compatibility before use |
Type storage and precision
JSON values are mapped to MaxCompute internal types. Values outside these ranges cause overflow or precision loss:
| JSON type | Internal storage | Note |
|---|---|---|
| NUMBER (integer part) | BIGINT | Overflow if outside BIGINT range |
| NUMBER (decimal part) | DOUBLE | Precision loss possible |
| STRING | BINARY (non-public) or typed column | \u0000 is not supported |
| BOOLEAN | BOOLEAN | |
| NULL | — | json 'null' differs from SQL NULL |
| ARRAY | ARRAY | |
| OBJECT | OBJECT |
Tool and SDK requirements
| Tool or SDK | Requirement |
|---|---|
| odpscmd (MaxCompute client) | V0.46.5 or later; set use_instance_tunnel=false in conf\odps_config.ini |
| MaxCompute Studio | Supported |
| DataWorks | Supported |
| Java SDK | V0.44.0 or later |
| PyODPS | V0.11.4.1 or later |
Work with the JSON type
All examples below use a shared order record to show create, insert, and query patterns end-to-end:
{"id": 1001, "customer": "Molly", "amount": 299.50}
Create a JSON table
No schema definition is required — declare the column as JSON:
CREATE TABLE orders (record JSON);
Generate JSON data
From a JSON literal:
INSERT INTO orders VALUES (JSON '{"id": 1001, "customer": "Molly", "amount": 299.50}');
Using JSON_OBJECT and JSON_ARRAY:
-- JSON_OBJECT builds a JSON object from key-value pairs.
INSERT INTO orders SELECT JSON_OBJECT("id", 1002, "customer", "Frank", "amount", 150.00);
-- JSON_ARRAY builds a JSON array.
SELECT JSON_ARRAY("tag1", "tag2", "promo");
-- Returns: ["tag1","tag2","promo"]
By converting a STRING column:
Use json_parse to convert existing string data. Wrap it with json_valid to skip malformed rows:
INSERT INTO orders
SELECT json_parse(raw_json)
FROM staging_table
WHERE json_valid(raw_json);
CAST("abc" AS JSON)andjson_parse("abc")behave differently for edge cases. See JSON functions for details.
Access JSON data
All examples below query the row inserted above: {"id": 1001, "customer": "Molly", "amount": 299.50}.
Index access
Index access uses strict mode — NULL is returned when the path does not match the data structure.
-- Returns 1001
SELECT record['id']
FROM orders
WHERE record['id'] IS NOT NULL;
-- Returns "Molly"
SELECT record['customer']
FROM orders;
-- Returns NULL (field does not exist)
SELECT record['email']
FROM orders;
Index access is equivalent to JSON_EXTRACT in strict mode:
-- These two expressions return the same result:
SELECT record['id'] FROM orders;
SELECT JSON_EXTRACT(record, 'strict $.id') FROM orders;
-- Both return: 1001
Access using JSON functions
Two functions are available:
| Function | Returns | JSON Path parser |
|---|---|---|
JSON_EXTRACT |
JSON type | New standardized parser (PostgreSQL-compatible subset) |
GET_JSON_OBJECT |
STRING type | Legacy parser |
Use JSON_EXTRACT in new SQL — its parser is consistent with the index accessor and supports column pruning in strict mode.
-- JSON_EXTRACT returns a JSON value (with quotes).
SELECT JSON_EXTRACT(record, '$.customer')
FROM orders;
-- Returns: "Molly"
-- GET_JSON_OBJECT returns a STRING value (no quotes).
SELECT GET_JSON_OBJECT(record, '$.customer')
FROM orders;
-- Returns: Molly
JSON Path reference
A JSON Path identifies a node in JSON data. The parser used by the JSON type is a subset of the PostgreSQL JSON Path specification.
Sample data for JSON Path examples:
{
"name": "Molly",
"phones": [
{ "phonetype": "work", "phone#": "650-506-7000" },
{ "phonetype": "cell", "phone#": "650-555-5555" }
]
}
Accessor syntax:
| Accessor | Example | Description |
|---|---|---|
| Member | $.name |
Access a field by name |
| Member (special chars) | $."phone#" |
Use quotes for names with special characters |
| Wildcard member | $.* |
All fields of an object |
| Element | $.phones[1] |
Array element by index |
| Element range | $.phones[0, 1] or $[1, 2, 4 to 7] |
Elements at specified indexes or a range |
| Wildcard element | $.phones[*] |
All array elements |
Modes:
JSON Path supports two modes. The default is lax.
| Mode | Behavior | Column pruning |
|---|---|---|
lax |
Automatically wraps scalars as arrays and unwraps arrays to objects when the path expects a different structure. Returns results permissively. | Not supported |
strict |
Returns NULL if the path does not match the actual data structure exactly. | Supported |
lax mode examples (using the sample data above):
| Expression | Result | Reason |
|---|---|---|
lax $.phones.phonetype |
["work","cell"] |
Unwraps the phones array and reads phonetype from each object |
lax $.phones[*].phonetype |
["work","cell"] |
Direct wildcard element access |
lax $.name[*] |
["Molly"] |
Wraps the string "Molly" into an array |
lax $.name.* |
NULL |
Expects an object under name, finds a string |
strict mode examples:
| Expression | Result | Reason |
|---|---|---|
strict $.phones[1]."phone#" |
"650-555-5555" |
Exact path match |
strict $.phones.phonetype |
NULL |
phones is an array; an object is expected |
strict $.address |
NULL |
Field does not exist |
Use strict mode when you need column pruning. Lax mode does not support column pruning optimization.
Design considerations
-
Use strict mode for column pruning. Lax mode does not trigger column pruning optimization, so queries in lax mode scan more data.
-
Keep JSON documents small. Each JSON column is stored as a single column value. Large documents increase the memory and I/O cost per row.
-
Validate before ingesting. Use
json_valid()to filter malformed rows before callingjson_parse(). -
Check precision before inserting large numbers. JSON numbers are stored as BIGINT (integer part) and DOUBLE (decimal part). Numbers outside those ranges overflow.
-
Plan your schema before creating tables. You cannot add a JSON column to an existing table. Design the table with the JSON column from the start.
End-to-end example
-- Enable the JSON type if your project was created before the feature was enabled.
SET odps.sql.type.json.enable=true;
-- Create a JSON table.
CREATE TABLE orders (record JSON);
-- Ingest from a staging STRING table, skipping malformed rows.
CREATE TABLE staging (raw_json STRING);
INSERT INTO staging VALUES ('{"id": 1001, "customer": "Molly", "amount": 299.50}');
INSERT INTO orders
SELECT json_parse(raw_json)
FROM staging
WHERE json_valid(raw_json);
-- Query all non-null records.
SELECT * FROM orders WHERE record IS NOT NULL;
-- Returns:
-- +--------------------------------------------------+
-- | record |
-- +--------------------------------------------------+
-- | {"id":1001,"customer":"Molly","amount":299.5} |
-- +--------------------------------------------------+
-- Access a specific field.
SELECT record['customer'] FROM orders WHERE record IS NOT NULL;
-- Returns:
-- +-----------+
-- | _c0 |
-- +-----------+
-- | "Molly" |
-- +-----------+
What's next
-
JSON functions — full reference for
JSON_EXTRACT,GET_JSON_OBJECT,JSON_OBJECT,JSON_ARRAY,json_parse,json_valid, and related functions -
CAST expressions — type conversion behavior for JSON