JSON type - Tablestore - Alibaba Cloud Documentation Center

Search Index supports the JSON type field to efficiently store and flexibly query semi-structured data. This feature lets you retrieve and analyze nested data structures in complex business scenarios, such as log analysis, recording user behavior, and configuration management. Using the JSON type field significantly improves query efficiency and data processing accuracy.

Important

To use the JSON type in Search Index, please and contact Tablestore technical support to enable this feature.

How it works

Map a String type field from your data table to a JSON type field in the search index. This lets you process semi-structured data with hierarchical structures. To meet different query and performance requirements, Tablestore offers two JSON storage modes: Object type and Nested type.

Object type: Flattens nested structures into individual fields. This type is suitable for scenarios that involve simple field retrieval and require high query performance.
Nested type: Maintains the independence of nested objects and the relationships between their fields. This type is suitable for complex query scenarios that require exact matches for field combinations within an object.

Core differences

The two JSON types have significant differences in data processing methods, query syntax, field relationships, and performance characteristics:

Data processing method: The Object type stores nested data in a flat structure. The Nested type stores each nested object as a separate document.
Query method: The Object type uses basic queries. The Nested type must use NestedQuery.
Field relationships: The Object type allows cross-matching of fields from different nested objects. The Nested type ensures that fields are matched independently within each nested object.
Performance: Object type queries are lightweight, efficient, and consume few resources. Nested type queries are powerful but consume more resources.

Selection principles

Choose the Nested type if your queries must strictly maintain the relationships and accuracy of fields within nested objects. Choose the Object type if you need high query performance and do not have strict requirements for field relationships.

Configure and use a JSON type index

This section describes the complete configuration flow for a JSON type index, which helps you build an efficient and reliable retrieval system for your semi-structured data. The process includes selecting a type, creating an index, and validating queries.

Whether you choose the Object or Nested JSON type, you must explicitly define the field type for all sub-fields. The indexing system ignores any fields without a defined type. These fields cannot be included in queries.

Step 1: Choose a JSON type and data format

Choose the most suitable JSON type and data format based on your specific query needs, performance requirements, and data structure.

JSON type selection policy

Object type: Suitable for independent queries on simple fields. It provides excellent query performance and low resource consumption.
Nested type: Suitable for precise queries that must maintain field relationships. This ensures the accuracy of query results.
Mixed use: Combine Object and Nested types in the same index to meet the requirements of complex business scenarios.

Data write format specifications

JSON fields support both array and non-array data formats. Choose the format that fits your data structure:

//Array format
[{ "country": "China", "city": "hangzhou" }, { "country": "usa", "city": "Seattle" }]

//Non-array format
{ "country": "China", "city": "hangzhou" }

Data write example

private static void putRow(SyncClient client) {
    // Construct the primary key.
    PrimaryKeyBuilder primaryKeyBuilder = PrimaryKeyBuilder.createPrimaryKeyBuilder();
    primaryKeyBuilder.addPrimaryKeyColumn("id", PrimaryKeyValue.fromString("10001"));
    PrimaryKey primaryKey = primaryKeyBuilder.build();

    // Set the data table name.
    RowPutChange rowPutChange = new RowPutChange("<TABLE_NAME>", primaryKey);

    // Build the raw JSON data.
    List<Map<String, Object>> addresses = Arrays.asList(
    new HashMap<String, Object>() {{ put("country", "China"); put("city", "hangzhou"); }},
    new HashMap<String, Object>() {{ put("country", "usa");   put("city", "Seattle"); }}
    );
    String jsonString = JSON.toJSONString(addresses); 

    rowPutChange.addColumn(new Column("address", ColumnValue.fromString(jsonString)));

    client.putRow(new PutRowRequest(rowPutChange));
}

Step 2: Configure the field structure and create the index

This example demonstrates the complete index configuration process for a single-level JSON object with two sub-fields. Defining field types and setting properties correctly ensures index integrity and optimizes query performance.

List<FieldSchema> subFieldSchemas = new ArrayList<FieldSchema>();
subFieldSchemas.add(new FieldSchema("country", FieldType.KEYWORD)
    .setIndex(true).setEnableSortAndAgg(true));
subFieldSchemas.add(new FieldSchema("city", FieldType.KEYWORD)
    .setIndex(true).setEnableSortAndAgg(true));

FieldSchema jsonFieldSchema = new FieldSchema("address", FieldType.Json)
    .setJsonType(JsonType.OBJECT) // Set to JsonType.OBJECT or JsonType.NESTED
    .setSubFieldSchemas(subFieldSchemas);

Step 3: Query data

Test your queries to verify the index configuration. Compare the query behavior and results for Object and Nested JSON types using the same data and query conditions.

Object type: Flattens nested data. To query, use a path that connects the parent and child field names with a period (.). This flat structure allows field values from different nested objects to be cross-matched.
Nested type: Maintains the independence and integrity of each nested object. You must wrap query conditions in a NestedQuery. This ensures that fields are matched only within the same nested object.

Format after indexing

Assume the data for the `address` field is `[{ "country": "China", "city": "Hangzhou" }, { "country": "USA", "city": "Seattle" }]`.

Object type JSON format after indexing: {"address.country": ["China", "USA"], "address.city": ["Hangzhou","Seattle"]}
Nested type JSON format after indexing: Separate documents { "country": "China", "city": "Hangzhou" } and { "country": "USA", "city": "Seattle" }

Based on the preceding examples, if the query condition is address.country ="China" and address.city="Seattle", the Object type query returns a match, but the Nested type query does not. If the query condition is address.country ="China" and address.city="Hangzhou", queries on both the Object and Nested types return a match.

Query examples

JSON Nested type query example

The following example queries for rows that satisfy two conditions in the same nested object of the `address` field: address.country is "China" and address.city is "Seattle".

public static void nestedQuery(SyncClient client) {
    // Condition 1: The value of the country field in the address sub-row must be "China".
    TermQuery termQuery1 = new TermQuery();
    termQuery1.setFieldName("address.country");
    termQuery1.setTerm(ColumnValue.fromString("China"));

    // Condition 2: The value of the city field in the address sub-row must be "Seattle".
    TermQuery termQuery2 = new TermQuery();
    termQuery2.setFieldName("address.city");
    termQuery2.setTerm(ColumnValue.fromString("Seattle"));

    // Use the AND condition of BoolQuery to query for sub-rows that meet both conditions.
    List<Query> mustQueries = new ArrayList<>();
    mustQueries.add(termQuery1);
    mustQueries.add(termQuery2);
    BoolQuery boolQuery = new BoolQuery();
    boolQuery.setMustQueries(mustQueries);

    // Set BoolQuery within NestedQuery to require a sub-row to meet multiple query conditions at the same time.
    NestedQuery nestedQuery = new NestedQuery();    // Set the query type to NestedQuery.
    nestedQuery.setPath("address");   // Set the path of the nested type column, which is the parent path of the field to query.
    nestedQuery.setQuery(boolQuery);
    nestedQuery.setScoreMode(ScoreMode.None);

    SearchQuery searchQuery = new SearchQuery();
    searchQuery.setQuery(nestedQuery);

    SearchRequest searchRequest = new SearchRequest("<TABLE_NAME>", "<SEARCH_INDEX_NAME>", searchQuery);

    SearchResponse resp = client.search(searchRequest);
    System.out.println("Row: " + resp.getRows());
}

JSON Object type query example

The following example queries for rows where the `address` field satisfies two conditions across its nested objects: address.country is "China" and address.city is "Seattle".

public static void boolQuery(SyncClient client) {
    // Condition 1: The value of the country field in the address sub-row must be "China".
    TermQuery termQuery1 = new TermQuery();
    termQuery1.setFieldName("address.country");
    termQuery1.setTerm(ColumnValue.fromString("China"));

    // Condition 2: The value of the city field in the address sub-row must be "Seattle".
    TermQuery termQuery2 = new TermQuery();
    termQuery2.setFieldName("address.city");
    termQuery2.setTerm(ColumnValue.fromString("Seattle"));

    // Use the AND condition of BoolQuery to query for sub-rows that meet both conditions.
    List<Query> mustQueries = new ArrayList<>();
    mustQueries.add(termQuery1);
    mustQueries.add(termQuery2);
    BoolQuery boolQuery = new BoolQuery();
    boolQuery.setMustQueries(mustQueries);


    SearchQuery searchQuery = new SearchQuery();
    searchQuery.setQuery(boolQuery);

    SearchRequest searchRequest = new SearchRequest("<TABLE_NAME>", "<SEARCH_INDEX_NAME>", searchQuery);

    SearchResponse resp = client.search(searchRequest);
    System.out.println("Row: " + resp.getRows());
}

JSON type application examples

Whether you choose the Object or Nested JSON type, the schema configuration is the same for both single-level and multi-level structures. The only difference is the JSON type setting. This consistency allows for flexible switching between types and unified management.

Single-level JSON example

This Java code example creates a single-level JSON field named tags. Set the JSON type as needed. The field includes the following three sub-fields:

tagName: A Keyword (string) type for exact matching and aggregating tag names.
score: A Double (floating-point number) type for numerical calculations and sorting by tag weights.
time: A Date type that uses the millisecond UNIX timestamp format. It is used for time range queries and time series analysis.

Data can be written in an array or non-array format. Choose the format that fits your data structure:

// Array format
[{"tagName":"tag1", "score":0.8,"time": 1730690237000 }, {"tagName":"tag2", "score":0.2,"time": 1730691557000}]
// Non-array format
{"tagName":"tag1", "score":0.8,"time": 1730690237000 }

The complete schema configuration code is as follows:

List<FieldSchema> subFieldSchemas = new ArrayList<FieldSchema>();
subFieldSchemas.add(new FieldSchema("tagName", FieldType.KEYWORD)
    .setIndex(true).setEnableSortAndAgg(true));
subFieldSchemas.add(new FieldSchema("score", FieldType.DOUBLE)
    .setIndex(true).setEnableSortAndAgg(true));
subFieldSchemas.add(new FieldSchema("time", FieldType.DATE)
    .setDateFormats(Arrays.asList("epoch_millis")));

FieldSchema nestedFieldSchema = new FieldSchema("tags", FieldType.Json)
    .setJsonType(JsonType.OBJECT) // Replace with JsonType.NESTED as needed
    .setSubFieldSchemas(subFieldSchemas);

Multi-level JSON example

This Java code example creates a multi-level JSON field named user. Set the JsonType as needed. This field builds a complete data structure that contains basic user information and nested address information. You can use a mix of Nested and Object types in a multi-level JSON structure.

Base fields: name (Keyword type for exact name queries), age (Long type for age range filtering), birth (Date type in yyyy-MM-dd HH:mm:ss.SSS format for birthday queries), and phone (Keyword type for matching contact information).
Nested field: address (Set the JsonType as needed). It contains three address-level fields: province, city, and street. All are Keyword types that support hierarchical queries for geographical locations.

The following is a typical example of writing user data:

{
  "name": "Zhang San",
  "age": 20,
  "birth": "2014-10-10 12:00:00.000",
  "phone": "1390000****",
  "address": {
    "province": "Zhejiang Province",
    "city": "Hangzhou City",
    "street": "No. 1201, Xingfu Community, Yangguang Avenue"
  }
}

The complete schema configuration for the multi-level JSON is as follows:

// Schema for the address sub-field (path: user.address)
List<FieldSchema> addressSubFiledSchemas = new ArrayList<>();
addressSubFiledSchemas.add(new FieldSchema("province",FieldType.KEYWORD));
addressSubFiledSchemas.add(new FieldSchema("city",FieldType.KEYWORD));
addressSubFiledSchemas.add(new FieldSchema("street",FieldType.KEYWORD));

// Schema for the user sub-field (path: user)
List<FieldSchema> subFieldSchemas = new ArrayList<>();
subFieldSchemas.add(new FieldSchema("name",FieldType.KEYWORD));
subFieldSchemas.add(new FieldSchema("age",FieldType.LONG));
subFieldSchemas.add(new FieldSchema("birth",FieldType.DATE)
    .setDateFormats(Arrays.asList("yyyy-MM-dd HH:mm:ss.SSS")));
subFieldSchemas.add(new FieldSchema("phone",FieldType.KEYWORD));
subFieldSchemas.add(new FieldSchema("address",FieldType.JSON)
    .setJsonType(JsonType.NESTED) // Replace with JsonType.OBJECT as needed
    .setSubFieldSchemas(addressSubFiledSchemas));

// Create the parent field user
List<FieldSchema> fieldSchemas = new ArrayList<>();
fieldSchemas.add(new FieldSchema("user",FieldType.JSON)
    .setJsonType(JsonType.OBJECT) // Replace with JsonType.NESTED as needed
    .setSubFieldSchemas(subFieldSchemas));

Quotas and limits

Note the following limits and configuration requirements when you use JSON type fields:

Vector type limit: A vector type field cannot be a sub-field of a JSON field. You must configure a separate index for vector fields.
Nested type limit: When a JSON field type is set to Nested, you cannot use index pre-sorting for that field. You must use a NestedQuery to query the field.
Sub-field array configuration: If you write data to a non-JSON sub-field in an array format, you must set the `IsArray` property to `true` in the field configuration. You must also ensure that the data is written in the standard array format, such as `"[a, b, c]"`. If you do not meet these requirements, the index cannot correctly sync or retrieve the sub-field data.

Developer integration

Use the JSON type with the Java SDK, Go SDK, and Python SDK.