AnalyticDB for MySQL provides JSON indexes as a semi-structured data retrieval feature to simplify semi-structured data processing.

Background information

In the era of big data, a variety of solutions are available to retrieve structured data. However, most big data is semi-structured, and the amount of semi-structured data is rapidly increasing. Semi-structured data is much more difficult to understand and analyze than structured data. Mature solutions are required to process semi-structured data. AnalyticDB for MySQL provides JSON indexes as a semi-structured data retrieval feature to simplify semi-structured data processing.

Precautions

  • If you set the data type of a column to JSON when you create a table, a JSON index is automatically created in AnalyticDB for MySQL clusters that run a minor engine version earlier than 3.1.5.10.

    After the JSON index is created, you can add or remove an index by executing the ALTER TABLE statement. For more information, see the "ALTER TABLE" of this topic.

  • If you set the data type of a column to JSON when you create a table, no JSON index is created in AnalyticDB for MySQL clusters that run the minor engine version 3.1.5.10 or later. You must execute DDL statements to create a JSON index.
    ALTER table <table_name> ADD key|index <index_name>(json_column);
  • AnalyticDB for MySQL supports the standard JSON format. When you write JSON strings to a table, you must comply with the standard JSON format.
  • Default values cannot be set for JSON data columns.

Create a table

In the following sample statement, the data in the vj column is of the JSON type. After a table is created, AnalyticDB for MySQL automatically creates a JSON index for the vj column.

CREATE TABLE json_test(
  id int,
  vj json COMMENT 'The vj column is of the JSON type. AnalyticDB for MySQL automatically creates a JSON index for this column.'
)
DISTRIBUTED BY HASH(id);
After the JSON index is created, you can add or remove an index by executing the ALTER TABLE statement. Sample statements:
  • Remove an index
    ALTER TABLE db_name.table_name DROP KEY index_name;
    Note You can execute the SHOW INDEXES statement to view information about all indexes in the table, including index names. key_name indicates the name of an index. For more information, see SHOW.
  • Add an index
    ALTER TABLE db_name.table_name ADD KEY index_name(column_name);

JSON format

When you write JSON data to a table in AnalyticDB for MySQL, take note of the following limits on keys and values:
  • Keys

    You must enclose a key with double quotation marks (""). Example: "addr" in {"addr":"xyz"}.

  • Values
    • Data types available for a value include BOOLEAN, NUMBER, VARCHAR, ARRAY, OBJECT, and NULL.
      Note
      • A value of the NUMBER type must be within the allowed value range of the DOUBLE type. Otherwise, an error is returned.
      • AnalyticDB for MySQL allows you to write JSON arrays, including plain arrays and nested arrays. Examples: {"hobby":["basketball", "football"]} and {"addr":[{"city":"beijing", "no":0}, {"city":"shenzhen", "no":0}]}.
    • If a value is of the STRING type, you must enclose the value with double quotation marks ("").
      Note If a value is of the STRING type and contains double quotation marks (""), you must add escape characters before the double quotation marks to indicate that the double quotation marks are part of the value. For example, the value in {"addr":"xyz"ab"c"} is "xyz"ab"c". The value must be escaped as "xyz\"ab\"c". Backslashes (\) are escaped when data is written, so you must write {"addr":"xyz\\"ab\\"c"}.
    • If a value is of the NUMBER type, you must write the value to the table instead of enclosing the value with double quotation marks ("").
    • If a value is of the BOOLEAN type, you must write true or false instead of 1 or 0 to the table. true and false must be in lowercase.
    • If a value is NULL, you must write Null to the table.
    • A key can have values of different data types. The value of a specified data type is returned during queries.

      For example, if the INSERT INTO test_tb1 VALUES ({"id":1}) statement is executed, the value of id is 1. If the INSERT INTO test_tb1 VALUES ({"id":"1"}) statement is executed, the value of id is "1".

      If the SELECT id FROM test_tb1 WHERE json_extract(col, '$.id')= 1; statement is executed, "id": 1 is returned. If the SELECT id FROM test_tb1 WHERE json_extract(col, '$.id')= '1'; statement is executed, "id": "1" is returned.

Write data

Fields of the JSON type are written to tables in single quotation marks (''), in the same manner as fields of the VARCHAR type.

The following SQL statements provide examples of different JSON data types:

INSERT INTO json_test VALUES(0, '{"id":0, "name":"abc", "age":0}');
INSERT INTO json_test VALUES(1, '{"id":1, "name":"abc", "age":10, "gender":"f"}');
INSERT INTO json_test VALUES(2, '{"id":3, "name":"xyz", "age":30, "company":{"name":"alibaba", "place":"hangzhou"}}');
INSERT INTO json_test VALUES(3, '{"id":5, "name":"a\\"b\\"c", "age":50, "company":{"name":"alibaba", "place":"america"}}');
INSERT INTO json_test VALUES(4, '{"a":1, "b":"abc-char", "c":true}');
INSERT INTO json_test VALUES(5, '{"uname":{"first":"lily", "last":"chen"}, "addr":[{"city":"beijing", "no":1}, {"city":"shenzhen", "no":0}], "age":10, "male":true, "like":"fish", "hobby":["basketball", "football"]}');

Query data

AnalyticDB for MySQL allows you to use the JSON_EXTRACT() function for data query.

  • Syntax
    json_extract(json, jsonpath)
  • Description

    This function extracts one or more values specified by jsonpath from a valid JSON string.

  • Parameters
    • json: the name of the column of the JSON type.
    • jsonpath: the path of the key that is separated by periods (.). $ indicates the outermost path.

    For more information about JSON functions, see JSON functions.

  • Example
    • Basic query
      SELECT json_extract(vj,'$.name') FROM json_test WHERE id=1;
    • Equivalent query
      SELECT id, vj FROM json_test WHERE json_extract(vj, '$.name') = 'abc';
      SELECT id, vj FROM json_test WHERE json_extract(vj, '$.c') = true;
      SELECT id, vj FROM json_test WHERE json_extract(vj, '$.age') = 30;
      SELECT id, vj FROM json_test WHERE json_extract(vj, '$.company.name') = 'alibaba';
    • Range query
      SELECT id, vj FROM json_test WHERE json_extract(vj, '$.age') > 0;
      SELECT id, vj FROM json_test WHERE json_extract(vj, '$.age') < 100;
      SELECT id, vj FROM json_test WHERE json_extract(vj, '$.name') > 'a' and json_extract(vj, '$.name') < 'z';
    • IS NULL or IS NOT NULL query
      SELECT id, vj FROM json_test WHERE json_extract(vj, '$.remark') is null;
      SELECT id, vj FROM json_test WHERE json_extract(vj, '$.name') is not null;
    • IN query
      SELECT id, vj FROM json_test WHERE json_extract(vj, '$.name') in ('abc','xyz');
      SELECT id, vj FROM json_test WHERE json_extract(vj, '$.age') in (10,20);
    • LIKE query
      SELECT id, vj FROM json_test WHERE json_extract(vj, '$.name') like 'ab%';
      SELECT id, vj FROM json_test WHERE json_extract(vj, '$.name') like '%bc%';
      SELECT id, vj FROM json_test WHERE json_extract(vj, '$.name') like '%bc';
    • ARRAY query
      SELECT id, vj FROM json_test WHERE json_extract(vj, '$.addr[0].city') = 'beijing' and json_extract(vj, '$.addr[1].no') = 0;
      Note You can use subscripts when you query arrays. However, you cannot iterate through an entire array.