All Products
Search
Document Center

MaxCompute:GET_JSON_OBJECT

Last Updated:Sep 22, 2023

Extracts a single string from a standard JSON string by using a specific method.

Usage notes

This function is used to extract a single string from a standard JSON string by using JSONPath. This function supports input parameters of the following data types:

  • JSON types: If input parameters are of JSON types, standard JSONPath expressions are used.

  • STRING type: If input parameters are of the STRING type, the original JSONPath expressions are used.

The methods and usage notes for using this function vary based on input parameter data types. This topic describes how to use the GET_JSON_OBJECT function when input parameters are of the JSON and STRING types.

Note

The JSONPath expressions that are used when input parameters are of JSON types and those that are used when input parameters are of the STRING type comply with different rules. This may cause incompatibility issues.

Input parameters of JSON types

Syntax

json get_json_object(json <json>, string <json_path>)

Parameters

  • json: required. A JSON string from which you want to extract a single string.

  • json_path: required. A JSONPath expression based on which you want to extract a single string.

Return value

A value of a JSON type is returned.

Examples

  • Example 1: Extract the value that corresponds to the key a from a JSON string.

    select get_json_object(json '{"a":1, "b":2}', '$.a');

    The following result is returned:

    +-----+
    | _c0 |
    +-----+
    | 1   |
    +-----+
  • Example 2: Extract the value that corresponds to the key c from a JSON string.

    select get_json_object(json '{"a":1, "b":2}', '$.c');

    The following result is returned:

    +-----+
    | _c0 |
    +-----+
    | NULL |
    +-----+
  • Example 3: If an invalid JSON path is specified, the return value is null.

    select get_json_object(json '{"a":1, "b":2}', '$invalid_json_path');

    The following result is returned:

    +-----+
    | _c0 |
    +-----+
    | NULL |
    +-----+

Input parameters of the STRING type

Background information

This function is used to extract a single string from a standard JSON string by using path. The original data is read each time this function is called. Repeated calls may affect system performance and increase costs. To prevent repeated calls, you can use the GET_JSON_OBJECT function together with user-defined table-valued functions (UDTFs). For more information, see Convert JSON log data by using MaxCompute built-in functions and UDTFs.

Parameters

  • json: required. A value of the STRING type. This parameter specifies a standard JSON object in the {Key:Value, Key:Value,...} format. If the string contains a double quotation mark ("), use two backslashes (\\) to escape the double quotation mark (") before extraction. If the string contains a single quotation mark ('), use a single backslash (\) to escape the single quotation mark (') before extraction.

  • path: required. A value of the STRING type. This parameter specifies the path in the value of the json parameter and starts with $. For more information about the path parameter, see LanguageManual UDF. For more information about best practices, see Migrate JSON data from OSS to MaxCompute. Meanings of different characters:

    • $: indicates the root node.

    • . or ['']: indicates a child node. MaxCompute parses JSON objects by using . or ['']. If a key in a JSON object contains a period (.), you can use [''].

    • [] ([number]): indicates an array subscript, which starts from 0.

    • *: indicates the wildcard for []. If this character is used in the path parameter, an entire array is returned. An asterisk (*) cannot be escaped.

Limits

Only the function of the latest version allows you to extract data by using [''] in the path parameter. To use [''], you must add the set odps.sql.udf.getjsonobj.new=true; configuration.

Syntax

string get_json_object(string <json>, string <path>)

Return value

  • If the json parameter is left empty or invalid, null is returned.

  • If the format of the json parameter is valid and path exists, the related string is returned.

  • You can configure the flagodps.sql.udf.getjsonobj.new parameter for a session to specify how the function returns a value.

    • If you run the set odps.sql.udf.getjsonobj.new=true; command, the function of the latest version is used and the function return behavior is more standard. This facilitates data processing and improves data processing performance. We recommend that you use the latest function version. In this case, the function complies with the following rules when it returns a value:

      • The return value is a JSON string, which can be parsed as JSON data, without the need to use the REPLACE or REGEXP_REPLACE function to replace backslashes (\).

      • Duplicate keys are allowed in a JSON object. If duplicate keys exist, the data can be parsed, and the first value is returned.

        -- The return value is 1. 
        select get_json_object('{"a":"1","a":"2"}', '$.a');
      • The output results are displayed in alphabetical order.

        -- The return value is {"b":"1","a":"2"}. 
        select get_json_object('{"b":{"b":"1","a":"2"},"a":"2"}', '$.b');
    • If you run the set odps.sql.udf.getjsonobj.new=false; command, the function complies with the following rules when it returns a value:

      Note

      If the Hive-compatible data type edition is enabled, the function of the latest version is used.

      • JSON reserved characters such as line feeds (\n) and quotation marks (") are displayed as '\n' and '\"'.

      • Each key in a JSON object must be unique. If duplicate keys exist, the data may fail to be parsed.

        -- The return value is null. 
        select get_json_object('{"a":"1","a":"2"}', '$.a');
      • The output results are displayed in alphabetical order.

        -- The return value is {"a":"2","b":"1"}. 
        select get_json_object('{"b":{"b":"1","a":"2"},"a":"2"}', '$.b');

Examples

  • Example 1: Extract information from the JSON object src_json.json. Sample statements:

    -- The JSON string src_json.json contains the following content: 
    +----+
    json
    +----+
    {"store":
    {"fruit":[{"weight":8,"type":"apple"},{"weight":9,"type":"pear"}],
    "bicycle":{"price":19.95,"color":"red"}
    },
    "email":"amy@only_for_json_udf_test.net",
    "owner":"amy"
    }
    -- Extract the information of the owner field and return amy. 
    select get_json_object(src_json.json, '$.owner') from src_json;
    -- Extract the information of the first array in the store.fruit field and return {"weight":8,"type":"apple"}. 
    select get_json_object(src_json.json, '$.store.fruit[0]') from src_json;
    -- Extract the information of a non-existent field and return null. 
    select get_json_object(src_json.json, '$.non_exist_key') from src_json;
  • Example 2: Extract information from a JSON object of the ARRAY type. Sample statements:

    -- The return value is 2222. 
    select get_json_object('{"array":[["aaaa",1111],["bbbb",2222],["cccc",3333]]}','$.array[1][1]');
    -- The return value is ["h0","h1","h2"]. 
    set odps.sql.udf.getjsonobj.new=true;
    select get_json_object('{"aaa":"bbb","ccc":{"ddd":"eee","fff":"ggg","hhh":["h0","h1","h2"]},"iii":"jjj"}','$.ccc.hhh[*]');
    -- The return value is ["h0","h1","h2"]. 
    set odps.sql.udf.getjsonobj.new=false;
    select get_json_object('{"aaa":"bbb","ccc":{"ddd":"eee","fff":"ggg","hhh":["h0","h1","h2"]},"iii":"jjj"}','$.ccc.hhh');
    -- The return value is h1. 
    select get_json_object('{"aaa":"bbb","ccc":{"ddd":"eee","fff":"ggg","hhh":["h0","h1","h2"]},"iii":"jjj"}','$.ccc.hhh[1]');
  • Example 3: Extract information from a JSON object that includes a period (.). Sample statements:

    -- Create a table. 
    create table mf_json (id string, json string);
    -- Insert data into the table. The key in the data contains a period (.). 
    insert into table mf_json (id, json) values ("1", "{
    \"China.beijing\":{\"school\":{\"id\":0,\"book\":[{\"title\": \"A\",
    \"price\": 8.95},{\"title\": \"B\",\"price\": 10.2}]}}}");
    -- Insert data into the table. The key in the data does not contain a period (.). 
    insert into table mf_json (id, json) values ("2", "{
    \"China_beijing\":{\"school\":{\"id\":0,\"book\":[{\"title\": \"A\",
    \"price\": 8.95},{\"title\": \"B\",\"price\": 10.2}]}}}");
    -- Query the value of id in the JSON object whose key is China.beijing. The value 0 is returned. Only [''] can be used to specify the key because the key contains a period (.). This way, MaxCompute can parse the key. 
    select get_json_object(json, "$['China.beijing'].school['id']") from mf_json where id =1;
    -- Query the value of id in the JSON object whose key is China_beijing. The value 0 is returned. You can use one of the following statements: 
    select get_json_object(json, "$['China_beijing'].school['id']") from mf_json where id =2;
    select get_json_object(json, "$.China_beijing.school['id']") from mf_json where id =2;
  • Example 4: The json parameter is empty or invalid. Sample statements:

    -- The return value is null. 
    select get_json_object('','$.array[1][1]');
    -- The return value is null. 
    select get_json_object('"array":["aaaa",1111],"bbbb":["cccc",3333]','$.array[1][1]');
  • Example 5: Escape a JSON string. Sample statements:

    set odps.sql.udf.getjsonobj.new=true;
    -- The return value is "1". 
    select get_json_object('{"a":"\\"1\\"","b":"2"}', '$.a'); 
    -- The return value is '1'. 
    select get_json_object('{"a":"\'1\'","b":"2"}', '$.a'); 

Related functions

GET_JSON_OBJECT is a complex type function or a string function.

  • For more information about the functions that are used to process data of complex data types, such as ARRAY, MAP, STRUCT, and JSON, see Complex type functions.

  • For more information about functions related to string searches and conversion, see String functions.