Extracts a specific string from a standard JSON string. This topic describes the usage and considerations of the GET_JSON_OBJECT function when input parameters of the JSON and STRING types are used.
Usage notes
The GET_JSON_OBJECT function extracts a specific string from a standard JSON string using JSONPath. This function supports input parameters of the following data types:
JSON type: If input parameters are of JSON types, standard JSONPath expressions are used.
STRING type: If input parameters are of the STRING type, the original JSONPath expressions are used.
The JSONPath expressions used for JSON type input parameters and STRING type input parameters follow different rules. This may cause incompatibility issues.
The GET_JSON_OBJECT function does not support the syntax of JSONPath expressions.
Input parameters of JSON types
Syntax
STRING GET_JSON_OBJECT(JSON <json>, STRING <json_path>)Parameters
json: required. A JSON string from which you want to extract a single string.
json_path: required. This parameter specifies the JSON path in which a specific JSON value is checked.
Return value
The return value is of the STRING type.
Examples
Example 1: Extract the value that corresponds to the key a from a JSON string.
SELECT GET_JSON_OBJECT(JSON '{"a":1, "b":2}', '$.a');The following result is returned:
+-----+ | _c0 | +-----+ | 1 | +-----+Example 2: Extract the value that corresponds to the key c from a JSON string.
SELECT GET_JSON_OBJECT(JSON '{"a":1, "b":2}', '$.c');The following result is returned:
+-----+ | _c0 | +-----+ | NULL | +-----+Example 3: If an invalid JSON path is specified, the return value is NULL.
SELECT GET_JSON_OBJECT(JSON '{"a":1, "b":2}', '$invalid_json_path');The following result is returned:
+-----+ | _c0 | +-----+ | NULL | +-----+
Input parameters of the STRING type
Background information
This function extracts a single string from a standard JSON string using path.
The original data is read each time this function is called. Therefore, repeated calls may affect system performance and increase costs. You can use the GET_JSON_OBJECT function together with user-defined table-valued functions (UDTFs) to convert JSON log data. This prevents repeated function calls. For more information, see Convert JSON log data using MaxCompute built-in functions and UDTFs.
Parameters
json: required. The value must be of the STRING type. This parameter specifies a standard JSON object in the
{Key:Value, Key:Value,...}format. If the string contains a double quotation mark ("), use two backslashes (\\) to escape the double quotation mark before extraction. If the string contains a single quotation mark ('), use a single backslash (\) to escape the single quotation mark before extraction.path: required. The value must be of the STRING type. This parameter specifies the path in the value of the json parameter and starts with
$. For more information about the path parameter, see LanguageManual UDF. For more information about best practices, see Migrate JSON-formatted data from OSS to MaxCompute. Different characters have the following meanings:$: indicates the root node..or['']: Indicates a child node. MaxCompute supports both notations to parse JSON objects. If a key in a JSON object contains a period (.), you can use['']to specify the key.[]:[number]indicates an array subscript, which starts from 0.*:Wildcard for [], returns an entire array.*cannot be escaped.
Limits
Only the function of the latest version lets you extract data using [''] in the path parameter. To use [''], you must add the SET odps.sql.udf.getjsonobj.new=true; configuration.
Syntax
STRING GET_JSON_OBJECT(STRING <json>, STRING <path>)Return value
If the json parameter is empty or is in an invalid json format, NULL is returned.
If the json parameter is valid and path exists, the related string is returned.
You can configure the
odps.sql.udf.getjsonobj.newparameter for a session to specify how the function returns a value:If you run the
SET odps.sql.udf.getjsonobj.new=true;command, the function returns values by preserving the original strings.We recommend that you use this configuration because it results in more standard function return behavior. This facilitates data processing and improves data processing performance. If your MaxCompute project has existing jobs that use the behavior of escaping JSON reserved characters, we recommend that you retain the original behavior to avoid errors or correctness issues that may occur if you use the new behavior without verification. The function complies with the following rules when it returns a value:
The return value is a JSON string, which can be parsed as JSON data, without the need to use the
replaceorregexp_replacefunction to replace backslashes (\).Duplicate keys are allowed in a JSON object and can be successfully parsed.
--The return value is 1. SELECT GET_JSON_OBJECT('{"a":"1","a":"2"}', '$.a');The encoded strings that correspond to emojis are supported. However, DataWorks does not support emoji input. You can only use tools such as Data Integration to write the encoded strings that correspond to emojis to MaxCompute, and then process them using the
GET_JSON_OBJECTfunction.--The emoji symbol is returned. SELECT GET_JSON_OBJECT('{"a":"<Emoji symbol>"}', '$.a');The output results are displayed in the original order of the JSON string.
--The return value is {"b":"1","a":"2"}. SELECT GET_JSON_OBJECT('{"b":{"b":"1","a":"2"},"a":"2"}', '$.b');
If you run the SET odps.sql.udf.getjsonobj.new=false; command, the function returns values by escaping JSON reserved characters. The function complies with the following rules when it returns a value:
NoteIf the Hive-compatible data type edition is enabled, the GET_JSON_OBJECT function returns values by preserving the original strings.
JSON reserved characters such as line feeds (\n) and quotation marks (") are displayed as
'\n'and'\"'.Each key in a JSON object must be unique. If duplicate keys exist, the data may fail to be parsed.
--The return value is NULL. SELECT GET_JSON_OBJECT('{"a":"1","a":"2"}', '$.a');The encoded strings that correspond to emojis cannot be parsed.
--The return value is NULL. SELECT GET_JSON_OBJECT('{"a":"<Emoji symbol>"}', '$.a');The output results are displayed in alphabetical order.
--The return value is {"a":"2","b":"1"}. SELECT GET_JSON_OBJECT('{"b":{"b":"1","a":"2"},"a":"2"}', '$.b');
NoteFor MaxCompute projects created on or after January 21, 2021, the default return behavior of the
GET_JSON_OBJECTfunction is to preserve the original strings. For MaxCompute projects created before January 21, 2021, the default return behavior of theGET_JSON_OBJECTfunction is to escape JSON reserved characters. You can run the following command to determine which behavior theGET_JSON_OBJECTfunction uses in your MaxCompute project:SELECT GET_JSON_OBJECT('{"a":"[\\"1\\"]"}', '$.a'); --Return JSON reserved characters using escape characters: [\"1\"] --Return by preserving the original strings: ["1"]You can use the application link or search for DingTalk group number 11782920 to join the MaxCompute Developer Community DingTalk group. You can contact the MaxCompute technical support team to switch the return behavior of the
GET_JSON_OBJECTfunction in your project to preserve the original strings. This eliminates the need to frequently configure the property at the session level.
Examples
Extract information from the JSON object
src_json.json. Examples:--The JSON string src_json.json contains the following content: +----+ json +----+ {"store": {"fruit":[{"weight":8,"type":"apple"},{"weight":9,"type":"pear"}], "bicycle":{"price":19.95,"color":"red"} }, "email":"amy@only_for_json_udf_test.net", "owner":"amy" } --Extract the information of the owner field and return amy. SELECT GET_JSON_OBJECT(src_json.json, '$.owner') FROM src_json; --Extract the information of the first array in the store.fruit field and return {"weight":8,"type":"apple"}. SELECT GET_JSON_OBJECT(src_json.json, '$.store.fruit[0]') FROM src_json; --Extract the information of a non-existent field and return NULL. SELECT GET_JSON_OBJECT(src_json.json, '$.non_exist_key') FROM src_json;Extract information from a JSON object of the ARRAY type. Examples:
--The return value is 2222. SELECT GET_JSON_OBJECT('{"array":[["aaaa",1111],["bbbb",2222],["cccc",3333]]}','$.array[1][1]'); --The return value is ["h0","h1","h2"]. SET odps.sql.udf.getjsonobj.new=true; SELECT GET_JSON_OBJECT('{"aaa":"bbb","ccc":{"ddd":"eee","fff":"ggg","hhh":["h0","h1","h2"]},"iii":"jjj"}','$.ccc.hhh[*]'); --The return value is ["h0","h1","h2"]. SET odps.sql.udf.getjsonobj.new=false; SELECT GET_JSON_OBJECT('{"aaa":"bbb","ccc":{"ddd":"eee","fff":"ggg","hhh":["h0","h1","h2"]},"iii":"jjj"}','$.ccc.hhh[*]'); --The return value is h1. SELECT GET_JSON_OBJECT('{"aaa":"bbb","ccc":{"ddd":"eee","fff":"ggg","hhh":["h0","h1","h2"]},"iii":"jjj"}','$.ccc.hhh[1]');Extract information from a JSON object that includes a period (
.). Examples:--Create a table. CREATE TABLE mf_json (id string, json string); --Insert data into the table. The key in the data contains a period (.). INSERT INTO TABLE mf_json (id, json) VALUES ("1", "{ \"China.beijing\":{\"school\":{\"id\":0,\"book\":[{\"title\": \"A\", \"price\": 8.95},{\"title\": \"B\",\"price\": 10.2}]}}}"); --Insert data into the table. The key in the data does not contain a period (.). INSERT INTO TABLE mf_json (id, json) VALUES ("2", "{ \"China_beijing\":{\"school\":{\"id\":0,\"book\":[{\"title\": \"A\", \"price\": 8.95},{\"title\": \"B\",\"price\": 10.2}]}}}"); --Query the value of id in the JSON object whose key is China.beijing. The value 0 is returned. Only [''] can be used to specify the key because the key contains a period (.). SELECT GET_JSON_OBJECT(json, "$['China.beijing'].school['id']") FROM mf_json WHERE id =1; --Query the value of id in the JSON object whose key is China_beijing. The value 0 is returned. You can use one of the following statements: SELECT GET_JSON_OBJECT(json, "$['China_beijing'].school['id']") FROM mf_json WHERE id =2; SELECT GET_JSON_OBJECT(json, "$.China_beijing.school['id']") FROM mf_json WHERE id =2;The json parameter is empty or invalid. Examples:
--The return value is NULL. SELECT GET_JSON_OBJECT('','$.array[1][1]'); --The return value is NULL. SELECT GET_JSON_OBJECT('"array":["aaaa",1111],"bbbb":["cccc",3333]','$.array[1][1]');Escape a JSON string. Examples:
SET odps.sql.udf.getjsonobj.new=true; --The return value is "1". SELECT GET_JSON_OBJECT('{"a":"\\"1\\"","b":"2"}', '$.a'); --The return value is '1'. SELECT GET_JSON_OBJECT('{"a":"\'1\'","b":"2"}', '$.a');If a key in a JSON object contains a period (
.), you can use['']instead. Examples:SET odps.sql.udf.getjsonobj.new=true; --The return value is "1". SELECT GET_JSON_OBJECT('{"a.1":"1","a":"2"}', '$[\'a.1\']');