All Products
Search
Document Center

Data Lake Analytics - Deprecated:Expand JSON data into row-column relation form

Last Updated:May 10, 2019

It is often necessary to process JSON data in databases. For example, some property fields in the JSON data need to be expanded and converted into a relational form of rows and columns. JSON data can be expanded flexibly by using various methods. This topic provides detailed examples of SQL statements to describe how to expand JSON data into a relational form of rows and columns.

General roadmap

  1. Use JSON functions to analyze JSON strings and extract data.

  2. Convert the extracted data to an array or map structure, or use lambda expressions to convert the data.

  3. Use UNNEST to expand the converted data into a relational form of rows and columns.

Use UNNEST to expand a map into a relational form of rows and columns

SQL statement example:

  1. SELECT t.m, t.n
  2. FROM (
  3. SELECT MAP(ARRAY['foo', 'bar'], ARRAY[1, 2]) as map_data
  4. )
  5. CROSS JOIN unnest(map_data) AS t(m, n);

Result:

  1. +------+------+
  2. | m | n |
  3. +------+------+
  4. | foo | 1 |
  5. | bar | 2 |
  6. +------+------+

Use UNNEST to expand JSON data into a relational form of rows and columns

SQL statement example 1:

  1. SELECT json_extract(t.a, '$.a') AS a,
  2. json_extract(t.a, '$.b') AS b
  3. FROM (
  4. SELECT cast(json_extract('{"x":[{"a":1,"b":2},{"a":3,"b":4}]}', '$.x')
  5. AS array<JSON>) AS package_array
  6. )
  7. CROSS JOIN UNNEST(package_array) AS t(a);

Result:

  1. +------+------+
  2. | a | b |
  3. +------+------+
  4. | 1 | 2 |
  5. | 3 | 4 |
  6. +------+------+

SQL statement example 2:

  1. SELECT t.m AS _col1, t.n AS _col2
  2. FROM (
  3. SELECT cast(json_extract('{"x":[{"a":1,"b":2},{"a":3,"b":4}]}', '$.x')
  4. AS array<JSON>) AS array_1,
  5. cast(json_extract('{"x":[{"a":5,"b":6}, {"a":7,"b":8}, {"a":9,"b":10}, {"a":11,"b":12}]}', '$.x')
  6. AS array<JSON>) AS array_2
  7. )
  8. CROSS JOIN UNNEST(array_1, array_2) AS t(m, n);

Result:

  1. +---------------+-----------------+
  2. | _col1 | _col2 |
  3. +---------------+-----------------+
  4. | {"a":1,"b":2} | {"a":5,"b":6} |
  5. | {"a":3,"b":4} | {"a":7,"b":8} |
  6. | NULL | {"a":9,"b":10} |
  7. | NULL | {"a":11,"b":12} |
  8. +---------------+-----------------+

SQL statement example 3:

  1. SELECT json_extract(t.m, '$.a') AS _col1,
  2. json_extract(t.m, '$.b') AS _col2,
  3. json_extract(t.n, '$.a') AS _col3,
  4. json_extract(t.n, '$.b') AS _col4
  5. FROM (
  6. SELECT cast(json_extract('{"x":[{"a":1,"b":2},{"a":3,"b":4}]}', '$.x')
  7. AS array<JSON>) AS array_1,
  8. cast(json_extract('{"x":[{"a":5,"b":6}, {"a":7,"b":8}, {"a":9,"b":10}, {"a":11,"b":12}]}', '$.x')
  9. AS array<JSON>) AS array_2
  10. )
  11. CROSS JOIN UNNEST(array_1, array_2) AS t(m, n);

Result:

  1. +-------+-------+-------+-------+
  2. | _col1 | _col2 | _col3 | _col4 |
  3. +-------+-------+-------+-------+
  4. | 1 | 2 | 5 | 6 |
  5. | 3 | 4 | 7 | 8 |
  6. | NULL | NULL | 9 | 10 |
  7. | NULL | NULL | 11 | 12 |
  8. +-------+-------+-------+-------+

Use UNNEST with lambda expressions to expand JSON data into a relational form of rows and columns

SQL statement example:

  1. SELECT count(*) AS cnt,
  2. package_name
  3. FROM (
  4. SELECT t.a AS package_name
  5. FROM (
  6. SELECT transform(packages_map_array, x -> Element_at(x, 'packageName'))
  7. AS package_array
  8. FROM (
  9. SELECT cast(Json_extract(data_json, '$.packages')
  10. AS array<map<VARCHAR, VARCHAR>>) AS packages_map_array
  11. FROM (
  12. SELECT json_parse(data) AS data_json
  13. FROM (
  14. SELECT '{
  15. "packages": [
  16. {
  17. "appName": "China Railway 12306",
  18. "packageName": "com.MobileTicket",
  19. "versionName": "4.1.9",
  20. "versionCode": "194"
  21. },
  22. {
  23. "appName": "QQ Speed",
  24. "packageName": "com.tencent.tmgp.speedmobile",
  25. "versionName": "1.11.0.13274",
  26. "versionCode": "1110013274"
  27. },
  28. {
  29. "appName": "iReader",
  30. "packageName": "com.chaozh.iReaderFree",
  31. "versionName": "7.11.0",
  32. "versionCode": "71101"
  33. }
  34. ]
  35. }'
  36. AS data
  37. )
  38. )
  39. )
  40. ) AS x (package_array)
  41. CROSS JOIN UNNEST(package_array) AS t (a)
  42. )
  43. GROUP BY package_name
  44. ORDER BY cnt DESC;

Result:

  1. +------+------------------------------+
  2. | cnt | package_name |
  3. +------+------------------------------+
  4. | 1 | com.MobileTicket |
  5. | 1 | com.tencent.tmgp.speedmobile |
  6. | 1 | com.chaozh.iReaderFree |
  7. +------+------------------------------+