All Products
Search
Document Center

Data Lake Analytics - Deprecated:JSON

Last Updated:May 21, 2020

This topic describes how to create tables for JSON data in DLA.

Prerequisites

  • Before creating tables, you must create an OSS schema first. In the example, all tables use the following OSS schema:

    1. CREATE SCHEMA dla_oss_db with DBPROPERTIES(
    2. catalog='oss',
    3. location 'oss://dlaossfile1/dla/';
    4. );
  • DLA allows you to create tables for JSON files that meet the following conditions:

    • DLA can only process JSON files in plain text format.

    • The JSON files are UTF-8-encoded.

    • In JSON files, each line must contain a complete JSON record.

      For example, DLA does not allow you to create a table for a JSON file that contains the following data:

      1. {"id": 123, "name": "jack",
      2. "c3": "2001-02-03 12:34:56"}
      3. {"id": 456, "name": "rose", "c3": "1906-04-18 05:12:00"}
      4. {"id": 789, "name": "tom", "c3": "2001-02-03 12:34:56"}
      5. {"id": 234, "name": "alice", "c3": "1906-04-18 05:12:00"}

      You can rewrite the data as follows:

      1. {"id": 123, "name": "jack", "c3": "2001-02-03 12:34:56"}
      2. {"id": 456, "name": "rose", "c3": "1906-04-18 05:12:00"}
      3. {"id": 789, "name": "tom", "c3": "2001-02-03 12:34:56"}
      4. {"id": 234, "name": "alice", "c3": "1906-04-18 05:12:00"}

JSON file without nested data

For example, the OSS instance has a file oss://dlaossfile1/dla/json_table_1.txt that stores the following data:

  1. {"id": 123, "name": "jack", "c3": "2001-02-03 12:34:56"}
  2. {"id": 456, "name": "rose", "c3": "1906-04-18 05:12:00"}
  3. {"id": 789, "name": "tom", "c3": "2001-02-03 12:34:56"}
  4. {"id": 234, "name": "alice", "c3": "1906-04-18 05:12:00"}

The following is a sample statement for creating a table based on the file:

  1. CREATE EXTERNAL TABLE json_table_1
  2. (id int,
  3. name string,
  4. c3 timestamp
  5. )STORED AS JSON
  6. LOCATION 'oss://dlaossfile1/dla/';

The query result is as follows:

  1. SELECT * FROM dla_oss_db.json_table_1
  2. |----|------|---------------------|
  3. | id | name | c3 |
  4. | 123| jack | 2001-02-03 12:34:560|
  5. | 456| rose | 1906-04-18 05:12:00 |
  6. | 789| tome | 2001-02-03 12:34:56 |
  7. | 234| alice|1906-04-18 05:12:00 |

JSON file with nested data

For example, the JSON file contains the following data:

  1. {
  2. "DocId": "Alibaba",
  3. "User_1": {
  4. "Id": 1234,
  5. "Username": "bob1234",
  6. "Name": "Bob",
  7. "ShippingAddress": {
  8. "Address1": "969 Wenyi West St.",
  9. "Address2": null,
  10. "City": "Hangzhou",
  11. "Province": "Zhejiang"
  12. },
  13. "Orders": [
  14. {
  15. "ItemId": 6789,
  16. "OrderDate": "11/11/2017"
  17. },
  18. {
  19. "ItemId": 4352,
  20. "OrderDate": "12/12/2017"
  21. }
  22. ]
  23. }
  24. }

The following is a sample statement for creating a table based on the file:

  1. CREATE EXTERNAL TABLE json_table_2 (
  2. docid string,
  3. user_1 struct<
  4. id:INT,
  5. username:string,
  6. name:string,
  7. shippingaddress:struct<
  8. address1:string,
  9. address2:string,
  10. city:string,
  11. province:string
  12. >,
  13. orders:array<
  14. struct<
  15. itemid:INT,
  16. orderdate:string
  17. >
  18. >
  19. >
  20. )
  21. STORED AS JSON
  22. LOCATION 'oss://dlaossfile1/dla/json_table_2.txt';

The query result is as follows:

  1. select * from json_table_2;
  2. +---------+----------------------------------------------------------------------------------------------------------------+
  3. | docid | user_1 |
  4. +---------+----------------------------------------------------------------------------------------------------------------+
  5. | Alibaba | [1234, bob1234, Bob, [969 Wenyi West St., null, Hangzhou, Zhejiang], [[6789, 11/11/2017], [4352, 12/12/2017]]] |
  6. +---------+----------------------------------------------------------------------------------------------------------------+

Note: No matter whether the JSON file has nested data, a complete JSON record must be placed in one line and then it can be processed by DLA.