JSON & JSONB 資料類型操作 - AnalyticDB for PostgreSQL

JSON 類型幾乎已成為互連網及物聯網（IoT）的基礎資料類型，AnalyticDB PostgreSQL資料庫對JSON資料類型做了完善的支援。並且AnalyticDB PostgreSQL 6.0版支援JSONB類型。

這部分介紹對JSON & JSONB資料類型的操作，包括：

JSON & JSONB的異同
JSON輸入輸出文法
JSON操作符
JSONB操作符
JSON建立函數
JSON處理函數
JSONB建立索引

具體協議請參見 JSON 官網。

JSON & JSONB的異同

JSON和JSONB類型在使用上幾乎完全一致，兩者的區別主要在儲存上，json資料類型直接儲存輸入文本的完全的拷貝，JSONB資料類型以二進位格式進行儲存。同時JSONB相較於JSON更高效，處理速度提升非常大，且支援索引，一般情況下，AnalyticDB PostgreSQL 6.0版都建議使用JSONB類型替代JSON類型

JSON輸入輸出文法

AnalyticDB PostgreSQL支援的JSON資料類型的輸入和輸出文法詳見RFC 7159。

一個JSON數值可以是一個簡單值（數字、字串、true/null/false），數組，對象。下列都是合法的JSON運算式：

-- 簡單標量/簡單值
-- 簡單值可以是數字、帶引號的字串、true、false或者null
SELECT '5'::json;

-- 零個或者更多個元素的數組（元素類型可以不同）
SELECT '[1, 2, "foo", null]'::json;

-- 含有鍵/值對的對象
-- 注意對象的鍵必須總是帶引號的字串
SELECT '{"bar": "baz", "balance": 7.77, "active": false}'::json;

-- 數組和對象可以任意嵌套
SELECT '{"foo": [true, "bar"], "tags": {"a": 1, "b": null}}'::json;

以上的JSON類型都可以寫成JSONB類型的運算式，例如：

-- 簡單標量/簡單值,轉化為jsonb類型
SELECT '5'::jsonb;

JSON操作符

下表說明了可以用於JSON & JSONB資料類型的操作符。

操作符	右運算元類型	描述	例子	結果
->	int	獲得JSON數組元素（索引從零開始）。	`'[{"a":"foo"}, {"b":"bar"}, {"c":"baz"}]'::json->2`	`{"c":"baz"}`
->	text	根據鍵獲得JSON對象的域。	`'{"a": {"b":"foo"}}'::json->'a'`	`{"b":"foo"}`
->>	int	獲得JSON數組元素的文本形式。	`'[1,2,3]'::json->>2`	`3`
->>	text	獲得JSON對象域的文本形式。	`'{"a":1,"b":2}'::json->>'b'`	`2`
#>	text[]	獲得在指定路徑上的JSON對象。	`'{"a": {"b":{"c": "foo"}}}'::json#>'{a,b}'`	`{"c": "foo"}`
#>>	text[]	獲得在指定路徑上的JSON對象的文本形式。	`'{"a":[1,2,3], "b":[4,5,6]}'::json#>>'{a,2}'`	`3`

下表說明了可以用於JSONB資料類型的操作符。

JSONB操作符

下表說明了可以用於JSONB資料類型的操作符。

操作符	右運算元類型	描述	例子
=	jsonb	兩個JSON對象的內容是否相等	`'[1,2]'::jsonb= '[1,2]'::jsonb`
@>	jsonb	左邊的JSON對象是否包含右邊的JSON對象	`'{"a":1, "b":2}'::jsonb @> '{"b":2}'::jsonb`
<@	jsonb	左邊的JSON對象是否包含於右邊的JSON對象	`'{"b":2}'::jsonb <@ '{"a":1, "b":2}'::jsonb`
?	text	指定的字串是否存在於JSON對象中的key或者字串類型的元素中	`'{"a":1, "b":2}'::jsonb ? 'b'`
?\|	text[]	右值字串數組是否存在任一元素在JSON對象字串類型的key或者元素中	`'{"a":1, "b":2, "c":3}'::jsonb ?\| array['b', 'c']`
?&	text[]	右值字串數組是否所有元素在JSON對象字串類型的key或者元素中	`'["a", "b"]'::jsonb ?& array['a', 'b']`

JSON建立函數

下表說明了用於建立JSON值的函數。

函數	描述	例子	結果
`to_json (anyelement)`	返回該值作為一個合法的JSON對象。數組和組合會被遞迴處理並且轉換成數組和對象。如果輸入包含一個從該類型到JSON的造型，會使用該cast函數來執行轉換，否則將會產生一個JSON標量值。對於任何非數字、布爾值或空值的標量類型，會使用其文本表示，並且加上適當的引號和轉義讓它變成一個合法的JSON字串。	`to_json ('Fred said "Hi."'::text)`	`"Fred said \"Hi.\""`
`array_to_json (anyarray [, pretty_bool])`	返回該數組為一個JSON數組。一個多維陣列會變成一個JSON數組的數組。說明如果pretty_bool為true，在第一維元素之間會增加換行。	`array_to_json ('{{1,5},{99,100}}'::int[])`	`[[1,5],[99,100]]`
`row_to_json (record [, pretty_bool])`	返回該行為一個JSON對象。說明如果pretty_bool為true，在第一層級元素之間會增加換行。	`row_to_json (row(1,'foo'))`	`{"f1":1,"f2":"foo"}`

JSON處理函數

下表說明了處理JSON值的函數。

函數	傳回型別	描述	例子	例子結果
`json_each(json)`	`set of key text, value json set of key text, value jsonb`	把最外層的JSON對象展開成鍵/值對的集合。	`select * from json_each('{"a":"foo", "b":"bar"}')`	`key \| value -----+------- a \| "foo" b \| "bar"`
`json_each_text(json)`	`set of key text, value text`	把最外層的JSON對象展開成鍵/值對的集合。傳回值的類型是text。	`select * from json_each_text('{"a":"foo", "b":"bar"}')`	`key \| value -----+------- a \| foo b \| bar`
`json_extract_path(from_json json, VARIADIC path_elems text[])`	`json`	返回path_elems指定的JSON值。等效於#>操作符。	`json_extract_path('{"f2":{"f3":1},"f4":{"f5":99,"f6":"foo"}}','f4')`	`{"f5":99,"f6":"foo"}`
`json_extract_path_text(from_json json, VARIADIC path_elems text[])`	`text`	返回path_elems指定的JSON值為文本。等效於#>>操作符。	`json_extract_path_text('{"f2":{"f3":1},"f4":{"f5":99,"f6":"foo"}}','f4', 'f6')`	`foo`
`json_object_keys(json)`	`setof text`	返回最外層JSON對象中的鍵集合。	`json_object_keys('{"f1":"abc","f2":{"f3":"a", "f4":"b"}}')`	`json_object_keys ------------------ f1 f2`
`json_populate_record(base anyelement, from_json json)`	`anyelement`	把Expands the object in from_json中的對象展開成一行，其中的列匹配由base定義的記錄類型。	`select * from json_populate_record(null::myrowtype, '{"a":1,"b":2}')`	`a \| b ---+--- 1 \| 2`
`json_populate_recordset(base anyelement, from_json json)`	`set of anyelement`	將from_json中最外層的對象數組展開成一個行集合，其中的列匹配由base定義的記錄類型。	`select * from json_populate_recordset(null::myrowtype, '[{"a":1,"b":2},{"a":3,"b":4}]')`	`a \| b ---+--- 1 \| 2 3 \| 4`
`json_array_elements(json)`	`set of json`	將一個JSON數組展開成JSON值的一個集合。	`select * from json_array_elements('[1,true, [2,false]]')`	`value ----------- 1 true [2,false]`

JSONB建立索引

JSONB類型支援GIN, BTree索引。一般情況下，我們會在JSONB類型欄位上建GIN索引，文法如下：

CREATE INDEX idx_name ON table_name USING gin (idx_col);
CREATE INDEX idx_name ON table_name USING gin (idx_col jsonb_path_ops);

說明在JSONB上建立GIN索引的方式有兩種：使用預設的jsonb_ops操作符建立和使用jsonb_path_ops操作符建立。兩者的區別在jsonb_ops的GIN索引中，JSONB資料中的每個key和value都是作為一個單獨的索引項目的，而jsonb_path_ops則只為每個value建立一個索引項目。

JSON操作舉例

建立表

create table tj(id serial, ary int[], obj json, num integer);
=> insert into tj(ary, obj, num) values('{1,5}'::int[], '{"obj":1}', 5);
INSERT 0 1
=> select row_to_json(q) from (select id, ary, obj, num from tj) as q;
                row_to_json
-------------------------------------------
 {"f1":1,"f2":[1,5],"f3":{"obj":1},"f4":5}
(1 row)
=> insert into tj(ary, obj, num) values('{2,5}'::int[], '{"obj":2}', 5);
INSERT 0 1
=> select row_to_json(q) from (select id, ary, obj, num from tj) as q;
                row_to_json
-------------------------------------------
 {"f1":1,"f2":[1,5],"f3":{"obj":1},"f4":5}
 {"f1":2,"f2":[2,5],"f3":{"obj":2},"f4":5}
(2 rows)

說明 JSON 類型不能支援作為分布鍵來使用；也不支援 JSON 彙總函式。

多表JOIN

create table tj2(id serial, ary int[], obj json, num integer);
=> insert into tj2(ary, obj, num) values('{2,5}'::int[], '{"obj":2}', 5);
INSERT 0 1
=> select * from tj, tj2 where tj.obj->>'obj' = tj2.obj->>'obj';
 id |  ary  |    obj    | num | id |  ary  |    obj    | num
----+-------+-----------+-----+----+-------+-----------+-----
  2 | {2,5} | {"obj":2} |   5 |  1 | {2,5} | {"obj":2} |   5
(1 row)
=> select * from tj, tj2 where json_object_field_text(tj.obj, 'obj') = json_object_field_text(tj2.obj, 'obj');
 id |  ary  |    obj    | num | id |  ary  |    obj    | num
----+-------+-----------+-----+----+-------+-----------+-----
  2 | {2,5} | {"obj":2} |   5 |  1 | {2,5} | {"obj":2} |   5
(1 row)

JSON 函數索引

CREATE TEMP TABLE test_json (
       json_type text,
       obj json
);
=> insert into test_json values('aa', '{"f2":{"f3":1},"f4":{"f5":99,"f6":"foo"}}');
INSERT 0 1
=> insert into test_json values('cc', '{"f7":{"f3":1},"f8":{"f5":99,"f6":"foo"}}');
INSERT 0 1
=> select obj->'f2' from test_json where json_type = 'aa';
 ?column?
----------
 {"f3":1}
(1 row)
=> create index i on test_json (json_extract_path_text(obj, '{f4}'));
CREATE INDEX
=> select * from test_json where json_extract_path_text(obj, '{f4}') = '{"f5":99,"f6":"foo"}';
 json_type |                 obj
-----------+-------------------------------------------
 aa        | {"f2":{"f3":1},"f4":{"f5":99,"f6":"foo"}}
(1 row)

JSONB建立索引

-- 建立測試表並產生資料
CREATE TABLE jtest1 (
    id int,
    jdoc json
);

CREATE OR REPLACE FUNCTION random_string(INTEGER)
RETURNS TEXT AS
$BODY$
SELECT array_to_string(
    ARRAY (
        SELECT substring(
            '0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz'
            FROM (ceil(random()*62))::int FOR 1
        )
        FROM generate_series(1, $1)
    ),
    ''
)
$BODY$
LANGUAGE sql VOLATILE;

insert into jtest1 select t.seq, ('{"a":{"a1":"a1a1", "a2":"a2a2"}, 
"name":"'||random_string(10)||'","b":"bbbbb"}')::json from
generate_series(1, 10000000) as t(seq);

CREATE TABLE jtest2 (
    id int,
    jdoc jsonb
);

CREATE TABLE jtest3 (
    id int,
    jdoc jsonb
);

insert into jtest2 select id, jdoc::jsonb from jtest1;
insert into jtest3 select id, jdoc::jsonb from jtest1;

-- 建立索引
CREATE INDEX idx_jtest2 ON jtest2 USING gin(jdoc);
CREATE INDEX idx_jtest3 ON jtest3 USING gin(jdoc jsonb_path_ops);

-- 未建索引執行
EXPLAIN ANALYZE SELECT * FROM jtest1 where jdoc @> '{"name":"N9WP5txmVu"}';
                                                              QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------
 Gather Motion 2:1  (slice1; segments: 2)  (cost=0.00..162065.73 rows=10100 width=88) (actual time=1343.248..1777.605 rows=1 loops=1)
   ->  Seq Scan on jtest2  (cost=0.00..162065.73 rows=5050 width=88) (actual time=0.042..1342.426 rows=1 loops=1)
         Filter: (jdoc @> '{"name": "N9WP5txmVu"}'::jsonb)
 Planning time: 0.172 ms
   (slice0)    Executor memory: 59K bytes.
   (slice1)    Executor memory: 91K bytes avg x 2 workers, 91K bytes max (seg0).
 Memory used:  2047000kB
 Optimizer: Postgres query optimizer
 Execution time: 1778.234 ms
(9 rows)

-- 使用jsonb_ops操作符建立索引執行
EXPLAIN ANALYZE SELECT * FROM jtest2 where jdoc @> '{"name":"N9WP5txmVu"}';
                                                           QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------
 Gather Motion 2:1  (slice1; segments: 2)  (cost=88.27..13517.81 rows=10100 width=88) (actual time=0.655..0.659 rows=1 loops=1)
   ->  Bitmap Heap Scan on jtest2  (cost=88.27..13517.81 rows=5050 width=88) (actual time=0.171..0.172 rows=1 loops=1)
         Recheck Cond: (jdoc @> '{"name": "N9WP5txmVu"}'::jsonb)
         ->  Bitmap Index Scan on idx_jtest2  (cost=0.00..85.75 rows=5050 width=0) (actual time=0.217..0.217 rows=1 loops=1)
               Index Cond: (jdoc @> '{"name": "N9WP5txmVu"}'::jsonb)
 Planning time: 0.151 ms
   (slice0)    Executor memory: 69K bytes.
   (slice1)    Executor memory: 628K bytes avg x 2 workers, 632K bytes max (seg1).  Work_mem: 9K bytes max.
 Memory used:  2047000kB
 Optimizer: Postgres query optimizer
 Execution time: 1.266 ms
(11 rows)

-- 使用jsonb_path_ops操作符建立索引執行
EXPLAIN ANALYZE SELECT * FROM jtest3 where jdoc @> '{"name":"N9WP5txmVu"}';
                                                           QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------
 Gather Motion 2:1  (slice1; segments: 2)  (cost=84.28..13513.81 rows=10101 width=88) (actual time=0.710..0.711 rows=1 loops=1)
   ->  Bitmap Heap Scan on jtest3  (cost=84.28..13513.81 rows=5051 width=88) (actual time=0.179..0.181 rows=1 loops=1)
         Recheck Cond: (jdoc @> '{"name": "N9WP5txmVu"}'::jsonb)
         ->  Bitmap Index Scan on idx_jtest3  (cost=0.00..81.75 rows=5051 width=0) (actual time=0.106..0.106 rows=1 loops=1)
               Index Cond: (jdoc @> '{"name": "N9WP5txmVu"}'::jsonb)
 Planning time: 0.144 ms
   (slice0)    Executor memory: 69K bytes.
   (slice1)    Executor memory: 305K bytes avg x 2 workers, 309K bytes max (seg1).  Work_mem: 9K bytes max.
 Memory used:  2047000kB
 Optimizer: Postgres query optimizer
 Execution time: 1.291 ms
(11 rows)

下面是Python訪問的一個例子：

#! /bin/env python
import time
import json
import psycopg2
def gpquery(sql):
    conn = None
    try:
        conn = psycopg2.connect("dbname=sanity1x2")
        conn.autocommit = True
        cur = conn.cursor()
        cur.execute(sql)
        return cur.fetchall()
    except Exception as e:
        if conn:
            try:
                conn.close()
            except:
                pass
            time.sleep(10)
        print e
    return None
def main():
    sql = "select obj from tj;"
    #rows = Connection(host, port, user, pwd, dbname).query(sql)
    rows = gpquery(sql)
    for row in rows:
        print json.loads(row[0])
if __name__ == "__main__":
    main()