MaxCompute支援將包含過濾條件或部分運算元類型的原始SQL查詢語句改寫為物化視圖。本文介紹SQL改寫的具體操作步驟。
適用範圍
使用物化視圖查詢改寫的原則是查詢語句中需要的資料必須從物化視圖中得到,包括輸出資料行、篩選條件中需要的列、彙總函式需要的列、JOIN條件需要的列。如果查詢語句中需要的列不包含在物化視圖中或彙總函式不支援,則無法基於物化視圖進行查詢改寫。
使用物化視圖查詢改寫,需要在查詢語句前添加配置:
SET odps.sql.materialized.view.enable.auto.rewriting=true;當物化視圖處於失效狀態時不支援查詢改寫,查詢語句會直接查詢源表而無法獲得加速作用。
跨專案物化視圖改寫
預設每個MaxCompute專案只能利用自身的物化視圖查詢改寫,如果需要利用其他專案中的物化視圖改寫,需要在查詢語句前添加配置指定其他MaxCompute專案列表,語句如下:
SET odps.sql.materialized.view.source.project.white.list = <project_name1>,<project_name2>,<project_name3>;帶有
LEFT/RIGHT JOIN和UNION ALL關鍵字物化視圖改寫,需要在查詢語句前添加配置:SET odps.sql.materialized.view.enable.substitute.rewriting=true;
運算元類型對照表
MaxCompute中物化視圖的查詢改寫支援的運算元類型及與其他產品的對照關係如下。
運算元類型 | 分類 | MaxCompute | BigQuery | Amazon RedShift | Hive |
FILTER | 運算式完全符合 | 支援 | 支援 | 支援 | 支援 |
運算式部分匹配 | 支援 | 支援 | 支援 | 支援 | |
AGGREGATE | 單個AGGREGATE | 支援 | 支援 | 支援 | 支援 |
多個AGGREGATE | 不支援 | 不支援 | 不支援 | 不支援 | |
JOIN | JOIN類型 | INNER JOIN | 不支援 | INNER JOIN | INNER JOIN |
單個JOIN | 支援 | 不支援 | 支援 | 支援 | |
多個JOIN | 支援 | 不支援 | 支援 | 支援 | |
AGGREGATE+JOIN | - | 支援 | 不支援 | 支援 | 支援 |
使用樣本
樣本一:改寫帶過濾條件的查詢語句
建立物化視圖。
CREATE MATERIALIZED VIEW mv AS SELECT a,b,c FROM src WHERE a>5;基於建立的物化視圖執行查詢語句,查詢改寫對照如下。
原始查詢語句
改寫後的查詢語句
SELECT a,b FROM src WHERE a>5;SELECT a,b FROM mv;SELECT a, b FROM src WHERE a=10;SELECT a,b FROM mv WHERE a=10;SELECT a, b FROM src WHERE a=10 AND b=3;SELECT a,b FROM mv WHERE a=10 AND b=3;SELECT a, b FROM src WHERE a>3;(SELECT a,b FROM src WHERE a>3 AND a<=5) UNION (SELECT a,b FROM mv);SELECT a, b FROM src WHERE a=10 AND d=4;改寫不成功,因為mv中沒有d列。
SELECT d, e FROM src WHERE a=10;改寫不成功,因為mv中沒有d、e列。
SELECT a, b FROM src WHERE a=1;改寫不成功,因為mv中沒有a=1的資料。
樣本二:改寫帶彙總函式的查詢語句
如果物化視圖的SQL語句和查詢語句的彙總Key相同,那麼所有彙總函式都可以改寫,如果彙總Key不相同,只支援SUM、MIN和MAX。
建立物化視圖。
CREATE MATERIALIZED VIEW mv AS SELECT a, b, sum(c) AS sum, count(d) AS cnt FROM src GROUP BY a, b;基於建立的物化視圖執行查詢語句,查詢改寫對照如下。
原始查詢語句
改寫後的查詢語句
SELECT a, sum(c) FROM src GROUP BY a;SELECT a, sum(sum) FROM mv GROUP BY a;SELECT a, count(d) FROM src GROUP BY a, b;SELECT a, cnt FROM mv;SELECT a, count(b) FROM (SELECT a, b FROM src GROUP BY a, b) GROUP BY a;SELECT a,count(b) FROM mv GROUP BY a;SELECT a,count(b) FROM mv GROUP BY a;改寫不成功,視圖對a、b列進行過彙總,不能再對b進行彙總。
SELECT a, count(c) FROM src GROUP BY a;改寫不成功,對於COUNT函數不支援重新彙總。
如果彙總函式中有DISTINCT,當物化視圖語句和查詢語句彙總Key相同,可以改寫,否則不可以改寫。
建立物化視圖。
CREATE MATERIALIZED VIEW mv AS SELECT a, b, sum(DISTINCT c) AS sum, count(DISTINCT d) AS cnt FROM src GROUP BY a, b;基於建立的物化視圖執行查詢語句,查詢改寫對照如下。
原始查詢語句
改寫後的查詢語句
SELECT a, count(DISTINCT d) FROM src GROUP BY a, b;SELECT a, cnt FROM mv;SELECT a, count(c) FROM src GROUP BY a, b;改寫不成功,對於COUNT函數不支援重新彙總。
SELECT a, count(DISTINCT c) FROM src GROUP BY a;改寫不成功,因為需要對a再進行彙總。
樣本三:改寫帶JOIN的查詢語句
改寫JOIN輸入
建立物化視圖。
CREATE MATERIALIZED VIEW mv1 AS SELECT a, b FROM j1 WHERE b > 10; CREATE MATERIALIZED VIEW mv2 AS SELECT a, b FROM j2 WHERE b > 10;基於建立的物化視圖執行查詢語句,查詢改寫對照如下。
原始查詢語句
改寫後的查詢語句
SELECT j1.a,j1.b,j2.a FROM (SELECT a,b FROM j1 WHERE b > 10) j1 JOIN j2 ON j1.a=j2.a;SELECT mv1.a, mv1.b, j2.a FROM mv1 JOIN j2 ON mv1.a=j2.a;SELECT j1.a,j1.b,j2.a FROM (SELECT a,b FROM j1 WHERE b > 10) j1 JOIN (SELECT a,b FROM j2 WHERE b > 10) j2 ON j1.a=j2.a;SELECT mv1.a,mv1.b,mv2.a FROM mv1 JOIN mv2 ON mv1.a=mv2.a;
JOIN帶過濾條件
建立物化視圖。
--建立非分區物化視圖。 CREATE MATERIALIZED VIEW mv1 AS SELECT j1.a, j1.b FROM j1 JOIN j2 ON j1.a=j2.a; CREATE MATERIALIZED VIEW mv2 AS SELECT j1.a, j1.b FROM j1 JOIN j2 ON j1.a=j2.a WHERE j1.a > 10; --建立分區物化視圖。 CREATE MATERIALIZED VIEW mv LIFECYCLE 7 PARTITIONED BY (ds) AS SELECT t1.id, t1.ds AS ds FROM t1 JOIN t2 ON t1.id = t2.id;基於建立的物化視圖執行查詢語句,查詢改寫對照如下。
原始查詢語句
改寫後的查詢語句
SELECT j1.a,j1.b FROM j1 JOIN j2 ON j1.a=j2.a WHERE j1.a=4;SELECT a, b FROM mv1 WHERE a=4;SELECT j1.a,j1.b FROM j1 JOIN j2 ON j1.a=j2.a WHERE j1.a > 20;SELECT a,b FROM mv2 WHERE a>20;SELECT j1.a,j1.b FROM j1 JOIN j2 ON j1.a=j2.a WHERE j1.a > 5;(SELECT j1.a,j1.b FROM j1 JOIN j2 ON j1.a=j2.a WHERE j1.a > 5 AND j1.a <= 10) UNION SELECT * FROM mv2;SELECT key FROM t1 JOIN t2 ON t1.id= t2.id WHERE t1.ds='20210306';SELECT key FROM mv WHERE ds='20210306';SELECT key FROM t1 JOIN t2 ON t1.id= t2.id WHERE t1.ds>='20210306';SELECT key FROM mv WHERE ds>='20210306';SELECT j1.a,j1.b FROM j1 JOIN j2 ON j1.a=j2.a WHERE j2.a=4;改寫不成功,因為物化視圖沒有j2.a列。
JOIN增加表
建立物化視圖
CREATE MATERIALIZED VIEW mv AS SELECT j1.a, j1.b FROM j1 JOIN j2 ON j1.a=j2.a;基於建立的物化視圖執行查詢語句,查詢改寫對照如下。
原始查詢語句
改寫後的查詢語句
SELECT j1.a, j1.b FROM j1 JOIN j2 JOIN j3 ON j1.a=j2.a AND j1.a=j3.a;SELECT mv.a, mv.b FROM mv JOIN j3 ON mv.a=j3.a;SELECT j1.a, j1.b FROM j1 JOIN j2 JOIN j3 ON j1.a=j2.a AND j2.a=j3.a;SELECT mv.a,mv.b FROM mv JOIN j3 ON mv.a=j3.a;
以上三種語句可以相互結合,如果查詢語句符合改寫條件,就可以改寫。
使用物化視圖查詢改寫的目標是加速查詢,因此MaxCompute會優先運行更好的改寫規則,若改寫後增加了一些操作,導致查詢加速效果不佳,那麼該改寫最後不會被選中。
樣本四:改寫帶LEFT JOIN的查詢語句
建立物化視圖。
CREATE MATERIALIZED VIEW mv LIFECYCLE 7( user_id, job, total_amount ) AS SELECT t1.user_id, t1.job, sum(t2.order_amount) AS total_amount FROM user_info AS t1 LEFT JOIN sale_order AS t2 ON t1.user_id=t2.user_id GROUP BY t1.user_id;基於建立的物化視圖執行查詢語句,查詢改寫對照如下。
原始查詢語句
改寫後的查詢語句
SELECT t1.user_id, sum(t2.order_amout) AS total_amount FROM user_info AS t1 LEFT JOIN sale_order AS t2 ON t1.user_id=t2.user_id GROUP BY t1.user_id;SELECT user_id, total_amount FROM mv;
樣本五:改寫帶UNION ALL的查詢語句
建立物化視圖。
CREATE MATERIALIZED VIEW mv LIFECYCLE 7( user_id, tran_amount, tran_date ) AS SELECT user_id, tran_amount, tran_date FROM alipay_tran UNION ALL SELECT user_id, tran_amount, tran_date FROM unionpay_tran;基於建立的物化視圖執行查詢語句,查詢改寫對照如下。
原始查詢語句
改寫後的查詢語句
SELECT user_id, tran_amount FROM alipay_tran UNION ALL SELECT user_id, tran_amount FROM unionpay_tran;SELECT user_id, total_amount FROM mv;
樣本六:情境樣本
情境假設
假設有一張頁面訪問表visit_records,記錄了各個使用者訪問的頁面ID、使用者ID、訪問時間。使用者經常要對不同頁面的訪問量進行查詢分析。
此時,可以給visit_records表建立一個以頁面ID分組,統計各個頁面訪問次數的物化視圖,並基於物化視圖執行後續查詢操作。
visit_records的結構如下:
+------------------------------------------------------------------------------------+ | Field | Type | Label | Comment | +------------------------------------------------------------------------------------+ | page_id | string | | | | user_id | string | | | | visit_time | string | | | +------------------------------------------------------------------------------------+建立物化視圖。
-- 給visit_records表建立一個以頁面ID分組,統計各個頁面訪問次數的物化視圖。 CREATE MATERIALIZED VIEW count_mv AS SELECT page_id, count(*) FROM visit_records GROUP BY page_id;執行查詢語句如下。
SET odps.sql.materialized.view.enable.auto.rewriting=true; SELECT page_id, count(*) FROM visit_records GROUP BY page_id;執行該查詢語句時,MaxCompute能自動匹配到物化視圖
count_mv,從count_mv中讀取彙總好的資料。執行如下命令檢驗查詢語句是否匹配到物化視圖。
EXPLAIN SELECT page_id, count(*) FROM visit_records GROUP BY page_id;返回結果如下:
job0 is root job In Job job0: root Tasks: M1 In Task M1: Data source: doc_test_dev.count_mv TS: doc_test_dev.count_mv FS: output: Screen schema: page_id (string) _c1 (bigint) OK從返回結果中的Data source可查看到當前查詢讀取的表是
doc_test_dev專案下的count_mv,說明物化視圖有效,查詢改寫成功。