StarRocks的非同步物化視圖採用了主流的基於SPJG(select-project-join-group-by)模式透明查詢改寫演算法。在不修改查詢語句的前提下,StarRocks可以自動將在基表上的查詢改寫為在物化視圖上的查詢,通過其中包含的預計算結果,協助您顯著降低計算成本,並大幅加速查詢執行。本文為您介紹如何利用StarRocks的非同步物化視圖來改寫並加速查詢。
適用情境
StarRocks的非同步物化視圖查詢改寫功能,主要應用於以下情境:
指標預彙總
如果您需要處理高維度資料,可以使用物化視圖來建立預彙總指標層。
寬表Join
物化視圖允許您在複雜情境下透明加速包含大寬表Join的查詢。
湖倉加速
構建基於External Catalog的物化視圖可以輕鬆加速針對資料湖中資料的查詢。
基於JDBC Catalog表構建的非同步物化視圖暫不支援查詢改寫。
功能特點
StarRocks的非同步物化視圖自動查詢改寫功能具有以下特點:
強資料一致性:如果基表是StarRocks內表,StarRocks可以保證通過物化視圖查詢改寫獲得的結果與直接查詢基表的結果一致。
Staleness rewrite:StarRocks支援Staleness rewrite,即允許容忍一定程度的資料到期,以應對資料變更頻繁的情況。
多表Join:StarRocks的非同步物化視圖支援各種類型的Join,包括一些複雜的Join情境,如View Delta Join和Join派生改寫,可用於加速涉及大寬表的查詢情境。
彙總改寫:StarRocks可以改寫帶有彙總操作的查詢,以提高報表效能。
嵌套物化視圖:StarRocks支援基於嵌套物化視圖改寫複雜查詢,擴充了可改寫的查詢範圍。
Union改寫:您可以將Union改寫特性與物化視圖分區的存留時間(TTL)相結合,實現冷熱資料的分離,允許您從物化視圖查詢熱資料,從基表查詢歷史資料。
基於視圖構建物化視圖:您可以在基於視圖建模的情景下加速查詢。
基於External Catalog構建物化視圖:您可以通過該特性加速資料湖中的查詢。
複雜運算式改寫:支援在運算式中調用函數和算術運算,滿足複雜分析和計算需求。
使用限制
單物化視圖查詢改寫能力,StarRocks目前存在以下限制:
StarRocks不支援非確定性函數的改寫,包括rand、random、uuid以及sleep。
StarRocks不支援視窗函數的改寫。
如果物化視圖定義語句中包含LIMIT、ORDER BY、UNION、EXCEPT、INTERSECT、MINUS、GROUPING SETS、WITH CUBE或WITH ROLLUP,則無法用於改寫。
基於External Catalog的物化視圖不保證查詢結果強一致。
基於JDBC Catalog表構建的非同步物化視圖暫不支援查詢改寫。
針對基於視圖的物化視圖查詢改寫,StarRocks目前存在以下限制:
StarRocks不支援分區Union改寫。
如果視圖包含隨機函數,則不支援查詢改寫,包括rand()、random()、uuid()和sleep()。
如果視圖包含具有相同名稱的列,則不支援查詢改寫。您必須為具有相同名稱的列設定不同的別名。
用於建立物化視圖的視圖必須至少包含以下資料類型之一的列:
整數類型
日期類型
字串類型
Join改寫
StarRocks 支援改寫具有各種類型 Join 的查詢,包括 Inner Join、Cross Join、Left Outer Join、Full Outer Join、Right Outer Join、Semi Join 和 Anti Join。
以下樣本展示 Join 查詢的改寫。建立以下資料庫及基表:
CREATE DATABASE IF NOT EXISTS test_db;
USE test_db;
CREATE TABLE customer (
c_custkey INT(11) NOT NULL,
c_name VARCHAR(26) NOT NULL,
c_address VARCHAR(41) NOT NULL,
c_city VARCHAR(11) NOT NULL,
c_nation VARCHAR(16) NOT NULL,
c_region VARCHAR(13) NOT NULL,
c_phone VARCHAR(16) NOT NULL,
c_mktsegment VARCHAR(11) NOT NULL
) ENGINE=OLAP
DUPLICATE KEY(c_custkey)
DISTRIBUTED BY HASH(c_custkey) BUCKETS 12;
CREATE TABLE lineorder (
lo_orderkey INT(11) NOT NULL,
lo_linenumber INT(11) NOT NULL,
lo_custkey INT(11) NOT NULL,
lo_partkey INT(11) NOT NULL,
lo_suppkey INT(11) NOT NULL,
lo_orderdate INT(11) NOT NULL,
lo_orderpriority VARCHAR(16) NOT NULL,
lo_shippriority INT(11) NOT NULL,
lo_quantity INT(11) NOT NULL,
lo_extendedprice INT(11) NOT NULL,
lo_ordtotalprice INT(11) NOT NULL,
lo_discount INT(11) NOT NULL,
lo_revenue INT(11) NOT NULL,
lo_supplycost INT(11) NOT NULL,
lo_tax INT(11) NOT NULL,
lo_commitdate INT(11) NOT NULL,
lo_shipmode VARCHAR(11) NOT NULL
) ENGINE=OLAP
DUPLICATE KEY(lo_orderkey)
DISTRIBUTED BY HASH(lo_orderkey) BUCKETS 48;基於上述基表,建立以下物化視圖。
USE test_db;
CREATE MATERIALIZED VIEW join_mv1
DISTRIBUTED BY HASH(lo_orderkey)
AS
SELECT lo_orderkey, lo_linenumber, lo_revenue, lo_partkey, c_name, c_address
FROM lineorder INNER JOIN customer
ON lo_custkey = c_custkey;該物化視圖可以改寫以下查詢。
SELECT lo_orderkey, lo_linenumber, lo_revenue, lo_partkey, c_name, c_address
FROM lineorder INNER JOIN customer
ON lo_custkey = c_custkey;其原始查詢計劃和改寫後的計劃如下。

StarRocks支援改寫具有複雜運算式的Join查詢,如算術運算、字串函數、日期函數、CASE WHEN運算式和謂詞OR等。例如,上述物化視圖可以改寫以下查詢:
SELECT
lo_orderkey,
lo_linenumber,
(2 * lo_revenue + 1) * lo_linenumber,
upper(c_name),
substr(c_address, 3)
FROM lineorder INNER JOIN customer
ON lo_custkey = c_custkey;除了常規情境,StarRocks還支援在更複雜的情景下改寫Join查詢。
Query Delta Join改寫
Query Delta Join是指查詢中Join的表是物化視圖中Join的表的超集的情況。例如,以下查詢Join了表lineorder、表customer和表part。如果物化視圖join_mv1僅包含lineorder和customer的Join,StarRocks可以使用join_mv1來改寫查詢。
樣本:
SELECT lo_orderkey, lo_linenumber, lo_revenue, c_name, c_address, p_name
FROM
lineorder INNER JOIN customer ON lo_custkey = c_custkey
INNER JOIN part ON lo_partkey = p_partkey;其原始查詢計劃和改寫後的計劃如下:

View Delta Join改寫
View Delta Join指的是查詢中Join的表是物化視圖中Join的表的子集的情況。通常在涉及大寬表的情景中使用此功能。例如,在Star Schema Benchmark (SSB)的背景下,您可以通過建立物化視圖,Join所有表以提高查詢效能。測試發現在通過物化視圖透明改寫查詢後,多表Join的查詢效能可以達到與查詢相應大寬表相同的效能水平。
要啟用View Delta Join改寫,必須確保物化視圖包含了與查詢中所有相關表存在的1:1的Cardinality Preservation Join。滿足以下約束條件的九種Join都被視為Cardinality Preservation Join,可以用於啟用View Delta Join改寫。

以SSB測試為例,建立以下資料庫和基表。
CREATE DATABASE IF NOT EXISTS test_db;
USE test_db;
CREATE TABLE customer (
c_custkey INT(11) NOT NULL,
c_name VARCHAR(26) NOT NULL,
c_address VARCHAR(41) NOT NULL,
c_city VARCHAR(11) NOT NULL,
c_nation VARCHAR(16) NOT NULL,
c_region VARCHAR(13) NOT NULL,
c_phone VARCHAR(16) NOT NULL,
c_mktsegment VARCHAR(11) NOT NULL
) ENGINE=OLAP
DUPLICATE KEY(c_custkey)
DISTRIBUTED BY HASH(c_custkey) BUCKETS 12
PROPERTIES (
"unique_constraints" = "c_custkey" -- 指定唯一鍵。
);
CREATE TABLE dates (
d_datekey DATE NOT NULL,
d_date VARCHAR(20) NOT NULL,
d_dayofweek VARCHAR(10) NOT NULL,
d_month VARCHAR(11) NOT NULL,
d_year INT(11) NOT NULL,
d_yearmonthnum INT(11) NOT NULL,
d_yearmonth VARCHAR(9) NOT NULL,
d_daynuminweek INT(11) NOT NULL,
d_daynuminmonth INT(11) NOT NULL,
d_daynuminyear INT(11) NOT NULL,
d_monthnuminyear INT(11) NOT NULL,
d_weeknuminyear INT(11) NOT NULL,
d_sellingseason VARCHAR(14) NOT NULL,
d_lastdayinweekfl INT(11) NOT NULL,
d_lastdayinmonthfl INT(11) NOT NULL,
d_holidayfl INT(11) NOT NULL,
d_weekdayfl INT(11) NOT NULL
) ENGINE=OLAP
DUPLICATE KEY(d_datekey)
DISTRIBUTED BY HASH(d_datekey) BUCKETS 1
PROPERTIES (
"unique_constraints" = "d_datekey" -- 指定唯一鍵。
);
CREATE TABLE supplier (
s_suppkey INT(11) NOT NULL,
s_name VARCHAR(26) NOT NULL,
s_address VARCHAR(26) NOT NULL,
s_city VARCHAR(11) NOT NULL,
s_nation VARCHAR(16) NOT NULL,
s_region VARCHAR(13) NOT NULL,
s_phone VARCHAR(16) NOT NULL
) ENGINE=OLAP
DUPLICATE KEY(s_suppkey)
DISTRIBUTED BY HASH(s_suppkey) BUCKETS 12
PROPERTIES (
"unique_constraints" = "s_suppkey" -- 指定唯一鍵。
);
CREATE TABLE part (
p_partkey INT(11) NOT NULL,
p_name VARCHAR(23) NOT NULL,
p_mfgr VARCHAR(7) NOT NULL,
p_category VARCHAR(8) NOT NULL,
p_brand VARCHAR(10) NOT NULL,
p_color VARCHAR(12) NOT NULL,
p_type VARCHAR(26) NOT NULL,
p_size TINYINT(11) NOT NULL,
p_container VARCHAR(11) NOT NULL
) ENGINE=OLAP
DUPLICATE KEY(p_partkey)
DISTRIBUTED BY HASH(p_partkey) BUCKETS 12
PROPERTIES (
"unique_constraints" = "p_partkey" -- 指定唯一鍵。
);
CREATE TABLE lineorder (
lo_orderdate DATE NOT NULL, -- 指定為 NOT NULL。
lo_orderkey INT(11) NOT NULL,
lo_linenumber TINYINT NOT NULL,
lo_custkey INT(11) NOT NULL, -- 指定為 NOT NULL。
lo_partkey INT(11) NOT NULL, -- 指定為 NOT NULL。
lo_suppkey INT(11) NOT NULL, -- 指定為 NOT NULL。
lo_orderpriority VARCHAR(100) NOT NULL,
lo_shippriority TINYINT NOT NULL,
lo_quantity TINYINT NOT NULL,
lo_extendedprice INT(11) NOT NULL,
lo_ordtotalprice INT(11) NOT NULL,
lo_discount TINYINT NOT NULL,
lo_revenue INT(11) NOT NULL,
lo_supplycost INT(11) NOT NULL,
lo_tax TINYINT NOT NULL,
lo_commitdate DATE NOT NULL,
lo_shipmode VARCHAR(100) NOT NULL
) ENGINE=OLAP
DUPLICATE KEY(lo_orderdate,lo_orderkey)
PARTITION BY RANGE(lo_orderdate)
(PARTITION p1 VALUES [("0000-01-01"), ("1993-01-01")),
PARTITION p2 VALUES [("1993-01-01"), ("1994-01-01")),
PARTITION p3 VALUES [("1994-01-01"), ("1995-01-01")),
PARTITION p4 VALUES [("1995-01-01"), ("1996-01-01")),
PARTITION p5 VALUES [("1996-01-01"), ("1997-01-01")),
PARTITION p6 VALUES [("1997-01-01"), ("1998-01-01")),
PARTITION p7 VALUES [("1998-01-01"), ("1999-01-01")))
DISTRIBUTED BY HASH(lo_orderkey) BUCKETS 48
PROPERTIES (
"foreign_key_constraints" = "(
(lo_custkey) REFERENCES customer(c_custkey),
(lo_partkey) REFERENCES part(p_partkey),
(lo_suppkey) REFERENCES supplier(s_suppkey)
)" -- 指定外鍵約束。
);建立Join表lineorder、表customer、表supplier、表part和表dates的物化視圖lineorder_flat_mv。
USE test_db;
CREATE MATERIALIZED VIEW lineorder_flat_mv
DISTRIBUTED BY HASH(LO_ORDERDATE, LO_ORDERKEY) BUCKETS 48
PARTITION BY LO_ORDERDATE
REFRESH MANUAL
PROPERTIES (
"partition_refresh_number"="1"
)
AS SELECT /*+ SET_VAR(query_timeout = 7200) */ -- 設定重新整理逾時時間。
l.LO_ORDERDATE AS LO_ORDERDATE,
l.LO_ORDERKEY AS LO_ORDERKEY,
l.LO_LINENUMBER AS LO_LINENUMBER,
l.LO_CUSTKEY AS LO_CUSTKEY,
l.LO_PARTKEY AS LO_PARTKEY,
l.LO_SUPPKEY AS LO_SUPPKEY,
l.LO_ORDERPRIORITY AS LO_ORDERPRIORITY,
l.LO_SHIPPRIORITY AS LO_SHIPPRIORITY,
l.LO_QUANTITY AS LO_QUANTITY,
l.LO_EXTENDEDPRICE AS LO_EXTENDEDPRICE,
l.LO_ORDTOTALPRICE AS LO_ORDTOTALPRICE,
l.LO_DISCOUNT AS LO_DISCOUNT,
l.LO_REVENUE AS LO_REVENUE,
l.LO_SUPPLYCOST AS LO_SUPPLYCOST,
l.LO_TAX AS LO_TAX,
l.LO_COMMITDATE AS LO_COMMITDATE,
l.LO_SHIPMODE AS LO_SHIPMODE,
c.C_NAME AS C_NAME,
c.C_ADDRESS AS C_ADDRESS,
c.C_CITY AS C_CITY,
c.C_NATION AS C_NATION,
c.C_REGION AS C_REGION,
c.C_PHONE AS C_PHONE,
c.C_MKTSEGMENT AS C_MKTSEGMENT,
s.S_NAME AS S_NAME,
s.S_ADDRESS AS S_ADDRESS,
s.S_CITY AS S_CITY,
s.S_NATION AS S_NATION,
s.S_REGION AS S_REGION,
s.S_PHONE AS S_PHONE,
p.P_NAME AS P_NAME,
p.P_MFGR AS P_MFGR,
p.P_CATEGORY AS P_CATEGORY,
p.P_BRAND AS P_BRAND,
p.P_COLOR AS P_COLOR,
p.P_TYPE AS P_TYPE,
p.P_SIZE AS P_SIZE,
p.P_CONTAINER AS P_CONTAINER,
d.D_DATE AS D_DATE,
d.D_DAYOFWEEK AS D_DAYOFWEEK,
d.D_MONTH AS D_MONTH,
d.D_YEAR AS D_YEAR,
d.D_YEARMONTHNUM AS D_YEARMONTHNUM,
d.D_YEARMONTH AS D_YEARMONTH,
d.D_DAYNUMINWEEK AS D_DAYNUMINWEEK,
d.D_DAYNUMINMONTH AS D_DAYNUMINMONTH,
d.D_DAYNUMINYEAR AS D_DAYNUMINYEAR,
d.D_MONTHNUMINYEAR AS D_MONTHNUMINYEAR,
d.D_WEEKNUMINYEAR AS D_WEEKNUMINYEAR,
d.D_SELLINGSEASON AS D_SELLINGSEASON,
d.D_LASTDAYINWEEKFL AS D_LASTDAYINWEEKFL,
d.D_LASTDAYINMONTHFL AS D_LASTDAYINMONTHFL,
d.D_HOLIDAYFL AS D_HOLIDAYFL,
d.D_WEEKDAYFL AS D_WEEKDAYFL
FROM lineorder AS l
INNER JOIN customer AS c ON c.C_CUSTKEY = l.LO_CUSTKEY
INNER JOIN supplier AS s ON s.S_SUPPKEY = l.LO_SUPPKEY
INNER JOIN part AS p ON p.P_PARTKEY = l.LO_PARTKEY
INNER JOIN dates AS d ON l.LO_ORDERDATE = d.D_DATEKEY; SSB Q2.1樣本如下。
USE test_db;
SELECT
SUM(lo_revenue) AS lo_revenue,
d_year,
p_brand
FROM lineorder
JOIN dates ON lo_orderdate = d_datekey
JOIN part ON lo_partkey = p_partkey
JOIN supplier ON lo_suppkey = s_suppkey
WHERE p_category = 'MFGR#12' AND s_region = 'AMERICA'
GROUP BY d_year, p_brand
ORDER BY d_year, p_brand;SSB Q2.1涉及四個表的Join,但與物化視圖lineorder_flat_mv相比,缺少了customer表。在lineorder_flat_mv中,lineorder INNER JOIN customer本質上是一個Cardinality Preservation Join。因此邏輯上,可以消除該Join而不影響查詢結果。因此,Q2.1可以使用lineorder_flat_mv進行改寫。
其原始查詢計劃和改寫後的計劃如下。

同樣,SSB中的其他查詢也可以通過使用lineorder_flat_mv進行透明改寫,從而最佳化查詢效能。
Join派生改寫
Join派生是指物化視圖和查詢中的Join類型不一致,但物化視圖的Join結果包含查詢Join結果的情況。
目前支援以下兩種情境:
三表或以上Join
假設物化視圖包含表t1和表t2之間的Left Outer Join,以及表t2和表t3之間的Inner Join。兩個Join的條件都包括來自表t2的列。
而查詢則包含t1和t2之間的Inner Join,以及t2和t3之間的Inner Join。兩個Join的條件都包括來自表t2的列。
在這種情況下,上述查詢可以通過物化視圖改寫。這是因為在物化視圖中,首先執行Left Outer Join,然後執行Inner Join。Left Outer Join產生的右表沒有匹配結果(即右表中的列為NULL)。這些結果在執行Inner Join期間被過濾掉。因此,物化視圖和查詢的邏輯是等效的,可以對查詢進行改寫。
例如,建立以下物化視圖join_mv5。
USE test_db;
CREATE MATERIALIZED VIEW join_mv5
PARTITION BY lo_orderdate
DISTRIBUTED BY hash(lo_orderkey)
PROPERTIES (
"partition_refresh_number" = "1"
)
AS
SELECT lo_orderkey, lo_orderdate, lo_linenumber, lo_revenue, c_custkey, c_address, p_name
FROM customer LEFT OUTER JOIN lineorder
ON c_custkey = lo_custkey
INNER JOIN part
ON p_partkey = lo_partkey;join_mv5可改寫以下查詢。
USE test_db;
SELECT lo_orderkey, lo_orderdate, lo_linenumber, lo_revenue, c_custkey, c_address, p_name
FROM customer INNER JOIN lineorder
ON c_custkey = lo_custkey
INNER JOIN part
ON p_partkey = lo_partkey;其原始查詢計劃和改寫後的計劃如下。

同樣,如果物化視圖定義為t1 INNER JOIN t2 INNER JOIN t3,而查詢為LEFT OUTER JOIN t2 INNER JOIN t3,那麼查詢也可以被改寫。而且,在涉及超過三個表的情況下,也具備上述的改寫能力。
兩表Join
兩表Join的派生改寫支援以下幾種細分情境。

在情境1至9中,需要向改寫結果補償過濾謂詞,以確保語義等效性。例如,建立以下物化視圖。
USE test_db;
CREATE MATERIALIZED VIEW join_mv3
DISTRIBUTED BY hash(lo_orderkey)
AS
SELECT lo_orderkey, lo_linenumber, lo_revenue, c_custkey, c_address
FROM lineorder LEFT OUTER JOIN customer
ON lo_custkey = c_custkey;則join_mv3可以改寫以下查詢,其查詢結果需補償謂詞c_custkey IS NOT NULL。
USE test_db;
SELECT lo_orderkey, lo_linenumber, lo_revenue, c_custkey, c_address
FROM lineorder INNER JOIN customer
ON lo_custkey = c_custkey;其原始查詢計劃和改寫後的計劃如下。

在情境10中,需要Left Outer Join查詢中包含右表中IS NOT NULL的過濾謂詞,如=、<>、>、<、<=、>=、LIKE、IN、NOT LIKE或NOT IN。例如,建立以下物化視圖。
USE test_db;
CREATE MATERIALIZED VIEW join_mv4
DISTRIBUTED BY hash(lo_orderkey)
AS
SELECT lo_orderkey, lo_linenumber, lo_revenue, c_custkey, c_address
FROM lineorder INNER JOIN customer
ON lo_custkey = c_custkey;則join_mv4可以改寫以下查詢,其中customer.c_address = "Sb4gxKs7"為IS NOT NULL謂詞。
USE test_db;
SELECT lo_orderkey, lo_linenumber, lo_revenue, c_custkey, c_address
FROM lineorder LEFT OUTER JOIN customer
ON lo_custkey = c_custkey
WHERE customer.c_address = "Sb4gxKs7";其原始查詢計劃和改寫後的計劃如下。

彙總改寫
StarRocks非同步物化視圖的多表彙總查詢改寫支援所有彙總函式,包括bitmap_union、hll_union和percentile_union等。例如,建立以下物化視圖。
USE test_db;
CREATE MATERIALIZED VIEW agg_mv1
DISTRIBUTED BY hash(lo_orderkey)
AS
SELECT
lo_orderkey,
lo_linenumber,
c_name,
sum(lo_revenue) AS total_revenue,
max(lo_discount) AS max_discount
FROM lineorder INNER JOIN customer
ON lo_custkey = c_custkey
GROUP BY lo_orderkey, lo_linenumber, c_name;該物化視圖可以改寫以下查詢。
USE test_db;
SELECT
lo_orderkey,
lo_linenumber,
c_name,
sum(lo_revenue) AS total_revenue,
max(lo_discount) AS max_discount
FROM lineorder INNER JOIN customer
ON lo_custkey = c_custkey
GROUP BY lo_orderkey, lo_linenumber, c_name;其原始查詢計劃和改寫後的計劃如下。

以下詳細闡述了彙總改寫功能可用的情境。
彙總上卷改寫
StarRocks支援通過彙總上卷改寫查詢,即StarRocks可以使用通過GROUP BY a,b子句建立的非同步物化視圖改寫帶有GROUP BY a子句的彙總查詢。例如,agg_mv1可以改寫以下查詢。
USE test_db;
SELECT
lo_orderkey,
c_name,
sum(lo_revenue) AS total_revenue,
max(lo_discount) AS max_discount
FROM lineorder INNER JOIN customer
ON lo_custkey = c_custkey
GROUP BY lo_orderkey, c_name;其原始查詢計劃和改寫後的計劃如下。

當前暫不支援grouping set、grouping set with rollup以及grouping set with cube的改寫。
僅有部分彙總函式支援彙總上卷查詢改寫。下表展示了原始查詢中的彙總函式與用於構建物化視圖的彙總函式之間的對應關係。您可以根據自己的業務情境,選擇相應的彙總函式構建物化視圖。
原始查詢彙總函式 | 支援 Aggregate Rollup 的物化視圖構建彙總函式 |
sum | sum |
count | count |
min | min |
max | max |
avg | sum/count |
bitmap_union、bitmap_union_count、count(distinct) | bitmap_union |
hll_raw_agg、hll_union_agg、ndv、approx_count_distinct | hll_union |
percentile_approx、percentile_union | percentile_union |
沒有相應GROUP BY列的DISTINCT彙總無法使用彙總上卷查詢改寫。但是,從StarRocks v3.1開始,如果彙總上卷對應DISTINCT彙總函式的查詢沒有GROUP BY列,但有等價的謂詞,該查詢也可以被相關物化視圖改寫,因為StarRocks可以將等價謂詞轉換為GROUP BY常量運算式。
在以下樣本中,StarRocks可以使用物化視圖order_agg_mv1改寫對應查詢Query。
USE test_db;
CREATE MATERIALIZED VIEW order_agg_mv1
DISTRIBUTED BY HASH(`order_id`) BUCKETS 12
REFRESH ASYNC START('2022-09-01 10:00:00') EVERY (interval 1 day)
AS
SELECT
order_date,
count(distinct client_id)
FROM order_list
GROUP BY order_date;
-- Query
USE test_db;
SELECT
order_date,
count(distinct client_id)
FROM order_list WHERE order_date='2023-07-03';彙總下推
從v3.3.0版本開始,StarRocks支援物化視圖查詢改寫的彙總下推功能。啟用此功能後,彙總函式將在查詢執行期間下推至Scan Operator,並在執行Join Operator之前被物化視圖改寫。此舉可以緩解Join操作導致的資料膨脹,從而提高查詢效能。
系統預設禁用該功能。要啟用此功能,必須將系統變數enable_materialized_view_agg_pushdown_rewrite設定為true。
假設需要加速以下基於SSB的查詢SQL1。
-- SQL1
USE test_db;
SELECT
LO_ORDERDATE, sum(LO_REVENUE), max(LO_REVENUE), count(distinct LO_REVENUE)
FROM lineorder l JOIN dates d
ON l.LO_ORDERDATE = d.d_date
GROUP BY LO_ORDERDATE
ORDER BY LO_ORDERDATE;SQL1包含lineorder表內的彙總以及lineorder和dates表之間的Join。彙總發生在lineorder內部,與dates的Join僅用於資料過濾。所以SQL1在邏輯上等同於以下SQL2。
-- SQL2
USE test_db;
SELECT
LO_ORDERDATE, sum(sum1), max(max1), bitmap_union_count(bitmap1)
FROM
(SELECT
LO_ORDERDATE, sum(LO_REVENUE) AS sum1, max(LO_REVENUE) AS max1, bitmap_union(to_bitmap(LO_REVENUE)) AS bitmap1
FROM lineorder
GROUP BY LO_ORDERDATE) l JOIN dates d
ON l.LO_ORDERDATE = d.d_date
GROUP BY LO_ORDERDATE
ORDER BY LO_ORDERDATE;SQL2將彙總提前,大量減少Join的資料量。您可以基於SQL2的子查詢建立物化視圖,並啟用彙總下推以改寫和加速彙總。
-- 建立物化視圖 mv0
USE test_db;
CREATE MATERIALIZED VIEW mv0 REFRESH MANUAL AS
SELECT
LO_ORDERDATE,
sum(LO_REVENUE) AS sum1,
max(LO_REVENUE) AS max1,
bitmap_union(to_bitmap(LO_REVENUE)) AS bitmap1
FROM lineorder
GROUP BY LO_ORDERDATE;
-- 啟用彙總下推
SET enable_materialized_view_agg_pushdown_rewrite=true;此時,SQL1將通過物化視圖進行改寫和加速。改寫後的查詢如下。
USE test_db;
SELECT
LO_ORDERDATE, sum(sum1), max(max1), bitmap_union_count(bitmap1)
FROM
(SELECT LO_ORDERDATE, sum1, max1, bitmap1 FROM mv0) l JOIN dates d
ON l.LO_ORDERDATE = d.d_date
GROUP BY LO_ORDERDATE
ORDER BY LO_ORDERDATE;請注意,只有部分支援彙總上卷改寫的彙總函式可以下推。目前支援下推的彙總函式有:
MIN
MAX
COUNT
COUNT DISTINCT
SUM
BITMAP_UNION
HLL_UNION
PERCENTILE_UNION
BITMAP_AGG
ARRAY_AGG_DISTINCT
下推後的彙總函式需要進行上卷才能對齊原始語義。有關彙總上卷的更多說明,請參閱彙總上卷改寫。
彙總下推支援基於Bitmap或HLL函數的Count Distinct上卷改寫。
彙總下推僅支援將查詢中的彙總函式下推至Join、Filter、Where Operator之下的Scan Operator之上。
彙總下推僅支援基於單張表構建的物化視圖進行查詢改寫和加速。
COUNT DISTINCT改寫
StarRocks支援將COUNT DISTINCT計算改寫為BITMAP類型的計算,從而使用物化視圖實現高效能、精確的去重。例如,建立以下物化視圖。
USE test_db;
CREATE MATERIALIZED VIEW distinct_mv
DISTRIBUTED BY hash(lo_orderkey)
AS
SELECT lo_orderkey, bitmap_union(to_bitmap(lo_custkey)) AS distinct_customer
FROM lineorder
GROUP BY lo_orderkey;該物化視圖可以改寫以下查詢。
USE test_db;
SELECT lo_orderkey, count(distinct lo_custkey)
FROM lineorder
GROUP BY lo_orderkey;嵌套物化視圖改寫
StarRocks支援使用嵌套物化視圖改寫查詢。例如,建立以下物化視圖join_mv2、agg_mv2和agg_mv3。
USE test_db;
CREATE MATERIALIZED VIEW join_mv2
DISTRIBUTED BY hash(lo_orderkey)
AS
SELECT lo_orderkey, lo_linenumber, lo_revenue, lo_discount, c_name, c_address
FROM lineorder INNER JOIN customer
ON lo_custkey = c_custkey;
CREATE MATERIALIZED VIEW agg_mv2
DISTRIBUTED BY hash(lo_orderkey)
AS
SELECT
lo_orderkey,
lo_linenumber,
c_name,
SUM(lo_revenue) AS total_revenue,
MAX(lo_discount) AS max_discount
FROM join_mv2
GROUP BY lo_orderkey, lo_linenumber, c_name;
CREATE MATERIALIZED VIEW agg_mv3
DISTRIBUTED BY hash(lo_orderkey)
AS
SELECT
lo_orderkey,
SUM(total_revenue) AS total_revenue,
MAX(max_discount) AS max_discount
FROM agg_mv2
GROUP BY lo_orderkey;其關係如下。

agg_mv3可改寫以下查詢。
USE test_db;
SELECT
lo_orderkey,
SUM(lo_revenue) AS total_revenue,
MAX(lo_discount) AS max_discount
FROM lineorder INNER JOIN customer
ON lo_custkey = c_custkey
GROUP BY lo_orderkey;其原始查詢計劃和改寫後的計劃如下。

Union改寫
謂詞Union改寫
當物化視圖的謂詞範圍是查詢的謂詞範圍的子集時,可以使用UNION操作改寫查詢。
例如,建立以下物化視圖。
USE test_db;
CREATE MATERIALIZED VIEW agg_mv4
DISTRIBUTED BY hash(lo_orderkey)
AS
SELECT
lo_orderkey,
SUM(lo_revenue) AS total_revenue,
MAX(lo_discount) AS max_discount
FROM lineorder
WHERE lo_orderkey < 300000000
GROUP BY lo_orderkey;該物化視圖可以改寫以下查詢。
USE test_db;
SELECT
lo_orderkey,
SUM(lo_revenue) AS total_revenue,
MAX(lo_discount) AS max_discount
FROM lineorder
GROUP BY lo_orderkey;其原始查詢計劃和改寫後的計劃如下。

其中,agg_mv4包含lo_orderkey < 300000000的資料,lo_orderkey >= 300000000的資料通過直接查詢表lineorder得到,最終通過Union操作之後再彙總,擷取最終結果。
分區Union改寫
假設基於分區表建立了一個分區物化視圖。當查詢掃描的分區範圍是物化視圖最新分區範圍的超集時,查詢可被UNION改寫。
例如,有如下的物化視圖agg_mv5。基表lineorder當前包含分區p1至p7,物化視圖也包含分區p1至p7。
USE test_db;
CREATE MATERIALIZED VIEW agg_mv5
DISTRIBUTED BY hash(lo_orderkey)
PARTITION BY RANGE(lo_orderdate)(
START ("1993-01-01") END ("1999-01-01") EVERY (INTERVAL 1 YEAR)
)
REFRESH MANUAL
AS
SELECT
lo_orderdate,
lo_orderkey,
sum(lo_revenue) AS total_revenue,
max(lo_discount) AS max_discount
FROM lineorder
GROUP BY lo_orderdate,lo_orderkey;如果lineorder新增分區p8,其範圍為[("19990101"), ("20000101")),則以下查詢可被UNION改寫。
USE test_db;
SELECT
lo_orderdate,
lo_orderkey,
sum(lo_revenue) AS total_revenue,
max(lo_discount) AS max_discount
FROM lineorder
GROUP BY lo_orderdate,lo_orderkey;其原始查詢計劃和改寫後的計劃如下。

如上所示,agg_mv5包含來自分區p1到p7的資料,而分區p8的資料來源於lineorder。最後,這兩組資料使用UNION操作合并。
基於視圖的物化視圖查詢改寫
自v3.1.0起,StarRocks支援基於視圖建立物化視圖。如果基於視圖的查詢為SPJG類型,StarRocks將會內聯展開查詢,然後進行改寫。預設情況下,對視圖的查詢會自動延伸為對視圖的基表的查詢,然後進行透明匹配和改寫。
然而,在實際情境中,您可能會基於複雜的嵌套視圖進行資料建模,這些視圖無法直接展開。因此,基於這些視圖建立的物化視圖無法改寫查詢。為了改進在上述情況下的能力,從v3.3.0開始,StarRock最佳化了基於視圖的物化視圖查詢改寫邏輯。
基本原理
在前面的查詢改寫邏輯中,StarRocks會將基於視圖的查詢展開為針對視圖基表的查詢。如果展開後查詢的執行計畫與SPJG模式不匹配,物化視圖將無法改寫查詢。
為瞭解決這個問題,StarRocks引入了一個新的運算元LogicalViewScanOperator,該運算元用於簡化執行計畫樹的結構,且無需展開查詢,使查詢執行計畫樹盡量滿足SPJG模式,從而最佳化查詢改寫。
以下樣本展示了一個包含彙總子查詢的查詢,一個建立在子查詢之上的視圖,基於視圖展開之後的查詢,以及建立在視圖之上的物化視圖。
-- 原始查詢。
SELECT
v1.a,
t2.b,
v1.total
FROM(
SELECT
a,
sum(c) AS total
FROM t1
GROUP BY a
) v1
INNER JOIN t2 ON v1.a = t2.a;
-- 建立在子查詢之上的視圖。
CREATE VIEW view_1 AS
SELECT
t1.a,
sum(t1.c) AS total
FROM t1
GROUP BY t1.a;
-- 基於視圖的查詢。
SELECT
v1.a,
t2.b,
v1.total
FROM view_1 v1
JOIN t2 ON v1.a = t2.a;
-- 基於視圖的物化視圖。
CREATE MATERIALIZED VIEW mv1
DISTRIBUTED BY hash(a)
REFRESH MANUAL
AS
SELECT
v1.a,
t2.b,
v1.total
FROM view_1 v1
JOIN t2 ON v1.a = t2.a;原始查詢的執行計畫如下圖左側所示。由於JOIN內的LogicalAggregateOperator與SPJG模式不匹配,StarRocks不支援這種情況下的查詢改寫。然而,如果將子查詢定義為一個視圖,原始查詢可以展開為針對該視圖的查詢。通過LogicalViewScanOperator,StarRocks可以將不匹配的部分轉換為SPJG模式,從而允許改寫查詢。

配置項
StarRocks預設禁用基於視圖的物化視圖查詢改寫。
要啟用此功能,您必須設定以下變數。
SET enable_view_based_mv_rewrite = true;使用情境
基於單個視圖的物化視圖查詢改寫
StarRocks支援通過基於單個視圖的物化視圖進行查詢改寫,包括彙總查詢。
例如,您可以為TPC-H Query 18構建以下視圖和物化視圖。
USE test_db;
CREATE VIEW q18_view
AS
SELECT
c_name,
c_custkey,
o_orderkey,
o_orderdate,
o_totalprice,
sum(l_quantity)
FROM
customer,
orders,
lineitem
WHERE
o_orderkey IN (
SELECT
l_orderkey
FROM
lineitem
GROUP BY
l_orderkey having
sum(l_quantity) > 315
)
AND c_custkey = o_custkey
AND o_orderkey = l_orderkey
GROUP BY
c_name,
c_custkey,
o_orderkey,
o_orderdate,
o_totalprice;
CREATE MATERIALIZED VIEW q18_mv
DISTRIBUTED BY hash(c_custkey, o_orderkey)
REFRESH MANUAL
AS
SELECT * FROM q18_view;物化視圖可以改寫以下兩個查詢。
USE test_db;
-- SQL1
EXPLAIN LOGICAL SELECT * FROM q18_view;
+-------------------------------------------------------------------------------------------------------+
| Explain String |
+-------------------------------------------------------------------------------------------------------+
| - Output => [2:c_name, 1:c_custkey, 9:o_orderkey, 10:o_orderdate, 13:o_totalprice, 52:sum] |
| - SCAN [q18_mv] => [1:c_custkey, 2:c_name, 52:sum, 9:o_orderkey, 10:o_orderdate, 13:o_totalprice] |
| MaterializedView: true |
| Estimates: {row: 9, cpu: 486.00, memory: 0.00, network: 0.00, cost: 243.00} |
| partitionRatio: 1/1, tabletRatio: 96/96 |
| 1:c_custkey := 60:c_custkey |
| 2:c_name := 59:c_name |
| 52:sum := 64:sum(l_quantity) |
| 9:o_orderkey := 61:o_orderkey |
| 10:o_orderdate := 62:o_orderdate |
| 13:o_totalprice := 63:o_totalprice |
+-------------------------------------------------------------------------------------------------------+-- SQL2
EXPLAIN LOGICAL SELECT c_name, sum(`sum(l_quantity)`) FROM q18_view GROUP BY c_name;
+-----------------------------------------------------------------------------------------------------+
| Explain String |
+-----------------------------------------------------------------------------------------------------+
| - Output => [2:c_name, 59:sum] |
| - AGGREGATE(GLOBAL) [2:c_name] |
| Estimates: {row: 9, cpu: 306.00, memory: 306.00, network: 0.00, cost: 1071.00} |
| 59:sum := sum(59:sum) |
| - EXCHANGE(SHUFFLE) [2] |
| Estimates: {row: 9, cpu: 30.60, memory: 0.00, network: 30.60, cost: 306.00} |
| - AGGREGATE(LOCAL) [2:c_name] |
| Estimates: {row: 9, cpu: 61.20, memory: 30.60, network: 0.00, cost: 244.80} |
| 59:sum := sum(52:sum) |
| - SCAN [q18_mv] => [2:c_name, 52:sum] |
| MaterializedView: true |
| Estimates: {row: 9, cpu: 306.00, memory: 0.00, network: 0.00, cost: 153.00} |
| partitionRatio: 1/1, tabletRatio: 96/96 |
| 2:c_name := 60:c_name |
| 52:sum := 65:sum(l_quantity) |
+-----------------------------------------------------------------------------------------------------+基於視圖的物化視圖改寫JOIN查詢
StarRocks支援對包含視圖之間或視圖與表之間的 JOIN 的查詢進行改寫,包括在JOIN上進行彙總。
例如,您可以建立以下視圖和物化視圖。
USE test_db;
CREATE VIEW view_1 AS
SELECT
l_partkey,
l_suppkey,
sum(l_quantity) AS total_quantity
FROM lineitem
GROUP BY
l_partkey,
l_suppkey;
CREATE VIEW view_2 AS
SELECT
l_partkey,
l_suppkey,
sum(l_tax) AS total_tax
FROM lineitem
GROUP BY
l_partkey,
l_suppkey;
CREATE MATERIALIZED VIEW mv_1
DISTRIBUTED BY hash(l_partkey, l_suppkey)
REFRESH MANUAL AS
SELECT
v1.l_partkey,
v2.l_suppkey,
total_quantity,
total_tax
FROM view_1 v1
JOIN view_2 v2 ON v1.l_partkey = v2.l_partkey
AND v1.l_suppkey = v2.l_suppkey;物化視圖可以改寫以下兩個查詢。
USE test_db;
-- SQL1
EXPLAIN LOGICAL SELECT v1.l_partkey,
v2.l_suppkey,
total_quantity,
total_tax
FROM view_1 v1
JOIN view_2 v2 ON v1.l_partkey = v2.l_partkey
AND v1.l_suppkey = v2.l_suppkey;
+--------------------------------------------------------------------------------------------------------+
| Explain String |
+--------------------------------------------------------------------------------------------------------+
| - Output => [4:l_partkey, 25:l_suppkey, 17:sum, 37:sum] |
| - SCAN [mv_1] => [17:sum, 4:l_partkey, 37:sum, 25:l_suppkey] |
| MaterializedView: true |
| Estimates: {row: 799541, cpu: 31981640.00, memory: 0.00, network: 0.00, cost: 15990820.00} |
| partitionRatio: 1/1, tabletRatio: 96/96 |
| 17:sum := 43:total_quantity |
| 4:l_partkey := 41:l_partkey |
| 37:sum := 44:total_tax |
| 25:l_suppkey := 42:l_suppkey |
+--------------------------------------------------------------------------------------------------------+-- SQL2
EXPLAIN LOGICAL SELECT v1.l_partkey,
sum(total_quantity),
sum(total_tax)
FROM view_1 v1
JOIN view_2 v2 ON v1.l_partkey = v2.l_partkey
AND v1.l_suppkey = v2.l_suppkey
group by v1.l_partkey;
+--------------------------------------------------------------------------------------------------------------------+
| Explain String |
+--------------------------------------------------------------------------------------------------------------------+
| - Output => [4:l_partkey, 41:sum, 42:sum] |
| - AGGREGATE(GLOBAL) [4:l_partkey] |
| Estimates: {row: 196099, cpu: 4896864.00, memory: 3921980.00, network: 0.00, cost: 29521223.20} |
| 41:sum := sum(41:sum) |
| 42:sum := sum(42:sum) |
| - EXCHANGE(SHUFFLE) [4] |
| Estimates: {row: 136024, cpu: 489686.40, memory: 0.00, network: 489686.40, cost: 19228831.20} |
| - AGGREGATE(LOCAL) [4:l_partkey] |
| Estimates: {row: 136024, cpu: 5756695.20, memory: 489686.40, network: 0.00, cost: 18249458.40} |
| 41:sum := sum(17:sum) |
| 42:sum := sum(37:sum) |
| - SCAN [mv_1] => [17:sum, 4:l_partkey, 37:sum] |
| MaterializedView: true |
| Estimates: {row: 799541, cpu: 28783476.00, memory: 0.00, network: 0.00, cost: 14391738.00} |
| partitionRatio: 1/1, tabletRatio: 96/96 |
| 17:sum := 45:total_quantity |
| 4:l_partkey := 43:l_partkey |
| 37:sum := 46:total_tax |
+--------------------------------------------------------------------------------------------------------------------+基於視圖的物化視圖改寫外表查詢
您可以在External Catalog中的外表上構建視圖,然後基於這些視圖構建物化視圖來改寫查詢。其使用方式類似於內部表。
基於External Catalog構建物化視圖
StarRocks支援基於Hive Catalog、Hudi Catalog、Iceberg Catalog和Paimon Catalog的外部資料源上構建非同步物化視圖,並支援透明地改寫查詢。基於External Catalog的物化視圖支援大多數查詢改寫功能,但存在以下限制:
基於Hudi、Paimon和JDBC Catalog建立的物化視圖不支援Union改寫。
基於Hudi、Paimon和JDBC Catalog建立的物化視圖不支援View Delta Join改寫。
基於Hudi和JDBC Catalog建立的物化視圖不支援分區增量重新整理。
基於文本的物化視圖改寫
自v3.3.0起,StarRocks支援基於文本的物化視圖改寫,極大地拓展了自身的查詢改寫能力。
基本原理
為實現基於文本的物化視圖改寫,StarRocks將對查詢(或其子查詢)的抽象文法樹與物化視圖定義的抽象文法樹進行比較。當雙方匹配時,StarRocks就可以基於物化視圖改寫該查詢。基於文本的物化視圖改寫簡單高效,與常規的SPJG類型物化視圖查詢改寫相比限制更少。正確使用此功能可顯著增強查詢效能。
基於文本的物化視圖改寫不僅支援SPJG類型運算元,還支援Union、Window、Order、Limit和CTE等運算元。
配置項
配置名稱 | 預設值 | 描述 |
| true | 用於控制是否啟用基於文本的物化視圖改寫。此功能預設啟用。將此項設定為 |
| true | 用於控制是否在建立非同步物化視圖時構建抽象文法樹。此功能預設啟用。將此項設定為 |
| 4 | 用於控制系統比對子查詢是否與物化視圖定義匹配的最大次數。預設值為 |
只有當物化視圖滿足時效性(資料一致性)要求時,才能用於基於文本的查詢改寫。您可以在建立物化視圖時通過屬性query_rewrite_consistency手動設定一致性檢查規則。
使用情境
符合以下情況的查詢可以被改寫:
原始查詢與物化視圖的定義一致。
原始查詢的子查詢與物化視圖的定義一致。
與常規的SPJG類型物化視圖查詢改寫相比,基於文本的物化視圖改寫支援更複雜的查詢,例如多層彙總。
建議您將需要匹配的查詢封裝至原始查詢的子查詢中。
請不要在物化視圖的定義或原始查詢的子查詢中封裝ORDER BY子句,否則查詢將無法被改寫。這是由於子查詢中的ORDER BY子句會預設被消除。
例如,您可以構建以下物化視圖。
USE test_db;
CREATE MATERIALIZED VIEW mv1 REFRESH MANUAL AS
SELECT
user_id,
count(1)
FROM (
SELECT
user_id,
time,
bitmap_union(to_bitmap(tag_id)) AS a
FROM user_tags
GROUP BY
user_id,
time) t
GROUP BY user_id;該物化視圖可以改寫以下兩個查詢。
USE test_db;
-- SQL1
SELECT
user_id,
count(1)
FROM (
SELECT
user_id,
time,
bitmap_union(to_bitmap(tag_id)) AS a
FROM user_tags
GROUP BY
user_id,
time) t
GROUP BY user_id;
-- SQL2
SELECT count(1)
FROM
(
SELECT
user_id,
count(1)
FROM (
SELECT
user_id,
time,
bitmap_union(to_bitmap(tag_id)) AS a
FROM user_tags
GROUP BY
user_id,
time) t
GROUP BY user_id
)m;但是該物化視圖無法改寫以下包含ORDER BY子句的查詢。
USE test_db;
SELECT
user_id,
count(1)
FROM (
SELECT
user_id,
time,
bitmap_union(to_bitmap(tag_id)) AS a
FROM user_tags
GROUP BY
user_id,
time) t
GROUP BY user_id
ORDER BY user_id;設定物化視圖查詢改寫
您可以通過以下Session變數設定非同步物化視圖查詢改寫。
變數 | 預設值 | 描述 |
| true | 是否開啟物化視圖Union改寫。 |
| true | 是否開啟基於規則的物化視圖查詢改寫功能,主要用於處理單表查詢改寫。 |
| 3 | 可用於查詢改寫的嵌套物化視圖的最大層數。類型:INT。取值範圍:[1, +∞)。取值為 |
驗證查詢改寫是否生效
您可以使用EXPLAIN語句查看對應Query Plan。如果其中OlapScanNode專案下的TABLE為對應非同步物化視圖名稱,則表示該查詢已基於非同步物化視圖改寫。
USE test_db;
EXPLAIN SELECT
order_id, sum(goods.price) AS total
FROM order_list INNER JOIN goods
ON goods.item_id1 = order_list.item_id2
GROUP BY order_id;
+------------------------------------+
| Explain String |
+------------------------------------+
| PLAN FRAGMENT 0 |
| OUTPUT EXPRS:1: order_id | 8: sum |
| PARTITION: RANDOM |
| |
| RESULT SINK |
| |
| 1:Project |
| | <slot 1> : 9: order_id |
| | <slot 8> : 10: total |
| | |
| 0:OlapScanNode |
| TABLE: order_mv |
| PREAGGREGATION: ON |
| partitions=1/1 |
| rollup: order_mv |
| tabletRatio=0/12 |
| tabletList= |
| cardinality=3 |
| avgRowSize=4.0 |
| numNodes=0 |
+------------------------------------+
20 rows in set (0.01 sec)禁用查詢改寫
StarRocks預設開啟基於Default Catalog建立的非同步物化視圖查詢改寫。您可以通過將Session變數enable_materialized_view_rewrite設定為false禁用該功能。
對於基於External Catalog建立的非同步物化視圖,你可以通過ALTER MATERIALIZED VIEW將物化視圖 Propertyforce_external_table_query_rewrite設定為false來禁用此功能。