本文以遊戲行業計算留存為例,介紹在Hologres實現超長周期的留存計算最佳實務。
背景資訊
在遊戲行業中,從角色建立到玩家每日登入的留存業務含義,主要關注的是玩家參與度和遊戲的長期吸引力。這一過程涉及到幾個關鍵的留存指標,用以評估遊戲的使用者粘性及玩家持續迴歸遊戲的傾向。具體包括如下表所示。
留存指標 | 說明 |
角色建立 | 玩家首次與遊戲互動的關鍵點,標誌著玩家從潛在使用者轉變為活躍使用者的轉換。良好的新手引導流程、迷人的角色定製選項和低門檻的入門體驗對於提升角色建立後的初次體驗至關重要。 |
次日留存率 | 是指新建立角色的玩家在首次登入遊戲後的第二天再次登入的比例。該指標反映了遊戲在最初接觸後給玩家留下的印象,以及遊戲是否有足夠的吸引力讓玩家願意快速迴歸。 |
日留存率、7日留存率 | 分別是衡量玩家在首次登入後第三天和第七天的回訪情況。這些中期留存指標可以協助理解遊戲在新玩家中建立習慣形成的能力,以及遊戲內容是否足夠豐富和有趣以保持玩家的興趣。 |
長期留存率 | 超過7日之後的留存資料,比如30~365日的時間,可評估遊戲能否持續吸引玩家並構建穩定的玩家基礎。這往往與遊戲的深度、社區互動、持續的內容更新等因素緊密相關。 |
遊戲公司可以通過分析這些留存指標,瞭解遊戲在哪些方面有效吸引了玩家,哪些環節可能導致玩家流失,進而指導產品最佳化、營銷策略和內容更新計劃。例如:
如果次日留存率較低,可能意味著遊戲的初始使用者體驗需要改進。
如果長期留存率較低,則可能需要增加遊戲的可玩性、社交功能或定期更新內容來維持玩家興趣。
總之,遊戲角色建立到每日登入的留存過程是評估遊戲健康情況和制定增長策略的重要依據。
即時數倉引擎的挑戰
在進行留存指標分析時,即時數倉Hologres面臨多重挑戰:
計算周期較長,需要處理大量歷史資料,這導致了較高的IO開銷,並且在JOIN操作中關聯維表和即時表時,效率降低,長周期的使用者唯一值(UV)計算對記憶體造成較大壓力。
計算多個群體的交集時,需要處理長時間範圍內的笛卡爾積交集,進一步增加了計算複雜性。
SQL代碼的複雜性影響了引擎的計算效率,大量的
CASE WHEN語句用於判斷留存日期,使得代碼開發效率低下,同時也加大了營運的複雜度。因此,最佳化這些方面對於實現高效、簡潔的超長周期留存計算至關重要。
基於CUBE的增量計算方案
基於CUBE的增量計算是先初始化並計算周期內的留存唯一使用者(UV),然後每日處理新增的角色登入資料,與關聯0~T-1歷史建立的角色,以得出從day0~dayn的留存情況。這種方法的優勢在於:
通過T+1的方式預先計算固定維度下的留存,利用儲存換取查詢速度,提高了效能和QPS(每秒查詢率)。
新增角色登入資料只需與歷史角色建立表進行一對一關聯,減少了計算複雜性,避免了N * N的計算粒度,並降低了計算的複雜性。

基於CUBE的增量計算方案,查詢的維度和時間範圍是基於預先計算好的UV結果值,因此查詢操作較為高效。然而,這種實現的擴充性較差,只能適用於固定的CUBE維度。
WITH aa AS (
SELECT
aa.start_date,
aa.gameid,
aa.gameversion,
cc.statdate as end_date,
'day' || cc.statdate::date - aa.start_date AS nday,
aa.actorid AS new_actorid,
cc.actorid AS log_actorid
FROM
(
SELECT
createtime::date AS start_date,
gameid,
gameversion,
actorid
FROM dim.dim_sdkactor
WHERE createtime > '2024-01-01'::date
) aa
LEFT JOIN
(
SELECT
*
FROM app.dws_actor_day_agg_di
WHERE statdate = '2024-06-04'
) cc
ON aa.gameid = cc.gameid
AND aa.actorid = cc.actorid
WHERE aa.start_date <= cc.statdate
)
SELECT
aa.start_date,
aa.gameid,
aa.gameversion,
aa.end_date,
aa.nday,
COUNT(DISTINCT aa.new_actorid) as NewCount,
COUNT(DISTINCT aa.log_actorid) as LogCount
FROM aa
GROUP BY
aa.start_date,
aa.gameid,
aa.gameversion,
aa.end_date,
aa.nday;基於Hologres RoaringBitmap計算方案
Hologres RoaringBitmap方案能夠實現高效靈活的查詢方式。通過構建每日維度最小粒度的Bitmap,可以實現對維度多屬性和任意時間範圍進行GROUP BY操作,以計算UV的交集。RoaringBitmap的核心優勢在於:
分桶策略(Bucketing):將整個資料範圍劃分成多個小的區間(Buckets),每個容器(Container)負責儲存一定範圍內的資料。這種劃分方式可以實現局部化資料訪問,減少了查詢時不必要的遍曆。
動態容器選擇:根據每個container內資料的分布特點(稀疏或密集),RoaringBitmap會自動選擇最合適的容器類型來儲存資料,確保資料在不同密度下都能保持高效。容器類型說明如下:
Bitmap Container:適合資料密集型情境,利用位操作高效地表示資料的存在狀態。
Array Container:適用於資料稀疏且數量不多的情況,直接儲存數值列表。
Run-Length Encoding Container:針對連續或近似連續的資料,通過編碼連續的值和長度來壓縮資料。
高效集合運算:由於每個Container內部資料群組織有序且採用高效編碼,進行集合間的交、並、差等運算時,可以針對性地最佳化這些操作,避免全量資料掃描,僅在必要時進行資料互動,從而顯著提高了運算速度。
位元影像壓縮技術:通過上述的容器選擇和編碼方式,RoaringBitmap在保持查詢效能的同時,能夠顯著減少儲存空間需求,特別是在處理大量稀疏資料時效果更為顯著。
平行處理友好:RoaringBitmap的資料結構天然支援平行處理,因為不同的Container可以獨立處理,這對於分布式系統和並行計算環境尤其有利。
Hologres RoaringBitmap計算方案樣本
資料準備
基礎角色維表和角色登入事實表
基礎角色維表(
dim.dim_sdkactor),程式碼範例如下。遊戲角色建立表,預計每款遊戲每天建立的角色在50萬層級。
BEGIN; CREATE SCHEMA dim; CALL HG_CREATE_TABLE_GROUP ('flink_tg_48', 48); CREATE TABLE dim.dim_sdkactor ( gameid text NOT NULL, channelid text NOT NULL, sdkuserid text NOT NULL default ''::text, gameversion text NOT NULL, worldid text, actorid text NOT NULL, actorname text, actorlevel integer, ip text, createtime timestamp with time zone NOT NULL default CURRENT_TIMESTAMP, createtime_user timestamp with time zone, first_logintime timestamp with time zone, os text, country text, uuid text, radid text, rsid text, activetime timestamp with time zone, adclicktime timestamp with time zone, cur_worldid text, last_logintime timestamp with time zone, first_rechargetime timestamp with time zone, viplevel integer, total_amount integer, is_old integer, first_recharge_level integer, ca_adclciktime timestamp with time zone, ca_radid text, ca_rsid text, devicetype text, match_flag integer, viptime timestamp with time zone, vip1time timestamp with time zone ,PRIMARY KEY (gameid, gameversion, actorid) ) PARTITION BY LIST (gameid); CALL set_table_property('dim.dim_sdkactor', 'orientation', 'row,column'); CALL set_table_property('dim.dim_sdkactor', 'storage_format', 'sst,orc'); CALL set_table_property('dim.dim_sdkactor', 'bitmap_columns', 'gameid,channelid,sdkuserid,gameversion,worldid,actorid,actorname,ip,os,country,uuid,radid,rsid'); CALL set_table_property('dim.dim_sdkactor', 'clustering_key', 'createtime:asc,gameversion:asc'); CALL set_table_property('dim.dim_sdkactor', 'dictionary_encoding_columns', 'gameid:auto,channelid:auto,sdkuserid:auto,gameversion:auto,worldid:auto,actorid:auto,actorname:auto,ip:auto,os:auto,country:auto,uuid:auto,radid:auto,rsid:auto'); CALL set_table_property('dim.dim_sdkactor', 'distribution_key', 'gameversion,actorid'); CALL set_table_property('dim.dim_sdkactor', 'segment_key', 'createtime'); CALL set_table_property('dim.dim_sdkactor', 'table_group', 'flink_tg_48'); CALL set_table_property('dim.dim_sdkactor', 'table_storage_mode', 'any'); CALL set_table_property('dim.dim_sdkactor', 'time_to_live_in_seconds', '3153600000'); COMMENT ON TABLE dim.dim_sdkactor IS NULL; COMMENT ON COLUMN dim.dim_sdkactor.gameid IS '遊戲ID'; COMMENT ON COLUMN dim.dim_sdkactor.channelid IS '渠道ID(pid)'; COMMENT ON COLUMN dim.dim_sdkactor.sdkuserid IS 'SDK使用者ID'; COMMENT ON COLUMN dim.dim_sdkactor.worldid IS '遊戲世界ID'; COMMENT ON COLUMN dim.dim_sdkactor.actorid IS '遊戲角色ID'; COMMENT ON COLUMN dim.dim_sdkactor.actorname IS '遊戲角色名稱'; COMMENT ON COLUMN dim.dim_sdkactor.ip IS 'IP地址'; COMMENT ON COLUMN dim.dim_sdkactor.createtime IS '角色建立時間'; COMMENT ON COLUMN dim.dim_sdkactor.createtime_user IS '使用者建立時間'; COMMENT ON COLUMN dim.dim_sdkactor.first_logintime IS '首次登入時間'; COMMENT ON COLUMN dim.dim_sdkactor.os IS '系統'; COMMENT ON COLUMN dim.dim_sdkactor.country IS '國家'; COMMENT ON COLUMN dim.dim_sdkactor.radid IS '廣告radid'; COMMENT ON COLUMN dim.dim_sdkactor.rsid IS '廣告rsid'; COMMENT ON COLUMN dim.dim_sdkactor.activetime IS '歸屬裝置啟用時間'; COMMENT ON COLUMN dim.dim_sdkactor.adclicktime IS '歸屬裝置點擊廣告時間'; COMMENT ON COLUMN dim.dim_sdkactor.viplevel IS 'vip等級'; COMMENT ON COLUMN dim.dim_sdkactor.total_amount IS '付費總金額'; COMMENT ON COLUMN dim.dim_sdkactor.is_old IS '是否是滾服角色'; COMMENT ON COLUMN dim.dim_sdkactor.first_recharge_level IS '首次付費等級'; COMMENT ON COLUMN dim.dim_sdkactor.viptime IS 'vip等級更新時間'; COMMENT ON COLUMN dim.dim_sdkactor.vip1time IS '到達vip1的時間'; COMMIT;角色登入事實表(
app.dws_actor_day_agg_di),程式碼範例如下。遊戲角色登入事件輕度匯總表,預計每15分鐘執行一次Upsert操作,每款遊戲每天登入的角色在1000萬層級。
BEGIN; CREATE SCHEMA app; CALL HG_CREATE_TABLE_GROUP ('dw_tg_48', 48); CREATE TABLE app.dws_actor_day_agg_di ( statdate timestamp with time zone NOT NULL, gameid text NOT NULL, gameversion text NOT NULL, actorid text NOT NULL, worldid text NOT NULL, channelid text NOT NULL, sdkuserid text NOT NULL, paybegincnt integer, payendcnt integer, payamount bigint, test_payamount bigint, logincnt integer, gameonlinetime integer, createtime timestamp with time zone, ad_flag text, dt_year text, dt_month text, dt_week text, one_week_range text, country text, country_code text, country_area text, timezone text NOT NULL, os text, channelname text, createtime_user text, first_worldid text, is_old text, first_channelid_actor text, language text, realamount bigint ,PRIMARY KEY (statdate, gameid, gameversion, actorid, worldid, channelid, sdkuserid, timezone) ); CALL set_table_property('app.dws_actor_day_agg_di', 'orientation', 'column'); CALL set_table_property('app.dws_actor_day_agg_di', 'storage_format', 'orc'); CALL set_table_property('app.dws_actor_day_agg_di', 'bitmap_columns', 'statdate,gameid,gameversion,actorid,worldid,channelid,sdkuserid'); CALL set_table_property('app.dws_actor_day_agg_di', 'clustering_key', 'statdate:asc,gameid:asc,channelid:asc'); CALL set_table_property('app.dws_actor_day_agg_di', 'dictionary_encoding_columns', 'ad_flag:auto,dt_year:auto,dt_month:auto,dt_week:auto,one_week_range:auto,country:auto,country_code:auto,country_area:auto,timezone:auto,os:auto,channelname:auto,createtime_user:auto,first_worldid:auto,statdate:auto,gameid:auto,gameversion:auto,actorid:auto,worldid:auto,channelid:auto,sdkuserid:auto'); CALL set_table_property('app.dws_actor_day_agg_di', 'distribution_key', 'statdate,gameid,gameversion'); CALL set_table_property('app.dws_actor_day_agg_di', 'segment_key', 'statdate'); CALL set_table_property('app.dws_actor_day_agg_di', 'table_group', 'dw_tg_48'); CALL set_table_property('app.dws_actor_day_agg_di', 'table_storage_mode', 'any'); CALL set_table_property('app.dws_actor_day_agg_di', 'time_to_live_in_seconds', '3153600000'); COMMENT ON TABLE app.dws_actor_day_agg_di IS NULL; COMMENT ON COLUMN app.dws_actor_day_agg_di.statdate IS '統計日期'; COMMENT ON COLUMN app.dws_actor_day_agg_di.gameid IS '遊戲ID'; COMMENT ON COLUMN app.dws_actor_day_agg_di.gameversion IS '遊戲版本'; COMMENT ON COLUMN app.dws_actor_day_agg_di.actorid IS '角色ID'; COMMENT ON COLUMN app.dws_actor_day_agg_di.worldid IS '區服ID'; COMMENT ON COLUMN app.dws_actor_day_agg_di.channelid IS '渠道ID'; COMMENT ON COLUMN app.dws_actor_day_agg_di.sdkuserid IS '帳號id'; COMMENT ON COLUMN app.dws_actor_day_agg_di.paybegincnt IS '日儲值意向次數'; COMMENT ON COLUMN app.dws_actor_day_agg_di.payendcnt IS '日儲值次數'; COMMENT ON COLUMN app.dws_actor_day_agg_di.payamount IS '日儲值金額'; COMMENT ON COLUMN app.dws_actor_day_agg_di.test_payamount IS '測試的日儲值金額'; COMMENT ON COLUMN app.dws_actor_day_agg_di.logincnt IS '登入遊戲次數'; COMMENT ON COLUMN app.dws_actor_day_agg_di.gameonlinetime IS '遊戲線上時間長度'; COMMENT ON COLUMN app.dws_actor_day_agg_di.createtime IS '角色建立時間'; COMMENT ON COLUMN app.dws_actor_day_agg_di.ad_flag IS '來源'; COMMENT ON COLUMN app.dws_actor_day_agg_di.dt_year IS '年'; COMMENT ON COLUMN app.dws_actor_day_agg_di.dt_month IS '月'; COMMENT ON COLUMN app.dws_actor_day_agg_di.dt_week IS '周'; COMMENT ON COLUMN app.dws_actor_day_agg_di.one_week_range IS '一周的日期範圍'; COMMENT ON COLUMN app.dws_actor_day_agg_di.country IS '國家'; COMMENT ON COLUMN app.dws_actor_day_agg_di.country_code IS '國家代碼'; COMMENT ON COLUMN app.dws_actor_day_agg_di.country_area IS '國家地區'; COMMENT ON COLUMN app.dws_actor_day_agg_di.timezone IS '時區(UTC+8:0;UTC+0:-8)'; COMMENT ON COLUMN app.dws_actor_day_agg_di.os IS '系統'; COMMENT ON COLUMN app.dws_actor_day_agg_di.channelname IS '渠道名稱'; COMMENT ON COLUMN app.dws_actor_day_agg_di.createtime_user IS '使用者建立時間'; COMMENT ON COLUMN app.dws_actor_day_agg_di.first_worldid IS '角色建立時的區服id'; COMMENT ON COLUMN app.dws_actor_day_agg_di.is_old IS '是否滾服角色(1:是,0:否)'; COMMENT ON COLUMN app.dws_actor_day_agg_di.first_channelid_actor IS '角色最開始進入遊戲的渠道id'; COMMENT ON COLUMN app.dws_actor_day_agg_di.language IS '角色首次選擇的語言'; COMMENT ON COLUMN app.dws_actor_day_agg_di.realamount IS '實收金額'; COMMIT;
Mapping表
Hologres RoaringBitmap要求使用者ID為32位INT類型且應盡量密集以達到最優效能,鑒於實際業務情境中使用者標識(如userId、actorId)多為字串形式,為解決這個問題,採取構建映射表的策略,通過Hologres的Serial類型,即自增的32位INT類型,實現了從字串到整型ID的高效、自動化映射管理,確保了資料的一致性和映射關係的穩定性。

遊戲角色建立表的映射表(
dws.actor_mapping),程式碼範例如下。BEGIN; CREATE SCHEMA dws; CALL HG_CREATE_TABLE_GROUP ('dw_tg_24', 24); CREATE TABLE dws.actor_mapping ( gameid integer NOT NULL, gameversion character varying(64) NOT NULL, actorid text NOT NULL, encode_actorid bigserial ,PRIMARY KEY (gameid, gameversion, actorid) ); CALL set_table_property('dws.actor_mapping', 'orientation', 'row,column'); CALL set_table_property('dws.actor_mapping', 'storage_format', 'sst,orc'); CALL set_table_property('dws.actor_mapping', 'bitmap_columns', 'gameversion,actorid'); CALL set_table_property('dws.actor_mapping', 'clustering_key', 'gameid:asc,gameversion:asc,actorid:asc'); CALL set_table_property('dws.actor_mapping', 'dictionary_encoding_columns', 'gameversion:auto,actorid:auto'); CALL set_table_property('dws.actor_mapping', 'distribution_key', 'gameid,gameversion,actorid'); CALL set_table_property('dws.actor_mapping', 'table_group', 'dw_tg_24'); CALL set_table_property('dws.actor_mapping', 'table_storage_mode', 'any'); CALL set_table_property('dws.actor_mapping', 'time_to_live_in_seconds', '3153600000'); COMMENT ON TABLE dws.actor_mapping IS '角色映射表'; COMMENT ON COLUMN dws.actor_mapping.gameid IS '遊戲ID'; COMMENT ON COLUMN dws.actor_mapping.gameversion IS '遊戲版本'; COMMENT ON COLUMN dws.actor_mapping.actorid IS '角色ID'; COMMENT ON COLUMN dws.actor_mapping.encode_actorid IS '角色編碼'; END;dws.actor_mapping映射表的資料寫入,程式碼範例如下。INSERT INTO dws.actor_mapping (gameid, gameversion, actorid) SELECT gameid::INT, gameversion, actorid FROM dim.dim_sdkactor ORDER BY createtime, gameid, actorid;
Roaringbitmap的結果表
RoaringBitmap使用前需要執行以下語句,建立EXTENSION,且EXTENSION是DB層級的函數,一個DB只需執行一次即可,建立DB需要重新執行。
--建立EXTENSION
CREATE EXTENSION roaringbitmap;遊戲角色建立的Roaringbitmap結果表
遊戲角色建立的Roaringbitmap結果表(
dws.actor_all_new_tag),程式碼範例如下。BEGIN; CREATE TABLE dws.actor_all_new_tag ( statdate date NOT NULL, gameid integer NOT NULL, tag_name text NOT NULL, tag_val text NOT NULL, rb_new roaringbitmap ,PRIMARY KEY (statdate, gameid, tag_name, tag_val) ); CALL set_table_property('dws.actor_all_new_tag', 'orientation', 'column'); CALL set_table_property('dws.actor_all_new_tag', 'storage_format', 'orc'); CALL set_table_property('dws.actor_all_new_tag', 'bitmap_columns', 'tag_name,tag_val'); CALL set_table_property('dws.actor_all_new_tag', 'dictionary_encoding_columns', 'tag_name:auto,tag_val:auto'); CALL set_table_property('dws.actor_all_new_tag', 'distribution_key', 'statdate,gameid,tag_name,tag_val'); CALL set_table_property('dws.actor_all_new_tag', 'segment_key', 'statdate'); CALL set_table_property('dws.actor_all_new_tag', 'table_group', 'dw_tg_24'); CALL set_table_property('dws.actor_all_new_tag', 'table_storage_mode', 'any'); CALL set_table_property('dws.actor_all_new_tag', 'time_to_live_in_seconds', '3153600000'); COMMENT ON TABLE dws.actor_all_new_tag IS '角色新建立列表'; COMMENT ON COLUMN dws.actor_all_new_tag.statdate IS '統計日期'; COMMENT ON COLUMN dws.actor_all_new_tag.gameid IS '遊戲ID'; COMMENT ON COLUMN dws.actor_all_new_tag.tag_name IS '標籤名(遊戲版本、渠道ID、遊戲世界ID、作業系統、國家編碼、來源)'; COMMENT ON COLUMN dws.actor_all_new_tag.tag_val IS '標籤值'; COMMENT ON COLUMN dws.actor_all_new_tag.rb_new IS '角色編碼列表'; END;dws.actor_all_new_tag結果表的資料寫入,程式碼範例如下。INSERT INTO dws.actor_all_new_tag (statdate, gameid, tag_name, tag_val, rb_new) SELECT createtime::DATE statdate, a.gameid::int, 'gameversion', a.gameversion, RB_BUILD_AGG (m.encode_actorid ::int) FROM dim.dim_sdkactor a INNER JOIN dws.actor_mapping m ON a.gameid::int = m.gameid AND a.gameversion = m.gameversion AND a.actorid = m.actorid GROUP BY a.createtime::DATE, a.gameid, a.gameversion
遊戲角色登入的Roaringbitmap結果表
遊戲角色登入的Roaringbitmap結果表(
dws.actor_all_login),程式碼範例如下。BEGIN; CREATE TABLE dws.actor_all_login ( statdate date NOT NULL, gameid integer NOT NULL, rb_login roaringbitmap ,PRIMARY KEY (statdate, gameid) ); CALL set_table_property('dws.actor_all_login', 'orientation', 'column'); CALL set_table_property('dws.actor_all_login', 'storage_format', 'orc'); CALL set_table_property('dws.actor_all_login', 'distribution_key', 'statdate,gameid'); CALL set_table_property('dws.actor_all_login', 'segment_key', 'statdate'); CALL set_table_property('dws.actor_all_login', 'table_group', 'dw_tg_24'); CALL set_table_property('dws.actor_all_login', 'table_storage_mode', 'any'); CALL set_table_property('dws.actor_all_login', 'time_to_live_in_seconds', '3153600000'); COMMENT ON TABLE dws.actor_all_login IS '角色登入行為指標表'; COMMENT ON COLUMN dws.actor_all_login.statdate IS '統計日期'; COMMENT ON COLUMN dws.actor_all_login.gameid IS '遊戲ID'; COMMENT ON COLUMN dws.actor_all_login.rb_login IS '角色登入指標列表'; END;dws.actor_all_login結果表的資料寫入,程式碼範例如下。INSERT INTO dws.actor_all_login (statdate, gameid, rb_login) SELECT statdate, A.gameid::int, RB_BUILD_AGG (m.encode_actorid ::int) FROM APP.dws_actor_day_agg_di A INNER JOIN dws.actor_mapping m ON A.gameid::int = M.gameid AND A.gameversion = M.gameversion AND A.actorid = M.actorid GROUP BY statdate, A.gameid::int
長周期留存查詢SQL效能驗證
擷取從遊戲角色建立到角色登入150天的留存分析。
Explain Analyze
WITH
actor_new AS
( SELECT statdate,
gameid,
RB_AND_AGG (rb_new) rb_new
FROM dws.actor_all_new_tag n
WHERE statdate >= '20240101'::date
AND gameid = '2131'
AND tag_name = 'gameversion'
AND tag_val = '2131-CN-ZS'
GROUP BY statdate,
gameid
)
,
actor_list
AS
( SELECT n.statdate start_date,
s.ds::date end_date,
'day' || s.ds::date - n.statdate AS nday,
gameid,
rb_new
FROM actor_new n,
generate_series ('20240101'::date, '20240604'::date, '1 day'::interval) AS s (ds)
WHERE n.statdate <= s.ds::date
)
SELECT count(1)
FROM ( SELECT al.start_date,
al.end_date,
al.nday,
RB_CARDINALITY (rb_new) NewCount,
RB_CARDINALITY (RB_AND (rb_new, rb_login)) LogCount
FROM actor_list al
INNER JOIN dws.actor_all_login aal
ON al.end_date = aal.statdate
AND al.gameid = aal.gameid
) t
;返回結果如下,執行耗時預估為6 s。若您有更低延遲的期望,您還可以增加分桶計算和分桶最佳化。
Gather (cost=0.00..7383.37 rows=777 width=32)
[24:1 id=100006 dop=1 time=6547/6547/6547ms rows=8001(8001/8001/8001) mem=75/75/75KB open=0/0/0ms get_next=6547/6547/6547ms]
-> Project (cost=0.00..7383.31 rows=777 width=32)
[id=27 dop=24 time=6549/3544/1202ms rows=8001(691/333/27) mem=2060/1314/131KB open=826/636/514ms get_next=5723/2908/688ms]
-> Hash Join (cost=0.00..7382.31 rows=777 width=32)
Hash Cond: ((actor_all_login.statdate = (date(actor_all_login.*))) AND (actor_all_login.gameid = actor_all_new_tag.gameid))
[id=25 dop=24 time=5768/3144/1160ms rows=8001(691/333/27) mem=2048/813/64MB open=826/636/513ms get_next=4942/2508/647ms]
-> Local Gather (cost=0.00..5.16 rows=777 width=16)
[id=22 dop=24 time=827/518/76ms rows=759(48/31/19) mem=600/600/600B open=0/0/0ms get_next=827/518/76ms local_dop=1/1/1]
-> Decode (cost=0.00..5.16 rows=777 width=16)
[id=20 split_count=24 time=1520/1154/765ms rows=759(48/31/19) mem=0/0/0B open=1/0/0ms get_next=1520/1154/765ms]
-> Seq Scan on actor_all_login (cost=0.00..5.06 rows=777 width=16)
Filter: (gameid = 2131)
RowGroupFilter: (gameid = 2131)
[id=19 split_count=24 time=1520/1154/765ms rows=759(48/31/19) mem=64/54/32MB open=1/0/0ms get_next=1520/1154/765ms physical_reads=51(3/2/1) scan_rows=37189(1634/1549/1495)]
-> Hash (cost=7377.15..7377.15 rows=606 width=28)
[id=24 dop=24 time=826/636/513ms rows=8001(691/333/27) mem=64/24/2MB open=672/571/512ms get_next=154/64/1ms rehash=1/1/1 hash_mem=384/384/384KB]
-> Redistribution (cost=0.00..7377.15 rows=606 width=28)
Hash Key: (date(actor_all_login.*)), actor_all_new_tag.gameid
[24:24 id=100005 dop=24 time=672/571/512ms rows=8001(691/333/27) mem=64/24/2MB open=2/0/0ms get_next=672/571/510ms]
-> Filter (cost=0.00..7377.14 rows=606 width=28)
Filter: (actor_all_new_tag.gameid = 2131)
[id=18 dop=24 time=380/376/371ms rows=8001(572/333/79) mem=2/2/2KB open=1/0/0ms get_next=379/375/371ms]
-> Project (cost=0.00..7377.14 rows=606 width=28)
[id=17 dop=24 time=380/375/371ms rows=8001(572/333/79) mem=10/7/4KB open=1/0/0ms get_next=379/375/371ms]
-> Project (cost=0.00..7376.14 rows=76 width=20)
[id=16 dop=24 time=379/375/371ms rows=8001(572/333/79) mem=4/3/2KB open=1/0/0ms get_next=379/374/370ms]
-> Nested Loop (cost=0.00..7375.14 rows=76 width=24)
Join Filter: (actor_all_ne w_tag.statdate <= date(actor_all_login.*))
[id=14 dop=24 time=379/375/370ms rows=8001(572/333/79) mem=453/276/115KB open=1/0/0ms get_next=379/374/370ms]
-> Filter (cost=0.00..1.10 rows=2 width=16)
Filter: (actor_all_new_tag.gameid = 2131)
[id=11 dop=24 time=371/370/368ms rows=126(9/5/2) mem=2/2/2KB open=1/0/0ms get_next=370/369/368ms]
-> GroupAggregate (cost=0.00..1.10 rows=2 width=16)
Group Key: actor_all_new_tag.statdate, actor_all_new_tag.gameid
[id=9 dop=24 time=371/370/368ms rows=126(9/5/2) mem=2107/1503/851KB open=1/0/0ms get_next=370/369/368ms]
-> Redistribution (cost=0.00..1.10 rows=2 width=16)
Hash Key: actor_all_new_tag.statdate, actor_all_new_tag.gameid
[24:24 id=100004 dop=24 time=370/368/367ms rows=126(9/5/2) mem=512/331/120KB open=1/0/0ms get_next=369/368/367ms]
-> Local Gather (cost=0.00..1.10 rows=2 width=16)
[id=5 dop=24 time=368/140/0ms rows=126(10/5/2) mem=600/600/600B open=0/0/0ms get_next=368/140/0ms local_dop=3/3/3]
-> Decode (cost=0.00..1.10 rows=2 width=16)
[id=3 split_count=72 time=373/47/0ms rows=126(10/1/0) mem=0/0/0B open=5/0/0ms get_next=372/47/0ms]
-> Index Scan using Clustering_index on actor_all_new_tag (cost=0.00..1.00 rows=2 width=16)
Filter: (gameid = 2131)
Segment Filter: (statdate >= '2024-01-01'::date)
Bitmap Filter: ((tag_name = 'gameversion'::text) AND (tag_val = '2131-CN-ZS'::text))
RowGroupFilter: ((gameid = 2131) AND (statdate >= '2024-01-01'::date))
[id=2 split_count=34 time=373/101/0ms rows=69(10/2/0) mem=621/199/5KB open=5/1/0ms get_next=372/100/0ms physical_reads=30(3/2/1) scan_rows=106496(8192/8192/8192) bitmap_used=37]
-> Materialize (cost=0.00..1.09 rows=3024 width=8)
[id=13 dop=24 time=1/0/0ms rows=15876(1134/661/252) mem=0/0/0B open=0/0/0ms get_next=1/0/0ms]
-> Broadcast (cost=0.00..1.09 rows=3024 width=8)
[1:24 id=100003 dop=24 time=3/0/0ms rows=3024(126/126/126) mem=600/600/600B open=0/0/0ms get_next=3/0/0ms]
-> Forward (cost=0.00..1.00 rows=126 width=8)
[0:1 id=100002 dop=1 time=3/3/3ms rows=126(126/126/126) mem=264/264/264B open=3/3/3ms get_next=0/0/0ms scan_rows=126(126/126/126)]
-> ExecuteExternalSQL on PQE (cost=0.00..0.00 rows=0 width=8)
External SQL: SELECT * FROM generate_series('2024-01-01 00:00:00+08'::timestamptz, '2024-06-04 00:00:00+08'::timestamptz, '1 day'::interval) AS TVF(c_b2615fb8_97)
ADVICE:
[node id : 2] Table actor_all_new_tag misses bitmap index: gameid.
[node id : 100004] distribution key miss match! table actor_all_new_tag defined distribution keys : statdate, gameid, tag_name, tag_val; request distribution columns : statdate, gameid;
[node id : 100005] shuffle data skew in different shards! max rows is 691, min rows is 27
[node id : 19] Table actor_all_login misses bitmap index: gameid.
Query id:[260095200828505288]
======================cost======================
Total cost:[6675] ms
Optimizer cost:[81] ms
Build execution plan cost:[4] ms
Init execution plan cost:[2] ms
Start query cost:[25] ms
- Wait schema cost:[0] ms
- Lock query cost:[0] ms
- Create dataset reader cost:[0] ms
- Create split reader cost:[6] ms
Get result cost:[6563] ms
- Get the first block cost:[2884] ms
====================resource====================
Memory: total 21322 MB. Worker stats: max 2137 MB, avg 592 MB, min 18 MB, max memory worker id: 810630269575977312.
CPU time: total 71368 ms. Worker stats: max 5967 ms, avg 1982 ms, min 41 ms, max CPU time worker id: 810630269575977312.
Ec wait time: total 683 ms. Worker stats: max 129 ms, avg 18 ms, min 0 ms, max ec wait time worker id: 729778468166517122.
Physical read bytes: total 3113 MB. Worker stats: max 169 MB, avg 86 MB, min 0 MB, max physical read bytes worker id: 729777124760795845.
Read bytes: total 2809 MB. Worker stats: max 147 MB, avg 78 MB, min 0 MB, max read bytes worker id: 729777124760795845.
DAG instance count: total 74. Worker stats: max 4, avg 2, min 1, max DAG instance count worker id: 729697499219933893.
Fragment instance count: total 170. Worker stats: max 8, avg 4, min 1, max fragment instance count worker id: 729697499219933893.增加分桶計算
在長周期範圍內,需要計算多個日期的Bitmap交集組合,由於長周期Bitmap匯總計算效能較差。因此,採用分桶的方法,將Bitmap拆分成多段進行打散儲存,以充分利用並發計算的能力,實現對人群的高度壓縮,減少資料的IO操作,從而提升計算效率。
建立基於分桶的RoaringBitmap結果表。
基於分桶的遊戲角色建立RoaringBitmap結果表
基於分桶的角色建立RoaringBitmap結果表(
dws.actor_all_new_tag_bucket),程式碼範例如下。BEGIN; CREATE TABLE dws.actor_all_new_tag_bucket ( statdate date NOT NULL, gameid integer NOT NULL, tag_name text NOT NULL, tag_val text NOT NULL, bucket integer NOT NULL, rb_new roaringbitmap ,PRIMARY KEY (statdate, gameid, tag_name, tag_val, bucket) ); CALL set_table_property('dws.actor_all_new_tag_bucket', 'orientation', 'column'); CALL set_table_property('dws.actor_all_new_tag_bucket', 'storage_format', 'orc'); CALL set_table_property('dws.actor_all_new_tag_bucket', 'bitmap_columns', 'tag_name,tag_val'); CALL set_table_property('dws.actor_all_new_tag_bucket', 'dictionary_encoding_columns', 'tag_name:auto,tag_val:auto'); CALL set_table_property('dws.actor_all_new_tag_bucket', 'distribution_key', 'statdate,gameid,bucket'); CALL set_table_property('dws.actor_all_new_tag_bucket', 'segment_key', 'statdate'); CALL set_table_property('dws.actor_all_new_tag_bucket', 'table_group', 'dw_tg_24'); CALL set_table_property('dws.actor_all_new_tag_bucket', 'table_storage_mode', 'any'); CALL set_table_property('dws.actor_all_new_tag_bucket', 'time_to_live_in_seconds', '3153600000'); COMMENT ON TABLE dws.actor_all_new_tag_bucket IS NULL; END;dws.actor_all_new_tag_bucket結果表的資料寫入,程式碼範例如下。INSERT INTO dws.actor_all_new_tag_bucket (statdate, gameid, bucket, tag_name, tag_val, rb_new) SELECT createtime::DATE statdate, a.gameid::int, m.encode_actorid >> 24 as bucket,--右移24位(即高8位)作為桶號,低24位為作為Bitmap 'gameversion', a.gameversion, RB_BUILD_AGG (m.encode_actorid ::int) FROM dim.dim_sdkactor a INNER JOIN dws.actor_mapping m ON a.gameid::int = m.gameid AND a.gameversion = m.gameversion AND a.actorid = m.actorid GROUP BY a.createtime::DATE, a.gameid, m.encode_actorid >> 24, a.gameversion
基於分桶的遊戲角色登入RoaringBitmap結果表
基於分桶的遊戲角色登入RoaringBitmap結果表(
dws.actor_all_login_bucket),程式碼範例如下。BEGIN; CREATE TABLE dws.actor_all_login_bucket ( statdate date NOT NULL, gameid integer NOT NULL, bucket integer NOT NULL, rb_login roaringbitmap ,PRIMARY KEY (statdate, gameid, bucket) ); CALL set_table_property('dws.actor_all_login_bucket', 'orientation', 'column'); CALL set_table_property('dws.actor_all_login_bucket', 'storage_format', 'orc'); CALL set_table_property('dws.actor_all_login_bucket', 'distribution_key', 'statdate,gameid,bucket'); CALL set_table_property('dws.actor_all_login_bucket', 'segment_key', 'statdate'); CALL set_table_property('dws.actor_all_login_bucket', 'table_group', 'dw_tg_24'); CALL set_table_property('dws.actor_all_login_bucket', 'table_storage_mode', 'any'); CALL set_table_property('dws.actor_all_login_bucket', 'time_to_live_in_seconds', '3153600000'); COMMENT ON TABLE dws.actor_all_login_bucket IS '角色登入行為指標表'; COMMENT ON COLUMN dws.actor_all_login_bucket.statdate IS '統計日期'; COMMENT ON COLUMN dws.actor_all_login_bucket.gameid IS '遊戲ID'; COMMENT ON COLUMN dws.actor_all_login_bucket.rb_login IS '角色登入指標列表'; END;dws.actor_all_login_bucket結果表的資料寫入,程式碼範例如下。INSERT INTO dws.actor_all_login_bucket (statdate, gameid,bucket, rb_login) SELECT statdate, A.gameid::int, m.encode_actorid >> 24 as bucke,--右移24位(即高8位)作為桶號,低24位為作為Bitmap RB_BUILD_AGG (m.encode_actorid ::int) FROM APP.dws_actor_day_agg_di A INNER JOIN dws.actor_mapping m ON A.gameid::int = M.gameid AND A.gameversion = M.gameversion AND A.actorid = M.actorid GROUP BY statdate, A.gameid::int, m.encode_actorid >> 24
驗證查詢效能。
擷取從遊戲角色建立到角色登入150天的留存分析。
explain analyze WITH actor_new AS ( SELECT statdate, gameid, bucket, RB_AND_AGG (rb_new) rb_new FROM dws.actor_all_new_tag_bucket n WHERE statdate >= '20240101'::date AND gameid = '2131' AND tag_name = 'gameversion' AND tag_val = '2131-CN-ZS' GROUP BY statdate, gameid, bucket ) , actor_list AS ( SELECT n.statdate start_date, s.ds::date end_date, 'day' || s.ds::date - n.statdate AS nday, gameid, bucket, rb_new FROM actor_new n, generate_series ('20240101'::date, '20240604'::date, '1 day'::interval) AS s (ds) WHERE n.statdate <= s.ds::date ) SELECT al.start_date, al.end_date, al.nday, RB_CARDINALITY(rb_new), RB_CARDINALITY(rb_login) FROM actor_list al INNER JOIN dws.actor_all_login_bucket aal ON al.end_date = aal.statdate AND al.gameid = aal.gameid AND al.bucket = aal.bucket;返回結果如下,查詢所需時間為3 s+。如果您期望更低的延遲,可以參考下文進行分桶最佳化。
QUERY PLAN Gather (cost=0.00..18005.55 rows=29454 width=32) [24:1 id=100005 dop=1 time=3527/3527/3527ms rows=24804(24804/24804/24804) mem=90/90/90KB open=0/0/0ms get_next=3527/3527/3527ms] -> Project (cost=0.00..18003.29 rows=29454 width=32) [id=23 dop=24 time=3540/2588/1711ms rows=24804(1569/1033/556) mem=27/19/11KB open=1458/1233/931ms get_next=2082/1355/549ms] -> Hash Join (cost=0.00..18002.21 rows=29454 width=32) Hash Cond: (((date(actor_all_new_tag_bucke.*)) = actor_all_login_bucket.statdate) AND (actor_all_new_tag_bucket.gameid = actor_all_login_bucket.gameid) AND (actor_all_new_tag_bucket.bucket = actor_all_login_bucket.bucket)) [id=22 dop=24 time=3367/2422/1611ms rows=24804(1569/1033/556) mem=536/442/264MB open=1458/1233/931ms get_next=1909/1188/438ms] -> Redistribution (cost=0.00..17996.60 rows=29454 width=32) Hash Key: (date(actor_all_new_tag_bucket.*)), actor_all_new_tag_bucke.gameid, actor_all_new_tag_bucket.bucket [24:24 id=100004 dop=24 time=117/57/26ms rows=24804(1569/1033/556) mem=32/22/16MB open=1/0/0ms get_next=117/56/26ms] -> Filter (cost=0.00..17996.47 rows=29454 width=32) Filter: (actor_all_new_tag_bucket.gameid = 2131) [id=16 dop=24 time=241/182/142ms rows=24804(1772/1033/652) mem=3/3/3KB open=1/0/0ms get_next=241/182/141ms] -> Project (cost=0.00..17996.43 rows=29454 width=32) [id=15 dop=24 time=241/182/141ms rows=24804(1772/1033/652) mem=22/15/11KB open=1/0/0ms get_next=241/182/141ms] -> Project (cost=0.00..17995.36 rows=3711 width=24) [id=14 dop=24 time=240/181/141ms rows=24804(1772/1033/652) mem=9/6/5KB open=1/0/0ms get_next=240/181/141ms] -> Nested Loop (cost=0.00..17994.35 rows=3711 width=28) Join Filter: (actor_all_new_tag_bucket.statdate <= date(actor_all_new_tag_bucke.*)) [id=12 dop=24 time=239/181/141ms rows=24804(1772/1033/652) mem=491/274/122KB open=1/0/0ms get_next=239/181/141ms] -> Filter (cost=0.00..1.10 rows=74 width=20) Filter: (actor_all_new_tag_bucket.gameid = 2131) [id=9 dop=24 time=223/173/134ms rows=350(21/14/10) mem=2/2/2KB open=1/0/0ms get_next=223/173/134ms] -> GroupAggregate (cost=0.00..1.10 rows=74 width=20) Group Key: actor_all_new_tag_bucke.statdate, actor_all_new_tag_bucket.gameid, actor_all_new_tag_bucket.bucket [id=7 dop=24 time=223/173/134ms rows=350(21/14/10) mem=2171/1446/647KB open=1/0/0ms get_next=223/173/134ms] -> Local Gather (cost=0.00..1.10 rows=74 width=20) [id=6 dop=24 time=221/171/133ms rows=350(21/14/10) mem=600/600/600B open=1/0/0ms get_next=221/171/133ms local_dop=1/1/1] -> Decode (cost=0.00..1.10 rows=74 width=20) [id=5 split_count=24 time=230/180/141ms rows=350(21/14/10) mem=0/0/0B open=9/1/1ms get_next=228/179/139ms] -> Index Scan using Clustering_index on actor_all_new_tag_bucket (cost=0.00..1.00 rows=74 width=20) Filter: (gameid = 2131) Segment Filter: (statdate >= '2024-01-25'::date) Bitmap Filter: ((tag_name = 'gameversion'::text) AND (tag_val = '2131-CN-ZS'::text)) RowGroupFilter: ((gameid = 2131) AND (statdate >= '2024-01-25'::date)) [id=4 split_count=24 time=230/180/141ms rows=350(21/14/10) mem=609/469/223KB open=9/1/1ms get_next=228/179/139ms physical_reads=96(4/4/4) scan_rows=124824(5443/5201/ 4870) bitmap_used=24] -> Materialize (cost=0.00..1.09 rows=3024 width=8) [id=11 dop=24 time=0/0/0ms rows=44100(2646/1837/1260) mem=0/0/0B open=0/0/0ms get_next=0/0/0ms] -> Broadcast (cost=0.00..1.09 rows=3024 width=8) [1:24 id=100003 dop=24 time=3/0/0ms rows=3024(126/126/126) mem=600/600/600B open=0/0/0ms get_next=3/0/0ms] -> Forward (cost=0.00..1.00 rows=126 width=8) [0:1 id=100002 dop=1 time=3/3/3ms rows=126(126/126/126) mem=264/264/264B open=3/3/3ms get_next=0/0/0ms scan_rows=126(126/126/126)] -> ExecuteExternalSQL on PQE (cost=0.00..0.00 rows=0 width=8) External SQL: SELECT * FROM generate_series('2024-01-25 00:00:00+08'::timestamptz, '2024-05-29 00:00:00+08'::timestamptz, '1 day'::interval) AS TVF(c_2c230678_101) -> Hash (cost=5.39..5.39 rows=6387 width=20) [id=21 dop=24 time=1458/1233/931ms rows=6360(316/265/239) mem=64/60/32MB open=1253/1066/864ms get_next=247/167/67ms rehash=1/1/1 hash_mem=384/384/384KB] -> Local Gather (cost=0.00..5.39 rows=6387 width=20) [id=20 dop=24 time=1253/1066/864ms rows=6360(316/265/239) mem=600/600/600B open=0/0/0ms get_next=1253/1066/864ms local_dop=1/1/1] -> Decode (cost=0.00..5.39 rows=6387 width=20) [id=19 split_count=24 time=1252/1065/863ms rows=6360(316/265/239) mem=0/0/0B open=1/0/0ms get_next=1252/1065/862ms] -> Seq Scan on actor_all_login_bucke (cost=0.00..5.29 rows=6387 width=20) Filter: (gameid = 2131) RowGroupFilter: (gameid = 2131) [id=18 split_count=24 time=1252/1065/862ms rows=6360(316/265/239) mem=64/60/32MB open=1/0/0ms get_next=1252/1065/862ms physical_reads=96(4/4/4) scan_rows=186373(7882/7765/7624)] ADVICE: [node id : 18] Table actor_all_login_bucke misses bitmap index: gameid. [node id : 4] Table actor_all_new_tag_bucke misses bitmap index: gameid. Query id:[70041957309637499] ======================cost====================== Total cost:[3765] ms Optimizer cost:[91] ms Build execution plan cost:[4] ms Init execution plan cost:[2] ms Start query cost:[32] ms - Wait schema cost:[0] ms - Lock query cost:[0] ms - Create dataset reader cost:[0] ms - Create split reader cost:[6] ms Get result cost:[3636] ms - Get the first block cost:[1752] ms ====================resource==================== Memory: total 11943 MB. Worker stats: max 605 MB, avg 497 MB, min 309 MB, max memory worker id: 729777686154468446. CPU time: total 48015 ms. Worker stats: max 2857 ms, avg 2000 ms, min 1080 ms, max CPU time worker id: 729778165074082501. Ec wait time: total 523 ms. Worker stats: max 42 ms, avg 21 ms, min 9 ms, max ec wait time worker id: 729776995498738050. Physical read bytes: total 2975 MB. Worker stats: max 134 MB, avg 123 MB, min 111 MB, max physical read bytes worker id: 729776995498738050. Read bytes: total 2976 MB. Worker stats: max 134 MB, avg 124 MB, min 111 MB, max read bytes worker id: 729776995498738050. DAG instance count: total 50. Worker stats: max 3, avg 2, min 2, max DAG instance count worker id: 810630269575977312. Fragment instance count: total 98. Worker stats: max 5, avg 4, min 4, max fragment instance count worker id: 810630269575977312. (80 rows)
分桶最佳化
分桶資料分析。
排查分桶的大小以及資料量是否合理。
查詢
dws.actor_all_new_tag_bucket結果表的分桶大小以及資料量。SELECT bucket, count(1) FROM dws.actor_all_new_tag_bucket GROUP BY bucket;返回結果如下。
bucket | count -------+------- 8 | 7000 0 | 1100 2 | 1200 3 | 8000 14 | 4000 5 | 3777 9 | 8000 13 | 6000 10 | 9000 12 | 7000 1 | 17000 7 | 7000 4 | 3000 6 | 6000 11 | 8000 (15 rows)查詢
dws.actor_all_login_bucket結果表的分桶大小以及資料量。SELECT bucket,count(1) FROM dws.actor_all_login_bucket GROUP BY bucket;返回結果如下。
bucket | count -------+------- 0 | 10000 5 | 15000 6 | 15000 11 | 10000 7 | 15000 13 | 5000 9 | 12000 4 | 13000 3 | 15000 2 | 19000 12 | 8000 1 | 15000 10 | 11000 8 | 13000 14 | 3000 (15 rows)
分桶資料均衡。
根據分桶資料顯示,按照高8位進行分桶後,實際上有256個桶,但僅有15個桶含有資料。因此,分桶效果不均衡,需要調整分桶策略,採用低8位進行分桶,並仍然按照256個桶進行。
修改分桶資料。
修改
dws.actor_all_new_tag_bucket結果表的分桶資料。INSERT INTO dws.actor_all_new_tag_bucket (statdate, gameid, bucket, tag_name, tag_val, rb_new) SELECT createtime::DATE statdate, a.gameid::int, m.encode_actorid & 255 as bucket,--高8位作為桶 'gameversion', a.gameversion, RB_BUILD_AGG (m.encode_actorid ::int >> 8) --低8位來構建Bitmap資料 FROM dim.dim_sdkactor a INNER JOIN dws.actor_mapping m ON a.gameid::int = m.gameid AND a.gameversion = m.gameversion AND a.actorid = m.actorid GROUP BY a.createtime::DATE, a.gameid, m.encode_actorid & 255, a.gameversion修改
dws.actor_all_login_bucket結果表的分桶資料。INSERT INTO dws.actor_all_login_bucket (statdate, gameid,bucket, rb_login) SELECT statdate, A.gameid::int, m.encode_actorid & 255 as bucket, RB_BUILD_AGG (m.encode_actorid ::int >> 8) FROM APP.dws_actor_day_agg_di A INNER JOIN dws.actor_mapping m ON A.gameid::int = M.gameid AND A.gameversion = M.gameversion AND A.actorid = M.actorid GROUP BY statdate, A.gameid::int, m.encode_actorid & 255
調整Shard。
為了充分利用256個桶的平行處理能力,由於該情境對QPS(每秒查詢率)的要求較低,因此將表的Shard擴大至480(需根據執行個體規格具體設定)。由於本文的測試執行個體使用了48個Worker,為了盡量確保資料均衡,此處使用480個Shard ,以確保資料能夠均勻分配。調整Shard操作如下:
call hg_create_table_group('tg_480',480); call hg_move_table_to_table_group('dws.actor_all_login_bucket','tg_480'); call hg_move_table_to_table_group('dws.actor_all_new_tag_bucket','tg_480');
驗證資料均衡效能。
擷取從遊戲角色建立到角色登入150天的留存分析。
Explain Analyze WITH actor_new AS ( SELECT statdate, gameid, bucket, RB_AND_AGG (rb_new) rb_new FROM dws.actor_all_new_tag_bucket n WHERE statdate >= '20240101'::date AND gameid = '2131' AND tag_name = 'gameversion' AND tag_val = '2131-CN-ZS' GROUP BY statdate, gameid, bucket ) , actor_list AS ( SELECT n.statdate start_date, s.ds::date end_date, 'day' || s.ds::date - n.statdate AS nday, gameid, bucket, rb_new FROM actor_new n, generate_series ('20240101'::date, '20240604'::date, '1 day'::interval) AS s (ds) WHERE n.statdate <= s.ds::date ) SELECT al.start_date, al.end_date, al.nday, RB_CARDINALITY (rb_new) NewCount, RB_AND_CARDINALITY (rb_new, rb_login) LogCount FROM actor_list al INNER JOIN dws.actor_all_login_bucket aal ON al.end_date = aal.statdate AND al.gameid = aal.gameid AND al.bucket = aal.bucket返回結果如下,查詢所需時間為1 s+,效能得到更大的提升。
QUERY PLAN Gather (cost=0.00..48922.60 rows=1736538 width=32) [480:1 id=100006 dop=1 time=107/107/107ms rows=3129600(3129600/3129600/3129600) mem=134/134/134KB open=4/4/4ms get_next=103/103/103ms] -> Project (cost=0.00..48799.82 rows=1736538 width=32) [id=26 dop=480 time=681/298/173ms rows=3129600(9867/6520/3761) mem=164/103/100KB open=193/58/7ms get_next=641/240/145ms] -> Hash Join (cost=0.00..48798.59 rows=1736538 width=32) Hash Cond: (((date(actor_all_new_tag_bucke.*)) = actor_all_login_bucket.statdate) AND (actor_all_new_tag_bucket.gameid = actor_all_login_bucket.gameid) AND (actor_all_new_tag_bucket.bucket = actor_all_login_b ucket.bucke)) [id=25 dop=480 time=627/247/130ms rows=3129600(9867/6520/3761) mem=16/16/16MB open=193/57/7ms get_next=587/189/96ms] -> Redistribution (cost=0.00..48796.94 rows=1736538 width=32) Hash Key: (date(actor_all_new_tag_bucket.*)), actor_all_new_tag_bucket.gameid, actor_all_new_tag_bucket.bucket [480:480 id=100005 dop=480 time=116/22/11ms rows=3129600(9867/6520/3761) mem=760/579/480KB open=45/6/3ms get_next=111/15/8ms] -> Filter (cost=0.00..48796.58 rows=1736538 width=32) Filter: (actor_all_new_tag_bucket.gameid = 2131) [id=18 dop=480 time=170/38/11ms rows=3129600(9442/6520/4086) mem=3/3/3KB open=1/0/0ms get_next=170/38/11ms] -> Project (cost=0.00..48796.46 rows=1736538 width=32) [id=17 dop=480 time=169/36/11ms rows=3129600(9442/6520/4086) mem=29/27/25KB open=1/0/0ms get_next=169/36/11ms] -> Project (cost=0.00..48795.23 rows=270900 width=24) [id=16 dop=480 time=155/28/6ms rows=3129600(9442/6520/4086) mem=12/11/11KB open=1/0/0ms get_next=155/28/6ms] -> Nested Loop (cost=0.00..48794.21 rows=270900 width=28) Join Filter: (actor_all_new_tag_bucke_480.statdate <= date(actor_all_new_tag_bucket.*)) [id=14 dop=480 time=151/26/5ms rows=3129600(9442/6520/4086) mem=20/15/12KB open=1/0/0ms get_next=151/26/5ms] -> Filter (cost=0.00..1.10 rows=4341 width=20) Filter: (actor_all_new_tag_bucket.gameid = 2131) [id=11 dop=480 time=8/0/0ms rows=38400(114/80/55) mem=2/2/2KB open=1/0/0ms get_next=8/0/0ms] -> GroupAggregate (cost=0.00..1.10 rows=4341 width=20) Group Key: actor_all_new_tag_bucket.statdate, actor_all_new_tag_bucket.gameid, actor_all_new_tag_bucket.bucke [id=9 dop=480 time=8/0/0ms rows=38400(114/80/55) mem=291/283/211KB open=1/0/0ms get_next=8/0/0ms] -> Local Gather (cost=0.00..1.10 rows=4341 width=20) [id=8 dop=480 time=1/0/0ms rows=38400(114/80/55) mem=600/600/600B open=1/0/0ms get_next=1/0/0ms local_dop=1/1/1] -> Decode (cost=0.00..1.10 rows=4341 width=20) [id=6 split_count=480 time=48/5/0ms rows=38400(114/80/55) mem=0/0/0B open=15/0/0ms get_next=42/4/0ms] -> Index Scan using Clustering_index on actor_all_new_tag_bucket (cost=0.00..1.00 rows=4341 width=20) Segment Filter: (statdate >= '2024-01-01'::date) Cluster Filter: (gameid = 2131) Bitmap Filter: ((tag_name = 'gameversion'::text) AND (tag_val = '2131-CN-ZS'::text)) RowGroupFilter: (statdate >= '2024-01-01'::date) [id=5 split_count=480 time=48/5/0ms rows=38400(114/80/55) mem=139/130/117KB open=15/0/0ms get_next=42/4/0ms physical_reads=747(4/2/1) scan_rows=3285315(7441/6844/611 6) bitmap_used=480] -> Materialize (cost=0.00..1.17 rows=74880 width=8) [id=13 dop=480 time=11/0/0ms rows=5990400(17784/12480/8580) mem=0/0/0B open=0/0/0ms get_next=11/0/0ms] -> Local Broadcast From Worker (cost=0.00..1.17 rows=74880 width=8) [48:480 id=100004 dop=480 time=72/5/0ms rows=74880(156/156/156) mem=600/600/600B open=1/0/0ms get_next=71/5/0ms] -> Broadcast To Worker (cost=0.00..1.17 rows=9360 width=8) [1:48 id=100003 dop=48 time=1/0/0ms rows=7488(156/156/156) mem=600/600/600B open=1/0/0ms get_next=0/0/0ms] -> Forward (cost=0.00..1.00 rows=156 width=8) [0:1 id=100002 dop=1 time=25/25/25ms rows=156(156/156/156) mem=264/264/264B open=25/25/25ms get_next=0/0/0ms scan_rows=156(156/156/156)] -> ExecuteExternalSQL on PQE (cost=0.00..0.00 rows=0 width=8) External SQL: SELECT * FROM generate_series('2024-01-01 00:00:00+08'::timestamptz, '2024-06-04 00:00:00+08'::timestamptz, '1 day'::interval) AS TVF(c_2b35a738_101) -> Hash (cost=1.10..1.10 rows=189110 width=20) [id=24 dop=480 time=193/57/7ms rows=184991(443/385/319) mem=4/2/2MB open=189/53/5ms get_next=31/4/1ms rehash=1/1/1 hash_mem=384/384/384KB] -> Local Gather (cost=0.00..1.10 rows=189110 width=20) [id=23 dop=480 time=189/52/2ms rows=184991(443/385/319) mem=600/600/600B open=6/0/0ms get_next=184/52/1ms local_dop=1/1/1] -> Decode (cost=0.00..1.10 rows=189110 width=20) [id=21 split_count=480 time=162/43/0ms rows=184991(443/385/319) mem=0/0/0B open=42/2/0ms get_next=161/41/0ms] -> Index Scan using Clustering_index on actor_all_login_bucket (cost=0.00..1.00 rows=189110 width=20) Cluster Filter: (gameid = 2131) [id=20 split_count=473 time=152/39/0ms rows=182347(443/385/319) mem=4/2/2MB open=42/2/0ms get_next=152/36/0ms physical_reads=748(4/2/1) scan_rows=1875185(4348/3964/3599)] Query id:[70231656748298225] ======================cost====================== Total cost:[1403] ms Optimizer cost:[150] ms Build execution plan cost:[11] ms Init execution plan cost:[19] ms Start query cost:[218] ms - Wait schema cost:[0] ms - Lock query cost:[0] ms - Create dataset reader cost:[0] ms - Create split reader cost:[2] ms Get result cost:[1005] ms - Get the first block cost:[50] ms ====================resource==================== Memory: total 10047 MB. Worker stats: max 241 MB, avg 209 MB, min 182 MB, max memory worker id: 729692126084053701. CPU time: total 168268 ms. Worker stats: max 7134 ms, avg 3505 ms, min 2919 ms, max CPU time worker id: 745354404943128931. Ec wait time: total 56589 ms. Worker stats: max 2727 ms, avg 1178 ms, min 267 ms, max ec wait time worker id: 729692126084053701. Physical read bytes: total 1445 MB. Worker stats: max 59 MB, avg 30 MB, min 0 MB, max physical read bytes worker id: 729776188416688975. Read bytes: total 2718 MB. Worker stats: max 58 MB, avg 56 MB, min 51 MB, max read bytes worker id: 745354267590834885. DAG instance count: total 1010. Worker stats: max 22, avg 21, min 21, max DAG instance count worker id: 722022531458482562. Fragment instance count: total 1970. Worker stats: max 42, avg 41, min 41, max fragment instance count worker id: 722022531458482562. (76 rows)
通過JSON函數最佳化SQL語句
通常在查詢明細表資料用於做留存分析時,返回的資料格式是多行,包含留存日期(dayn)對應的新建立的角色UV和基於登入的留存UV資料。為了以多列展示從day0~dayn的所有結果,使用者經常使用N個CASE WHEN語句將這些行轉換為列,這樣SQL不僅降低了開發效率,也影響了計算效率和營運的複雜性。程式碼範例如下。
WITH
actor_new AS
( SELECT statdate,
gameid,
RB_AND_AGG (rb_new) rb_new
FROM dws.actor_all_new_tag n
WHERE statdate >= '20240125'::date
AND gameid = '2131'
AND tag_name = 'gameversion'
AND tag_val = '2131-CN-ZS'
GROUP BY statdate,
gameid
)
,
actor_list
AS
( SELECT n.statdate start_date,
s.ds::date end_date,
'day' || s.ds::date - n.statdate AS nday,
gameid,
rb_new
FROM actor_new n,
generate_series ('20240125'::date, '20240529'::date, '1 day'::interval) AS s (ds)
WHERE n.statdate <= s.ds::date
)
select
start_date,
CASE WHEN nday=0 and NewCount >0 THEN LogCount/NewCount END AS day0,
CASE WHEN nday=1 and NewCount >0 THEN LogCount/NewCount END AS day1,
CASE WHEN nday=2 and NewCount >0 THEN LogCount/NewCount END AS day2,
CASE WHEN nday=3 and NewCount >0 THEN LogCount/NewCount END AS day3,
........
CASE WHEN nday=365 and NewCount >0 THEN LogCount/NewCount END AS day365
from
(
SELECT
start_date,
nday,
SUM (LogCount) LogCount,
SUM (NewCount) NewCount
FROM ( SELECT al.start_date,
al.nday,
RB_CARDINALITY (rb_new) NewCount,
RB_AND_CARDINALITY (rb_new, rb_login) LogCount
FROM actor_list al
INNER JOIN dws.actor_all_login aal
ON al.end_date = aal.statdate
AND al.gameid = aal.gameid
) t
GROUP BY GROUPING sets ( (start_date, nday), (nday))
)A Hologres支援PostgreSQL生態的JSON相關函數,可以充分利用Hologres的jsonb_object_agg函數,將動態JSON資料轉換為一行,並且可以根據靈活的JSON資料通過jsonb_populate_record函數進行任意維度展開,實現行轉列。
關於Hologres支援PostgreSQL生態的JSON相關函數詳情,請參見JSON和JSONB類型。
在使用jsonb_populate_record需要先定義一個固定的Schema(public.rowtype_365) ,作為base anyelement對象來接收資料,程式碼範例如下。
CREATE TABLE public.rowtype_365 (
day0 text,
day1 text,
day2 text,
day3 text,
day4 text,
day5 text,
day6 text,
day7 text,
day8 text,
day9 text,
day10 text,
day11 text,
day12 text,
day13 text,
day14 text,
day15 text,
day16 text,
day17 text,
day18 text,
day19 text,
day20 text,
day21 text,
day22 text,
day23 text,
day24 text,
day25 text,
day26 text,
day27 text,
day28 text,
day29 text,
day30 text,
day31 text,
day32 text,
day33 text,
day34 text,
day35 text,
day36 text,
day37 text,
day38 text,
day39 text,
day40 text,
day41 text,
day42 text,
day43 text,
day44 text,
day45 text,
day46 text,
day47 text,
day48 text,
day49 text,
day50 text,
day51 text,
day52 text,
day53 text,
day54 text,
day55 text,
day56 text,
day57 text,
day58 text,
day59 text,
day60 text,
day61 text,
day62 text,
day63 text,
day64 text,
day65 text,
day66 text,
day67 text,
day68 text,
day69 text,
day70 text,
day71 text,
day72 text,
day73 text,
day74 text,
day75 text,
day76 text,
day77 text,
day78 text,
day79 text,
day80 text,
day81 text,
day82 text,
day83 text,
day84 text,
day85 text,
day86 text,
day87 text,
day88 text,
day89 text,
day90 text,
day91 text,
day92 text,
day93 text,
day94 text,
day95 text,
day96 text,
day97 text,
day98 text,
day99 text,
day100 text,
day101 text,
day102 text,
day103 text,
day104 text,
day105 text,
day106 text,
day107 text,
day108 text,
day109 text,
day110 text,
day111 text,
day112 text,
day113 text,
day114 text,
day115 text,
day116 text,
day117 text,
day118 text,
day119 text,
day120 text,
day121 text,
day122 text,
day123 text,
day124 text,
day125 text,
day126 text,
day127 text,
day128 text,
day129 text,
day130 text,
day131 text,
day132 text,
day133 text,
day134 text,
day135 text,
day136 text,
day137 text,
day138 text,
day139 text,
day140 text,
day141 text,
day142 text,
day143 text,
day144 text,
day145 text,
day146 text,
day147 text,
day148 text,
day149 text,
day150 text,
day151 text,
day152 text,
day153 text,
day154 text,
day155 text,
day156 text,
day157 text,
day158 text,
day159 text,
day160 text,
day161 text,
day162 text,
day163 text,
day164 text,
day165 text,
day166 text,
day167 text,
day168 text,
day169 text,
day170 text,
day171 text,
day172 text,
day173 text,
day174 text,
day175 text,
day176 text,
day177 text,
day178 text,
day179 text,
day180 text,
day181 text,
day182 text,
day183 text,
day184 text,
day185 text,
day186 text,
day187 text,
day188 text,
day189 text,
day190 text,
day191 text,
day192 text,
day193 text,
day194 text,
day195 text,
day196 text,
day197 text,
day198 text,
day199 text,
day200 text,
day201 text,
day202 text,
day203 text,
day204 text,
day205 text,
day206 text,
day207 text,
day208 text,
day209 text,
day210 text,
day211 text,
day212 text,
day213 text,
day214 text,
day215 text,
day216 text,
day217 text,
day218 text,
day219 text,
day220 text,
day221 text,
day222 text,
day223 text,
day224 text,
day225 text,
day226 text,
day227 text,
day228 text,
day229 text,
day230 text,
day231 text,
day232 text,
day233 text,
day234 text,
day235 text,
day236 text,
day237 text,
day238 text,
day239 text,
day240 text,
day241 text,
day242 text,
day243 text,
day244 text,
day245 text,
day246 text,
day247 text,
day248 text,
day249 text,
day250 text,
day251 text,
day252 text,
day253 text,
day254 text,
day255 text,
day256 text,
day257 text,
day258 text,
day259 text,
day260 text,
day261 text,
day262 text,
day263 text,
day264 text,
day265 text,
day266 text,
day267 text,
day268 text,
day269 text,
day270 text,
day271 text,
day272 text,
day273 text,
day274 text,
day275 text,
day276 text,
day277 text,
day278 text,
day279 text,
day280 text,
day281 text,
day282 text,
day283 text,
day284 text,
day285 text,
day286 text,
day287 text,
day288 text,
day289 text,
day290 text,
day291 text,
day292 text,
day293 text,
day294 text,
day295 text,
day296 text,
day297 text,
day298 text,
day299 text,
day300 text,
day301 text,
day302 text,
day303 text,
day304 text,
day305 text,
day306 text,
day307 text,
day308 text,
day309 text,
day310 text,
day311 text,
day312 text,
day313 text,
day314 text,
day315 text,
day316 text,
day317 text,
day318 text,
day319 text,
day320 text,
day321 text,
day322 text,
day323 text,
day324 text,
day325 text,
day326 text,
day327 text,
day328 text,
day329 text,
day330 text,
day331 text,
day332 text,
day333 text,
day334 text,
day335 text,
day336 text,
day337 text,
day338 text,
day339 text,
day340 text,
day341 text,
day342 text,
day343 text,
day344 text,
day345 text,
day346 text,
day347 text,
day348 text,
day349 text,
day350 text,
day351 text,
day352 text,
day353 text,
day354 text,
day355 text,
day356 text,
day357 text,
day358 text,
day359 text,
day360 text,
day361 text,
day362 text,
day363 text,
day364 text,
day365 text,
day366 text
);
CALL set_table_property('public.rowtype_365', 'orientation', 'column');
CALL set_table_property('public.rowtype_365', 'storage_format', 'orc');
CALL set_table_property('public.rowtype_365', 'table_group', 'tg_480');
CALL set_table_property('public.rowtype_365', 'table_storage_mode', 'any');
CALL set_table_property('public.rowtype_365', 'time_to_live_in_seconds', '3153600000');
COMMENT ON TABLE public.rowtype_365 IS NULL;
END;使用jsonb_object_agg函數和jsonb_populate_record函數實現行轉列,程式碼範例如下。
explain analyze
WITH
actor_new AS
( SELECT statdate,
gameid,
bucket,
RB_AND_AGG (rb_new) rb_new
FROM dws.actor_all_new_tag_bucket n
WHERE statdate >= '20240101'::date
AND gameid = '2131'
AND tag_name = 'gameversion'
AND tag_val = '2131-CN-ZS'
GROUP BY statdate,
gameid,
bucket
)
,
actor_list
AS
( SELECT n.statdate start_date,
s.ds::date end_date,
'day' || s.ds::date - n.statdate AS nday,
gameid,
bucket,
rb_new
FROM actor_new n,
generate_series ('20240101'::date, '20240604'::date, '1 day'::interval) AS s (ds)
WHERE n.statdate <= s.ds::date
)
SELECT start_date,
(jsonb_populate_record (NULL::rowtype_365, buff)) .*
FROM ( SELECT start_date,
jsonb_object_agg (nday, ROUND (LogCount * 100.0 / NewCount, 2)) buff
FROM ( SELECT start_date,
nday,
SUM (LogCount) LogCount,
SUM (NewCount) NewCount
FROM ( SELECT al.start_date,
al.end_date,
al.nday,
RB_CARDINALITY (rb_new) NewCount,
rb_and_cardinality(rb_new, rb_login) LogCount
FROM actor_list al
INNER JOIN dws.actor_all_login_bucket aal
ON al.end_date = aal.statdate
AND al.gameid = aal.gameid
AND al.bucket = aal.bucket) T
GROUP BY grouping sets ( (start_date, nday), (nday))) t
GROUP BY start_date) A
ORDER BY
CASE WHEN start_date IS NULL
THEN '1'
ELSE start_date::TEXT
END