Bloom filter pushdown mengurangi volume data yang ditransfer antara mesin penyimpanan dan lapisan SQL selama kueri hash join, sehingga menurunkan beban komputasi pada lapisan SQL dan meningkatkan performa kueri secara keseluruhan.
Prasyarat
Sebelum memulai, pastikan bahwa:
Versi kluster PolarDB for MySQL Anda adalah 8.0 dengan revisi versi 8.0.2.2.3 atau lebih baru. Untuk memeriksa versi, lihat Query the engine version.
Data bertipe INT.
Cara kerja
Saat PolarDB for MySQL menjalankan hash join, database membuat tabel hash dari tabel yang lebih kecil (build-side). Secara bersamaan, database juga membuat bloom filter dari tabel hash tersebut dan mendorongnya (push down) ke mesin probe.
Mesin probe menggunakan bloom filter untuk menyaring baris-baris yang tidak diperlukan oleh kueri sebelum mencapai lapisan SQL. Hal ini mengurangi volume data yang perlu diproses oleh lapisan SQL.
Aktifkan bloom filter pushdown
Gunakan parameter loose_bloom_filter_enabled untuk mengaktifkan atau menonaktifkan bloom filter. Parameter ini diaktifkan secara default.
Untuk mengonfigurasi parameter tersebut, lihat Specify cluster and node parameters.
| Parameter | Ruang Lingkup | Default | Nilai Valid |
|---|---|---|---|
loose_bloom_filter_enabled | Global dan session | ON | ON, OFF |
Verifikasi bahwa bloom filter pushdown aktif
Setelah mengaktifkan parameter tersebut, jalankan EXPLAIN pada kueri hash join. Pada output, cari hash join with bloom filter di bidang Extra. Ini menegaskan bahwa bloom filter sedang aktif.
Contoh dari kueri TPC-H Q3:
*************************** 2. row ***************************
Extra: Using where; Using join buffer (hash join with bloom filter)Pengujian performa
Hasil berikut didasarkan pada skema TPC-H (faktor skala 1) tanpa primary key atau indeks. Rencana eksekusi untuk Q3, Q11, dan Q16 ditampilkan di bawah. Setiap rencana mencakup setidaknya satu tabel di mana bidang Extra menampilkan hash join with bloom filter, yang mengonfirmasi bahwa bloom filter aktif.
Q3
EXPLAIN
SELECT l_orderkey,
SUM(l_extendedprice * (1 - l_discount)) AS revenue,
o_orderdate,
o_shippriority
FROM customer,
orders,
lineitem
WHERE c_mktsegment = 'MACHINERY'
AND c_custkey = o_custkey
AND l_orderkey = o_orderkey
AND o_orderdate < '1995-03-10'
AND l_shipdate > '1995-03-10'
GROUP BY l_orderkey,
o_orderdate,
o_shippriority
ORDER BY revenue DESC,
o_orderdate
LIMIT 10\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: customer
partitions: NULL
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 148463
filtered: 10.00
Extra: Using where; Using temporary; Using filesort
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: orders
partitions: NULL
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 1486962
filtered: 3.33
Extra: Using where; Using join buffer (hash join with bloom filter)
*************************** 3. row ***************************
id: 1
select_type: SIMPLE
table: lineitem
partitions: NULL
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 5948979
filtered: 3.33
Extra: Using where; Using join buffer (hash join with bloom filter)Q11
EXPLAIN
SELECT ps_partkey,
SUM(ps_supplycost * ps_availqty) AS value
FROM partsupp,
supplier,
nation
WHERE ps_suppkey = s_suppkey
AND s_nationkey = n_nationkey
AND n_name = 'INDIA'
GROUP BY ps_partkey
HAVING SUM(ps_supplycost * ps_availqty) >
(SELECT SUM(ps_supplycost * ps_availqty) * 0.0001000000
FROM partsupp,
supplier,
nation
WHERE ps_suppkey = s_suppkey
AND s_nationkey = n_nationkey
AND n_name = 'INDIA' )
ORDER BY value DESC\G
*************************** 1. row ***************************
id: 1
select_type: PRIMARY
table: nation
partitions: NULL
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 25
filtered: 10.00
Extra: Using where; Using temporary; Using filesort
*************************** 2. row ***************************
id: 1
select_type: PRIMARY
table: supplier
partitions: NULL
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 10000
filtered: 10.00
Extra: Using where; Using join buffer (hash join with bloom filter)
*************************** 3. row ***************************
id: 1
select_type: PRIMARY
table: partsupp
partitions: NULL
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 791815
filtered: 10.00
Extra: Using where; Using join buffer (hash join with bloom filter)
*************************** 4. row ***************************
id: 2
select_type: SUBQUERY
table: nation
partitions: NULL
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 25
filtered: 10.00
Extra: Using where
*************************** 5. row ***************************
id: 2
select_type: SUBQUERY
table: supplier
partitions: NULL
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 10000
filtered: 10.00
Extra: Using where; Using join buffer (hash join with bloom filter)
*************************** 6. row ***************************
id: 2
select_type: SUBQUERY
table: partsupp
partitions: NULL
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 791815
filtered: 10.00
Extra: Using where; Using join buffer (hash join with bloom filter)Q16
EXPLAIN SELECT p_brand,
p_type,
p_size,
COUNT(DISTINCT ps_suppkey) AS supplier_cnt
FROM partsupp,
part
WHERE p_partkey = ps_partkey
AND p_brand <> 'Brand#33'
AND p_type NOT LIKE 'PROMO POLISHED%'
AND p_size IN (34,
45,
33,
42,
9,
24,
26,
7)
AND ps_suppkey NOT IN
(SELECT s_suppkey
FROM supplier
WHERE s_comment LIKE '%Customer%Complaints%' )
GROUP BY p_brand,
p_type,
p_size
ORDER BY supplier_cnt DESC,
p_brand,
p_type,
p_size\G
*************************** 1. row ***************************
id: 1
select_type: PRIMARY
table: part
partitions: NULL
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 198116
filtered: 40.00
Extra: Using where; Using temporary; Using filesort
*************************** 2. row ***************************
id: 1
select_type: PRIMARY
table: partsupp
partitions: NULL
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 791815
filtered: 10.00
Extra: Using where; Using join buffer (hash join with bloom filter)
*************************** 3. row ***************************
id: 2
select_type: SUBQUERY
table: supplier
partitions: NULL
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 10000
filtered: 11.11
Extra: Using whereHasil pengujian
Grafik di bawah menunjukkan waktu kueri dengan bloom filter diaktifkan dibandingkan dinonaktifkan untuk Q3, Q11, dan Q16 (TPC-H faktor skala 1, tanpa primary key atau indeks).
