Enable OSS File Filter to Speed Up Cold Data Queries - PolarDB

The OSS file filter feature skips irrelevant data blocks in OSS during query execution, reducing the volume of scanned data and accelerating cold data queries.

How it works

For each archived data block, PolarDB collects statistics, generates filter data, and stores it in OSS. When a query runs, the engine pushes the query conditions down to the CSV-format cold data layer, compares them against the filter data, and skips blocks that cannot contain matching rows.

Different filter strategies apply to different data types:

Filter strategy	Applicable data types	Best for
Min/max statistics	Numeric, time	Ordered or partially ordered data (e.g., timestamps, sequential IDs)
Character map	String (`CHAR`, `VARCHAR`)	Case-sensitive string prefix filtering
Bloom filter	All data types	Globally unordered data; UUID strings

For time-series data—which is typical of archived data—filtering on time-type columns can yield significant query performance gains.

Prerequisites

Before you begin, make sure that:

The cluster runs MySQL 8.0.2, revision version 8.0.2.2.25 or later
The cluster has cold data archiving enabled
A connection to the cluster is established. See Connect to a cluster
The table uses OSS cold data in CSV format and has OSS META=1 enabled

To check whether OSS META=1 is set, run SHOW CREATE TABLE and confirm the output contains OSS META=1:

SHOW CREATE TABLE t;

Example output:

*************************** 1. row ***************************
       Table: t
Create Table: CREATE TABLE `t` (
  `id` int(11) DEFAULT NULL
) /*!99990 800020213 STORAGE OSS */ ENGINE=CSV DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci /*!99990 800020204 NULL_MARKER='NULL' */ /*!99990 800020223 OSS META=1 */

1 row in set (0.00 sec)

For more information, see Perform DDL operations on cold data.

Limitations

Compatibility:

The OSS file filter feature cannot be enabled if Elastic Parallel Query (ePQ) is already enabled.
The table must have data. Empty OSS tables cannot use this feature for query acceleration.

Supported data types:

Category	Data types
Integer	`TINYINT`, `SMALLINT`, `MEDIUMINT`, `INT`, `BIGINT`, `TINYINT UNSIGNED`, `SMALLINT UNSIGNED`, `MEDIUMINT UNSIGNED`, `INT UNSIGNED`, `BIGINT UNSIGNED`
Floating-point	`FLOAT`, `DOUBLE`
Fixed-point	`DECIMAL`
Time	`DATE`, `DATETIME`, `TIME`, `TIMESTAMP`
String	`CHAR`, `VARCHAR`

All data types are supported with a Bloom filter.

Supported operators:

The following operators trigger filter pushdown: =, <=>, <, <=, >=, >, BETWEEN, LIKE, IS NULL, IS NOT NULL.

For LIKE, the wildcard percent sign (%) must appear at the right side only: LIKE 'ABC%'. Patterns such as LIKE '%ABC' or LIKE '%ABC%' do not trigger filter pushdown.

For string columns, only case-sensitive comparisons are supported.

Multiple conditions connected by AND are supported. Conditions connected by OR are not evaluated by the filter.

Patterns that do not trigger filter pushdown:

Pattern	Reason
`col LIKE '%ABC'` or `col LIKE '%ABC%'`	Wildcard must be on the right side only
`condition1 OR condition2`	`OR` conditions are not evaluated by the filter
Queries on empty OSS tables	Filter requires at least one row of data

Enable the OSS file filter

In the console, set the cluster parameter loose_use_oss_meta to ON to enable the USE_OSS_META feature. For more information, see Configure cluster and node parameters.
In the console, set the cluster parameter loose_optimizer_switch to ENGINE_CONDITION_PUSHDOWN=ON. For more information, see Configure cluster and node parameters.
In the console, set the cluster parameter loose_csv_oss_file_filter to ON. For more information, see Configure cluster and node parameters.

Connect to the cluster and verify all three parameters are active:

-- Verify that oss_file_filter is ON
SHOW VARIABLES LIKE 'oss_file_filter';

-- Verify that use_oss_meta is ON
SHOW VARIABLES LIKE 'use_oss_meta';

-- Verify that ENGINE_CONDITION_PUSHDOWN is ON
SHOW VARIABLES LIKE 'optimizer_switch';

Create an OSS file filter

Use the OSS_FILE_FILTER option when archiving a table or in an ALTER TABLE statement on an already-archived table.

Option syntax:

OSS_FILE_FILTER = 'field_filter[,field_filter]'
field_filter := field_name[:filter_type]
filter_type := bloom

By default, numeric and time columns use min/max statistics, and string columns use the character map. To use a Bloom filter on a column, append :bloom to the column name.

For UUID values stored in string columns, use a Bloom filter (:bloom) for better filtering efficiency.

Create a filter when archiving a table

Specify OSS_FILE_FILTER as part of the ALTER TABLE ... ENGINE = CSV STORAGE OSS statement:

-- Archive the table and create an OSS file filter on L_ORDERKEY, L_LINENUMBER, and L_SHIPDATE
ALTER TABLE lineitem ENGINE = CSV STORAGE OSS
  OSS_FILE_FILTER = 'L_ORDERKEY,L_LINENUMBER,L_SHIPDATE';

The full table definition used in this example:

CREATE TABLE `lineitem` (
  `L_ORDERKEY` int(11) NOT NULL,
  `L_PARTKEY` int(11) NOT NULL,
  `L_SUPPKEY` int(11) NOT NULL,
  `L_LINENUMBER` int(11) NOT NULL,
  `L_QUANTITY` decimal(15,2) NOT NULL,
  `L_EXTENDEDPRICE` decimal(15,2) NOT NULL,
  `L_DISCOUNT` decimal(15,2) NOT NULL,
  `L_TAX` decimal(15,2) NOT NULL,
  `L_RETURNFLAG` char(1) COLLATE utf8_bin NOT NULL,
  `L_LINESTATUS` char(1) COLLATE utf8_bin NOT NULL,
  `L_SHIPDATE` date NOT NULL,
  `L_COMMITDATE` date NOT NULL,
  `L_RECEIPTDATE` date NOT NULL,
  `L_SHIPINSTRUCT` char(25) COLLATE utf8_bin NOT NULL,
  `L_SHIPMODE` char(10) COLLATE utf8_bin NOT NULL,
  `L_COMMENT` varchar(44) COLLATE utf8_bin NOT NULL
) ENGINE = innodb;

Add a filter to an already-archived table

Run ALTER TABLE with OSS_FILE_FILTER on a table that is already in CSV STORAGE OSS format:

-- Add an OSS file filter to an archived table
-- L_SHIPINSTRUCT uses a Bloom filter; the other columns use the default strategy
ALTER TABLE lineitem OSS_FILE_FILTER = 'L_ORDERKEY,L_LINENUMBER,L_SHIPDATE,L_SHIPINSTRUCT:BLOOM';

The table definition for an already-archived table uses ENGINE = CSV STORAGE OSS:

CREATE TABLE `lineitem` (
  `L_ORDERKEY` int(11) NOT NULL,
  `L_PARTKEY` int(11) NOT NULL,
  `L_SUPPKEY` int(11) NOT NULL,
  `L_LINENUMBER` int(11) NOT NULL,
  `L_QUANTITY` decimal(15,2) NOT NULL,
  `L_EXTENDEDPRICE` decimal(15,2) NOT NULL,
  `L_DISCOUNT` decimal(15,2) NOT NULL,
  `L_TAX` decimal(15,2) NOT NULL,
  `L_RETURNFLAG` char(1) COLLATE utf8_bin NOT NULL,
  `L_LINESTATUS` char(1) COLLATE utf8_bin NOT NULL,
  `L_SHIPDATE` date NOT NULL,
  `L_COMMITDATE` date NOT NULL,
  `L_RECEIPTDATE` date NOT NULL,
  `L_SHIPINSTRUCT` char(25) COLLATE utf8_bin NOT NULL,
  `L_SHIPMODE` char(10) COLLATE utf8_bin NOT NULL,
  `L_COMMENT` varchar(44) COLLATE utf8_bin NOT NULL
) ENGINE = CSV STORAGE OSS;

Verify filter pushdown

After creating an OSS file filter, run EXPLAIN to confirm the filter conditions are being pushed down to the OSS data layer.

EXPLAIN SELECT * FROM lineitem WHERE l_orderkey = 96;

The output should include With pushed engine condition in the Extra column:

+----+-------------+----------+------------+------+---------------+------+---------+------+---------+----------+---------------------------------------------------------------------------------+
| id | select_type | table    | partitions | type | possible_keys | key  | key_len | ref  | rows    | filtered | Extra                                                                           |
+----+-------------+----------+------------+------+---------------+------+---------+------+---------+----------+---------------------------------------------------------------------------------+
|  1 | SIMPLE      | lineitem | NULL       | ALL  | NULL          | NULL | NULL    | NULL | 6001215 |    10.00 | Using where; With pushed engine condition (`test`.`lineitem`.`L_ORDERKEY` = 96) |
+----+-------------+----------+------------+------+---------------+------+---------+------+---------+----------+---------------------------------------------------------------------------------+

For a tree-format plan that explicitly shows which filter conditions are applied to the OSS data scan:

EXPLAIN FORMAT = TREE SELECT * FROM lineitem WHERE l_orderkey = 96\G

Expected output:

*************************** 1. row ***************************
EXPLAIN: -> Filter: (lineitem.L_ORDERKEY = 96)  (cost=15010.00 rows=10000)
    -> Table scan on lineitem, extra (oss_file_filter conditions: (lineitem.L_ORDERKEY = 96))  (cost=15010.00 rows=100000)

The oss_file_filter conditions entry confirms the condition is evaluated at the data block level, skipping blocks that cannot match before they are read into memory.

Once the filter is confirmed, run the actual query:

SELECT COUNT(*) FROM lineitem WHERE l_orderkey = 96;

PolarDB:Use the OSS file filter feature to accelerate cold data queries