The OSS file filter feature skips irrelevant data blocks in OSS during query execution, reducing the volume of scanned data and accelerating cold data queries.
How it works
For each archived data block, PolarDB collects statistics, generates filter data, and stores it in OSS. When a query runs, the engine pushes the query conditions down to the CSV-format cold data layer, compares them against the filter data, and skips blocks that cannot contain matching rows.
Different filter strategies apply to different data types:
| Filter strategy | Applicable data types | Best for |
|---|---|---|
| Min/max statistics | Numeric, time | Ordered or partially ordered data (e.g., timestamps, sequential IDs) |
| Character map | String (CHAR, VARCHAR) | Case-sensitive string prefix filtering |
| Bloom filter | All data types | Globally unordered data; UUID strings |
For time-series data—which is typical of archived data—filtering on time-type columns can yield significant query performance gains.
Prerequisites
Before you begin, make sure that:
The cluster runs MySQL 8.0.2, revision version 8.0.2.2.25 or later
The cluster has cold data archiving enabled
A connection to the cluster is established. See Connect to a cluster
The table uses OSS cold data in CSV format and has
OSS META=1enabled
To check whether OSS META=1 is set, run SHOW CREATE TABLE and confirm the output contains OSS META=1:
SHOW CREATE TABLE t;Example output:
*************************** 1. row ***************************
Table: t
Create Table: CREATE TABLE `t` (
`id` int(11) DEFAULT NULL
) /*!99990 800020213 STORAGE OSS */ ENGINE=CSV DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci /*!99990 800020204 NULL_MARKER='NULL' */ /*!99990 800020223 OSS META=1 */
1 row in set (0.00 sec)For more information, see Perform DDL operations on cold data.
Limitations
Compatibility:
The OSS file filter feature cannot be enabled if Elastic Parallel Query (ePQ) is already enabled.
The table must have data. Empty OSS tables cannot use this feature for query acceleration.
Supported data types:
| Category | Data types |
|---|---|
| Integer | TINYINT, SMALLINT, MEDIUMINT, INT, BIGINT, TINYINT UNSIGNED, SMALLINT UNSIGNED, MEDIUMINT UNSIGNED, INT UNSIGNED, BIGINT UNSIGNED |
| Floating-point | FLOAT, DOUBLE |
| Fixed-point | DECIMAL |
| Time | DATE, DATETIME, TIME, TIMESTAMP |
| String | CHAR, VARCHAR |
All data types are supported with a Bloom filter.
Supported operators:
The following operators trigger filter pushdown: =, <=>, <, <=, >=, >, BETWEEN, LIKE, IS NULL, IS NOT NULL.
For LIKE, the wildcard percent sign (%) must appear at the right side only: LIKE 'ABC%'. Patterns such as LIKE '%ABC' or LIKE '%ABC%' do not trigger filter pushdown.
For string columns, only case-sensitive comparisons are supported.
Multiple conditions connected by AND are supported. Conditions connected by OR are not evaluated by the filter.
Patterns that do not trigger filter pushdown:
| Pattern | Reason |
|---|---|
col LIKE '%ABC' or col LIKE '%ABC%' | Wildcard must be on the right side only |
condition1 OR condition2 | OR conditions are not evaluated by the filter |
| Queries on empty OSS tables | Filter requires at least one row of data |
Enable the OSS file filter
In the console, set the cluster parameter
loose_use_oss_metatoONto enable theUSE_OSS_METAfeature. For more information, see Configure cluster and node parameters.In the console, set the cluster parameter
loose_optimizer_switchtoENGINE_CONDITION_PUSHDOWN=ON. For more information, see Configure cluster and node parameters.In the console, set the cluster parameter
loose_csv_oss_file_filtertoON. For more information, see Configure cluster and node parameters.Connect to the cluster and verify all three parameters are active:
-- Verify that oss_file_filter is ON SHOW VARIABLES LIKE 'oss_file_filter'; -- Verify that use_oss_meta is ON SHOW VARIABLES LIKE 'use_oss_meta'; -- Verify that ENGINE_CONDITION_PUSHDOWN is ON SHOW VARIABLES LIKE 'optimizer_switch';
Create an OSS file filter
Use the OSS_FILE_FILTER option when archiving a table or in an ALTER TABLE statement on an already-archived table.
Option syntax:
OSS_FILE_FILTER = 'field_filter[,field_filter]'
field_filter := field_name[:filter_type]
filter_type := bloomBy default, numeric and time columns use min/max statistics, and string columns use the character map. To use a Bloom filter on a column, append :bloom to the column name.
For UUID values stored in string columns, use a Bloom filter (:bloom) for better filtering efficiency.Create a filter when archiving a table
Specify OSS_FILE_FILTER as part of the ALTER TABLE ... ENGINE = CSV STORAGE OSS statement:
-- Archive the table and create an OSS file filter on L_ORDERKEY, L_LINENUMBER, and L_SHIPDATE
ALTER TABLE lineitem ENGINE = CSV STORAGE OSS
OSS_FILE_FILTER = 'L_ORDERKEY,L_LINENUMBER,L_SHIPDATE';The full table definition used in this example:
CREATE TABLE `lineitem` (
`L_ORDERKEY` int(11) NOT NULL,
`L_PARTKEY` int(11) NOT NULL,
`L_SUPPKEY` int(11) NOT NULL,
`L_LINENUMBER` int(11) NOT NULL,
`L_QUANTITY` decimal(15,2) NOT NULL,
`L_EXTENDEDPRICE` decimal(15,2) NOT NULL,
`L_DISCOUNT` decimal(15,2) NOT NULL,
`L_TAX` decimal(15,2) NOT NULL,
`L_RETURNFLAG` char(1) COLLATE utf8_bin NOT NULL,
`L_LINESTATUS` char(1) COLLATE utf8_bin NOT NULL,
`L_SHIPDATE` date NOT NULL,
`L_COMMITDATE` date NOT NULL,
`L_RECEIPTDATE` date NOT NULL,
`L_SHIPINSTRUCT` char(25) COLLATE utf8_bin NOT NULL,
`L_SHIPMODE` char(10) COLLATE utf8_bin NOT NULL,
`L_COMMENT` varchar(44) COLLATE utf8_bin NOT NULL
) ENGINE = innodb;Add a filter to an already-archived table
Run ALTER TABLE with OSS_FILE_FILTER on a table that is already in CSV STORAGE OSS format:
-- Add an OSS file filter to an archived table
-- L_SHIPINSTRUCT uses a Bloom filter; the other columns use the default strategy
ALTER TABLE lineitem OSS_FILE_FILTER = 'L_ORDERKEY,L_LINENUMBER,L_SHIPDATE,L_SHIPINSTRUCT:BLOOM';The table definition for an already-archived table uses ENGINE = CSV STORAGE OSS:
CREATE TABLE `lineitem` (
`L_ORDERKEY` int(11) NOT NULL,
`L_PARTKEY` int(11) NOT NULL,
`L_SUPPKEY` int(11) NOT NULL,
`L_LINENUMBER` int(11) NOT NULL,
`L_QUANTITY` decimal(15,2) NOT NULL,
`L_EXTENDEDPRICE` decimal(15,2) NOT NULL,
`L_DISCOUNT` decimal(15,2) NOT NULL,
`L_TAX` decimal(15,2) NOT NULL,
`L_RETURNFLAG` char(1) COLLATE utf8_bin NOT NULL,
`L_LINESTATUS` char(1) COLLATE utf8_bin NOT NULL,
`L_SHIPDATE` date NOT NULL,
`L_COMMITDATE` date NOT NULL,
`L_RECEIPTDATE` date NOT NULL,
`L_SHIPINSTRUCT` char(25) COLLATE utf8_bin NOT NULL,
`L_SHIPMODE` char(10) COLLATE utf8_bin NOT NULL,
`L_COMMENT` varchar(44) COLLATE utf8_bin NOT NULL
) ENGINE = CSV STORAGE OSS;Verify filter pushdown
After creating an OSS file filter, run EXPLAIN to confirm the filter conditions are being pushed down to the OSS data layer.
EXPLAIN SELECT * FROM lineitem WHERE l_orderkey = 96;The output should include With pushed engine condition in the Extra column:
+----+-------------+----------+------------+------+---------------+------+---------+------+---------+----------+---------------------------------------------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+----------+------------+------+---------------+------+---------+------+---------+----------+---------------------------------------------------------------------------------+
| 1 | SIMPLE | lineitem | NULL | ALL | NULL | NULL | NULL | NULL | 6001215 | 10.00 | Using where; With pushed engine condition (`test`.`lineitem`.`L_ORDERKEY` = 96) |
+----+-------------+----------+------------+------+---------------+------+---------+------+---------+----------+---------------------------------------------------------------------------------+For a tree-format plan that explicitly shows which filter conditions are applied to the OSS data scan:
EXPLAIN FORMAT = TREE SELECT * FROM lineitem WHERE l_orderkey = 96\GExpected output:
*************************** 1. row ***************************
EXPLAIN: -> Filter: (lineitem.L_ORDERKEY = 96) (cost=15010.00 rows=10000)
-> Table scan on lineitem, extra (oss_file_filter conditions: (lineitem.L_ORDERKEY = 96)) (cost=15010.00 rows=100000)The oss_file_filter conditions entry confirms the condition is evaluated at the data block level, skipping blocks that cannot match before they are read into memory.
Once the filter is confirmed, run the actual query:
SELECT COUNT(*) FROM lineitem WHERE l_orderkey = 96;