All Products
Search
Document Center

Tablestore:Configure a filter

Last Updated:Mar 07, 2024

After you configure a filter, Tablestore filters query results on the server. Only rows that match the filter conditions are returned. Therefore, this feature reduces the volume of data transferred and shortens the response time.

Scenarios

  • Directly filter results

    For example, an Internet of things (IoT)-based smart electric meter writes voltage, current, usage, and other information to a Tablestore table every 15 seconds. You want to query abnormal voltage data and other related data for daily analysis to determine whether to inspect cables.

    You can call the GetRange operation to read the monitoring data generated in one day by the electric meter and filter the data (5,760 records) to obtain the 10 records that are collected when the voltage is unstable.

    If you use a filter, only the 10 records that need to be analyzed are returned. A filter reduces the volume of returned data and eliminates the need for preliminary data processing. This reduces development costs.

  • Filter results after regular expression matching and data conversion

    If data is stored in a custom format such as JSON string in a column and you want to query a subfield value, you can use a regular expression to match and then convert the value into the data type you require. Then, you can use a filter to obtain the required data.

    For example, the data stored in a column is in the {cluster_name:name1,lastupdatetime:12345} format. If you need to filter and query the value of row lastupdatetime>12345, you can use the regular expression lastupdatetime:([0-9]+)} to match the data of the subfield in the column, convert the matched results into a numeric type, and then compare the numeric data with the matched results. This way, you can obtain the required data row.

Overview

When you query data by calling the GetRow, BatchGetRow, or GetRange operation, you can use a filter to return only the rows that match the filter conditions. In these cases, the native semantics and limits of these operations are not changed. For more information, see Read data.

When you use a filter, you can configure the following settings based on your business requirements:

  • If a reference column does not exist in a row, configure the PassIfMissing parameter to determine whether to return the row.

  • If a reference column contains data of multiple versions, specify whether to use only the latest version of data for comparison.

  • Use regular expressions to match subfield values.

  • Use relational operators and logical operators to create filter conditions.

The available filters are SingleColumnValueFilter, SingleColumnValueRegexFilter, and CompositeColumnValueFilter, which filter a row based on the column values of one or more reference columns.

Filter

Description

SingleColumnValueFilter

Determine whether to filter a row based on the value of a reference column.

SingleColumnValueRegexFilter

Use regular expressions to match column values of the String type and extract matched substrings. Then, convert the data type of the extracted substrings to String, Integer, or Double and filter the values after conversion.

Regular expressions must meet the following conditions:

  • A regular expression can be up to 256 bytes in length.

  • The syntax of regular expressions in Perl is supported.

  • Single-byte regular expressions are supported.

  • Regular expression matching in Chinese is not supported.

  • The full matching mode and partial matching mode of a regular expression are supported.

    In the partial matching mode, the subfield values to match are in a pair of parentheses () in a regular expression.

    If the full matching mode is used, the first matching result is returned. If the partial matching mode is used, the first submatch result is returned. For example, if the column value is 1aaa51bbb5 and the regular expression is 1[a-z]+5, the return value is 1aaa5. If the regular expression is 1([a-z]+)5, the return value is aaa.

    Important

    Only Tablestore SDK for Java supports the SingleColumnValueRegexFilter filter.

CompositeColumnValueFilter

Determine whether to filter a row based on a logical combination of the matching results for the values of multiple reference columns.

Usage notes

  • Filter conditions support relational operators of =, !=, >, >=, <, and <= and logical operators of NOT, AND, and OR. A filter condition can inlude up to 10 subconditions.

  • The reference columns that are used by a filter must be included in the query results. If the specified columns from which data is read do not include reference columns, the filter cannot query the values of reference columns.

  • When you use the GetRange operation, up to 5,000 rows or 4 MB of data can be scanned at a time.

    If no data matches the filter conditions in the range of the scan, the returned rows are empty. However, NextStartPrimaryKey may not be empty. If NextStartPrimaryKey is not empty, use the parameter value to continue scanning until the return value of NextStartPrimaryKey is empty.

Methods

Important

You can use filters only by using Tablestore SDKs.

You can use filters by using Tablestore SDK for Java, Tablestore SDK for Go, Tablestore SDK for Python, Tablestore SDK for Node.js, Tablestore SDK for .NET, and Tablestore SDK for PHP. In this example, Tablestore SDK for Java is used.

Use SingleColumnValueFilter to filter data

The following sample code provides an example on how to read data of the latest version from a row in a data table and use a filter to filter data based on the value of the Col0 column.

private static void getRow(SyncClient client, String pkValue) {
    // Construct the primary key. 
    PrimaryKeyBuilder primaryKeyBuilder = PrimaryKeyBuilder.createPrimaryKeyBuilder();
    primaryKeyBuilder.addPrimaryKeyColumn("pk", PrimaryKeyValue.fromString(pkValue));
    PrimaryKey primaryKey = primaryKeyBuilder.build();

    // Specify the table name and primary key to read a row of data. 
    SingleRowQueryCriteria criteria = new SingleRowQueryCriteria("<TABLE_NAME>", primaryKey);
    // Set the MaxVersions parameter to 1 to read the latest version of data. 
    criteria.setMaxVersions(1);

    // Configure a filter to return a row in which the value of the Col0 column is 0. 
    SingleColumnValueFilter singleColumnValueFilter = new SingleColumnValueFilter("Col0",
            SingleColumnValueFilter.CompareOperator.EQUAL, ColumnValue.fromLong(0));
    // If the Col0 column does not exist, the row is not returned. 
    singleColumnValueFilter.setPassIfMissing(false);
    criteria.setFilter(singleColumnValueFilter);

    GetRowResponse getRowResponse = client.getRow(new GetRowRequest(criteria));
    Row row = getRowResponse.getRow();

    System.out.println("Read complete. Result:");
    System.out.println(row);
}

Use SingleColumnValueRegexFilter to filter data based on regular expression matching

The following sample code provides an example on how to read data whose primary key values are in the range of ["pk:2020-01-01.log", "pk:2021-01-01.log") from the Col1 column and use a regular expression to filter data in the Col1 column.

private static void getRange(SyncClient client) {
    // Specify the name of the data table. 
    RangeRowQueryCriteria criteria = new RangeRowQueryCriteria("<TABLE_NAME>");
 
    // Specify ["pk:2020-01-01.log", "pk:2021-01-01.log") as the range of the primary key of the data that you want to read. The range is a left-closed and right-open interval. 
    PrimaryKey pk0 = PrimaryKeyBuilder.createPrimaryKeyBuilder()
        .addPrimaryKeyColumn("pk", PrimaryKeyValue.fromString("2020-01-01.log"))
        .build();
    PrimaryKey pk1 = PrimaryKeyBuilder.createPrimaryKeyBuilder()
        .addPrimaryKeyColumn("pk", PrimaryKeyValue.fromString("2021-01-01.log"))
        .build();
    criteria.setInclusiveStartPrimaryKey(pk0);
    criteria.setExclusiveEndPrimaryKey(pk1);
 
    // Set the MaxVersions parameter to 1 to read the latest version of data. 
    criteria.setMaxVersions(1);
 
    // Configure a filter. A row is returned when cast<int>(regex(Col1)) is greater than 100. 
    RegexRule regexRule = new RegexRule("t1:([0-9]+),", RegexRule.CastType.VT_INTEGER);
    SingleColumnValueRegexFilter filter =  new SingleColumnValueRegexFilter("Col1",
        regexRule,SingleColumnValueRegexFilter.CompareOperator.GREATER_THAN,ColumnValue.fromLong(100));
    criteria.setFilter(filter);

    while (true) {
        GetRangeResponse resp = client.getRange(new GetRangeRequest(criteria));
        for (Row row : resp.getRows()) {
            // do something
            System.out.println(row);
        }
        if (resp.getNextStartPrimaryKey() != null) {
            criteria.setInclusiveStartPrimaryKey(resp.getNextStartPrimaryKey());
        } else {
            break;
        }
   }
}

Use CompositeColumnValueFilter to filter data

The following sample code is used to return the rows whose primary key is within the ["a","h") range. In addition, the value of Col0 is 0 and either the value of Col1 is greater than 100 or the value of Col2 is less than or equal to 10.

private static void getRange(SyncClient client) {
    // Specify the name of the data table. 
    RangeRowQueryCriteria criteria = new RangeRowQueryCriteria("<TABLE_NAME>");

    // Specify the primary key range. The primary key range is a left-closed and right-open interval. 
    PrimaryKey pk0 = PrimaryKeyBuilder.createPrimaryKeyBuilder()
            .addPrimaryKeyColumn("pk", PrimaryKeyValue.fromString("a"))
            .build();
    PrimaryKey pk1 = PrimaryKeyBuilder.createPrimaryKeyBuilder()
            .addPrimaryKeyColumn("pk", PrimaryKeyValue.fromString("h"))
            .build();
    criteria.setInclusiveStartPrimaryKey(pk0);
    criteria.setExclusiveEndPrimaryKey(pk1);

    // Set the maxVersions parameter to 1 to read the latest version of data. 
    criteria.setMaxVersions(1);

    // Set the composite1 condition to (Col0 == 0) AND (Col1 > 100). 
    CompositeColumnValueFilter composite1 = new CompositeColumnValueFilter(CompositeColumnValueFilter.LogicOperator.AND);
    SingleColumnValueFilter single1 = new SingleColumnValueFilter("Col0",
            SingleColumnValueFilter.CompareOperator.EQUAL, ColumnValue.fromLong(0));
    SingleColumnValueFilter single2 = new SingleColumnValueFilter("Col1",
            SingleColumnValueFilter.CompareOperator.GREATER_THAN, ColumnValue.fromLong(100));
    composite1.addFilter(single1);
    composite1.addFilter(single2);

    // Set the composite2 condition to ( (Col0 == 0) AND (Col1 > 100) ) OR (Col2 <= 10). 
    CompositeColumnValueFilter composite2 = new CompositeColumnValueFilter(CompositeColumnValueFilter.LogicOperator.OR);
    SingleColumnValueFilter single3 = new SingleColumnValueFilter("Col2",
            SingleColumnValueFilter.CompareOperator.LESS_EQUAL, ColumnValue.fromLong(10));
    composite2.addFilter(composite1);
    composite2.addFilter(single3);
    criteria.setFilter(composite2);

    while (true) {
        GetRangeResponse resp = client.getRange(new GetRangeRequest(criteria));
        for (Row row : resp.getRows()) {
            // do something
            System.out.println(row);
        }
        if (resp.getNextStartPrimaryKey() != null) {
            criteria.setInclusiveStartPrimaryKey(resp.getNextStartPrimaryKey());
        } else {
            break;
        }
    }
}

Billing

The implementation of filters does not affect existing billing rules.

Although filters reduce the volume of returned data, the disk I/O usage remains unchanged because filtering is performed on the server before data is returned. Therefore, the same number of read CUs are consumed regardless of whether the filters are used or not. For example, when you call the GetRange operation to read 100 records (200 KB) of data and then filter these records to obtain 10 records (20 KB), 50 read CUs are consumed.

FAQ

How do I select between a secondary index and a search index?

References

  • If an application needs to use different attributes as query conditions to query data, you can specify these attributes as the primary key columns of a secondary index table to speed up the query. For more information, see Overview.

  • If your business requires multi-dimensional queries and data analysis, you can create a search index and specify the required attributes as the fields of the search index. Then, you can query and analyze the data by using the search index. The multi-dimensional queries include queries based on non-primary key columns, Boolean queries, and fuzzy queries. The data analysis requirements include obtaining extreme values, counting rows, and grouping data. For more information, see Overview.