Tablestore filters results on the server before returning the rows that match the filter conditions. This feature reduces the volume of data transferred and shortens the response time because only matched rows are returned.

Scenarios

  • Directly filter results

    An Internet of things (IoT)-based smart electric meter writes voltage, current, usage, and other information to a Tablestore table every 15 seconds. You want to query abnormal voltage data and other related data for daily analysis to determine whether to inspect cables.

    You can use GetRange to read the monitoring data generated by the electric meter and filter the data (5,760 records) to obtain the 10 records that are collected when the voltage is unstable.

    If you use a filter, only the 10 records that need to be analyzed are returned. A filter reduces the volume of returned data and removes the need for preliminary data processing. This reduces development costs.

  • Filter results after regular expression matching and data conversion

    When data is stored in a custom format such as JSON string and you want to query a subfield value, you can use regular expressions to match and then convert the value into the data type you require. Then, you can use a filter to obtain the required data.

    For example, the data stored in a column is in the format of {cluster_name:name 1,lastupdatetime:12345}. If you need to filter and query the value of row lastupdatetime>12345, you can use the regular expression lastupdatetime:([0-9]+)} to match the data of the subfield in the column, use the CAST function to convert the matching results into a numeric type, and then compare the numeric data to the matched results. This way, you can obtain the required data row.

Limits

  • Filters support relational operator-based operations (=, ! =, >, >=, <, and <=) and logical operations (NOT, AND, and OR). You can use a combination of up to 10 filter conditions for a filter.
  • The reference columns that are used by a filter must be included in the read data. If the specified columns from which data is read do not include reference columns, the filter cannot query the values of reference columns.
  • When you use the GetRange operation, up to 5,000 rows or 4 MB of data can be scanned at a time.

    If no data matches the filter conditions in the range of the scan, the returned rows are empty. However, NextStartPrimaryKey may not be empty. If NextStartPrimaryKey is not empty, use this value to continue scanning until the return value of NextStartPrimaryKey is empty.

Operations

Filters can be used for the GetRow, BatchGetRow, and GetRange operations. You can use filters by calling the GetRow, BatchGetRow, and GetRange operations, which does not change the native semantics or limits of these operations. For more information, see Single-row operations and Multi-row operations.

The available filters are SingleColumnValueFilter, SingleColumnValueRegexFilter, and CompositeColumnValueFilter, which filter a row based on the column values of one or more reference columns.

  • SingleColumnValueFilter determines whether to filter a row based on the value of a reference column.

    SingleColumnValueFilter uses the PassIfMissing parameter to determine whether the filter conditions are met if the reference column does not exist. You can specify an action when the reference column does not exist.

  • SingleColumnValueRegexFilter uses regular expressions to match column values of the String type and extract matching substrings. Then, this filter converts the data type of the extracted substrings to String, Integer, or Double and filters the values after conversion.
    Notice Only Tablestore SDK for Java supports the SingleColumnValueRegexFilter filter.
  • CompositeColumnValueFilter determines whether to filter a row based on a logical combination of the check results for the values of multiple reference columns.

Use Tablestore SDKs

You can use the following Tablestore SDKs to implement filters:

Parameters

Parameter Description
ColumnName The name of the reference column used by a filter.
ColumnValue The value of the reference column used by a filter.
CompareOperator The relational operator used by a filter.

Relational operators include EQUAL (=), NOT_EQUAL (!=),GREATER_THAN (>), GREATER_EQUAL (>=), LESS_THAN (<), and LESS_EQUAL (<=).

LogicOperator The logical operator used by a filter.

Logical operators include NOT, AND, and OR.

PassIfMissing Specifies whether to return a row when a reference column does not exist in the row. Valid values:
  • true: If the reference column does not exist in a row, the row is returned. This is the default value.
  • false: If the reference column does not exist in a row, the row is not returned.
LatestVersionsOnly Specifies whether to use only the latest versions of data in a reference column for comparison when the reference column contains data of multiple versions. The value of this parameter is of the Bool type. The default value is true. If the default value is used, the latest versions of data are used for comparison when a reference column contains data of multiple versions.

If the value of LatestVersionsOnly is set to false, all versions of data in a reference column are used for comparison. If one version of data in the reference column meets the filter conditions, the row is returned.

Regex A regular expression used to match subfield values. The regular expression must meet the following conditions:
  • A regular expression can be up to 256 bytes in length.
  • The syntax of regular expressions in Perl is supported.
  • Single-byte regular expressions are supported.
  • Regular expression matching in Chinese is not supported.
  • Full matching mode and partial matching mode of regular expressions are supported.

    In partial matching mode, regular expressions are separated by a pair of parentheses ().

    If the full matching mode is used, the first matching result is returned. If particle matching mode is used, the first submatch is returned. For example, if the column value is 1aaa51bbb5 and the regular expression is 1[a-z]+5, the return value is 1aaa5. If the regular expression is 1([a-z]+)5, the return value is aaa.

VariantType The data type of the subfield value after conversion when you use a regular expression to match the subfield value. Valid values: VT_INTEGER (integer), VT_STRING (string type), and VT_DOUBLE (double-precision floating-point type).

Examples

  • Construct SingleColumnValueFilter.
      // // Configure a filter to return a row when the value of Col0 is 0 
      SingleColumnValueFilter singleColumnValueFilter = new SingleColumnValueFilter("Col0",
              SingleColumnValueFilter.CompareOperator.EQUAL, ColumnValue.fromLong(0));
      // If Col0 does not exist, the row is not returned. 
      singleColumnValueFilter.setPassIfMissing(false);
      // Only the latest version of data in the column is used for comparison. 
      singleColumnValueFilter.setLatestVersionsOnly(true);
                        
  • Construct SingleColumnValueRegexFilter.
     // Construct a rule to extract regular expressions. 
     RegexRule regexRule = new RegexRule("t1:([0-9]+),", VariantType.Type.VT_INTEGER);
     // Set a filter to implement cast<int>(regex(col1) >0. 
     // The SingleColumnValueRegexFilter constructing is in the format of column name, regular rule, comparison character, comparison value. 
     SingleColumnValueRegexFilter filter =  new SingleColumnValueRegexFilter("Col1",
         regexRule,SingleColumnValueFilter.CompareOperator.GREATER_THAN, ColumnValue.fromLong(0));
     // If Col0 does not exist, the row is not returned. 
     filter.setPassIfMissing(false);
  • Construct CompositeColumnValueFilter.
      // Set the composite1 condition to (Col0 == 0) AND (Col1 > 100). 
      CompositeColumnValueFilter composite1 = new CompositeColumnValueFilter(CompositeColumnValueFilter.LogicOperator.AND);
      SingleColumnValueFilter single1 = new SingleColumnValueFilter("Col0",
              SingleColumnValueFilter.CompareOperator.EQUAL, ColumnValue.fromLong(0));
      SingleColumnValueFilter single2 = new SingleColumnValueFilter("Col1",
              SingleColumnValueFilter.CompareOperator.GREATER_THAN, ColumnValue.fromLong(100));
      composite1.addFilter(single1);
      composite1.addFilter(single2);
    
      // Set the composite2 condition to ( (Col0 == 0) AND (Col1 > 100) ) OR (Col2 <= 10). 
      CompositeColumnValueFilter composite2 = new CompositeColumnValueFilter(CompositeColumnValueFilter.LogicOperator.OR);
      SingleColumnValueFilter single3 = new SingleColumnValueFilter("Col2",
              SingleColumnValueFilter.CompareOperator.LESS_EQUAL, ColumnValue.fromLong(10));
      composite2.addFilter(composite1);
      composite2.addFilter(single3);                   

Billing

The implementation of filters does not affect existing billing rules.

Although filters reduce the volume of returned data, the disk I/O usage remain unchanged because filtering is performed on the server side before data is returned. Therefore, the same number of read CUs are consumed regardless of whether the filters are used or not. For example, when you use GetRange to read 100 records (200 KB) of data and then filter these records to obtain 10 records (20 KB), 50 read CUs are consumed.