When you use search index for batch computing, you can customize predicate pushdown. You can configure predicate pushdown only on LONG and STRING columns.
Background information
You can use predicate pushdown when a query condition for the search index needs to filter large amounts of data, and it is time-consuming for Tablestore to join all intermediate query results. To address these needs, you can push down the value filtering of part of fields from the storage layer (Tablestore) to the compute layer (Spark), which improves query efficiency.
Take select * from table where a = 10 and b < 999999999;
as an example. If the number of entries returned for the condition (a = 10) is only 1,000, and the number of entries returned for the condition (b < 999999999) is up to 100 million, it is time-consuming for Tablestore to join the results of the two conditions. However, you can push down the condition (b < 999999999) to the compute layer. Spark needs only to filter data based on the 1,000 entries returned from the storage layer, which reduces most pressure on the storage layer.
Supported operators
The following table lists the operators supported by Spark.
Spark | Supported | Example of SQL statement |
---|---|---|
And | Yes | select * from table where a > 1 and b < 0; |
Or | Yes | select * from table where a > 1 or b < 0; |
Not | Yes | select * from table where a != 1; |
EqualTo | Yes | select * from table where a = 1; |
Not+EqualTo | Yes | select * from table where a != 1; |
IsNull | Yes | select * from table where a is null; Return rows that do not contain the a column from table. |
In | Yes | select * from table where a in {1,2,3}; By default, the maximum upper limit is 1024. |
LessThan | Yes | select * from table where a < 10; You can customize predicate pushdown when the SQL statement uses LONG or STRING columns. |
LessThanOrEqual | Yes | select * from table where a <=10; You can customize predicate pushdown when the SQL statement uses LONG or STRING columns. |
GreaterThan | Yes | select * from table where a > 10; You can customize predicate pushdown when the SQL statement uses LONG or STRING columns. |
GreaterThanOrEqual | Yes | select * from table where a >= 10; You can customize predicate pushdown when the SQL statement uses LONG or STRING columns. |
StringStartsWith | Yes | select * from table where a like "tablestore%"; Return rows that contain a column whose values are prefixed with tablestore from table. |
Coordinate pair of the central point, radius (STRING JSON) | Yes | select * from table where val_geo = '{"centerPoint":"3,0", "distanceInMeter": 100000}'; The geographical area is defined by the radius and coordinate pair of the central point. |
Coordinate pairs of the upper-left corner and lower-right corner in the rectangular geographical area (STRING JSON) | Yes | select * from table where geo = '{"topLeft":"8,0", "bottomRight": "0,10"}'; The rectangular geographical area is defined by the coordinate pairs of the upper-left corner and lower-right corner. |
Coordinate pairs of vertices in the polygon geographical area (STRING JSON) | Yes | select * from table where geo = '{"points":["5,0", "5,1", "6,1", "6,10"]}'; The polygon geographical area is defined by the coordinate pairs of multiple vertices. |
Configure predicate pushdown
The parameters for predicate pushdown must be configured when you create a Spark external table. Predicate pushdown uses the following rules:
When ADN and NOT are the only logical operators in the filter conditions, you can specify whether to enable predicate pushdown.
When the filter conditions contain logical operator OR , all predicates are pushed down. Custom configurations for predicate pushdown such as push.down.range.long=false and push.down.range.string=false (configured by using SQL in EMR) become invalid.
The following table describes the combination of logical operators with pushdown configurations, examples of SQL statements, and expected results.
Logical operator | Pushdown configuration | Example of SQL statement | Expected result |
---|---|---|---|
AND |
| select * from table where val_long1 > 1000 and val_long1 is null and name like 'table%' and pk in {12341,213432}; | The SQL statement is executed. All predicates are pushed down. |
AND only |
| select * from table where val_long1 > 1 and name like 'table%'; | Predicates where operators that are used to compare values with LONG column values are not pushed down. The results returned based on the predicate are filtered by the Spark compute layer. The Spark compute layer obtains the data based on the condition name like 'table%'. The condition val_long1 > 1 is executed by using the business code. |
AND only |
| select * from table where val_string1 > 'string1' and name like 'table%'; | Predicates where operators that are used to compare values with STRING column values are not pushed down. The results returned based on the predicate are filtered by the Spark compute layer. The Spark compute layer obtains the data based on the condition name like 'table%'. The condition val_long1 > 1 is executed by using the business code. |
AND only | Columns containing values for geographical locations exist | select * from table where val_geo = '{"centerPoint":"3,0", "distanceInMeter": 100000}' and val_long1 = 37691900 and val_long2 > 2134234; | The SQL statement is executed. The configurations of push.down.range.long determine whether the Spark compute layer filters the results returned based on the condition val_long2 > 2134234. |
OR |
| select * from table where val_long1 > 1000 or val_long1 is null or name like 'table%' and pk in {12341,213432}; | The SQL statement is executed. All predicates are pushed down. |
OR | Columns containing values for geographical locations exist | select * from table where val_geo = '{"centerPoint":"3,0", "distanceInMeter": 100000}' or val_long1 = 37691900; | The SQL statement is executed. All predicates are pushed down. |
OR |
| select * from table where val_long1 > 1 or name like 'table%'; | The custom configurations for predicate pushdown become invalid. All predicates are pushed down. |
OR |
| select * from table where val_string1 > 'string1' or name like 'table%'; | The custom configurations for predicate pushdown become invalid. All predicates are pushed down. |