All Products
Search
Document Center

Limit-k

Last Updated: Jan 04, 2020

Limit-k is an operator that constrains the number of returned rows to k. In ApsaraDB for OceanBase, Limit-k allows you to obtain k rows of data from a specified offset. The Limit-k statement is in the form of Limit offset, k.

Limit-k is often used with the ORDER BY clause to obtain the top k rows. When the ORDER BY and Limit-k clauses are used together, the ORDER BY operator may be changed to a Top-k sort operator depending on the cost model. The following figure shows an example in which the ORDER BY operator is replaced with a Top-k sort operator in a Limit-k statement.

  1. OceanBase (root@test)> create table t1(a int primary key, b int, c int);
  2. Query OK, 0 rows affected (0.14 sec)
  3. --The sort operator for the Limit-k clause is changed to a Top-k sort
  4. OceanBase (root@test)> explain select * from t1 order by b limit 0, 10;
  5. | =====================================
  6. |ID|OPERATOR |NAME|EST. ROWS|COST|
  7. -------------------------------------
  8. |0 |LIMIT | |10 |1151|
  9. |1 | TOP-N SORT | |10 |1149|
  10. |2 | TABLE SCAN|t1 |1000 |455 |
  11. =====================================
  12. Outputs & filters:
  13. -------------------------------------
  14. 0 - output([t1.a], [t1.b], [t1.c]), filter(nil), limit(10), offset(0)
  15. 1 - output([t1.a], [t1.b], [t1.c]), filter(nil), sort_keys([t1.b, ASC]), topn(10 + 0)
  16. 2 - output([t1.a], [t1.b], [t1.c]), filter(nil),
  17. access([t1.a], [t1.b], [t1.c]), partitions(p0)

The Limit-k statement is pushed down to the storage layer in some cases. The following figure shows an example of this. The primary key is “a”, so the sorting operator is eliminated and the limit operator is executed at the storage layer.

  1. OceanBase (root@test)> create table t1(a int primary key, b int, c int);
  2. Query OK, 0 rows affected (0.14 sec)
  3. OceanBase (root@test)> explain select * from t1 order by a limit 100, 10;
  4. | ===================================
  5. |ID|OPERATOR |NAME|EST. ROWS|COST|
  6. -----------------------------------
  7. |0 |TABLE SCAN|t1 |10 |59 |
  8. ===================================
  9. Outputs & filters:
  10. -------------------------------------
  11. 0 - output([t1.a], [t1.b], [t1.c]), filter(nil),
  12. access([t1.a], [t1.b], [t1.c]), partitions(p0),
  13. limit(10), offset(100)

In most cases, the performance of a Limit-k statement depends on the data distribution. For example, a table has 1 million rows of data with 10 thousand rows that satisfy b = 1. If the number of rows that satisfy b = 1 is less than 100, a full table scan is needed to find all the data. The performance of the query has the following two extreme cases:

  1. In the best case, if all the first 100 rows of data scanned in the table satisfy the conditions, the query only needs to scan 100 rows.

  2. In the worst case, if the 10 thousand rows that satisfy the condition are all located at the end, the query must scan nearly 1 million rows to find the 100 rows that satisfy the condition.

  1. OceanBase (root@test)> create table t1(a int primary key, b int, c int);
  2. Query OK, 0 rows affected (0.14 sec)
  3. OceanBase (root@test)> explain select * from t1 where b = 1 limit 100;
  4. | ===================================
  5. |ID|OPERATOR |NAME|EST. ROWS|COST|
  6. -----------------------------------
  7. |0 |TABLE SCAN|t1 |2 |622 |
  8. ===================================
  9. Outputs & filters:
  10. -------------------------------------
  11. 0 - output([t1.a], [t1.b], [t1.c]), filter([t1.b = 1]),
  12. access([t1.b], [t1.a], [t1.c]), partitions(p0),
  13. limit(100), offset(nil)

In this example, the data distribution heavily affects the performance of the Limit-k statement. Therefore, we recommend that you use Limit-k statements with caution.