All Products
Search
Document Center

Analytic functions

Last Updated: Jan 07, 2020

Overview

Analytic functions (also called window functions in some database services) are similar to aggregate functions. The calculation is based on a set of rows. The difference is that an aggregate function can return only one row in a group, while each window function can return one or more rows. Each row is based on the results computed by the window function. Analytic functions can improve queries that contain self-joins.

Supported functions

ApsaraDB for OceanBase supports the following analytic functions:

  • SUM

  • MIN

  • MAX

  • COUNT

  • AVG

  • GROUP_CONCAT

  • ROW_NUMBER

  • RANK

  • DENSE_RANK

  • PERCENT_RANK

  • CUME_DIST

  • FIRST_VALUE

  • LAST_VALUE

  • NTH_VALUE

  • NTILE

  • LEAD

  • LAG

A window is also called a frame. ApsaraDB for OceanBase supports the ROWS and RANGE options for the FRAME clauses. For ROWS, the frame is defined by beginning and ending row positions. For RANGE, the frame is defined by rows within a value range.

Execution plans

The operator for the analysis function in ApsaraDB for OceanBase is WINDOW FUNCTION. This operator requires the output of the lower-level operators to be partitioned and sorted. Therefore, a sort operator is assigned if necessary.

Example:

  1. mysql> explain select row_number() over (partition by table_id order by partition_id) from __all_root_table;
  2. +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
  3. | Query Plan |
  4. +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
  5. | ====================================================
  6. |ID|OPERATOR |NAME |EST. ROWS|COST|
  7. ----------------------------------------------------
  8. |0 |WINDOW FUNCTION| |1000 |2240|
  9. |1 | SORT | |1000 |2049|
  10. |2 | TABLE SCAN |__all_root_table|1000 |499 |
  11. ====================================================
  12. Outputs & filters:
  13. -------------------------------------
  14. 0 - output([T_WIN_FUN_ROW_NUMBER()]), filter(nil),
  15. win_expr(T_WIN_FUN_ROW_NUMBER()), partition_by([__all_root_table.table_id]), order_by([__all_root_table.partition_id, ASC]), window_type(RANGE), upper(UNBOUNDED PRECEDING), lower(UNBOUNDED FOLLOWING)
  16. 1 - output([__all_root_table.table_id], [__all_root_table.partition_id]), filter(nil), sort_keys([__all_root_table.table_id, ASC], [__all_root_table.partition_id, ASC])
  17. 2 - output([__all_root_table.table_id], [__all_root_table.partition_id]), filter(nil),
  18. access([__all_root_table.table_id], [__all_root_table.partition_id]), partitions(p0)

Eliminate sorting

If the optimizer recognizes that the output of the lower level is ordered, the optimizer does not assign a sort operator. In this case, (tenant_id, table_id) is the prefix of the primary key in the table __all_root_table.

  1. explain select row_number() over (partition by tenant_id, table_id order by partition_id) from __all_root_table;
  2. +----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
  3. | Query Plan |
  4. +----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
  5. | ====================================================
  6. |ID|OPERATOR |NAME |EST. ROWS|COST|
  7. ----------------------------------------------------
  8. |0 |WINDOW FUNCTION| |1000 |690 |
  9. |1 | TABLE SCAN |__all_root_table|1000 |499 |
  10. ====================================================
  11. Outputs & filters:
  12. -------------------------------------
  13. 0 - output([T_WIN_FUN_ROW_NUMBER()]), filter(nil),
  14. win_expr(T_WIN_FUN_ROW_NUMBER()), partition_by([__all_root_table.tenant_id], [__all_root_table.table_id]), order_by([__all_root_table.partition_id, ASC]), window_type(RANGE), upper(UNBOUNDED PRECEDING), lower(UNBOUNDED FOLLOWING)
  15. 1 - output([__all_root_table.tenant_id], [__all_root_table.table_id], [__all_root_table.partition_id]), filter(nil),
  16. access([__all_root_table.tenant_id], [__all_root_table.table_id], [__all_root_table.partition_id]), partitions(p0)
  17. |

Combine calculations

When multiple analytic functions share the same sorted subset during calculation, all the functions use the same operator instead of multiple independent operators. This reduces memory overhead and range checks.

As shown in the following example, the calculation of the two analytic functions row_number() and rank() can share the same sorted output, and only one sort operator is allocated.

  1. mysql> explain select row_number() over (partition by table_id order by partition_id), rank() over (partition by table_id, partition_id order by svr_ip) from __all_root_table;
  2. | ====================================================
  3. |ID|OPERATOR |NAME |EST. ROWS|COST|
  4. ----------------------------------------------------
  5. |0 |WINDOW FUNCTION| |1000 |3017|
  6. |1 | SORT | |1000 |2826|
  7. |2 | TABLE SCAN |__all_root_table|1000 |499 |
  8. ====================================================
  9. Outputs & filters:
  10. -------------------------------------
  11. 0 - output([T_WIN_FUN_ROW_NUMBER()], [T_WIN_FUN_RANK()]), filter(nil),
  12. win_expr(T_WIN_FUN_RANK()), partition_by([__all_root_table.table_id], [__all_root_table.partition_id]), order_by([__all_root_table.svr_ip, ASC]), window_type(RANGE), uppe r(UNBOUNDED PRECEDING), lower(UNBOUNDED FOLLOWING)
  13. win_expr(T_WIN_FUN_ROW_NUMBER()), partition_by([__all_root_table.table_id]), order_by([__all_root_table.partition_id, ASC]), window_type(RANGE), upper(UNBOUNDED PRECEDING ), lower(UNBOUNDED FOLLOWING)
  14. 1 - output([__all_root_table.table_id], [__all_root_table.partition_id], [__all_root_table.svr_ip]), filter(nil), sort_keys([__all_root_table.table_id, ASC], [__all_root_tabl e.partition_id, ASC], [__all_root_table.svr_ip, ASC])
  15. 2 - output([__all_root_table.table_id], [__all_root_table.partition_id], [__all_root_table.svr_ip]), filter(nil),
  16. access([__all_root_table.table_id], [__all_root_table.partition_id], [__all_root_table.svr_ip]), partitions(p0)

Store data on disks

Analytic functions also support storing intermediate data on disks during execution, and the logic for storing data is the same as sort operators.