If memory is insufficient, AnalyticDB for PostgreSQL may temporarily spill query results to disk during the query execution process. However, disk access is slower than memory access. We recommend that you prevent operators from spilling to disk during the query execution process to improve query efficiency.

Causes for operator spills to disk

When SORT, JOIN, or HASH operations are performed on tables that have large amounts of data, query results may temporarily spill to disk due to insufficient memory. You can execute an EXPLAIN ANALYZE statement to view the execution plan and check whether operators have spilled to disk.

Causes for operator spills to disk

The preceding figure shows a sample execution plan in which operators have spilled to disk. A positive integer in the Workfile field indicates that operators have spilled to disk. The following figure shows a sample execution plan in which no operators have spilled to disk. The value in the Workfile field is 0.

Causes for operator spills to disk - 2

Operator spills to disk may be caused by the following reasons:

  • Small memory capacity available for queries
  • Excessive volume of computations required for queries
  • Data skew

The following section describes solutions for operator spills to disk in different scenarios.

Solutions for operator spills to disk in different scenarios

  • Small memory capacity available for queriesSmall memory capacity available for queries

    The preceding figure shows that only a small memory capacity is required for the queries, but operators still spill to disk. This is because the available memory capacity is decreased due to a limit imposed on the resource group or queue, or a disproportionately small value of statement_mem.

    In this case, you can execute the following statement to increase the statement_mem value and prevent operators from spilling to disk:

    SET statement_mem TO '256MB';
  • Excessive volume of computations required for queriesExcessive volume of computations required for queries

    Sometimes, operators still spill to disk even though a large memory capacity is configured by the statement_mem parameter. This is because a large volume of computations requires a larger memory capacity. You can decrease the volume of computations by executing ANALYZE statements or creating indexes.

    In the preceding figure, the t2 table that has a large number of rows is improperly evaluated as a single-row table. As a result, the table is broadcast in a hash join, which leads to a large volume of computations. In this case, you can execute an ANALYZE statement on the t2 table to eliminate operator spills to disk.

  • Data skew

    Data skew may cause the volume of data and computations on a compute node to be much larger than those on other compute nodes. This can lead to operator spills to disk and insufficient memory for queries. For information about how to identify and eliminate data skew, see Data skew elimination.