All Products
Search
Document Center

:Force scanning

Last Updated:Apr 07, 2025

This topic describes the guidelines for handling the "Force scanning" governance item.

Governance item ID: DG-C-44

Brute force scanning refers to the scanning of an entire table or large amounts of data during data computing due to improper use of partitions or filter conditions. This potentially results in a huge waste of computing resources. The "Force scanning" governance item not only exposes issues in the code but also guides the optimization of code logic, thereby effectively reducing consumption of computing resources.

Supported data sources

  • MaxCompute

Judgment conditions

A brute force scan is recognized if any of the following conditions is met:

  • The number of partitions queried in the table exceeds 90.

  • The total storage volume of the queried partitions exceeds 90 GB.

A brute force scan can be triggered in the following scenarios:

  • A large partitioned table is read with no partition filter conditions specified.

  • Partition filter conditions are specified for a partitioned table, but the time span during which the partitions are accessed is too large.

  • Incorrect partition filter conditions lead to ineffective restrictions.

Handling guide

You can use one of the following methods to prevent brute force scanning:

  • Add partition filter conditions, reduce the number of partitions scanned, or extract intermediate small tables and scan the historical partitions of the small tables, to reduce the amount of data scanned.

  • Specify partition filter conditions in a subquery.

  • For a table in which a large number of partitions need to be scanned and computed on a daily basis, you can extract the data that is repeatedly computed and store the data in an intermediate table once a day.

  • Check partition filter conditions and fix the invalidation of the partition filter conditions that is caused by coding errors.

Precautions

Resolving brute force scans must not compromise the actual business requirements. If you want to access data across a large time span, you can configure a whitelist or add an aggregate table for downstream access.