The allowed crawlers function maintains a whitelist of authorized search engines, such as Google, Bing, Baidu, Sogou, 360, and Yandex. The crawlers of these search engines are allowed to access all pages on domain names.

Prerequisites

  • A WAF instance is purchased. The instance must meet the following requirements:
    • The instance is billed on a subscription basis.
    • Bot Management is enabled. This feature is a value-added service.

    For more information, see Purchase a WAF instance.

  • Your website is added to the WAF console. For more information, see Add domain names.

Background information

Rules defined in the function allow requests from specific crawlers to the target domain name based on the Alibaba Cloud crawler library. The Alibaba Cloud crawler library is updated in real time based on the analysis of network traffic that flows through Alibaba Cloud, and captures the characteristics of requests that are initiated from crawlers. The crawler library is updated dynamically and contains crawler IP addresses of mainstream search engines, including Google, Baidu, Sogou, 360, Bing, and Yandex.

After you enable the allowed crawlers function, requests initiated from the crawler IP addresses of the authorized search engines are directly sent to the target domain names. The bot management module no longer detects these requests.
Note To filter some requests from the crawler IP addresses, use the Access Control/Throttling module. For more information, see Create a custom protection policy.

Procedure

  1. Log on to the Web Application Firewall console.
  2. In the top navigation bar, select the resource group to which the instance belongs and the region, Mainland China or International, in which the instance is deployed.
  3. In the left-side navigation pane, choose Protection Settings > Website Protection.
  4. In the upper part of the Website Protection page, select the domain name for which you want to configure the whitelist.Switch Domain Name
  5. Click the Bot Management tab, find the Allowed Crawlers section. Then, turn onStatus and click Settings.Allowed Crawlers
  6. In the Allowed Crawlers list, find the target rule by Intelligence Name, and turn on Status.Set a rule to allow requests from specific crawlers
    The default rules only allow crawler requests from the following search engines: Google, Bing, Baidu, Sogou, 360, and Yandex. You can enable the Legit Crawling Bots rule to allow requests from all search engine crawlers.