The allowed crawlers function maintains a whitelist of authorized search engines, such as Google, Bing, Baidu, Sogou and Yandex. The crawlers of these search engines are allowed to access all pages on domain names.
Background information
Rules defined in the function allow requests from specific crawlers to the target domain name based on the Alibaba Cloud crawler library. The Alibaba Cloud crawler library is updated in real time based on the analysis of network traffic that flows through Alibaba Cloud, and captures the characteristics of requests that are initiated from crawlers. The crawler library is updated dynamically and contains crawler IP addresses of mainstream search engines, including Google, Baidu, Sogou, Bing, and Yandex.
Prerequisites
- Subscription WAF instance: If your WAF instance runs the Pro, Business, or Enterprise edition, the Bot Management module is enabled.
Your website is added to WAF. For more information, see Tutorial.
Procedure
Log on to the WAF console. In the top navigation bar, select the resource group and the region in which the WAF instance is deployed. The region can be Chinese Mainland or Outside Chinese Mainland.
In the left-side navigation pane, choose .
In the upper part of the Website Protection page, select the domain name for which you want to configure protection from the Switch Domain Name drop-down list.
- Click the Bot Management tab, find the Allowed Crawlers section. Then, turn on Status and click Settings.
- In the Allowed Crawlers list, find the target rule by Intelligence Name, and turn on Status.The default rules only allow crawler requests from the following search engines: Google, Bing, Baidu, Sogou and Yandex. You can enable the Legit Crawling Bots rule to allow requests from all search engine crawlers.