The bot management module of Web Application Firewall (WAF) provides the scenario-specific configuration feature to protect your business from malicious crawlers. You can configure custom anti-crawler rules based on your business requirements.

Background information

Malicious crawlers come in various types. Crawling methods keep changing to bypass anti-crawler rules that are configured by website administrators. Therefore, fixed rules cannot block all malicious crawlers. The methods that are used to block malicious crawlers vary based on your business requirements. Security experts are also required to deliver optimal protection.

If you need strong protection against malicious crawlers or have no security experts to configure anti-crawler rules, we recommend that you use the scenario-specific configuration feature that is provided by WAF. WAF provides IP address libraries of malicious crawlers and updates the IP address libraries of various public clouds and data centers in real time based on network-wide threat intelligence of Alibaba Cloud. This way, normal crawler requests are allowed and malicious crawler requests from the addresses in the IP address libraries are blocked.

Risks and characteristics of malicious crawlers

Normal crawler requests contain the xxspider keyword in the User-Agent field and have the following characteristics: low request rate, scattered URLs, and wide time range. To obtain the source IP address that initiates a crawler request, run a reverse nslookup or tracert command on the crawler request. For example, if you run the reverse nslookup command with the IP address of the Baidu crawler, you can obtain the source IP address of the crawler. View the information about origin servers

Malicious crawlers may send a large number of requests to a specific URL or port of a domain name during a specific period of time. For example, HTTP flood attacks are disguised as crawlers or as requests from third parties to crawl sensitive information. A large number of malicious requests can cause increased CPU utilization, website access failures, and service interruptions.

Prerequisites

A WAF instance that runs Pro Edition or higher is purchased, and the bot management module is enabled.

References

Configure anti-crawler rules for websites

Configure anti-crawler rules for apps

Examples of using the scenario-specific configuration feature