The bot management module of WAF is upgraded to provide the scenario-specific configuration feature. You can configure anti-crawler rules based on your business requirements. This feature protects your business from malicious crawlers. This topic describes how to configure anti-crawler rules for websites.

Prerequisites

If you use a subscription WAF instance that runs the Pro, Business, or Enterprise edition, the bot management module is enabled.

Background information

The scenario-specific configuration feature allows you to configure anti-crawler rules based on your business requirements. This feature can be used in combination with intelligent algorithms to identify crawler traffic. In addition, this feature can automatically handle the crawler traffic that matches the configured anti-crawler rules. After you configure anti-crawler rules, you can verify the anti-crawler rules in the test environment. This prevents adverse effects, such as false positives and undesired protection results, on your websites or apps due to inappropriate rule configurations or incompatibility issues.

Configure anti-crawler rules for websites

  1. Log on to the Web Application Firewall console.
  2. In the top navigation bar, select the resource group to which the instance belongs and the region, Mainland China or International, in which the instance is deployed.
  3. In the left-side navigation pane, choose Protection Settings > Website Protection.
  4. In the upper part of the Website Protection page, select the domain name for which you want to configure the whitelist. Switch Domain Name
  5. Click the Bot Management tab. In the Scenario-specific Configuration section, click Start to create your first anti-crawler rule. Enable the scenario-specific configuration feature for the first time
    If you have created an anti-crawler rule, you can skip this step and click Add in the upper-right corner to create a rule.
  6. In the Configure Scenarios step, configure the basic information about the scenario in which you want to protect websites and click Next.
    Parameter description:
    • Scenario: Enter the type of scenario in which you want to protect websites. Examples: logon, registration, and order placement.
    • Service Type: Select Websites to protect web pages and the HTML5 pages. HTML5 apps are also protected.

      If you visit a protected website from an intermediate domain name, select Use Intermediate Domain Name. Then, select the intermediate domain name from the drop-down list.

    • Traffic Characteristics: Add match conditions for the requests to the websites that you want to protect by using the anti-crawler rule. For more information about the header fields in the match conditions, see Fields in match conditions.
  7. In the Configure Protection Rules step, configure the details about the anti-crawler rule and click Next.
    Parameter description:
    • Simple XSS Attack Blocking: If you enable this feature, WAF performs JavaScript validation on clients. The traffic from non-browser tools that cannot run JavaScript code is blocked. This way, simple XSS attacks are blocked.
    • Intelligent Protection: If you enable this feature, the intelligent protection engine analyzes and automatically learns access traffic patterns. Then, a blacklist or protection rule is generated based on the analysis results and learned patterns. You can set the Protection Mode parameter to Monitor or Slider CAPTCHA. If you set the Protection Mode parameter to Monitor, the anti-crawler rule allows the traffic that matches the rule and records the traffic in security reports. If you set the Protection Mode parameter to Slider CAPTCHA, clients are required to pass slider CAPTCHA verification before they can access the protected websites.
    • Bot Threat Intelligence Feed: The threat intelligence feed of Alibaba Cloud is used to identify IP addresses that frequently crawl content from Alibaba Cloud users. The clients that use these IP addresses are required to pass slider CAPTCHA verification before they access the protected websites.
    • Data Center Blacklist: This feature blocks the IP addresses in the blacklists for data centers of Alibaba Cloud and other mainstream cloud providers. If you enable this feature, WAF can prevent these addresses from initiating access requests to the protected websites. Data Center Blacklist
    • IP Address Throttling and Custom Session-based Throttling: If you enable these features, you can configure throttling conditions to filter abnormal requests. This way, HTTP flood attacks are mitigated.
      • IP Address Throttling: You can configure throttling conditions for IP addresses. If the number of requests from the same IP address within the specified time period exceeds the threshold, WAF applies a specified action to subsequent requests. You can also configure the period during which the specified action is performed. The action can be monitor, block, or slider CAPTCHA. You can add a maximum of three conditions. For more information, see Create a custom protection policy.
      • Custom Session-based Throttling: You can configure throttling conditions for sessions. If the number of requests from the same session within the specified time period exceeds the threshold, WAF applies a specified action to subsequent requests. You can also configure the period during which the specified action is performed. The action can be monitor, block, or slider CAPTCHA. For more information, see Create a custom protection policy.
  8. Optional:In the Verify Actions step, test the effectiveness of the anti-crawler rule.
    This step is optional. To skip this step, you can click Skip in the lower-left corner. Before you publish the anti-crawler rule, we recommend that you complete this step. This applies if it is the first time that you configure an anti-crawler rule. This way, false positives that are caused by incorrect configurations or incompatibility issues can be prevented.
    Parameter description:
    • Test Public IP Address: Enter the public IP address of your test device, for example, a computer or mobile phone. The test of the anti-crawler rule takes effect only for the public IP address. The test does not affect your business.
      Notice Do not enter the IP address that you obtain by running the ipconfig command. This command returns an internal IP address. If you want to obtain the public IP address of your test device, you can click Alibaba Network Diagnose Tool. Then, on the displayed page, you can search for local IP. You can also search for the IP address by using your browser.
    • Test Action: Test the effectiveness of the protection actions that you configure in the anti-crawler rule in the production environment. WAF generates a test rule that takes effect only for the test IP address. The actions include JS verification, slider CAPTCHA, and block.

      After you click Start Test for an action, WAF immediately sends the anti-crawler rule to the test device. WAF also provides the demonstration diagram and description of the test results. We recommend that you read the demonstration diagram and description of the test results.

      After the test is complete, you can click I Have Completed Test to go to the next step. If the test result is abnormal, you can click Go Back to optimize the anti-crawler rule. Then, implement the test again.

      For more information about the exceptions that may occur during a test and about the solutions to these exceptions, see FAQ.

  9. In the Preview and Publish Protection Rules step, confirm the content of the anti-crawler rule and click Publish.
    After the anti-crawler rule is published, the rule immediately takes effect.
    Note If this is the first time to create an anti-crawler rule, you cannot view the rule ID until you publish the rule. The rule ID is displayed on the Bot Management tab of the Security report page. You can use the ID of an anti-crawler rule to check for requests that match the rule in Log Service for WAF.

FAQ

Error Cause Solution
No valid test requests are detected. See WAF documentation or contact us to analyze the possible causes. The test request fails to be sent or is not sent to WAF. Verify that the test request is sent to the IP address that maps the CNAME provided by WAF.
The header field in the test request does not match the header fields that you configure in Traffic Characteristics in the anti-crawler rule. Modify the settings of Traffic Characteristics in the anti-crawler rule.
The source IP address of the test request is inconsistent with the public IP address that you specify in the anti-crawler rule. Use the correct public IP address. We recommend that you click Alibaba Network Diagnose Tool to query your public IP address.
The test requests failed the verification. See WAF documentation or contact us to analyze the possible causes. You do not simulate real user access. For example, you use the debugging mode or automation tools, but do not simulate real user access. Simulate real user access during the test.
An incorrect service type is selected. For example, you select Websites for app protection scenarios. Change the value of the Service Type parameter.
An intermediate domain name is used but is not correctly configured in the anti-crawler rule. Select Use an Intermediate Domain Name, and then select the intermediate domain name from the drop-down list.
Frontend incompatibility issues occur. Contact customer service in the DingTalk group or submit a ticket.
No verification is triggered. See WAF documentation or contact us to analyze the possible causes. The test rule is not sent. Perform the test several times until the anti-crawler test rule is sent.
No valid test requests are detected or blocked. See WAF documentation or contact us to analyze the possible causes. The test request fails to be sent or is not sent to WAF. Verify that the test request is sent to the IP address that maps the CNAME provided by WAF.
The header field in the test request does not match the header fields that you configure in Traffic Characteristics in the anti-crawler rule. Modify the settings of Traffic Characteristics in the anti-crawler rule.
The source IP address of the actual test request is inconsistent with the public IP address that you specify in the anti-crawler rule. Use the correct public IP address. We recommend that you click Alibaba Network Diagnose Tool to query your public IP address.

What to do next

Go to the Bot Management tab of the Security report page and view the protection results and the details of the requests that match the anti-crawler rule. Then, optimize the anti-crawler rule based on the protection results.