The bot management module of Web Application Firewall (WAF) provides the scenario-specific configuration feature. This feature allows you to configure anti-crawler rules based on your business requirements and helps protect your business from malicious crawlers. This topic describes how to configure anti-crawler rules for websites.

Background information

The scenario-specific configuration feature allows you to configure anti-crawler rules based on your business requirements. You can use this feature together with intelligent algorithms to identify crawler traffic in a more precise manner. The feature can also automatically handle the crawler traffic that matches the configured anti-crawler rules. After you configure anti-crawler rules, you can verify the rules in a test environment. This prevents adverse effects on your websites or apps caused by inappropriate rule configurations or compatibility issues. The adverse effects include false positives and undesired protection results.

Prerequisites

  • Subscription WAF instance: If your WAF instance runs the Pro, Business, or Enterprise edition, the Bot Management module is enabled. For more information, see Purchase a WAF instance.
  • Your website is added to WAF.

    For more information, see Tutorial.

Procedure

  1. Log on to the WAF console.
  2. In the top navigation bar, select the resource group and region to which the WAF instance belongs. The region can be Mainland China or International.
  3. In the left-side navigation pane, choose Protection Settings > Website Protection.
  4. In the upper part of the Website Protection page, select the domain name for which you want to configure a whitelist. Switch Domain Name
  5. If you have not created an anti-crawler rule, click the Bot Management tab. In the Scenario-specific Configuration section, click Start to create an anti-crawler rule. If you have created an anti-crawler rule, click Add in the upper-right corner of the Bot Management tab to create an anti-crawler rule.
    Note You can create up to 50 anti-crawler rules for a domain name.
  6. In the Configure Scenarios step, configure the basic information about the website that you want to protect and click Next.
    Parameter Description
    Scenario Specify the type of service scenarios in which you want to protect the domain name. Examples: logon, registration, and order placement.
    Service Type Select Websites. This way, WAF protects web pages and HTML5 pages. HTML5 apps are also protected.

    If the domain name of the website that you want to protect is accessed from a different domain name, you must select Use Intermediate Domain Name. Then, select the intermediate domain name from the drop-down list.

    Traffic Characteristics Add match conditions to identify traffic destined for the domain name of the website that you want to protect. To add a condition, you must specify the matching field, logical operator, and matching content. The matching field is a header field of HTTP requests. For more information about the matching fields, see Fields in match conditions. You can add up to five match conditions.
    Notice After you enter an IP address, you must press Enter.
  7. In the Configure Protection Rules step, configure detailed settings for the anti-crawler rule and click Next.
    Parameter Description
    Script-based Bot Block If you turn on this switch, WAF performs JavaScript validation on clients. The traffic from non-browser tools that cannot run JavaScript code is blocked. This way, simple script-based attacks are blocked.
    Dynamic Token Challenge By default, the switch is turned off. If you turn on this switch, WAF performs signature verification on each request. Requests that fail signature verification are blocked. Signature Verification Exception is selected by default and cannot be deselected. Requests that do not contain signatures or requests that contain invalid signatures are detected. You can also select Signature Timestamp Exception and WebDriver Attack.
    Intelligent Protection If you turn on this switch, the intelligent protection engine analyzes access traffic and performs machine learning. Then, a blacklist or a protection rule is generated based on the analysis results and learned patterns. You can set the Protection Mode parameter to Monitor or Slider CAPTCHA. If you set the Protection Mode parameter to Monitor, the anti-crawler rule allows the traffic that matches the rule and records the traffic in security reports. If you set the Protection Mode parameter to Slider CAPTCHA, clients are required to pass slider CAPTCHA verification before they can access the protected domain name.
    Bot Threat Intelligence Feed If you turn on this switch, the threat intelligence library of Alibaba Cloud is used to identify the IP addresses that are frequently used to crawl content from Alibaba Cloud users. The clients that use these IP addresses are required to pass slider CAPTCHA verification before they can access the protected domain name.
    Data Center Blacklist If you turn on this switch, you must select libraries from the drop-down list. This way, WAF blocks access requests from IP addresses in the libraries to the protected domain name. The libraries contain known malicious IP addresses from the data centers of Alibaba Cloud and other mainstream cloud providers. Data Center Blacklist
    IP Address Throttling If you turn on this switch, you can configure throttling conditions to filter out the requests that are frequently initiated for crawling. This way, HTTP flood attacks are mitigated.

    You can configure throttling conditions for IP addresses. If the number of requests from the same IP address within the specified time period exceeds the threshold, WAF performs the specified action on subsequent requests. You can also configure the period during which the specified action is performed. The action can be Monitor, block, or Captcha. You can configure up to three throttling conditions. For more information, see Create a custom protection policy.

    Custom Session-based Throttling If you turn on this switch, you can configure custom throttling conditions to filter out the requests that are frequently initiated for crawling. This way, HTTP flood attacks are mitigated.

    You can configure throttling conditions for sessions. If the number of requests from the same session within the specified time period exceeds the threshold, WAF performs the specified action on subsequent requests. You can also configure the period during which the action is performed. The action can be Monitor, block, or Captcha. For more information, see Create a custom protection policy.

  8. Optional:In the Verify Actions step, test the effectiveness of the anti-crawler rule.
    This step is optional. To skip this step, you can click Skip in the lower-left corner. If this is your first time to configure an anti-crawler rule, we recommend that you complete this step before you publish the anti-crawler rule. This helps prevent the false positives that are caused by inappropriate configurations or compatibility issues.
    Test steps:
    1. Step 1: Enter a public IP address.: Enter the public IP address of your test device, such as a computer or mobile phone. The test of the anti-crawler rule takes effect only for the public IP address. The test does not affect your business.
      Notice Do not enter the IP address that you obtain by running the ipconfig command. This command returns an internal IP address. If you want to obtain the public IP address of your test device, you can click Alibaba Network Diagnose Tool. On the page that appears, search for Local IP. The value of Local IP is the public IP address of your test device. You can also use a browser to search for the IP address of your test device.
    2. Step 2: Select an action.: Test the effectiveness of a protection action that you specify in the anti-crawler rule. WAF generates a test rule only for the specified IP address. The action can be JavaScript Validation, Dynamic Token-based Authentication, Slider CAPTCHA Verification, or Block Verification.

      After you click Start Test for an action, WAF immediately delivers the test rule to the test device. In the dialog box that appears, WAF provides the test procedure, expected result, and demonstration. We recommend that you carefully read them.

      After the test is complete, you can click I Have Completed Test to go to the next step. If the test result shows exceptions, you can click Go Back to optimize the anti-crawler rule. Then, perform the test again.

      For more information about the exceptions that may occur during a test and the solutions to these exceptions, see FAQ.

  9. In the Preview and Publish Protection Rules step, confirm the content of the anti-crawler rule and click Publish.
    After the anti-crawler rule is published, the rule immediately takes effect.
    Note If this is your first time to create an anti-crawler rule, you cannot view the rule ID until the rule is published. The rule ID is displayed on the Bot Management tab of the Security Report page. You can use the ID of an anti-crawler rule to check for requests that match the rule in Log Service for WAF.

FAQ

Error Cause Solution
No valid test requests are detected. See WAF documentation or contact us to analyze the possible causes. The test request fails to be sent or is not sent to WAF. Make sure that the test request is sent to the IP address that maps the CNAME provided by WAF.
The header fields in the test request do not match the header fields that you specify for Traffic Characteristics in the anti-crawler rule. Modify the settings of Traffic Characteristics in the anti-crawler rule.
The source IP address of the test request is different from the public IP address that you specify in the anti-crawler rule. Use the correct public IP address. We recommend that you click Alibaba Network Diagnose Tool to obtain your public IP address.
The test requests failed the verification. See WAF documentation or contact us to analyze the possible causes. No real user access is simulated. For example, the debugging mode or automation tools are used. Simulate real user access during the test.
An incorrect service type is selected. For example, Websites is selected when you configure an anti-crawler rule for apps. Change the value of the Service Type parameter.
An intermediate domain name is used, but an incorrect intermediate domain name is selected in the anti-crawler rule. Select Use Intermediate Domain Name. Then, select the correct intermediate domain name from the drop-down list.
Compatibility issues occur in the frontend. Contact customer service in the DingTalk group or submit a ticket.
No verification is triggered. See WAF documentation or contact us to analyze the possible causes. No test rules are generated. Perform the test several times until the test rule is generated.
No valid test requests are detected or blocked. See WAF documentation or contact us to analyze the possible causes. The test request fails to be sent or is not sent to WAF. Make sure that the test request is sent to the IP address that maps the CNAME provided by WAF.
The header fields in the test request do not match the header fields that you configure for Traffic Characteristics in the anti-crawler rule. Modify the settings of Traffic Characteristics in the anti-crawler rule.
The source IP address of the test request is different from the public IP address that you specify in the anti-crawler rule. Use the correct public IP address. We recommend that you click Alibaba Network Diagnose Tool to obtain your public IP address.

What to do next

Go to the Bot Management tab of the Security report page and view the protection results and the details of the requests that match the anti-crawler rule. Then, optimize the anti-crawler rule based on the protection results.