All Products
Search
Document Center

Dynamic Content Delivery Network:Configure the bot management module

Last Updated:Jan 04, 2024

You can configure the bot management module to create anti-crawler rules for websites, HTML5 pages, and native iOS and Android apps.

Prerequisites

Configure anti-crawler rules for websites

If your web pages, HTML5 pages, or HTML5 apps are accessible from browsers, you can configure anti-crawler rules for the websites to protect your services from malicious crawlers.

  1. Log on to the DCDN console.

  2. In the left-side navigation pane, choose WAF > Protection Policies.

  3. On the Protection Policies page, click Create Policy.

  4. On the Create Policy page, configure the parameters. The following table describes the parameters.

  5. Section

    Parameter

    Description

    Policy Information

    Policy Type

    Select Bot Management.

    Policy Name

    The name of the protection policy. The name can be up to 64 characters in length and can contain letters, digits, and underscores (_).

    Global Configurations

    Service Type

    Select Websites. This way, WAF protects web pages, HTML5 pages, and HTML5 apps.

    Web SDK Integration

    • Automatic Integration (Recommended)

      WAF provides Web SDK for JavaScript to improve protection performance and prevent incompatibility issues.

      If you enable automatic integration, WAF automatically references the SDK in the HTML pages of the website that you want to protect. Then, the SDK collects information such as browser information, probe signatures, and malicious behaviors. Sensitive information is not collected. WAF detects and blocks malicious crawlers based on the collected information.

    • Manual Integration

      If automatic integration is not supported, you can use manual integration.

    For more information, see Integrate the Web SDK into web applications.

    Traffic Characteristics

    You can add match conditions to identify traffic that is destined for the domain name of the website that you want to protect. To add a condition, you must specify the match field, logical operator, and match content. The match field is a header field of HTTP requests. For more information about match fields, see Match conditions.

    Legitimate Bot Management

    Spider Whitelist

    After you enable this feature, the crawler library is dynamically updated and contains the crawler IP addresses of mainstream search engines, including Google, Baidu, Sogou, 360, Bing, and Yandex.

    After you select search engines, requests that are sent from the crawler IP addresses of the search engines are sent to the origin server. Then, the bot management module no longer checks the requests.

    Bot Characteristic Detection

    Script-based Bot Block (JavaScript)

    After you enable this feature, WAF performs JavaScript validation on clients. To prevent simple script-based attacks, traffic from non-browser tools that cannot run JavaScript is blocked.

    Advanced Bot Defense (Dynamic Token-based Authentication)

    After you enable this feature, WAF verifies the signature of each request. Requests that fail signature verification are blocked. Signature Verification Exception is selected by default and cannot be cleared. Requests that do not contain signatures or requests that contain invalid signatures are detected. You can also select Signature Timestamp Exception and WebDriver Attack.

    Bot Behavior Detection

    AI Intelligent Protection

    After you enable this feature, the intelligent protection engine analyzes access traffic and performs machine learning. Then, a blacklist or a protection rule is generated based on the analysis results and learned patterns.

    • Monitor: The anti-crawler rule allows traffic that matches the rule and records the traffic in security reports.

    • Slider CAPTCHA: Clients must pass slider CAPTCHA verification before the clients can access the website that is protected by WAF.

    Custom Throttling

    IP Address Throttling (Default)

    You can configure throttling conditions for IP addresses. If the number of requests from the same IP address within the value specified by Statistical Interval (Seconds) exceeds the value of Threshold (Times), WAF performs the specified action on subsequent requests. The action can be specified by selecting Slider CAPTCHA, Block, or Monitor from the Action drop-down list. You can also configure the Throttling Interval (Seconds) parameter, which specifies the period during which the specified action is performed. You can configure up to three throttling conditions. For more information, see Custom rule parameters.

    Custom Session Throttling

    You can configure throttling conditions for sessions. You can configure the Session Type parameter to specify the session type. If the number of requests from the same IP address within the value specified by Statistical Interval (Seconds) exceeds the value of Threshold (Times), WAF performs the specified action on subsequent requests. The action can be specified by selecting Slider CAPTCHA, Block, or Monitor from the Action drop-down list. You can also configure the Throttling Interval (Seconds) parameter, which specifies the period during which the specified action is performed. You can configure up to three throttling conditions. For more information, see Custom rule parameters.

    Bot Threat Intelligence

    Bot Threat Intelligence Library

    The library contains the IP addresses of attackers that have sent multiple requests to crawl content from Alibaba Cloud users over a specific period of time.

    You can set the protection mode to Monitor or Slider CAPTCHA.

    Data Center Blacklist

    After you enable this feature, the IP addresses in the selected IP address libraries of data centers are blocked. If you use the source IP addresses of public clouds or data centers to access the website that you want to protect, you must add the IP addresses to the whitelist. For example, you must add the callback IP addresses of Alipay or WeChat and the IP addresses of monitoring applications to the whitelist. The data center blacklist supports the following IP address libraries: IP Address Library of Data Center-Alibaba Cloud, IP Address Library of Data Center-21Vianet, IP Address Library of Data Center-Meituan Open Services, IP Address Library of Data Center-Tencent Cloud, and IP Address Library of Data Center-Other.

    You can set the Actions parameter to Monitor, Slider CAPTCHA, or Block.

    Fake Spider Blocking

    After you enable this feature, WAF blocks the User-Agent headers that are used by all search engines specified in the Legitimate Bot Management section. If the IP addresses of clients that access the search engines are proved to be valid, WAF allows requests from the search engines.

    Protected Domain Names

    Select Association Mode

    • Add and replace the original associated policy: disassociate the associated policy and replace it with the current policy.

    • Add and keep the original associated policy: add the current policy and retain the associated policy.

    Protected Domain Names

    The domain names that you want to associate with the current protection policy.

    Note
    • You can associate a protected domain name with only one protection policy of the same policy type.

      If the domain name is associated with another protection policy of the same type, the domain name is associated with the current policy after you configure the current policy for the domain name.

    • You cannot configure bot protection for DCDN-accelerated domain names for which WebSocket is enabled. WebSocket content is encrypted, in which attack characteristics cannot be detected. For more information about how to configure WebSocket, see Configure WebSocket.

  6. Click Create Policy.

    By default, the protection policy that you created is enabled.

Configure anti-crawler rules for apps

You can configure anti-crawler rules for your native iOS or Android apps to protect your services against crawlers. HTML5 apps are not native iOS or Android apps.

  1. Log on to the DCDN console.

  2. In the left-side navigation pane, choose WAF > Protection Policies.

  3. On the Protection Policies page, click Create Policy.

  4. On the Create Policy page, configure the parameters. The following table describes the parameters.

    Section

    Parameter

    Description

    Policy Information

    Policy Type

    Select Bot Management.

    Policy Name

    The name of the protection policy. The name can be up to 64 characters in length and can contain letters, digits, and underscores (_).

    Global Configurations

    Service Type

    Select App to protect native iOS and Android apps.

    Web SDK Integration

    WAF provides the Anti-Bot SDK to improve protection performance for native Android and iOS apps. After the Anti-Bot SDK is integrated, the Anti-Bot SDK collects the risk characteristics of clients and generates security signatures in requests. WAF identifies and blocks requests that are identified as unsafe based on the signatures. To obtain the SDK package, click Obtain and Copy an AppKey and then submit a ticket. For more information, see Integrate the Anti-Bot SDK into Android apps or Integrate the Anti-Bot SDK into iOS apps.

    Traffic Characteristics

    You can add match conditions to identify traffic that is destined for the domain name of the website that you want to protect. To add a condition, you must specify the match field, logical operator, and match content. The match field is a header field of HTTP requests. For more information about match fields, see Match conditions. You can add up to five match conditions.

    Bot Characteristic Detection

    Invalid App Signature

    By default, Invalid App Signature is selected and cannot be cleared. This feature blocks requests that include invalid signatures or do not include signatures after the Anti-Bot SDK is integrated.

    Abnormal Device Behavior

    After you enable this feature, WAF detects and controls the requests from the devices that have abnormal behaviors. The following behaviors are considered abnormal:

    • Expired Signature: The signature expires. This behavior is selected by default.

    • Using Simulator: A simulator is used.

    • Using Proxy: A proxy is used.

    • Rooted Device: A rooted device is used.

    • Debugging Mode: The debugging mode is used.

    • Hooking: Hooking techniques are used.

    • Multiboxing: Multiple protected app processes run on the device at the same time.

    • Simulated Execution: User behavior simulation techniques are used.

    • Script Tools: An automatic script is used.

    Custom Signature Field

    Select Header, Parameter, or Cookie from the Field Name drop-down list and enter your custom signature in the Value field.

    If the custom signature is empty or has special characters or the length exceeds the limit, you can hash the signature or process the signature by using other methods and enter the processing result in the Value field.

    Action

    Select Monitor or Block based on your business requirements.

    • Monitor: triggers alerts and does not block requests.

    • Block: blocks requests.

    Secondary Packaging Detection

    Requests that are sent from apps whose package names or signatures are not in the whitelists are considered secondary packaging requests. You can specify valid application packages.

    • Valid Package Name: Enter the valid application package name. Example: example.aliyundoc.com.

    • Signature: Contact Alibaba Cloud technical support to obtain the signature. This parameter is optional if the package signature does not need to be verified. In this case, WAF verifies only the package name.

    Note

    The value of Signature is not the signature of the application certificate.

    You can add up to five valid iOS or Android app packages and the package names must be unique.

    Select Monitor or Block based on your business requirements.

    Throttling

    IP Address Throttling (Default)

    You can configure throttling conditions for IP addresses. If the number of requests from the same IP address within the value specified by Statistical Interval (Seconds) exceeds the value of Threshold (Times), WAF performs the specified action on subsequent requests. The action can be specified by selecting Block or Monitor from the Action drop-down list. You can also configure the Throttling Interval (Seconds) parameter, which specifies the period during which the specified action is performed. You can configure up to three throttling conditions. For more information, see Custom rule parameters.

    Device Throttling

    You can configure throttling conditions for devices. If the number of requests from the same device within the value specified by Statistical Interval (Seconds) exceeds the value of Threshold (Times), WAF performs the specified action on subsequent requests. The action can be specified by selecting Block or Monitor from the Action drop-down list. You can also configure the Throttling Interval (Seconds) parameter, which specifies the period during which the specified action is performed. You can configure up to three throttling conditions. For more information, see Custom rule parameters.

    Custom Session Throttling

    You can configure throttling conditions for sessions. You can configure the Session Type parameter to specify the session type. If the number of requests from the same session within the value specified by Statistical Interval (Seconds) exceeds the value of Threshold (Times), WAF performs the specified action on subsequent requests. The action can be specified by selecting Block or Monitor from the Action drop-down list. You can also configure the Throttling Interval (Seconds) parameter, which specifies the period during which the specified action is performed. You can configure up to three throttling conditions. For more information, see Custom rule parameters.

    Bot Threat Intelligence

    Bot Threat Intelligence Library

    The library contains the IP addresses of attackers that have sent multiple requests to crawl content from Alibaba Cloud users over a specific period of time.

    You can set the protection mode to Monitor or Slider CAPTCHA.

    Data Center Blacklist

    After you enable this feature, the IP addresses in the selected IP address libraries of data centers are blocked. If you use the source IP addresses of public clouds or data centers to access the website that you want to protect, you must add the IP addresses to the whitelist. For example, you must add the callback IP addresses of Alipay or WeChat and the IP addresses of monitoring applications to the whitelist. The data center blacklist supports the following IP address libraries: IP Address Library of Data Center-Alibaba Cloud, IP Address Library of Data Center-21Vianet, IP Address Library of Data Center-Meituan Open Services, IP Address Library of Data Center-Tencent Cloud, and IP Address Library of Data Center-Other.

    You can set the Actions parameter to Monitor, Slider CAPTCHA, or Block.

    Protected Domain Names

    Select Association Mode

    • Add and replace the original associated policy: disassociate the associated policy and replace it with the current policy.

    • Add and keep the original associated policy: add the current policy and retain the associated policy.

    Protected Domain Names

    The domain names that you want to associate with the current protection policy.

    Note

    You can associate a protected domain name with only one protection policy of the same policy type.

    If the domain name is associated with another protection policy of the same type, the domain name is associated with the current policy after you configure the current policy for the domain name.

  5. Click Create Policy.

    By default, the protection policy that you created is enabled.

Related API operations