After you add a website to Web Application Firewall (WAF), you can enable data risk control for the added website. Data risk control is used to protect crucial website services against attacks. These services include registrations, logons, campaigns, and forums. You can customize data risk control rules based on your business requirements.

Prerequisites

  • A WAF instance is purchased. The instance meets the following requirements:
    • The instance is billed on a subscription basis.
      • The instance is deployed in mainland China.
        Note Instances that are deployed outside mainland China do not support data risk control.
      • Bot Management is enabled. This feature is a value-added service.
  • Your website is added to WAF. For more information, see Add websites.
Notice WAF provides the scenario-specific configuration feature. You can configure anti-crawler rules based on your business requirements to precisely protect your business from malicious crawlers. If you want to protect your website against malicious crawlers, we recommend that you use the scenario-specific configuration feature. After you configure the anti-crawler rules, you no longer need to configure data risk control rules. This is because the two types of rules can both prevent malicious crawlers. Alibaba Cloud no longer provides updates and maintenance for the data risk control feature.

Background information

The data risk control feature is based on Alibaba Cloud big data. This feature uses industry-leading engines for risk decision-making and integrates human-machine identification technologies to protect crucial services against attacks in various scenarios. To use data risk control, you need only to add your website to WAF. You do not need to configure servers or clients.

Data risk control is suitable for a wide range of scenarios. These scenarios include spam user registration, SMS flood attacks, dictionary attacks, brute-force attacks, auto-purchase bots, promotion abuse, snatcher bots, vote manipulation, and spam.

The following figure shows how data risk control protects your website. For more information about the scenarios and protection effects of data risk control, see Examples.

Compatibility

Data risk control is suitable only for web pages or HTML5 environments. In some cases, the JavaScript plug-in that is inserted into web pages may be incompatible with the web pages. This results in errors in CAPTCHA verification. The following web pages may encounter incompatibility issues:

  • Static web pages that you can visit by using their URLs and web pages to which you can be redirected by modifying location.href, or by using the window.open method or the anchor tag <a>. The static web pages include HTML details pages, shared pages, website homepages, and documents.
  • Web pages where you rewrite and commit code or web pages where you submit custom requests, such as when you submit forms, rewrite XMLHttpRequest (XHR), and send custom Ajax requests.
  • Web pages whose code makes use of webhooks.

After you enable data risk control, we recommend that you select the warn mode and use data risk control together with the Log Service for WAF feature. This allows you to run a compatibility test. For more information, see Overview.

To protect native apps, we recommend that you use the Anti-Bot SDK. For more information, see Configure application protection.

Procedure

  1. Log on to the Web Application Firewall console.
  2. In the top navigation bar, select the resource group to which the instance belongs and the region, Mainland China or International, in which the instance is deployed.
  3. In the left-side navigation pane, choose Protection Settings > Website Protection.
  4. In the upper part of the Website Protection page, select the domain name for which you want to configure the whitelist. Switch Domain Name
  5. Click the Bot Management tab, find the Data Risk Control section, and then click Settings. Data Risk Control
    Parameter Description
    Status The switch that you use to enable or disable data risk control. After you enable data risk control for a website, WAF inserts a JavaScript plug-in into specific or all web pages of the website. Reactive elements on the web pages are returned to visitors as compressed files that are not in the GZIP format. No further configurations are required, regardless of whether your website uses non-standard ports.
    Note
    • If you want to specify the Mode parameter and configure protection rules, you must enable data risk control.
    • If data risk control is enabled, the requests that are destined for your website are checked. You can configure a whitelist for the bot management module. This way, the requests that match the rule bypass the check. For more information, see Configure a whitelist for Bot Management.
    Mode The mode for data risk control. Valid values:
    • Strict Interception: If WAF detects that your website is under attack, requests are required to pass strict multi-factor authentication.
    • Block: If WAF detects that your website is under attack, requests are required to pass multi-factor authentication.
    • Warn: If WAF detects that your website is under attack, requests are forwarded to your website. However, events that are related to the requests are recorded. You can view the detailed information in risk reports.
      Note By default, the Mode parameter is set to the Warn mode. In this mode, data risk control does not block requests. However, WAF inserts a JavaScript plug-in into static web pages to analyze client behavior.
  6. Add a data risk control rule.
    1. On the Data Risk Control page, click the Protection Request tab and click Add Protection Request.
    2. In the Add Protection Request dialog box, enter the URL that you want to protect in the Protection Request URL field.
      For more information, see Introduction to a protected URL. Add a URL for protection
    3. Click Confirm.
    A newly added URL takes effect in about 10 minutes. You can view the newly added URL in the URL list. You can also modify or delete the URL based on your business requirements.
  7. Optional:Specify the web page into which you want to insert the JavaScript plug-in.
    Some code of web pages may be incompatible with the JavaScript plug-in. In this case, we recommend that you insert the JavaScript plug-in into only the pages that are compatible with the plug-in.
    Note If the JavaScript plug-in is inserted into only the pages that are compatible with the plug-in, data risk control may fail to obtain all visitor behavior. This compromises the effectiveness of data risk control.
    1. On the Data Risk Control page, click the Insert JavaScript into Webpage tab.
    2. Select Insert JavaScript into Specific Webpage and click Add Webpage. Insert the JavaScript plug-in into specific web pages
      Note You can add a maximum of 20 URL paths for the web pages.
    3. In the Add URL dialog box, enter the URL path of the web pages into which you want to insert the JavaScript plug-in and click Confirm. The URL path must start with a forward slash (/). Add URL
    After you add the URL path, data risk control inserts the JavaScript plug-in into all the web pages in the URL path.

After data risk control is enabled, you can use Log Service for WAF to monitor the protection results. For more information, see View protection results.

Introduction to a protected URL

A protected URL is the endpoint that is used to perform service operations. The protected URL is different from the URL of the web page. The following figure shows a registration page whose URL is www.abc.com/new_user. The endpoint that you can use to obtain verification codes is www.abc.com/getsmscode, whereas the endpoint that you can use to register is www.abc.com/register.do.

In this example, you must add www.abc.com/getsmscode and www.abc.com/register.do as protected URLs. This way, WAF can protect the URLs from SMS flood attacks and spam user registration. If you add www.abc.com/new_user as a protected URL, regular visitors are also required to pass CAPTCHA verification. This impairs the user experience.

Precautions for protected URLs
  • Protected URLs support only an exact match and do not support a fuzzy match.

    For example, if you add www.test.com/test as a protected URL, data risk control filters only the requests that are sent to this URL. Data risk control does not filter the requests that are sent to the subdirectories of this URL.

  • Data risk control protects traffic based on website directories.

    If you add www.abc.com/book/* as a protected URL, data risk control filters the requests that are sent to the web pages in all the subdirectories of www.abc.com/book. We recommend that you do not configure data risk control to monitor the entire website. If you add www.abc.com/* as a protected URL, regular visitors are required to pass CAPTCHA verification before they can visit the website homepage. This impairs the user experience.

  • Requests that are sent to a protected URL always trigger CAPTCHA verification. Make sure that regular visitors cannot directly request a protected URL. Regular visitors are required to pass multi-factor authentication before they can visit the protected URL.
  • Data risk control does not apply to websites that support API operations. API calls are machine actions and cannot pass the CAPTCHA verification of data risk control. However, if a regular visitor clicks a button on a page to call an API operation, data risk control still works.

View protection results

You can use the Log Service for WAF feature to view the protection results.

  • Allowed requests
    The following figure shows a request that passes the CAPTCHA verification of data risk control. The URL of the request includes a parameter that starts with u_a. WAF forwards the request to the origin server, and the origin server returns the Status: 200 response to the visitor. Logs for requests that pass the CAPTCHA verification of data risk control and are forwarded to an origin server
  • Blocked requests

    The following figure shows a request that is blocked by data risk control. In most cases, a request that is directly sent to a URL does not start with u_a or starts with a forged u_a parameter. WAF blocks this type of request, and the responses from the origin server are not detected in request logs.

    Logs, data risk control, block

After you enable Log Service for WAF, choose Advanced Search > URL Key Words and configure the URLs that you want to protect by using data risk control. You can monitor the status of data risk control and view the blocked requests. For more information, see Use full logs.

Examples

User Tom has a website whose domain name is www.abc.com. Regular visitors can register as website members at www.abc.com/register.html. Tom notices that attackers can use malicious scripts to submit registration requests and create accounts. The accounts created by attackers are used to participate in prize draws that are held by the website. The registration requests are highly similar to normal requests, and the request rate is maintained at a normal level. In this case, the HTTP flood protection policy cannot identify this type of malicious requests.

Sample configurations

Tom configures WAF for the website and enables data risk control for the www.abc.com domain name. The URL of the most crucial registration service is www.abc.com/register.html. Therefore, Tom sets this URL as a protected URL.

Protection results

After the configurations take effect, data risk control inserts a JavaScript plug-in into all web pages of the website. This allows Tom to monitor and analyze the behavior of each visitor to www.abc.com. The web pages into which a JavaScript plug-in is inserted include the homepage and subpages. Then, data risk control determines whether the behavior of each visitor is normal. Data risk control also determines whether a source IP address is malicious based on the big data reputation library of Alibaba Cloud.

When a visitor sends a registration request to www.abc.com/register.html, WAF determines whether the visitor is an attacker based on the visitor behavioral data generated from the time the visitor visits the website to the time the visitor submits the registration request. For example, if a visitor directly submits a registration request and does not perform other operations before the request is submitted, the request is identified as suspicious.
  • If data risk control determines that a request is from a regular visitor based on the past behavior of the visitor, the visitor can register accounts without verification.
  • If data risk control identifies a request as suspicious, or the source IP address has a record that indicates that the source IP address is used to send malicious requests, CAPTCHA verification is triggered to verify the identity of the visitor. Only the visitor that passes the verification can register accounts.
    • If CAPTCHA verification captures suspicious visitor behavior, such as the use of scripts to simulate real visitor behavior to pass CAPTCHA verification, data risk control uses other verification methods to verify the visitor identity until the visitor passes verification and is identified as a regular visitor.
    • If the visitor fails the verification, data risk control blocks the request.

During this process, data risk control is enabled for the entire website (www.abc.com). Data risk control inserts a JavaScript plug-in into all web pages of the website to analyze visitor behavior. However, protection and verification are required only for www.abc.com/register.html to which visitors submit registration requests. Data risk control is triggered only after a registration request is submitted.