All Products
Search
Document Center

Microservices Engine:bot-detect

Last Updated:Aug 28, 2024

The bot-detect plug-in is used to identify web crawlers and prevent web crawlers from crawling websites. This topic describes how to configure the bot-detect plug-in.

Plug-in type

Security protection plug-in.

Fields

Name

Data type

Required

Default value

Description

allow

array of string

No

-

The regular expression that is used to match User-Agent request headers. If a User-Agent request header is matched, the request is allowed.

deny

array of string

No

-

The regular expression that is used to match User-Agent request headers. If a User-Agent request header is matched, the request is blocked.

blocked_code

number

No

403

The HTTP status code that is returned if a request is blocked.

blocked_message

string

No

-

The HTTP response body that is returned if a request is blocked.

Note

If the allow and deny fields are not configured, the default logic to identify crawlers is executed. You can configure the allow field to allow requests that hit the default crawler identification logic. You can configure the deny field to add additional crawler identification logic.

Configuration examples

Allow requests that hit the default crawler identification logic

allow:
- ".*Go-http-client.*"

If you do not configure the allow field, the Golang web library request is considered as a crawler, and the request is blocked.

Add crawler identification logic

deny:
- "spd-tools.*"

The following requests are blocked based on the preceding configuration:

curl http://example.com -H 'User-Agent: spd-tools/1.1'
curl http://exmaple.com -H 'User-Agent: spd-tools'

Enable security protection rules for specific routes or domain names

Save the route-a and route-b routes with empty configuration data.

Apply the following plug-in configurations to the *.example.com and test.com domain names:

  allow:
  - ".*Go-http-client.*"
Note
  • The route-a and route-b routes are those specified when the gateway routes are created. If a client request matches one of the routes, the built-in default rules of the matched route are used to identify web crawlers.

  • The *.example.com and test.com domain names are used to match domain names in requests. If a client request matches one of the domain names, the rules that are configured for the matched domain name take effect.

  • Rules that you configure take effect in sequence. If the first rule is matched, subsequent rules are ignored.