All Products
Search
Document Center

Microservices Engine:bot-detect

Last Updated:Oct 23, 2023

The bot-detect plug-in is used to identify web crawlers and prevent web crawlers from crawling websites. This topic describes how to configure the bot-detect plug-in.

Plug-in type

Security protection plug-in.

Fields

Name

Data type

Required

Default value

Description

allow

array of string

No

-

The regular expression that is used to match User-Agent request headers. If a User-Agent request header is matched, the request is allowed.

deny

array of string

No

-

The regular expression that is used to match User-Agent request headers. If a User-Agent request header is matched, the request is blocked.

blocked_code

number

No

403

The HTTP status code that is returned if a request is blocked.

blocked_message

string

No

-

The HTTP response body that is returned if a request is blocked.

Note

If the allow and deny fields are not configured, the default logic to identify crawlers is executed. You can configure the allow field to allow requests that hit the default crawler identification logic. You can configure the deny field to add additional crawler identification logic.

Configuration examples

Allow requests that hit the default crawler identification logic

allow:
- ".*Go-http-client.*"

If you do not configure the allow field, the Golang web library request is considered as a crawler, and the request is blocked.

Add crawler identification logic

deny:
- "spd-tools.*"

The following requests are blocked based on the preceding configuration:

curl http://example.com -H 'User-Agent: spd-tools/1.1'
curl http://exmaple.com -H 'User-Agent: spd-tools'

Enable security protection rules for specific routes or domain names

# Use the _rules_ field to configure fine-grained rules.
_rules_:
# Rule 1: match by route name
- _match_route_:
  - route-a
  - route-b
# Rule 2: match by domain name
- _match_domain_:
  - "*.example.com"
  - test.com
  allow:
  - ".*Go-http-client.*"
Note
  • In this example, route-a and route-b specified in _match_route_ are the route names that you configure when you create the gateway routes. If one of the two routes is matched, this rule takes effect.

  • In this example, *.example.com and test.com specified in _match_domain_ are used to match domain names of requests. If one of the domain names is matched, this rule takes effect.

  • Rules in _rules_ take effect based on their sequences. If the first rule is matched, subsequent rules are ignored.