Configure the bot-detect plug-in - Microservices Engine - Alibaba Cloud Documentation Center

The bot-detect plug-in is used to identify web crawlers and prevent web crawlers from crawling websites. This topic describes how to configure the bot-detect plug-in.

Plug-in type

Security protection plug-in.

Fields

Name	Data type	Required	Default value	Description
allow	array of string	No	-	The regular expression that is used to match User-Agent request headers. If a User-Agent request header is matched, the request is allowed.
deny	array of string	No	-	The regular expression that is used to match User-Agent request headers. If a User-Agent request header is matched, the request is blocked.
blocked_code	number	No	403	The HTTP status code that is returned if a request is blocked.
blocked_message	string	No	-	The HTTP response body that is returned if a request is blocked.

Note

If the allow and deny fields are not configured, the default logic to identify crawlers is executed. You can configure the allow field to allow requests that hit the default crawler identification logic. You can configure the deny field to add additional crawler identification logic.

Configuration examples

Allow requests that hit the default crawler identification logic

allow:
- ".*Go-http-client.*"

If you do not configure the allow field, the Golang web library request is considered as a crawler, and the request is blocked.

Add crawler identification logic

deny:
- "spd-tools.*"

The following requests are blocked based on the preceding configuration:

curl http://example.com -H 'User-Agent: spd-tools/1.1'
curl http://exmaple.com -H 'User-Agent: spd-tools'

Enable security protection rules for specific routes or domain names

# Use the _rules_ field to configure fine-grained rules.
_rules_:
# Rule 1: match by route name
- _match_route_:
  - route-a
  - route-b
# Rule 2: match by domain name
- _match_domain_:
  - "*.example.com"
  - test.com
  allow:
  - ".*Go-http-client.*"

Note

In this example, route-a and route-b specified in _match_route_ are the route names that you configure when you create the gateway routes. If one of the two routes is matched, this rule takes effect.
In this example, *.example.com and test.com specified in _match_domain_ are used to match domain names of requests. If one of the domain names is matched, this rule takes effect.
Rules in _rules_ take effect based on their sequences. If the first rule is matched, subsequent rules are ignored.