The bot-detect plug-in is used to identify web crawlers and prevent web crawlers from crawling websites. This topic describes how to configure the bot-detect plug-in.
Plug-in type
Security protection plug-in.
Fields
Name | Data type | Required | Default value | Description |
allow | array of string | No | - | The regular expression that is used to match User-Agent request headers. If a User-Agent request header is matched, the request is allowed. |
deny | array of string | No | - | The regular expression that is used to match User-Agent request headers. If a User-Agent request header is matched, the request is blocked. |
blocked_code | number | No | 403 | The HTTP status code that is returned if a request is blocked. |
blocked_message | string | No | - | The HTTP response body that is returned if a request is blocked. |
If the allow
and deny
fields are not configured, the default logic to identify crawlers is executed. You can configure the allow
field to allow requests that hit the default crawler identification logic. You can configure the deny
field to add additional crawler identification logic.
Configuration examples
Allow requests that hit the default crawler identification logic
allow:
- ".*Go-http-client.*"
If you do not configure the allow field, the Golang web library request is considered as a crawler, and the request is blocked.
Add crawler identification logic
deny:
- "spd-tools.*"
The following requests are blocked based on the preceding configuration:
curl http://example.com -H 'User-Agent: spd-tools/1.1'
curl http://exmaple.com -H 'User-Agent: spd-tools'
Enable security protection rules for specific routes or domain names
# Use the _rules_ field to configure fine-grained rules.
_rules_:
# Rule 1: match by route name
- _match_route_:
- route-a
- route-b
# Rule 2: match by domain name
- _match_domain_:
- "*.example.com"
- test.com
allow:
- ".*Go-http-client.*"
In this example,
route-a
androute-b
specified in_match_route_
are the route names that you configure when you create the gateway routes. If one of the two routes is matched, this rule takes effect.In this example,
*.example.com
andtest.com
specified in_match_domain_
are used to match domain names of requests. If one of the domain names is matched, this rule takes effect.Rules in
_rules_
take effect based on their sequences. If the first rule is matched, subsequent rules are ignored.