The bot-detect plug-in is used to identify web crawlers and prevent web crawlers from crawling websites. This topic describes how to configure the bot-detect plug-in.
Plug-in type
Security protection plug-in.
Fields
Name | Data type | Required | Default value | Description |
allow | array of string | No | - | The regular expression that is used to match User-Agent request headers. If a User-Agent request header is matched, the request is allowed. |
deny | array of string | No | - | The regular expression that is used to match User-Agent request headers. If a User-Agent request header is matched, the request is blocked. |
blocked_code | number | No | 403 | The HTTP status code that is returned if a request is blocked. |
blocked_message | string | No | - | The HTTP response body that is returned if a request is blocked. |
If the allow
and deny
fields are not configured, the default logic to identify crawlers is executed. You can configure the allow
field to allow requests that hit the default crawler identification logic. You can configure the deny
field to add additional crawler identification logic.
Configuration examples
Allow requests that hit the default crawler identification logic
allow:
- ".*Go-http-client.*"
If you do not configure the allow field, the Golang web library request is considered as a crawler, and the request is blocked.
Add crawler identification logic
deny:
- "spd-tools.*"
The following requests are blocked based on the preceding configuration:
curl http://example.com -H 'User-Agent: spd-tools/1.1'
curl http://exmaple.com -H 'User-Agent: spd-tools'
Enable security protection rules for specific routes or domain names
Save the route-a
and route-b
routes with empty configuration data.
Apply the following plug-in configurations to the *.example.com
and test.com
domain names:
allow:
- ".*Go-http-client.*"
The
route-a
androute-b
routes are those specified when the gateway routes are created. If a client request matches one of the routes, the built-in default rules of the matched route are used to identify web crawlers.The
*.example.com
andtest.com
domain names are used to match domain names in requests. If a client request matches one of the domain names, the rules that are configured for the matched domain name take effect.Rules that you configure take effect in sequence. If the first rule is matched, subsequent rules are ignored.