Configure the bot-detect plug-in to identify and prevent web crawlers - Microservices Engine

Automated crawlers and scraping tools generate unwanted traffic that can degrade API performance and expose sensitive data. The bot-detect plug-in inspects the User-Agent header of incoming requests, matches it against built-in crawler patterns, and blocks identified bots. You can customize the detection by allowing specific user agents or adding your own blocking rules.

Plug-in type: Security protection

How it works

When enabled on a route or domain, bot-detect evaluates each incoming request as follows:

The plug-in reads the User-Agent header from the request and checks it against the built-in default crawler detection rules. These rules match common bot signatures such as known spider names, scraper tools, and automated HTTP libraries (including Go HTTP client).
If you configured an allow list, the plug-in checks whether the User-Agent matches any allow pattern. Matched requests bypass the default detection and pass through.
If you configured a deny list, the plug-in checks whether the User-Agent matches any deny pattern. Matched requests are blocked in addition to those caught by the default rules.
A blocked request receives an HTTP 403 status code by default. You can customize the status code and response body.

If neither allow nor deny is configured, only the built-in default crawler detection rules apply.

Note

Rules are evaluated in order. Once a request matches the first applicable rule, later rules are skipped.

Fields

Name	Data type	Required	Default value	Description
`allow`	array of string	No	-	Regular expressions to match `User-Agent` headers. Matched requests bypass the default crawler detection and pass through.
`deny`	array of string	No	-	Regular expressions to match `User-Agent` headers. Matched requests are blocked, in addition to those caught by the default rules.
`blocked_code`	number	No	403	HTTP status code returned when a request is blocked.
`blocked_message`	string	No	-	HTTP response body returned when a request is blocked.

Configuration examples

Allow a user agent blocked by default rules

The built-in crawler detection rules block Go HTTP client requests by default. To allow them, add a matching pattern to the allow field:

allow:
- ".*Go-http-client.*"

After you apply this configuration, requests with a User-Agent header containing Go-http-client pass through instead of being blocked.

Expected result: A request such as curl http://example.com -H 'User-Agent: Go-http-client/1.1' returns the normal response from the upstream service.

Block a custom user agent

To block requests from a tool not covered by the default rules, add a pattern to the deny field:

deny:
- "spd-tools.*"

Expected result: The following requests are blocked and receive a 403 response:

curl http://example.com -H 'User-Agent: spd-tools/1.1'
# HTTP/1.1 403 Forbidden

curl http://exmaple.com -H 'User-Agent: spd-tools'
# HTTP/1.1 403 Forbidden

Customize the blocked response

To return a custom status code and message when a request is blocked:

deny:
- "spd-tools.*"
blocked_code: 444
blocked_message: "Request rejected by bot detection"

Expected result: Blocked requests receive an HTTP 444 response with the body Request rejected by bot detection.

Apply bot detection to specific routes or domains

You can scope bot detection to specific routes or domain names instead of applying it globally.

Step 1. Save the routes route-a and route-b with empty configuration data. Requests matching these routes use the built-in default crawler detection rules.

Step 2. Apply a custom configuration to specific domain names. For example, apply the following to *.example.com and test.com:

allow:
- ".*Go-http-client.*"

Note

route-a and route-b refer to routes defined when gateway routes are created. Matching requests use the built-in default rules.
*.example.com and test.com match domain names in incoming requests. Matching requests use the domain-level configuration.
Rules take effect in order. Once a request matches a rule, later rules are skipped.