How to Solve the Occasional 404 Error After Enabling HTTP/2 in Envoy?

This article explains how to address the occasional 404 error that occurs after enabling HTTP/2 in Envoy by exploring its causes and presenting several potential solutions.

Background

In most gateways implemented based on Envoy, there is a common issue: when HTTP/2 is enabled, clients may occasionally encounter a 404 error. It can be observed from the logs that for these 404 requests, the domain name in the :authority header does not match the domain name in the Server Name Indication (SNI).

This issue is particularly likely to occur when using a wildcard certificate and configuring routes for multiple domains.

Related community issues:

• https://github.com/envoyproxy/envoy/issues/6767
• https://github.com/istio/istio/issues/13589
• https://github.com/projectcontour/contour/issues/1493

Causes

Why the :authority header and SNI are inconsistent?

This issue is related to the client's connection reuse mechanism. For HTTP/2, the ability to multiplex connections is a core difference compared with HTTP/1. Especially for browser scenarios, maximizing connection reuse can significantly optimize page load times under TLS (without considering head-of-line blocking). In the HTTP/2 RFC specification, there is also the following description of connection reuse:

Connections that are made to an origin server, either directly or through a tunnel created using the CONNECT method (Section 8.3), MAY be reused for requests with multiple different URI authority components. A connection can be reused as long as the origin server is authoritative (Section 10.1). For TCP connections without TLS, this depends on the host having resolved to the same IP address.

Therefore, browsers like Chrome will reuse an HTTP/2 connection established for domain A to make requests for domain B under the following conditions:

Domain B resolves to the same IP address as domain A.
The certificate obtained when establishing communication with domain A has a wildcard Common Name (CN) that matches domain B, or domain B is listed in the Subject Alternative Names (SAN) of the certificate.

Once a request for domain B is sent over a connection established for domain A, the issue arises where the gateway logs show a mismatch between the SNI and the :authority header, as described above.

Why Does the 404 Error Occur?

In Envoy gateways, the common mapping method between SNI and domain name routing is one-to-one. This means that when matching to SNI A, only the routing configuration for domain A will be present, and there will be no routing for domain B. This results in the 404 error.

Specifically, the common way to organize Envoy configurations is that each SNI has its own independent filter chain, and the RDS configuration in the HCM configuration in this filter chain is also independent.

Solutions

Solution 1: Reuse the same filter chain for domains with the same certificate

Strictly speaking, this issue is not a bug in Envoy, but rather a result of improper configuration organization. It can be resolved by reusing the same filter chain for domains that share the same certificate.

However, this solution has two main drawbacks:

If the certificate for a domain is updated, the filter chain needs to be rebuilt, which can cause downstream connections to be interrupted.
It increases the complexity of the control plane. For example, when using the Gateway API, it becomes impossible to maintain a one-to-one mapping between the filter chain and the Listener in the Gateway.

Solution 2: Use HTTP 421 status code

A common method is to return a 421 status code by using a Lua filter, for example:

              "@type": "type.googleapis.com/envoy.extensions.filters.http.lua.v3.Lua"
              inlineCode: |
                function envoy_on_request(request_handle)
                  local streamInfo = request_handle:streamInfo()
                  if streamInfo:requestedServerName() ~= "" then
                    if (string.sub(streamInfo:requestedServerName(), 1, 2) == "*." and not string.find(request_handle:headers():get(":authority"), string.sub(streamInfo:requestedServerName(), 2))) then
                      request_handle:respond({[":status"] = "421"}, "Misdirected Request")
                    end
                    if (string.sub(streamInfo:requestedServerName(), 1, 2) ~= "*." and streamInfo:requestedServerName() ~= request_handle:headers():get(":authority")) then
                      request_handle:respond({[":status"] = "421"}, "Misdirected Request")
                    end
                  end
                end

This is also based on the recommendation in the HTTP/2 RFC:

In some deployments, reusing a connection for multiple origins can result in requests being directed to the wrong origin server. For example, TLS termination might be performed by a middlebox that uses the TLS Server Name Indication (SNI) [TLS-EXT] extension to select an origin server. This means that it is possible for clients to send confidential information to servers that might not be the intended target for the request, even though the server is otherwise authoritative. A server that does not wish clients to reuse connections can indicate that it is not authoritative for a request by sending a 421 (Misdirected Request) status code in response to the request (see Section 9.1.2).

This solution also has two main drawbacks:

It loses the benefit of connection reuse. For scenarios that rely on HTTP/2 connection reuse to optimize page loading, this might be unacceptable.
The HTTP 421 status code has compatibility issues with certain versions. In particular, in China, many Hybrid Android apps built on older versions of Chromium reuse connections across different domains but do not support reconnecting based on a 421 response. This can directly cause business errors.

Solution 3: Share route configurations among filter chains

1. Based on RDS (Route Discovery Service)

If all HTTPS filter chains share the same RDS, the issue can be solved. However, it may lead to an excessively large RDS resource, which would make it impossible to optimize incremental updates using solutions like delta xDS. In addition, any change in the RDS resource will modify the resource checksum, causing Envoy to reload the entire RDS configuration. This leads to significant CPU usage in the main thread due to the need to re-parse the configuration and regenerate data structures.

2. Based on VHDS (Virtual Host Discovery Service)

The current VHDS solution is on-demand based. If the domain name configuration cannot be found in the current request route, the configuration is pulled from the xDS server. This can cause data plane traffic to be forwarded to the control plane. A high volume of 404 requests can thereby stress the control plane, and the availability of the control plane directly affects the availability of the data plane.

3. Based on SRDS (Scoped Route Discovery Service)

Currently, Envoy supports routing configuration slicing based on specific headers. The original design was intended to route traffic differently based on cookies. This can be extended to support routing slices based on domain names. The key points for extension are:

Support for wildcard domains and prefix matching.
Different ports may have different logic for the same domain. For example, port 80 might enforce redirection.

Below is an example of how Higress extends ScopedRoutes in its configuration:

// [#next-free-field: 6]
message ScopedRoutes {
     option (udpa.annotations.versioning).previous_message_type =
        "envoy.config.filter.network.http_connection_manager.v2.ScopedRoutes";
...
...
      message HostValueExtractor {
        option (udpa.annotations.versioning).previous_message_type =
            "envoy.config.filter.network.http_connection_manager.v2.ScopedRoutes.ScopeKeyBuilder."
            "FragmentBuilder.HostValueExtractor";

        // The maximum number of host superset recomputes. If not specified, defaults to 100.
        google.protobuf.UInt32Value max_recompute_num = 1;
      }

      message LocalPortValueExtractor {
        option (udpa.annotations.versioning).previous_message_type =
            "envoy.config.filter.network.http_connection_manager.v2.ScopedRoutes.ScopeKeyBuilder."
            "FragmentBuilder.LocalPortValueExtractor";
      }


      oneof type {
        option (validate.required) = true;

        // Specifies how a header field's value should be extracted.
        HeaderValueExtractor header_value_extractor = 1;

        // Extract the fragemnt value from the :authority header, and support recompute with the wildcard domains,
        // i.e. ``www.example.com`` can be recomputed with ``*.example.com``, then ``*.com``, then ``*``.
        HostValueExtractor host_value_extractor = 101;

        // Extract the fragment value from local port of the connection.
        LocalPortValueExtractor local_port_value_extractor = 102;
      }
    }

    // The final(built) scope key consists of the ordered union of these fragments, which are compared in order with the
    // fragments of a :ref:`ScopedRouteConfiguration<envoy_v3_api_msg_config.route.v3.ScopedRouteConfiguration>`.
    // A missing fragment during comparison will make the key invalid, i.e., the computed key doesn't match any key.
    repeated FragmentBuilder fragments = 1 [(validate.rules).repeated = {min_items: 1}];
  }

Security Considerations for Shared Routing Configurations

When all filter chains share the same route configuration, different filter chains may have different authentication policies. For example, some may require client certificate authentication (mTLS), while others may use IP-based RBAC (Role-Based Access Control).

Exposing all routes to any filter chain indiscriminately is insecure.

A potential solution is to have the control plane identify this security risk. When it detects that a domain must be accessed only through a specific filter chain's authentication, it can implement corresponding protections.

For example, Higress introduces an allow_server_names configuration item for VirtualHosts. When mTLS is enabled, it can be configured to allow access only if the request contains a specific SNI.

// [#protodoc-title: HTTP route components]
// * Routing :ref:`architecture overview <arch_overview_http_routing>`
// * HTTP :ref:`router filter <config_http_filters_router>`

// The top level element in the routing configuration is a virtual host. Each virtual host has
// a logical name as well as a set of domains that get routed to it based on the incoming request's
// host header. This allows a single listener to service multiple top level domain path trees. Once
// a virtual host is selected based on the domain, the routes are processed in order to see which
// upstream cluster to route to or whether to perform a redirect.
// [#next-free-field: 24]
message VirtualHost {
  option (udpa.annotations.versioning).previous_message_type = "envoy.api.v2.route.VirtualHost";
...
...
  // If non-empty, a list of server names (such as SNI for the TLS protocol) is used to determine
  // whether this request is allowed to access this VirutalHost. If not allowed, 421 Misdirected Request will be returned.
  //
  // The server name can be matched whith wildcard domains, i.e. ``www.example.com`` can be matched with
  // ``www.example.com``, ``*.example.com`` and ``*.com``.
  //
  // Note that partial wildcards are not supported, and values like ``*w.example.com`` are invalid.
  //
  // This is useful when expose all virtual hosts to arbitrary HCM filters (such as using SRDS), and you want to make
  // mTLS-protected routes invisible to requests with different SNIs.
  //
  // .. attention::
  //
  //   See the :ref:`FAQ entry <faq_how_to_setup_sni>` on how to configure SNI for more
  //   information.
  repeated string allow_server_names = 101;
}

Is There a Security Risk?

Traditional HTTP proxy software like Apache HTTPD does not support routing when the :authority and SNI are inconsistent. Nginx, however, is one of the earliest gateways to implement this feature. Some people have raised issues in the Nginx community, arguing that this poses a security risk: https://trac.nginx.org/nginx/ticket/1694

This is the response from the Nginx maintainers at the time, which clearly states that there is no security risk:

In theory, you are right. SNI was designed to be used with the only one name, and requesting different names over a connection which uses SNI is not correct. QuotingRFC 6066:If the server_name is established in the TLS session handshake, the client SHOULD NOT attempt to request a different server name at the application layer.But in practice, SPDY introduced so-called "connection reuse", which effectively uses a connection with an established SNI for request to different application-level names. And it is followed byHTTP/2 connection reuse, which does the same: a HTTP/2 client can request a different host over an already established connection.The 421 (Misdirected Request) status code, also introduced by HTTP/2 RFC, is expected to be used only when such a connection reuse is not possible due to server limitations. In nginx, 421 is returned when a client tries to request a server protected with client SSL certificates over a connection established to a different server.

Technology is constantly evolving. When RFC 6066 was established, technologies like HTTP/2 multiplexing did not exist. Therefore, it was considered incorrect for a client to send requests for different domains over a connection using SNI.

However, with the development of web technologies, front-end pages are now carrying richer content and making more concurrent requests. This has led to higher demands for performance and faster response times, both for clients and servers. As a result, SPDY and HTTP/2 emerged to meet these needs. Faced with the high cost of TLS connections, connection reuse should be maximized whenever possible. If a server has been authenticated through an HTTPS certificate to handle requests for other domains, then sending requests for different domains over the same connection is entirely secure.

Gateways should naturally accommodate such reasonable client demands. From a user experience perspective, connection reuse can improve page rendering and API access speeds. From a broader perspective, it also simplifies the transmission of information in a secure manner, making it more energy-efficient and environmentally friendly.

Community

How to Solve the Occasional 404 Error After Enabling HTTP/2 in Envoy?

Background

Causes

Why the :authority header and SNI are inconsistent?

Why Does the 404 Error Occur?

Solutions

Solution 1: Reuse the same filter chain for domains with the same certificate

Solution 2: Use HTTP 421 status code

Solution 3: Share route configurations among filter chains

1. Based on RDS (Route Discovery Service)

2. Based on VHDS (Virtual Host Discovery Service)

3. Based on SRDS (Scoped Route Discovery Service)

Security Considerations for Shared Routing Configurations

Is There a Security Risk?

Read previous post:

Read next post:

Alibaba Cloud Native Community

You may also like

Comments

Alibaba Cloud Native Community

Related Products

Cloud-Native Applications Management Solution

HTTPDNS

Cloud Migration Solution

WHOIS