Using NGINX as an HTTPS forward proxy server
Introduction: NGINX is mainly designed as a reverse proxy server, but with the development of NGINX, it can also be used as one of the forward proxy options. The forward proxy itself is not complicated, and how to proxy encrypted HTTPS traffic is the main problem that the forward proxy needs to solve. This article will introduce two schemes using NGINX to forward proxy HTTPS traffic, their usage scenarios and main problems.
Classification of HTTP/HTTPS Forward Proxy
Briefly introduce the classification of forward agents as background knowledge for understanding the following:
Classification by client perception
Common proxy: The client needs to manually set the proxy address and port in the browser or system environment variables. Such as squid, specify the squid server IP and port 3128 on the client side.
Transparent proxy: The client does not need to do any proxy settings, and the role of "proxy" is transparent to the client. Such as a Web Gateway device in a corporate network link.
Classification by whether the proxy decrypts HTTPS
Tunnel proxy: that is, a transparent proxy. The proxy server only transparently transmits HTTPS traffic on the TCP protocol, and does not decrypt or perceive the specific content of the traffic it proxies. The client and the destination server it accesses do direct TLS/SSL interaction. The NGINX proxy approach discussed in this article falls into this pattern.
Man-in-the-Middle (MITM) proxy: The proxy server decrypts HTTPS traffic, completes the TLS/SSL handshake with the client using a self-signed certificate, and completes normal TLS interaction with the destination server. Two TLS/SSL sessions are established in the client-agent-server link. Such as Charles, a simple principle description can refer to the article.
Note: In this case, the client actually gets the proxy server's own self-signed certificate during the TLS handshake phase. The verification of the certificate chain is unsuccessful by default, and the client needs to trust the Root CA certificate of the proxy self-signed certificate. So the process is felt by the client. If you want to make an insensitive transparent proxy, you need to push the self-signed Root CA certificate to the client, which is achievable in the internal environment of the enterprise.
Why do forward proxies need special handling for HTTPS traffic?
When acting as a reverse proxy, the proxy server usually terminates HTTPS encrypted traffic before forwarding it to the backend instance. The encryption, decryption and authentication process of HTTPS traffic occurs between the client and the reverse proxy server.
As a forward proxy, when processing the traffic sent by the client, HTTP encryption is encapsulated in TLS/SSL, and the proxy server cannot see the domain name that the client wants to access in the request URL, as shown in the following figure. Therefore, proxying HTTPS traffic requires some special processing compared to HTTP.
NGINX Solutions
According to the classification method above, the way NGINX solves the HTTPS proxy belongs to the transparent transmission (tunnel) mode, that is, it does not decrypt and does not perceive the upper layer traffic. There are two types of solutions, 7-layer and 4-layer, as follows.
HTTP CONNECT Tunnel (Layer 7 Solution)
History background
As early as 1998, in the SSL era when TLS was not officially born, Netscape, which dominated the SSL protocol, proposed INTERNET-DRAFT about using web proxy to tunnel SSL traffic. The core idea is to use the HTTP CONNECT request to establish an HTTP CONNECT Tunnel between the client and the proxy. In the CONNECT request, you need to specify the destination host and port that the client needs to access. The original image in Draft is as follows:
The whole process can refer to the diagram in the HTTP Definitive Guide:
1: The client sends an HTTP CONNECT request to the proxy server.
2: The proxy server uses the host and port in the HTTP CONNECT request to establish a TCP connection with the destination server.
3: The proxy server returns an HTTP 200 response to the client.
4: The client and the proxy server establish an HTTP CONNECT tunnel. After the HTTPS traffic reaches the proxy server, it is directly transparently transmitted to the remote destination server through TCP. The role of the proxy server is to transparently transmit HTTPS traffic and does not need to decrypt HTTPS.
NGINX ngx_http_proxy_connect_module module
As a reverse proxy server, NGINX has not officially supported the HTTP CONNECT method. However, based on the modularity and scalability of NGINX, Ali's @chobits provides the ngx_http_proxy_connect_module module to support the HTTP CONNECT method, so that NGINX can be extended to a forward proxy.
Environment construction
Take the environment of CentOS 7 as an example.
1) Install
For the newly installed environment, refer to the normal installation steps and the steps to install this module (https://github.com/chobits/ngx_http_proxy_connect_module), after marking the patch of the corresponding version, add the parameters when configuring.
NGINX stream (4-layer solution)
Since the method of transparent transmission of upper-layer traffic is used, can it be made into a "4-layer proxy" to achieve complete transparent transmission of protocols above TCP/UDP? The answer is yes. NGINX officially supports the ngx_stream_core_module module since version 1.9.0. The module is not built by default. It needs to be enabled by adding the --with-stream option when configuring.
question
Proxying HTTPS traffic at the TCP level with NGINX stream will definitely encounter the problem mentioned at the beginning of this article: the proxy server cannot obtain the destination domain name that the client wants to access. Because the information obtained at the TCP level is limited to the IP and port levels, there is no chance to obtain domain name information. To get the destination domain name, you must have the ability to dismantle upper-layer packets to obtain domain name information. Therefore, the NGINX stream method is not strictly a 4-layer proxy, but requires some upper-layer capabilities.
ngx_stream_ssl_preread_module module
To obtain the domain name accessed by HTTPS traffic without decryption, only the extended address SNI (Server Name Indication) in the first Client Hello message of the TLS/SSL handshake can be used. Since version 1.11.5, NGINX officially supports the use of the ngx_stream_ssl_preread_module module to obtain this capability. The module is mainly used to obtain the SNI and ALPN information in the Client Hello message. For a layer 4 forward proxy, the ability to extract the SNI from the Client Hello message is critical, otherwise the NGINX stream solution will not work. At the same time, this also brings a limitation, requiring all clients to bring the SNI field in the TLS/SSL handshake, otherwise the NGINX stream proxy has no way of knowing the destination domain name that the client needs to access.
common problem
1) The client manually sets the proxy, resulting in unsuccessful access
Layer 4 forward proxy transparently transmits upper-layer HTTPS traffic, and does not require HTTP CONNECT to establish a tunnel, which means that the client does not need to set up an HTTP(S) proxy. If we manually set the HTTP(s) proxy on the client, can the access be successful? We can use curl -x to set the proxy to test the forward server access and see the results:
# curl https://www.baidu.com -svo /dev/null -x 39.105.196.164:443
* About to connect() to proxy 39.105.196.164 port 443 (#0)
* Trying 39.105.196.164...
* Connected to 39.105.196.164 (39.105.196.164) port 443 (#0)
* Establish HTTP proxy tunnel to www.baidu.com:443
> CONNECT www.baidu.com:443 HTTP/1.1
> Host: www.baidu.com:443
> User-Agent: curl/7.29.0
> Proxy-Connection: Keep-Alive
>
* Proxy CONNECT aborted
* Connection #0 to host 39.105.196.164 left intact
It can be seen that the client tries to establish an HTTP CONNECT tunnel before forwarding NGINX, but because NGINX is transparent transmission, the CONNECT request is directly forwarded to the destination server. The destination server does not accept the CONNECT method, so eventually "Proxy CONNECT aborted" appears, resulting in unsuccessful access.
2) The client does not have SNI and the access is unsuccessful
As mentioned above, one of the key factors for using NGINX stream as a forward proxy is to use ngx_stream_ssl_preread_module to extract the SNI field in Client Hello. If the client client does not carry the SNI field, the proxy server cannot know the destination domain name, resulting in unsuccessful access.
Classification of HTTP/HTTPS Forward Proxy
Briefly introduce the classification of forward agents as background knowledge for understanding the following:
Classification by client perception
Common proxy: The client needs to manually set the proxy address and port in the browser or system environment variables. Such as squid, specify the squid server IP and port 3128 on the client side.
Transparent proxy: The client does not need to do any proxy settings, and the role of "proxy" is transparent to the client. Such as a Web Gateway device in a corporate network link.
Classification by whether the proxy decrypts HTTPS
Tunnel proxy: that is, a transparent proxy. The proxy server only transparently transmits HTTPS traffic on the TCP protocol, and does not decrypt or perceive the specific content of the traffic it proxies. The client and the destination server it accesses do direct TLS/SSL interaction. The NGINX proxy approach discussed in this article falls into this pattern.
Man-in-the-Middle (MITM) proxy: The proxy server decrypts HTTPS traffic, completes the TLS/SSL handshake with the client using a self-signed certificate, and completes normal TLS interaction with the destination server. Two TLS/SSL sessions are established in the client-agent-server link. Such as Charles, a simple principle description can refer to the article.
Note: In this case, the client actually gets the proxy server's own self-signed certificate during the TLS handshake phase. The verification of the certificate chain is unsuccessful by default, and the client needs to trust the Root CA certificate of the proxy self-signed certificate. So the process is felt by the client. If you want to make an insensitive transparent proxy, you need to push the self-signed Root CA certificate to the client, which is achievable in the internal environment of the enterprise.
Why do forward proxies need special handling for HTTPS traffic?
When acting as a reverse proxy, the proxy server usually terminates HTTPS encrypted traffic before forwarding it to the backend instance. The encryption, decryption and authentication process of HTTPS traffic occurs between the client and the reverse proxy server.
As a forward proxy, when processing the traffic sent by the client, HTTP encryption is encapsulated in TLS/SSL, and the proxy server cannot see the domain name that the client wants to access in the request URL, as shown in the following figure. Therefore, proxying HTTPS traffic requires some special processing compared to HTTP.
NGINX Solutions
According to the classification method above, the way NGINX solves the HTTPS proxy belongs to the transparent transmission (tunnel) mode, that is, it does not decrypt and does not perceive the upper layer traffic. There are two types of solutions, 7-layer and 4-layer, as follows.
HTTP CONNECT Tunnel (Layer 7 Solution)
History background
As early as 1998, in the SSL era when TLS was not officially born, Netscape, which dominated the SSL protocol, proposed INTERNET-DRAFT about using web proxy to tunnel SSL traffic. The core idea is to use the HTTP CONNECT request to establish an HTTP CONNECT Tunnel between the client and the proxy. In the CONNECT request, you need to specify the destination host and port that the client needs to access. The original image in Draft is as follows:
The whole process can refer to the diagram in the HTTP Definitive Guide:
1: The client sends an HTTP CONNECT request to the proxy server.
2: The proxy server uses the host and port in the HTTP CONNECT request to establish a TCP connection with the destination server.
3: The proxy server returns an HTTP 200 response to the client.
4: The client and the proxy server establish an HTTP CONNECT tunnel. After the HTTPS traffic reaches the proxy server, it is directly transparently transmitted to the remote destination server through TCP. The role of the proxy server is to transparently transmit HTTPS traffic and does not need to decrypt HTTPS.
NGINX ngx_http_proxy_connect_module module
As a reverse proxy server, NGINX has not officially supported the HTTP CONNECT method. However, based on the modularity and scalability of NGINX, Ali's @chobits provides the ngx_http_proxy_connect_module module to support the HTTP CONNECT method, so that NGINX can be extended to a forward proxy.
Environment construction
Take the environment of CentOS 7 as an example.
1) Install
For the newly installed environment, refer to the normal installation steps and the steps to install this module (https://github.com/chobits/ngx_http_proxy_connect_module), after marking the patch of the corresponding version, add the parameters when configuring.
NGINX stream (4-layer solution)
Since the method of transparent transmission of upper-layer traffic is used, can it be made into a "4-layer proxy" to achieve complete transparent transmission of protocols above TCP/UDP? The answer is yes. NGINX officially supports the ngx_stream_core_module module since version 1.9.0. The module is not built by default. It needs to be enabled by adding the --with-stream option when configuring.
question
Proxying HTTPS traffic at the TCP level with NGINX stream will definitely encounter the problem mentioned at the beginning of this article: the proxy server cannot obtain the destination domain name that the client wants to access. Because the information obtained at the TCP level is limited to the IP and port levels, there is no chance to obtain domain name information. To get the destination domain name, you must have the ability to dismantle upper-layer packets to obtain domain name information. Therefore, the NGINX stream method is not strictly a 4-layer proxy, but requires some upper-layer capabilities.
ngx_stream_ssl_preread_module module
To obtain the domain name accessed by HTTPS traffic without decryption, only the extended address SNI (Server Name Indication) in the first Client Hello message of the TLS/SSL handshake can be used. Since version 1.11.5, NGINX officially supports the use of the ngx_stream_ssl_preread_module module to obtain this capability. The module is mainly used to obtain the SNI and ALPN information in the Client Hello message. For a layer 4 forward proxy, the ability to extract the SNI from the Client Hello message is critical, otherwise the NGINX stream solution will not work. At the same time, this also brings a limitation, requiring all clients to bring the SNI field in the TLS/SSL handshake, otherwise the NGINX stream proxy has no way of knowing the destination domain name that the client needs to access.
common problem
1) The client manually sets the proxy, resulting in unsuccessful access
Layer 4 forward proxy transparently transmits upper-layer HTTPS traffic, and does not require HTTP CONNECT to establish a tunnel, which means that the client does not need to set up an HTTP(S) proxy. If we manually set the HTTP(s) proxy on the client, can the access be successful? We can use curl -x to set the proxy to test the forward server access and see the results:
# curl https://www.baidu.com -svo /dev/null -x 39.105.196.164:443
* About to connect() to proxy 39.105.196.164 port 443 (#0)
* Trying 39.105.196.164...
* Connected to 39.105.196.164 (39.105.196.164) port 443 (#0)
* Establish HTTP proxy tunnel to www.baidu.com:443
> CONNECT www.baidu.com:443 HTTP/1.1
> Host: www.baidu.com:443
> User-Agent: curl/7.29.0
> Proxy-Connection: Keep-Alive
>
* Proxy CONNECT aborted
* Connection #0 to host 39.105.196.164 left intact
It can be seen that the client tries to establish an HTTP CONNECT tunnel before forwarding NGINX, but because NGINX is transparent transmission, the CONNECT request is directly forwarded to the destination server. The destination server does not accept the CONNECT method, so eventually "Proxy CONNECT aborted" appears, resulting in unsuccessful access.
2) The client does not have SNI and the access is unsuccessful
As mentioned above, one of the key factors for using NGINX stream as a forward proxy is to use ngx_stream_ssl_preread_module to extract the SNI field in Client Hello. If the client client does not carry the SNI field, the proxy server cannot know the destination domain name, resulting in unsuccessful access.
Related Articles
-
A detailed explanation of Hadoop core architecture HDFS
Knowledge Base Team
-
What Does IOT Mean
Knowledge Base Team
-
6 Optional Technologies for Data Storage
Knowledge Base Team
-
What Is Blockchain Technology
Knowledge Base Team
Explore More Special Offers
-
Short Message Service(SMS) & Mail Service
50,000 email package starts as low as USD 1.99, 120 short messages start at only USD 1.00