Where did that request really come from?

There is a web application equivalent of the old the call is coming from inside the house trope. It happens when an application is telling me that a local, virtual IP address is suspect and I should really check it out, possibly ban it.

Obviously, what’s going on here is a misunderstanding. The behaviour in question is not actually coming from a local IP, it’s just getting forwarded from it and not informing the application correctly.

Applications behind a reverse proxy typically do not need to know the ‘real’ origin of requests because the reverse proxy will take the request from the client and hand it over to the application. Once a reply has been generated it will find it’s way back to the client by the same route in reverse order. The application does not communicate directly with the client.

The IP address of the client is useful in some regards, though, mainly as a measure against abuse. Thousands of login attempts come from the same IP address? Probably a brute force attempt. Thousands of comment come from the same IP address? Probably spam. Or a really popular TOR/VPN endpoint. Or both. In analytics it’s a simple way to tell whether two page views is one user browsing the site or two different visitors.

The way to identify the client’s IP address is generally to tell the reverse proxy to pass on the client address in some specially designated http headers. It is then up to the upstream application to make good use of that information. If the application is not programmed to look at those headers, no amount of header stuffing is going to help, as we will see shortly.

This post is going to be exclusively focused on issues on the server side that result from misunderstandings about where the request originated. These issues are easy to distinguish from the type of problems that result from bad or missing information about the connection between client and reverse proxy, things that often get addressed with headers like X-Forwarded-Host, X-Forwarded-Port, and X-Forwarded-Proto. The latter will typically result in issues with links being in the wrong protocol (http/https), the client being sent redirects to local ip addresses or even “localhost”, etc. It will be problems that are clearly visible to the client.

The types of problems that come from bad request origin information will be less transparent to the client. At most a message from the server saying I cannot login because I have tried too many times – despite not having tried even once!

In this post I’m going to look at various web applications and how they handle request origin information and what happens when things aren’t configured just right for that particular application. I will be using Nginx as my reverse proxy, though I’m sure the lessons learned can applied equally well to other reverse proxies.

In effect, this is at long last a follow-up to a post about proxy_pass I wrote in 2018 in which I wrote: “I have avoided the X-Forwarded-For header so far because it is not really relevant to the main purpose of [the theme of that post]”. As in 2018 the approach here will be experimental – let’s see what works and what doesn’t – only, I hope, slighty more competent and less blundering.

What headers are relevant for this purpose?

There is one official header that should be used by a proxy to indicate where the request came from, one de facto standard header and one that’s in some use but has virtually no documentation. They are in that order:

Forwarded: Forwarded should replace the unoffical headers but clearly hasn’t and doesn’t look it is going to anytime soon. It includes a lot more information than either of the others about the request that was made of the reverse proxy. Because Nginx cannot supply correctly structured values for the Forwarded header out of the box I’m not going to try to use it. Also I have never seen it recommended or used by default.
X-Forwarded-For: The “de-facto standard header for identifying the originating IP address of a client connecting to a web server through an HTTP proxy”, according to Mozilla. An X-Forwarded-For value is a comma-separated list that tells the story of the proxies the request has gone through. Each successive proxy should add the previous to the list. In my case, at least for now, there will only ever be a single IP address (no commas). Which makes it in practice similar to…
X-Real-IP: So unofficial that it doesn’t have a page on Mozilla’s site. In practice, the value is the single remote address seen by the last reverse proxy to have handled the request.

In effect: If there is an application that know how to use X-Forwarded-For that’s what I will feed it. If I need a single IP address for a hacky solution, I will use X-Real-IP because I don’t know how to say “pick the first one in the list” in PHP.

Just to be clear: These headers can be added at any kind of proxy but are only really useful with reverse proxies since with ordinary proxies the point is to obfuscate origin. Any VPN that used these headers to tell the destination what IP it was acting on behalf of would be a very bad VPN and a very soon-to-be-out-of-business VPN.

Setting headers in Nginx

I won’t go in to details about how to use and set proxy relevant headers in Nginx; for that I heartily recommend Justin Ellingwood’s article on the subject on Digital Ocean (and basically everything he wrote for them on Nginx). I will just note that I will be setting X-Forwarded-For and X-Real-IP in the following manner on my reverse proxy:

proxy_set_header    X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header    X-Real-IP $remote_addr;

Syntax in short: Nginx instruction, header name, Nginx variable. The last line is the simplest: Whatever IP the reverse proxy is currently getting a request from, that’s the value of the X-Real-IP header, when it forwards the request. The first one only differs in that it takes that same value and adds it to whatever value the X-Forwarded-For header already has. In Python parlance: The current remote address gets appended to the header. I could set it to $remote_addr instead, just like X-Real-IP, though that is not according to spec.

Enough theory, let’s look at what this all means in practice.

WordPress comment moderation or the case of the hardcoded remote address

Here is a comment awaiting moderation.

As is clearly evident, the comment moderation system is not getting the right IP address. WordPress runs in a docker container and the address shown is that of the WordPress docker network’s gateway and not Dopey’s actual IP address.

Let’s look at some headers to see if we can figure out why that might be:

    HOST: brokkr.net\r\n
    X-Forwarded-Host: brokkr.net\r\n
    X-Forwarded-Proto: https\r\n
    X-Forwarded-Port: 443\r\n
    X-Forwarded-For: 94.147.x.x\r\n

Nginx is clearly configured to set the whole suite of X-Forwarded-* headers and they all look correct and proper, though the only concern here is that X-Forwarded-For should be letting WordPress know which is the real IP address. So why is WordPress not using that information? Could it be that it prefers X-Real-IP? I will add that to my nginx config (in the server’s ‘/’ location context) like so:

proxy_set_header    X-Real-IP $remote_addr;

Note that using the nginx variable $remote_addr here is different from using the similarly named variable in a PHP application handling the request. The meaning is the same: the IP address on the other end of the connection. However, with an incoming request, the remote address from the application’s point of view is the reverse proxy whereas the remote address from the reverse proxy’s point of view is the client (or whatever proxy the client is using to connect).

I then check the headers anew to see if it’s working:

    HOST: brokkr.net\r\n                                                                                                                                                                                                                
    X-Forwarded-Host: brokkr.net\r\n                                                                                                                                                                                                    
    X-Forwarded-Proto: https\r\n
    X-Forwarded-Port: 443\r\n
    X-Forwarded-For: 94.147.x.x\r\n
    X-Real-IP: 94.147.x.x\r\n

It is. But writing a new comment shows me that WordPress still is not taking the hint. The IP address listed is still the docker network gateway.

It seems odd but as far as I can tell WordPress is simply hardcoded to use the PHP REMOTE_ADDR variable as the originating IP and ignore all the customary X-headers.

So how do I fix this? From what I have dug up there are various hacks. Basically, it boils down to a simple choice: Do I want to fake the remote address in WordPress or do I want to fake it in the web server? In the name of science I will try both approaches.

WordPress config hack

In the first approach I can add a line in the WordPress config file (wp-config.php) that resets the REMOTE_ADDR variable to use another value:

$_SERVER['REMOTE_ADDR'] = $_SERVER['HTTP_X_FORWARDED_FOR'];

Here it is set to the value of a variable that is clearly derived from the X-Forwarded-For header. And in my testing that does work.Still, I am sceptical that it will always work. Looking at my intercepted headers above, X-Forwarded-For is a single IP address value, same as X-Real-IP. As mentioned previously, the syntax, however, clearly says that it can be a comma separated list of IPs, where each succesive proxy is added to the original client. And using $proxy_add_x_forwarded_for as the value for X-Forwarded-For explicitly instructs Nginx to follow this syntax and rule. So I’m going to try using the guaranteed single value X-Real-IP header instead:

$_SERVER['REMOTE_ADDR'] = $_SERVER['HTTP_X_REAL_IP'];

And that also works (of course provided that I also set X-Real-IP in the reverse proxy).

Nginx RealIP Module

The other approach is a little more sophisticated but I don’t know if it’s objectively better by any meaningful metric. I can tell the web server to trust that the reverse proxy is telling me the truth about where the request originated – and then instruct it to use the reverse proxy’s information in the variables that it makes available to the PHP processor.

Using the realip module of Nginx (that is included in the Debian builds of Nginx), I add these two lines to my Nginx web server (in the server context, though it can also be set on a location basis):

    set_real_ip_from    10.0.5.1;
    real_ip_header      X-Real-IP;

The web server sees traffic from the reverse proxy on the gateway at 10.0.5.1. When requests come from there, web server Nginx will tell all who ask that the request really came from the value in the header named by the directive real_ip_header. In this case the X-Real-IP header, here set by reverse proxy Nginx.

And that also seems to work:

Using a WordPress plugin I can say that this does indeed work the way I expected. As far as WordPress knows, the REMOTE_ADDR is now 94.147.x.x. Not really sure which approach is better, though.

Header confusion

I also came across a suggestion that the reverse proxy could simply set the remote address that it could see (that of the client) as a header in the forwarded request.

proxy_set_header REMOTE_ADDR $remote_addr;

I see multiple things wrong with this.

First: I cannot find any documentation for a REMOTE_ADDR header. Second: REMOTE_ADDR is the name of a php variable (and in lowercase an Nginx variable) – maybe someone is confusing concepts? Third: I’m no PHP expert but the documentation that I have found indicate that the REMOTE_ADDR variable derive from the $_SERVER array:

$_SERVER is an array containing information such as headers, paths, and script locations. The entries in this array are created by the web server.
https://www.php.net/manual/en/reserved.variables.server.php

The web server in question here is not the reverse proxy but whatever software the reverse proxy is forwarding requests to – in my case that just happens to be another instance of Nginx. That instance will obviously determine for itself what the remote address is, rather than blindly trust-copy it from a header.

Fourth and finally: It doesn’t work. The reverse proxy absolutely lets me set the header. There are no hard and fast rules governing what headers are allowed; an application will just ignore any headers that it isn’t interested in. And WordPress certainly ignores this made up header. Quite rightly so.

To recap

When an application does not make use of X-Forwarded-For or X-Real-IP headers despite them being available and correct, I need to reset the value of REMOTE_ADDR, either locally in the application’s setup or have the web server “lie” to it.

Nextcloud brute force protection and logging

One good reason to pick the realip module based approach in the last example is that it can be applied when dealing with other applications. Or simply because the logic inherent in it – sure, we will use your X-Forwarded-For header but only if you tell us to trust it – can be recognised in other applications.

Take the official Nextcloud documentation on using a reverse proxy.

Set the trusted_proxies parameter… to define the servers Nextcloud should trust as proxies… A reverse proxy can define HTTP headers with the original client IP address, and Nextcloud can use those headers to retrieve that IP address.
https://docs.nextcloud.com/server/latest/admin_manual/configuration_server/reverse_proxy_configuration.html

Basically it is a replication of the nginx realip module solution from before, except it’s “in-house”, i.e. Nextcloud itself will act as the arbiter of trusting the X-* header, rather than having the web server do it. The logic, though, is the exact same. The Nextcloud documentation also makes it clear that the setup is needed for brute force protection and what can happen if you get it wrong:

If you are behind a reverse proxy or load balancer it is important you make sure it is setup properly… Otherwise it can happen that Nextcloud actually starts throttling all traffic coming from the reverse proxy or load balancer.
https://docs.nextcloud.com/server/latest/admin_manual/configuration_server/bruteforce_configuration.html

So does it work? I start from the point of delivering the X-Forwarded-For header to my Nextcloud instance (which is what Nextcloud says it looks for by default) but not having set trusted_proxies.

Nextcloud does not make it easy to check requests as increasing the log level to debug will give you everything but what I’m looking for will not be easy to spot in a huge JSON formatted file. There is however an official audit plugin (“app”) that generates a simple request overview log file, one line per authentication request, called audit.log in the data folder. The plugin is referred to as “admin_audit” in the official documentation. In the “Nextcloud app store” (is that really what we’re calling it?) the plugin can be found at the top of list in the “App bundles” section under Enterprise bundle as “Auditing / logging”.

I downloaded and enabled it (no need to pick the entire bundle), and set my log_level in config.php to 1.

  'loglevel' => 1,

Then I generated some traffic and had a look.

{"reqId":"VEw1JIFz5xt3MuUNPOOJ","level":1,"time":"2021-11-07T11:12:54+00:00","remoteAddr":"10.0.2.1","user":"admin","app":"admin_audit","method":"PROPFIND","url":"/remote.php/caldav/calendars/admin/it/","message":"Login successful: \"admin\"","userAgent":"Evolution/3.36.5","version":"21.0.4.1"}
{"reqId":"F2AumRUbnTzFQBf8VbIR","level":1,"time":"2021-11-07T11:12:54+00:00","remoteAddr":"10.0.2.1","user":"admin","app":"admin_audit","method":"PROPFIND","url":"/remote.php/caldav/calendars/admin/personal/","message":"Login successful: \"admin\"","userAgent":"Evolution/3.36.5","version":"21.0.4.1"}

Again the remoteAddr is the docker gateway. What if I set that gateway as a trusted proxy and make sure to pass on the X-Forwarded-For header from the reverse proxy? In config.php I set:

  'trusted_proxies' => 
  array (
    0 => '10.0.2.1',
  ),

I wasn’t sure that this would work – or at least show up in the logs – because the log says “remoteAddr”. Nextcloud could easily use the value of the X-Forwarded-For header for brute force protection internally while maintaining that the “remoteAddr” of the request is still the REMOTE_ADDR variable in the SERVER array. However, it does show up which is gratifying:

{"reqId":"EdTnjk8D0gund5VWvXmt","level":1,"time":"2021-11-07T11:36:33+00:00","remoteAddr":"94.147.x.x","user":"admin","app":"admin_audit","method":"REPORT","url":"/remote.php/dav/calendars/admin/contact_birthdays/","message":"Login successful: \"admin\"","userAgent":"Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.13.0","version":"21.0.4.1"}
{"reqId":"mSegvZYZClGHbzriVXh5","level":1,"time":"2021-11-07T11:36:33+00:00","remoteAddr":"94.147.x.x","user":"admin","app":"admin_audit","method":"REPORT","url":"/remote.php/dav/calendars/admin/personal/","message":"Login successful: \"admin\"","userAgent":"Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.13.0","version":"21.0.4.1"}

Again I have some reservations about using the array-based X-Forwarded-For but unlike with WordPress this is not a hack but the recommended setup and the default header. So I’m guessing that Nextcloud knows what to do if X-Forwarded-For does show up as an array, rather than a single value.

Thankfully, I have had very few issues with brute force attempts on my Nextcloud, and currently all is quiet on that front. Unlike the constant and persistent attempts to log in to my mail server, it appears that Nextcloud “hacking” is more “drive-by” by nature. So I can’t really see any direct benefit right now. I suspect though, that this fix is a requirement for using any of the the plugins that restrict login based on IP address (Geo-IP restriction plugins, whitelisting plugins, etc.)

Tiny Tiny RSS: What can you trust?

Looking at how Tiny Tiny RSS (or tt-rss for short, a popular selfhosted news reader) handles request origins does not add much to the discussion because it uses the same headers as we have seen so far and it doesn’t do anything interesting with them. It is, however, an interesting case study in the somewhat arbitrary nature of headers, variables and the question of what you can trust on the internet.

Awareness of anything other than REMOTE_ADDR in tt-rss started only earlier this year with a patch that added knowledge of various headers to a set of variables. The kick-off was someone on the tt-rss forums noting…

I have TT-RSS setup behind a reverse-proxy, and TT-RSS logs failed logins with the Docker internal IP address instead of the public one shared by the reverse-proxy Traefik… In more details, the headers X-Real-Ip isn’t used. Nor is RemoteAddr.
https://community.tt-rss.org/t/not-using-x-real-ip/4150

The user refers to RemoteAddr because their Traefik setup (quoted in the forum post) look like it’s set up to do something similar to the Nginx realip module hack from earlier. Not that I am a Traefik expert.

fox, the developer, acknowledges the issue and goes on to note that they would prefer to use X-Real-IP because…

X-Real-IP seems to be the standard for last client IP address (while X-Forwarded-For being a chain of IP addresses of which only the last one could be trusted – i would prefer not to bother with).
https://community.tt-rss.org/t/not-using-x-real-ip/4150

Which I guess is true if you assume that every reverse proxy in a line of proxies would overwrite the header with the last known-to-be-true value, the remote address that they had received the request from. My assumption, at that point, had been that you would want to preserve the knowledge of that first client but obviously I cannot trust that.

In the physical world, if someone handed me a letter to give to someone else, telling me it was from the queen herself, I would be a fool to lend my credibility to that claim. I would not know if that person was telling me the truth. Handing over the letter to the recipient, I should either say “This weirdo handed me this letter and claimed – would you believe it – that it’s from her royal majesty to you!” (X-Forwarded-For) or “I got this letter from this guy on the street. He says it’s for you.” (X-Real-IP).

I do not recall seeing the issue myself and I’m currently running a version of tt-rss that has the patch. So what if anything is the effect of setting X-Real-IP or X-Forwarded-For by itself? Well, it makes the correct IP address show up in the logs:

172.23.0.1 - - [07/Nov/2021:13:17:07 +0100] "GET /backend.php?op=pref_feeds&method=getfeedtree&mode=2 HTTP/1.1" 200 7425 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:94.0) Gecko/20100101 Firefox/94.0" "94.147.x.x"
172.23.0.1 - - [07/Nov/2021:13:17:07 +0100] "POST /backend.php HTTP/1.1" 200 20 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:94.0) Gecko/20100101 Firefox/94.0" "94.147.x.x"

The log format is a bit strange. The request is logged using REMOTE_ADDR but the correct IP is clearly noted at the end. The log as viewed in the web ui clearly shows what’s going on:

It’s also worth noting that tt-rss seemingly does nothing more with the info than log it. Either that or I’m unable to produce enough attempts manually to trigger it’s brute force protections. Maybe I’ll return to this armed with a bruteforcing tool just to see what happens. Or maybe I’ll just discover the limit’s of my VPS provider’s patience, who knows.

Conclusion

Setting either X-Forwarded-For or X-Real-IP is in itself no guaranteee that an application will understand that it’s behind a reverse proxy and act accordingly and intelligently. But it is a start and I cannot see a downside to setting both by default in my Nginx reverse proxy. It should however be supported by checking of logs and either application configuration or some local web server tricks.

I haven’t touched on how these headers impact various analytics solutions as I’m saving that for a later post where I’ll also touch on GDPR and cookie law stuff and hopefully bring my own house in order in that regard.

Another thing that I want to explore further is a question raised by both the Nextcloud documentation and a poster on the ttrss forums. Here’s the Nextcloud people:

This parameter [trusted_proxies] provides protection against client spoofing, and you should secure those servers as you would your Nextcloud server… Incorrectly setting [X-Forwarded-For] may allow clients to spoof their IP address as visible to Nextcloud, even when going through the trusted proxy!
https://docs.nextcloud.com/server/latest/admin_manual/configuration_server/reverse_proxy_configuration.html

The benefits to bad guys of spoofing their origin is not in question. It is an obvious way to evade brute force protections or hide one’s spammer nature. And while I have some notion, I do feel like I need to explore more exactly how to spoof and how to protect against spoofers. To be continued.

2022 addendum

I am currently in the process of switching password management to vaultwarden (a home user, selfhosted rust implementation of bitwarden) and I thought I would throw their way of doing things into the mix because it seems wonderfully simple and straightforward.

I’m running the vaultwarden docker image in a container. Settings are set from environment variables, “injected” either from an .env file or as settings in the docker-compose file. Here’s the relevant section:

## Client IP Header, used to identify the IP of the client, defaults to "X-Real-IP"
## Set to the string "none" (without quotes), to disable any headers and just use the remote IP
# IP_HEADER=X-Real-IP

As the comments clearly state, you can just pick whatever standard header you want – or even invent one yourself – or just use remote ip if you’re not using reverse proxies. And it’s a standard setting, not an obscure hack from StackOverflow. Seems like the way to go. Hopefully more applications adopt this.