Redirecting traffic from an nginx reverse proxy to a docker container I needed to add some forwarding information to the http headers – and check that it had been added. Enter tshark (cue the ominous cellos), the command line version of Wireshark.
Wire-/tshark are general purpose packet analyzers so the challenge here is to avoid casting a too wide net: I don’t want all the network traffic on my host, just the http headers and just those coming in and out of one particular container.
My first idea was to put tshark inside of the container so as to be able to inspect the requests as they arrived. This turned out be wholly unnecessary: Docker networks interfaces do not exist in isolation of the host, they’re right there on the host as anybody who has played with docker networking on a GNOME desktop environment can attest (the network section of the settings widget becomes a bit crowded). This means that packets arriving at the bridge of the subnet the container lives in can be inspected from the host.
First I will zoom in on the particular docker bridge interface of interest so as to exclude traffic from other containers. How do I know which of the many network interfaces to focus on? Not particularly elegant but to summarize: 1) check the docker network to see which local ip address it’s gateway is using, 2) find that ip address in a list of netowrk interfaces. In other words:
docker network inspect name_of_docker_network
# find or grep the line that says "Gateway" and note the ip
ip addr | grep ip_address_from_above
# note the interface name at the end of the line
Then I ask tshark to show me traffic on that interface:
tshark -i br-0349e1f24c7a
This device services a subnet consisting of three containers – a web server, a php processor and a database – so there’s plenty of chatter. A lot of tcp handshakes, a lot of MySQL queries, etc.
66 9.131120165 10.0.5.4 → 10.0.5.2 TCP 1058 45666 → 9000 [PSH, ACK] Seq=1 Ack=1 Win=29312 Len=992 TSval=1393167508 TSecr=462643214
67 9.131140843 10.0.5.2 → 10.0.5.4 TCP 66 9000 → 45666 [ACK] Seq=1 Ack=993 Win=30976 Len=0 TSval=462643214 TSecr=1393167508
68 9.149405416 10.0.5.2 → 10.0.5.3 TCP 74 54586 → 3306 [SYN] Seq=0 Win=29200 Len=0 MSS=1460 SACK_PERM=1 TSval=3422045482 TSecr=0 WS=128
69 9.149453570 10.0.5.3 → 10.0.5.2 TCP 74 3306 → 54586 [SYN, ACK] Seq=0 Ack=1 Win=28960 Len=0 MSS=1460 SACK_PERM=1 TSval=793168209 TSecr=3422045482 WS=128
70 9.149477587 10.0.5.2 → 10.0.5.3 TCP 66 54586 → 3306 [ACK] Seq=1 Ack=1 Win=29312 Len=0 TSval=3422045482 TSecr=793168209
71 9.149666681 10.0.5.3 → 10.0.5.2 MySQL 180 Server Greeting proto=10 version=5.5.5-10.3.8-MariaDB-1:10.3.8+maria~bionic
72 9.149696241 10.0.5.2 → 10.0.5.3 TCP 66 54586 → 3306 [ACK] Seq=1 Ack=115 Win=29312 Len=0 TSval=3422045482 TSecr=793168209
73 9.149745278 10.0.5.2 → 10.0.5.3 MySQL 176 Login Request user=wp_user db=
74 9.149763702 10.0.5.3 → 10.0.5.2 TCP 66 3306 → 54586 [ACK] Seq=115 Ack=111 Win=29056 Len=0 TSval=793168209 TSecr=3422045482
Narrowing it down somewhat I tell tshark to focus on http traffic and nothing else. This also causes tshark to be a lot more detailed about the packets (similar to asking for verbose mode):
tshark -i br-0349e1f24c7a -O http
Now I do get to see the occasional detailed http headers because of the verbose mode but I’m also still seeing the rest of the traffic. Looking at the ports a lot of it seems to be coming from the PHP FPM (port 9000) and MariaDB (port 3306) containers. Let’s cut that out:
tshark -i br-0349e1f24c7a -O http -f "tcp port 80 or tcp port 443"
“or tcp port 443” is not really relevant or useful here – encryption is applied outside of the docker container and if I was using encryption all the way into the docker container I wouldn’t be able to study the headers. Keeping it helps me remember the syntax of what tshark terms capture filters and it does no harm.
The capture filter clears things up a lot. However I’m still getting the entire and detailed tcp three way handshake: A packet with the SYN flag set from the client, followed by an ACK packet from the server, followed by ACK from the client:
Frame 1: 74 bytes on wire (592 bits), 74 bytes captured (592 bits) on interface 0
Ethernet II, Src: 02:42:60:54:6a:1c (02:42:60:54:6a:1c), Dst: 02:42:0a:00:05:04 (02:42:0a:00:05:04)
Internet Protocol Version 4, Src: 10.0.5.1, Dst: 10.0.5.4
Transmission Control Protocol, Src Port: 34412, Dst Port: 80, Seq: 0, Len: 0
Frame 2: 74 bytes on wire (592 bits), 74 bytes captured (592 bits) on interface 0
Ethernet II, Src: 02:42:0a:00:05:04 (02:42:0a:00:05:04), Dst: 02:42:60:54:6a:1c (02:42:60:54:6a:1c)
Internet Protocol Version 4, Src: 10.0.5.4, Dst: 10.0.5.1
Transmission Control Protocol, Src Port: 80, Dst Port: 34412, Seq: 0, Ack: 1, Len: 0
Frame 3: 66 bytes on wire (528 bits), 66 bytes captured (528 bits) on interface 0
Ethernet II, Src: 02:42:60:54:6a:1c (02:42:60:54:6a:1c), Dst: 02:42:0a:00:05:04 (02:42:0a:00:05:04)
Internet Protocol Version 4, Src: 10.0.5.1, Dst: 10.0.5.4
Transmission Control Protocol, Src Port: 34412, Dst Port: 80, Seq: 1, Ack: 1, Len: 0
It does make the headers a bit harder to find so I would rather leave these out. tshark has another filtering capability that is more focused on appearance than network fundamentals like ports and protocols called “display filters” (as opposed to “capture filters”). Think grep and suchlike. I add these using the -Y flag (older tutorials use -R but this seems to have changed somewhat).
tshark -i br-0349e1f24c7a -O http -f "tcp port 80 or tcp port 443" -Y "http.request || http.response"
“||” will be recognisable to Bash (and probably many other scripting languages) fans as an or-test, so the frame should match either http.request or http.response.
Now I’m getting somewhere. Of course, whenever the server actually delivers the goods, packet fill up with data. A lot of (gzipped) data. Which looks like this:
Frame 6: 5699 bytes on wire (45592 bits), 5699 bytes captured (45592 bits) on interface 0
Ethernet II, Src: 02:42:0a:00:05:04 (02:42:0a:00:05:04), Dst: 02:42:60:54:6a:1c (02:42:60:54:6a:1c)
Internet Protocol Version 4, Src: 10.0.5.4, Dst: 10.0.5.1
Transmission Control Protocol, Src Port: 80, Dst Port: 35102, Seq: 1, Ack: 591, Len: 5633
Hypertext Transfer Protocol
HTTP/1.1 200 OK\r\n
[Expert Info (Chat/Sequence): HTTP/1.1 200 OK\r\n]
[HTTP/1.1 200 OK\r\n]
[Severity level: Chat]
[Group: Sequence]
Request Version: HTTP/1.1
Status Code: 200
[Status Code Description: OK]
Response Phrase: OK
Server: nginx/1.14.0\r\n
Date: Sun, 02 Sep 2018 11:17:15 GMT\r\n
Content-Type: text/html; charset=UTF-8\r\n
Transfer-Encoding: chunked\r\n
Connection: close\r\n
Vary: Accept-Encoding\r\n
X-Powered-By: PHP/7.2.8\r\n
X-Pingback: https://tgt.madsmi.de/xmlrpc.php\r\n
Link: <https://tgt.madsmi.de/wp-json/>; rel="https://api.w.org/"\r\n
Link: <https://tgt.madsmi.de/?p=1>; rel=shortlink\r\n
Content-Encoding: gzip\r\n
\r\n
[HTTP response 1/1]
[Time since request: 0.193809688 seconds]
[Request in frame: 4]
HTTP chunked response
Data chunk (5220 octets)
Chunk size: 5220 octets
Data (5220 bytes)
0000 1f 8b 08 00 00 00 00 00 00 03 d5 3c d9 72 db 46 ...........<.r.F
0010 b6 cf d6 57 b4 e0 8a 48 c6 04 40 52 0b b5 90 74 ...W...H..@R...t
0020 39 96 3d e3 ca 32 53 96 33 a9 29 cb e5 6a 00 4d 9.=..2S.3.)..j.M
0030 b0 45 6c 01 40 4a 1c 5b 9f 73 f3 0d f3 ee 1f bb .El.@J.[.s......
0040 e7 f4 82 8d 14 a5 c4 ca bd 33 8e 44 a2 bb 4f 9f .........3.D..O.
0050 3e 5b 9f a5 1b ca 68 f7 fc 6f 2f df fd f3 ef af >[....h..o/.....
0060 c8 2c 0f 83 c9 ce 08 bf 48 40 23 7f 6c 78 d4 3c .,......H@#.lx.<
0070 ff de 20 6e 40 b3 6c 6c 44 b1 79 95 19 08 c1 a8 .. n@.llD.y.....
The data are part of the packets I want to inspect but it’s not actually of any interest – especially with the compression which renders it incomprehensible (image data would obviously be incomprehensible even without compression) I don’t know of any way for tshark to filter it out so I resort to using some regex magic with grep:
tshark -i br-0349e1f24c7a -O http -f "tcp port 80 or tcp port 443" -Y "http.request || http.response" | grep -v -E ^[0-9a-f]{4}
The -v flag inverts the capture so grep shows me everything but what the regex matches. The -E flag is for extended regular expressions. What are extended regular expressions?
The Extended Regular Expressions or ERE flavor standardizes a flavor similar to the one used by the UNIX egrep command. “Extended” is relative to the original UNIX grep, which only had bracket expressions, dot, caret, dollar and star.
Regular-Expressions.info: https://www.regular-expressions.info/posix.html
The expression matches any line that starts (^) with four ({4}) hexadecimal characters (numbers and the letters a through f).
I can tell where the grep line has removed data lines by the presence of a line saying “Data (x bytes)” in the chunked response:
HTTP chunked response
Data chunk (4542 octets)
Chunk size: 4542 octets
Data (4542 bytes)
Data: 1f8b0800000000000003cd3b6b6fdbb8b29f9b5fc1a8d8d8...
[Length: 4542]
Chunk boundary: 0d0a
End of chunked encoding
The end result should be a lot easier to inspect if all you’re interested in is http headers.
A lot of credit for this capture must be given to various answers from the SuperUser question How do I return just the Http header from tshark?
“Shark diving in South Florida” © Chase Baker, Unsplash license.