tea shark

Inspecting HTTP headers with tshark

Redirecting traffic from an nginx reverse proxy to a docker container I needed to add some forwarding information to the http headers. And so I figured I had better start wrapping my head around what http headers actually were, how they looked and how my nginx settings were impacting them. Enter tshark, the command line version of Wireshark.

Wire-/tshark are general purpose packet analyzers so the challenge here is to avoid casting a too wide net: I don’t want all the network traffic on my host, just the http headers and just those coming in and out of one particular virtual box.

My first idea was to put tshark inside of the container so as to be able to inspect the requests as they arrived. This turned out be wholly unnecessary: Docker networks interfaces do not exist in isolation of the host, they’re right there on the host as anybody who has played with docket networking on a GNOME desktop environment can attest (the network section of the settings widget becomes a bit crowded). This means that packets arriving at the bridge of the subnet the container lives in can be inspected from the host.

First I will zoom in on the particular docker bridge interface of interest so as to exclude traffic from other containers:

tshark -i br-0349e1f24c7a

This device services a subnet consisting of three containers – a web server, a php processor and a database – so there’s plenty of chatter. A lot of tcp handshakes, a lot of MySQL queries, etc.

   66 9.131120165     10.0.5.4 → 10.0.5.2     TCP 1058 45666 → 9000 [PSH, ACK] Seq=1 Ack=1 Win=29312 Len=992 TSval=1393167508 TSecr=462643214                                                                                           
   67 9.131140843     10.0.5.2 → 10.0.5.4     TCP 66 9000 → 45666 [ACK] Seq=1 Ack=993 Win=30976 Len=0 TSval=462643214 TSecr=1393167508                                                                                                  
   68 9.149405416     10.0.5.2 → 10.0.5.3     TCP 74 54586 → 3306 [SYN] Seq=0 Win=29200 Len=0 MSS=1460 SACK_PERM=1 TSval=3422045482 TSecr=0 WS=128                                                                                      
   69 9.149453570     10.0.5.3 → 10.0.5.2     TCP 74 3306 → 54586 [SYN, ACK] Seq=0 Ack=1 Win=28960 Len=0 MSS=1460 SACK_PERM=1 TSval=793168209 TSecr=3422045482 WS=128                                                                   
   70 9.149477587     10.0.5.2 → 10.0.5.3     TCP 66 54586 → 3306 [ACK] Seq=1 Ack=1 Win=29312 Len=0 TSval=3422045482 TSecr=793168209                                                                                                    
   71 9.149666681     10.0.5.3 → 10.0.5.2     MySQL 180 Server Greeting proto=10 version=5.5.5-10.3.8-MariaDB-1:10.3.8+maria~bionic                                                                                                     
   72 9.149696241     10.0.5.2 → 10.0.5.3     TCP 66 54586 → 3306 [ACK] Seq=1 Ack=115 Win=29312 Len=0 TSval=3422045482 TSecr=793168209                                                                                                  
   73 9.149745278     10.0.5.2 → 10.0.5.3     MySQL 176 Login Request user=wp_user db=                                                                                                                                                  
   74 9.149763702     10.0.5.3 → 10.0.5.2     TCP 66 3306 → 54586 [ACK] Seq=115 Ack=111 Win=29056 Len=0 TSval=793168209 TSecr=3422045482     

Narrowing it down somewhat I tell tshark to focus on http traffic and nothing else. This also causes tshark to be a lot more detailed about the packets (similar to asking for verbose mode):

tshark -i br-0349e1f24c7a -O http

Now I do get to see the occasional detailed http headers because of the verbose mode but I’m also still seeing the rest of the traffic. Looking at the ports a lot of it seems to be coming from the PHP FPM (port 9000) and MariaDB (port 3306) containers. Let’s cut that out:

tshark -i br-0349e1f24c7a -O http -f "tcp port 80 or tcp port 443"

“or tcp port  443” is not really relevant or useful here – encryption is applied outside of the docker container and if I was using encryption all the way into the docker container I wouldn’t be able to study the headers. Keeping it helps me remember the syntax of what tshark terms capture filters and it does no harm.

The capture filter clears things up a lot. However I’m still getting the entire and detailed tcp three way handshake: A packet with the SYN flag set from the client, followed by an ACK packet from the server, followed by ACK from the client:

Frame 1: 74 bytes on wire (592 bits), 74 bytes captured (592 bits) on interface 0                                                                                                                                                       
Ethernet II, Src: 02:42:60:54:6a:1c (02:42:60:54:6a:1c), Dst: 02:42:0a:00:05:04 (02:42:0a:00:05:04)                                                                                                                                     
Internet Protocol Version 4, Src: 10.0.5.1, Dst: 10.0.5.4                                                                                                                                                                               
Transmission Control Protocol, Src Port: 34412, Dst Port: 80, Seq: 0, Len: 0                                                                                                                                                            
                                                                                                                                                                                                                                        
Frame 2: 74 bytes on wire (592 bits), 74 bytes captured (592 bits) on interface 0                                                                                                                                                       
Ethernet II, Src: 02:42:0a:00:05:04 (02:42:0a:00:05:04), Dst: 02:42:60:54:6a:1c (02:42:60:54:6a:1c)                                                                                                                                     
Internet Protocol Version 4, Src: 10.0.5.4, Dst: 10.0.5.1                                                                                                                                                                               
Transmission Control Protocol, Src Port: 80, Dst Port: 34412, Seq: 0, Ack: 1, Len: 0                                                                                                                                                    
                                                                                                                                                                                                                                        
Frame 3: 66 bytes on wire (528 bits), 66 bytes captured (528 bits) on interface 0                                                                                                                                                       
Ethernet II, Src: 02:42:60:54:6a:1c (02:42:60:54:6a:1c), Dst: 02:42:0a:00:05:04 (02:42:0a:00:05:04)                                                                                                                                     
Internet Protocol Version 4, Src: 10.0.5.1, Dst: 10.0.5.4                                                                                                                                                                               
Transmission Control Protocol, Src Port: 34412, Dst Port: 80, Seq: 1, Ack: 1, Len: 0  

It does make the headers a bit harder to find so I would rather leave these out. tshark has another filtering capability that is more focused on appearance than network fundamentals like ports and protocols called “display filters” (as opposed to “capture filters”). Think grep and suchlike. I add these using the -Y flag (older tutorials use -R but this seems to have changed somewhat).

tshark -i br-0349e1f24c7a -O http -f "tcp port 80 or tcp port 443" -Y "http.request || http.response"

“||”  will be recognisable to Bash (and probably many other scripting languages) fans as an or-test, so the frame should match either http.request or http.response.

Now I’m getting somewhere. Of course, whenever the server actually delivers the goods, packet fill up with data. A lot of (gzipped) data. Which looks like this:

Frame 6: 5699 bytes on wire (45592 bits), 5699 bytes captured (45592 bits) on interface 0                                                                                                                                               
Ethernet II, Src: 02:42:0a:00:05:04 (02:42:0a:00:05:04), Dst: 02:42:60:54:6a:1c (02:42:60:54:6a:1c)                                                                                                                                     
Internet Protocol Version 4, Src: 10.0.5.4, Dst: 10.0.5.1                                                                                                                                                                               
Transmission Control Protocol, Src Port: 80, Dst Port: 35102, Seq: 1, Ack: 591, Len: 5633                                                                                                                                               
Hypertext Transfer Protocol                                                                                                                                                                                                             
    HTTP/1.1 200 OK\r\n                                                                                                                                                                                                                 
        [Expert Info (Chat/Sequence): HTTP/1.1 200 OK\r\n]                                                                                                                                                                              
            [HTTP/1.1 200 OK\r\n]                                                                                                                                                                                                       
            [Severity level: Chat]                                                                                                                                                                                                      
            [Group: Sequence]                                           
        Request Version: HTTP/1.1                                       
        Status Code: 200                                                
        [Status Code Description: OK]                                   
        Response Phrase: OK                                             
    Server: nginx/1.14.0\r\n                                            
    Date: Sun, 02 Sep 2018 11:17:15 GMT\r\n                             
    Content-Type: text/html; charset=UTF-8\r\n                          
    Transfer-Encoding: chunked\r\n                                      
    Connection: close\r\n                                               
    Vary: Accept-Encoding\r\n                                           
    X-Powered-By: PHP/7.2.8\r\n                                         
    X-Pingback: https://tgt.madsmi.de/xmlrpc.php\r\n                    
    Link: <https://tgt.madsmi.de/wp-json/>; rel="https://api.w.org/"\r\n
    Link: <https://tgt.madsmi.de/?p=1>; rel=shortlink\r\n
    Content-Encoding: gzip\r\n
    \r\n
    [HTTP response 1/1]
    [Time since request: 0.193809688 seconds]
    [Request in frame: 4]
    HTTP chunked response
        Data chunk (5220 octets)
            Chunk size: 5220 octets
            Data (5220 bytes)

0000  1f 8b 08 00 00 00 00 00 00 03 d5 3c d9 72 db 46   ...........<.r.F
0010  b6 cf d6 57 b4 e0 8a 48 c6 04 40 52 0b b5 90 74   ...W...H..@R...t
0020  39 96 3d e3 ca 32 53 96 33 a9 29 cb e5 6a 00 4d   9.=..2S.3.)..j.M
0030  b0 45 6c 01 40 4a 1c 5b 9f 73 f3 0d f3 ee 1f bb   .El.@J.[.s......
0040  e7 f4 82 8d 14 a5 c4 ca bd 33 8e 44 a2 bb 4f 9f   .........3.D..O.
0050  3e 5b 9f a5 1b ca 68 f7 fc 6f 2f df fd f3 ef af   >[....h..o/.....
0060  c8 2c 0f 83 c9 ce 08 bf 48 40 23 7f 6c 78 d4 3c   .,......H@#.lx.<
0070  ff de 20 6e 40 b3 6c 6c 44 b1 79 95 19 08 c1 a8   .. n@.llD.y.....

The data are part of the packets I want to inspect but it’s not actually of any interest – especially with the compression which renders it incomprehensible (image data would obviously be incomprehensible even without compression) I don’t know of any way for tshark to filter it out so I resort to using some regex magic with grep:

tshark -i br-0349e1f24c7a -O http -f "tcp port 80 or tcp port 443" -Y "http.request || http.response" | grep -v -E ^[0-9a-f]{4}

The -v flag inverts the capture so grep shows me everything but what the regex matches. The -E flag is for extended regular expressions. What are extended regular expressions?

The Extended Regular Expressions or ERE flavor standardizes a flavor similar to the one used by the UNIX egrep command. “Extended” is relative to the original UNIX grep, which only had bracket expressions, dot, caret, dollar and star.

Regular-Expressions.info: https://www.regular-expressions.info/posix.html

The expression matches any line that starts (^) with four ({4}) hexadecimal characters (numbers and the letters a through f).

I can tell where the grep line has removed data lines by the presence of a line saying “Data (x bytes)” in the chunked response:

    HTTP chunked response
        Data chunk (4542 octets)
            Chunk size: 4542 octets
            Data (4542 bytes)

                Data: 1f8b0800000000000003cd3b6b6fdbb8b29f9b5fc1a8d8d8...
                [Length: 4542]
            Chunk boundary: 0d0a
        End of chunked encoding

The result is pure http headers goodness. Enjoy.


Some amount of credit for this capture must be given to various answers from the SuperUser question How do I return just the Http header from tshark? 


  

Mad Hatters Tea Party, Martin Garwood, Amanda Elzer flickr photo by Eva Rinaldi shared under a Creative Commons (CC BY-SA 2.0) license

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.