Squeezing the sponge: How to implement a buffering SMTP Handler in Python

…when he needs what you have gleaned, it is but squeezing you,
and, sponge, you shall be dry again.
– Hamlet

One of the first things I implement in most Python projects is some quick stream logging so as not to dot too many print statements across the code. At some point file logging usually  follows as a feature so it can run by itself and I can check up on it from time to time.

Recently I got ambitious and added email logging to poca, my podcast client project. Which isn’t difficult as the logging module has an SMTPHandler built-in that you just stick on your logger and now your logs are airborne. It’s almost xkcd-like. The way it works though is that an email is sent off instantly every time something is logged, just like stream logging only over port 25. This was not what I wanted. To be specific I was looking for something that

  • Would gather up log entries as my script runs it course
  • At the end of the script it would evaluate if there was anything worthwhile sending
  • If so, send it; if not, save, don’t send

The way to do, as suggested by the logging module author himself, is to subclass the BufferingHandler which does a fine job of soaking up the entries and keeping them until its time to release them. He has written an example of how to do this himself and put it on Github.

I had, however, a further requirement. When a podcast feed fails, it’s not necessarily cause for alarm or notifying the user. Maybe the server’s in maintenance mode for a few minutes, maybe builders unplugged the cable for an hour, maybe you did. You only really need email logging to tell you when a podcast feed consistently fails because then it’s probably either a software failure or the feed is offline, either of which requires user intervention. Vinay Sajip’s script doesn’t allow for that. Also, using unicode is easier if we keep a clear distinction between body and headers which Sajip’s approach seems to fudge a bit.

My requirements meant that I would have to not just buffer entries in memory but save them over time and only release them when there was a sufficient number (and of sufficient severity) to worry about.

Here’s the gist of it (sorry).

Thr first part is subclassing the BufferingHandler. BufferingHandler’s trick is to delay logging until a certain number or severity of records is reached – or the handler is prompted to ‘flush’ the records manually. There is no set destination – stdout, a file, networking – for BufferingHandler. By subclassing BufferingHandler and overriding the flush function we get to determine where the logged records go.

class BufferSMTPHandler(handlers.BufferingHandler):
    '''SMTPHandler that send one email per flush'''
    def __init__(self, email, paths):
        handlers.BufferingHandler.__init__(self, capacity=5)
        self.state = get_state(path)
        self.buffer = self.state.buffer
        self.email = email
        smtp_formatter = logging.Formatter("%(asctime)s %(message)s",
                                           datefmt='%Y-%m-%d %H:%M')
        self.setFormatter(smtp_formatter)

We amend the BufferingHandler’s init in two main ways: First, we preload the handler’s buffer (i.e. the saved store of records, the logging term for a logged entry). This is done by a function that un-pickles a previously pickled buffer (or if there is no previously saved buffer, just returns an empty list). The great thing about pickle is that you don’t need to know what the exact nature of what you’re picking. Much like real-world pickling, I imagine: Just shove it in a jar, douse it in vinegar and put a lid on it. I won’t cover the pickle function here as it will get too detailed and there are lots of pickle tutorials out there. It’s fairly simple and all we need to pass to it is the path of the file where we saved the buffer. Secondly, we store the email dictionary that contains all the need smtp settings (host, port, toaddr, fromaddr, etc.)

We also preset the capacity of the BufferingHandler. In the original class this is the number of records that the handler will soak up before it automatically releases them. We will slightly alter this behavior later but we still make use of the setting. In real applications this is probably set by way of a variable.

Next, we override the flush function. This is the part of the handler that determines where the records go when they are released. So in this case it is where we format the records into an email and carry out the SMTP routine.

    def flush(self):
        '''Flush if we exceed threshold; otherwise save the buffer'''
        if not self.buffer:
            self.outcome = Outcome(None, 'Buffer was empty')
            return
        if len(self.buffer) < self.capacity:
            self.outcome = Outcome(None, 'Buffer no sufficiently full')
            self.save()
            return
        body = str()
        for record in self.buffer:
            body = body + self.format(record) + "\r\n"
        msg = MIMEText(body.encode('utf-8'), _charset="utf-8")
        msg['From'] = self.email['fromaddr']
        msg['To'] = self.email['toaddr']
        msg['Subject'] = Header("POCA log")
        if self.email['starttls']:
            try:
                smtp = smtplib.SMTP(self.email['host'], 587, timeout=10)
                ehlo = smtp.ehlo()
            except (ConnectionRefusedError, socket.gaierror, socket.timeout) \
                    as error:
                self.outcome = Outcome(False, str(error))
                self.save()
                return
            smtp.starttls()
            try:
                smtp.login(self.email['fromaddr'], self.email['password'])
            except smtplib.SMTPAuthenticationError as error:
                self.outcome = Outcome(False, str(error))
                self.save()
                return
        else:
            try:
                smtp = smtplib.SMTP(self.email['host'], 25, timeout=10)
                ehlo = smtp.ehlo()
            except (ConnectionRefusedError, socket.gaierror, socket.timeout) \
                    as error:
                self.outcome = Outcome(False, str(error))
                self.save()
                return
        try:
            smtp.sendmail(self.email['fromaddr'], [self.email['toaddr']],
                          msg.as_string())
            self.outcome = Outcome(True, "Succesfully sent email")
        except (smtplib.SMTPException, socket.timeout) as error:
            self.outcome = Outcome(False, str(error))
            self.save()
            return
        smtp.quit()
        self.buffer = []
        self.save()
    def save(self):
        '''Save the buffer for some other time (it either isn't full yet or we
           can't empty it)'''
        self.state.buffer = self.buffer
        self.state.save()
        self.buffer = []

As I said in the opening, there are cases where I don’t want my handler to send off an email. If the buffer is empty, we just skip the rest. If the buffer isn’t empty but below capacity, we also skip. I want to accumulate a certain amount of records before sending them off.  However, if we have gathered some records but not enough, we put the new buffer (old, retrieved records plus fresh ones) back into the jar, pickle it and reset the buffer for good measure.

The eagle-eyed may notice that the second catch (buffer is below capacity) should also get the first one (buffer is empty). So why do we need both? The thing is that however I activate the handler’s flush method – by allowing it to reach capacity, by clsoing the handler or by manually flushing – the method is actually run not once, but three times (insert mandatory germophobe joke here). I have tried building far simpler subclasses and the same goes for those. I have no explanation for this. In the case where the buffer is half full and we pickle it and reset the handler’s buffer, we risk pickling an empty buffer on the second and third run. So we catch those at the outset and simply skip everything.

If we do have something to send, we start the whole process of formatting the contents into an email. Lopping through the records we add them one by one before handing it over to the MIMEText class. MIMEMultipart is for when you need to include both a text and an html version of your email. Since I’m using this for logging, text is perfectly adequate.

The configuration in the ’email’ dictionary (which I’m just realising is very poorly named considering it’s also the name of a module we use) allows for a binary starttls-or-no-starttls choice. If you go with STARTTLS we create an SMTP connection to port 587, run the STARTTLS method and login. If not, we connect to port 25 and assume we’re authorized to start gabbing away. The latter option is probably not useful to most people but if you have a local SMTP server and your trust your mynetworks, then it’s a pretty easy setup.

In either case we guard against (some) errors. socket.gaierror tends to pop up if DNS is not working or you have no connection at all. ConnectionRefusedError is for trying to connect to a bad port. SMTPAuthenticationError is obviously for bad credentials, whereas SMTPException is simply smtplib’s catchall exception, the class all other exceptions are subclassed from and so should get most if not all SMTP related issues. I guess it’s not generally a good idea to attempt to hedge your bets this way, but I cannot possibly foresee everything that can go wrong with an SMTP connection and we make sure to keep whatever message we get by saving it into the Outcome named tuple. Finally, we attempt to guard against timeouts  by catching socket.timeout errors.

In all cases of failure, we save the buffer as we did not get to properly empty it out. This means, of course, that if the internet connection goes and (podcast connection) errors start piling up, all the error records will be saved for when the connection comes back on, as the email attempts will fail too. Once the machine is online again you’ll get one massive email detailing all the failed attempts. Still better than a thousand one-liners.

This could obviously be hardened a lot more but I think it’s decently well padded for the majority of use cases. Also it’s hard to find a hundred differently configured SMTP servers to test against.

    def shouldFlush(self, record):
        '''Returns false to stop automatic flushing (we flush on close)'''
        return False

Finally we override the shouldFlush method. This method is used by BufferingHandler whenever it receives a record to determine if it’s time to flush the buffer. By default it will trigger when the amount of records in buffer reaches the set capacity or a record with a sufficiently high loglevel is encountered.

I’ve set it to always return False, so it will never trigger an automatic flush. You can flush the buffer manually by calling the flush method from a script or a console. In combination with the check in the flush method to see if the buffer has met or exceeded capacity, this turns ‘capacity’ into more of a ‘threshold’: Rather than automatically spilling over when capacity is met, we run a manual flush and that only sends email if we’re at or over the threshold.

The result is a fairly flexible buffering handler that could also be used for networking or popping up desktop notifications or similar instances where accumulated  notifications are preferable to a constant stream.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.