Article / 2nd Nov 2011

Introducing Mantrid

Mantrid is the first of hopefully many pieces of Epio's infrastructure that we're open-sourcing.

Back when we were first developing Epio, we started off using HAProxy as our load balancer. HAProxy is a fine piece of software, and performed quite well for a while, but it quickly started having issues as we got more sites.

The main issue was that Epio has at least two (and often more) hostname-matching rules for each app we host, and we have rather a large number of apps - numbering in the thousands at the time we're talking about, and more these days. HAProxy has two issues at this scale: One, it's not terribly efficient at matching domains (it looks to do a linear scan through several thousand regular expressions), and two, the state of our sites changes constantly; we were having to reload it every ten to fifteen seconds; this worked surprisingly well (HAProxy has a graceful restart mode), but still meant things were a little delicate.

Thus, one day in May, when the HaProxy setup was having a particularly bad day, I sat down and wrote a simple Eventlet-based load balancer. We tested and deployed it that week, and much to our surprise, the latency on requests dropped. Over the following months, we built on that code, integrating lazy-loading for applications (so if you hit an application that was disabled, it would hold the request while it started it) and statistics logging (to measure bandwidth).

However, that load balancer, while it performed admirably for over 6 months, was heavily tied into the core Epio codebase, and our set of common libraries and utilities. Thus, I began the work of separating it out and polishing it off, and Mantrid was born.

The key features for us are, obviously, the fact we can change the host configuration at runtime, and the ability to "spin" requests until applications are loaded. This feature is, incidentally, also useful even for a single-host site; if you're performing a database upgrade, migration, or server restart, you can do it with zero dropped connections with something like:

mantrid-client set spin true
# ... do restart ...
mantrid-client set proxy true backends=

We deployed Mantrid into production on Epio around two weeks ago, and it's performed flawlessly ever since. We have it exposed directly to the internet on port 80, and also to our SSL-terminating Nginx on another port.

We'd love for people to try it out and report bugs, or tell us their experiences; our goal is to get it running with a latency and throughput cost close to that of other software load balancers (in part by running it under PyPy, something we designed it to do).

(Note: Several people have asked me what we do if Mantrid crashes, as it's configured at runtime. The answer is simple - it persists to a state file, which it loads from on startup)