Introducing Mantrid
Mantrid is the first of hopefully many pieces of Epio's infrastructure that we're open-sourcing.
Back when we were first developing Epio, we started off using HAProxy as our load balancer. HAProxy is a fine piece of software, and performed quite well for a while, but it quickly started having issues as we got more sites.
The main issue was that Epio has at least two (and often more) hostname-matching rules for each app we host, and we have rather a large number of apps - numbering in the thousands at the time we're talking about, and more these days. HAProxy has two issues at this scale: One, it's not terribly efficient at matching domains (it looks to do a linear scan through several thousand regular expressions), and two, the state of our sites changes constantly; we were having to reload it every ten to fifteen seconds; this worked surprisingly well (HAProxy has a graceful restart mode), but still meant things were a little delicate.
Thus, one day in May, when the HaProxy setup was having a particularly bad day, I sat down and wrote a simple Eventlet-based load balancer. We tested and deployed it that week, and much to our surprise, the latency on requests dropped. Over the following months, we built on that code, integrating lazy-loading for applications (so if you hit an application that was disabled, it would hold the request while it started it) and statistics logging (to measure bandwidth).
However, that load balancer, while it performed admirably for over 6 months, was heavily tied into the core Epio codebase, and our set of common libraries and utilities. Thus, I began the work of separating it out and polishing it off, and Mantrid was born.
The key features for us are, obviously, the fact we can change the host configuration at runtime, and the ability to "spin" requests until applications are loaded. This feature is, incidentally, also useful even for a single-host site; if you're performing a database upgrade, migration, or server restart, you can do it with zero dropped connections with something like:
mantrid-client set myhost.com spin true # ... do restart ... mantrid-client set myhost.com proxy true backends=1.2.3.4:8000
We deployed Mantrid into production on Epio around two weeks ago, and it's performed flawlessly ever since. We have it exposed directly to the internet on port 80, and also to our SSL-terminating Nginx on another port.
We'd love for people to try it out and report bugs, or tell us their experiences; our goal is to get it running with a latency and throughput cost close to that of other software load balancers (in part by running it under PyPy, something we designed it to do).
(Note: Several people have asked me what we do if Mantrid crashes, as it's configured at runtime. The answer is simple - it persists to a state file, which it loads from on startup)

comments
What was the rationale for writing a load-balancer from scratch instead of writing a patch for HAProxy and getting it upstream?
The way I think about this is that we have open source software stacks and when an issue turns up it's worth it over the long term to improve the stack rather than treating it as a black box and working around the problem.
It sounds like the results made it worth it for you but how do feel about go your own way vs contribute to the existing stack?
This seems nice. I have a few questions, excuse my ignorance:
1. Is it possible to protect the REST interface with a password.
2. What method of scheduling does it use?
3. Can it serve static files? I know pleaple advice other things but I have my reasons.
Thanks
good stuff. We had ended up writing a load balancer / proxy for TCP connections for the Screen Sharing solution in dimdim (http://dimdim.com) We had used twisted python for a pattern matching load balancer with fail over and rerouting.
Exisitng solutions like Zeus (which we used) and HAProxy really were threadbare when it came to routing stateful TCP connections. Zeus did have TrafficScript but it was so painful to use beyond trivial routing.
Stefan: The rationale was that we wanted something quite different to anything out there - we'd have to have added live config, holding back of connections, and a better host searcher to HaProxy, and at that point it was quicker to develop something in-house, at least initially.
I don't feel bad about not contributing to existing stacks as that was never really an option; it would have taken too long and required learning a codebase and language neither of us was too familiar with.
volta:
1. No, it should be protected by networking. A password for that was surplus to requirements, and we'd have had to then protect it against brute force attacks, timing attacks, etc.
2. Purely random, equally weighted. We'll add more scheduling in future, but our backends are all identical, so that's worked well for us so far.
3. It could in theory serve static files via the `static` action, but only one per hostname; it doesn't do any path-based routing.
I hope you would be able to write 1 or 2 pages of documentation for novice users like me soon. A simple working example using 2 simple backends would suffice I think. Thanks
Maybe a google group? I am getting this error on Ubuntu 11.10
2011-11-03 13:58:26 - INFO: Using configuration file /home/projects/test/mantrid.conf
2011-11-03 13:58:27 - INFO: Dropped to GID 4321
2011-11-03 13:58:27 - INFO: Dropped to UID 4321
2011-11-03 13:58:27 - INFO: Listening for requests on ('::', 80)
2011-11-03 13:58:27 - INFO: Listening for management on ('::', 8042)
2011-11-03 13:58:45 - ERROR: Traceback (most recent call last):
File "/home/projects/test/env/local/lib/python2.7/site-packages/mantrid/loadbalancer.py", line 322, in handle
headers = headers,
File "/home/projects/test/env/local/lib/python2.7/site-packages/mantrid/actions.py", line 62, in handle
with open(os.path.join(os.path.dirname(__file__), "static", "%s.http" % self.type)) as fh:
IOError: [Errno 2] No such file or directory: '/home/projects/test/env/local/lib/python2.7/site-packages/mantrid/static/no-hosts.http'
@volta: I've updated the documentation to add a simple guide (see http://mantrid.readthedocs.org/en/latest/guides/simple.html), and uploaded a fixed release for your traceback.
I'd recommend filing issues on Github if you have any problems.
Great! Thanks Andrew, no more traceback. There is one thing though, the admin-dashboard on port 8042 displays a "file not found" error. No other clues why.
@volta: That's because it's not an admin dashboard, it's an API endpoint. There's nothing at the / path; you're supposed to interact with it via mantrid-client.