Django Diaries / 16th Jun 2016

The Philosophy of Channels

It's been a while since my last blog post about Channels, and a lot has happened in the meantime - the API has developed out and stabilised, features like backpressure have come along, and the situation for backends is looking a lot better, especially once the local-and-remote combination layer matures some more.

The other thing that has happened, however, is confusion and worry over the direction Channels is headed, and the direction it spells for Django and Python overall. A lot of the development of Channels was addressing and tackling my own personal worries with its direction, and picking the right set of tradeoffs, sometimes from two equally valid options.

I've not been as proactive as I could have been at communicating my reasoning and long-term vision for what Channels could do; I'm hoping this blog post will rectify some of that. Let me take you through the specific set of problems I'm looking to tackle, why I chose to design things the way I did, and what I see as the path forwards.

It's not just about WebSockets

A lot of people's base reaction to Channels is two-fold; first, to see it as only being a way to get WebSocket support (that is the thing that spurred on development, but not the only reason for it; more on that later), and second, to then say that trying to solve WebSocket protocol handling via a message-passing distributed system is overkill.

They're right there; Python's async capabilities are getting ever better, and it's easy enough to use one of the existing libraries (such as Autobahn) to write a WebSocket handling server in a few hours. You probably need to standardise an interface so you can talk between this server and the rest of your project, but that's not particularly difficult.

This is, indeed, the route I first took, and how the very early versions of Channels (then called django-onair) worked. However, as I developed it out and starting pondering how to run it at a decent scale, the real problem became clear.

You see, WebSocket protocol handling isn't the hard problem, in my opinion; it's actually using those sockets in a larger project. Most uses of sockets are event-driven; you send data down the socket when something happens externally - be it a model being saved, an external system changing, or just another message on another WebSocket.

All these different sources of events can happen at different places in your deployed project. If you go down the traditional path of running a bunch of servers, each with a webserver and Python code, you quickly realise that you need some way to communicate between them; WebSocket handling is one thing, but being able to broadcast to groups of sockets when things happen is actually how you write applications.

Imagine a large-scale chat server where different people are logged into different physical machines; how will your processes broadcast out incoming chat messages on a socket to everyone else in that chatroom, on all the other servers? Where do you track who's in the room? What do you do about dead sessions?

What about a system where you send notifications to users when their profile is viewed - those views are likely happening on a different server, so how do you get the view event from that server over to the one where your user's WebSocket is terminated?

This is the hard problem that Channels is really aimed to solve; it's not just WebSocket protocol handling, but the problem of building complex applications around WebSockets. Distributed systems and messaging is a really tough problem, and I believe it's the sort of thing that benefits a lot from a few, shared, polished solutions, rather than a rough guide on how to tie async code together.

Encouraging async

One of the things Channels does is run Django in a synchronous fashion, and encourages you to write all the event handlers for messages the same way; it just runs this code in a tight worker loop, and discourages you from doing any blocking operations to stall that loop.

The problem is, people seem to think that's the only way intended for you to write code against Channels. It's not; Channels is meant to make messaging easier between sync and async programs in general, letting you choose the best tool for the job (and I would argue that in a lot of simple business logic cases you probably want synchronous code, as it's a lot easier to write and maintain).

In fact, Channels makes it easier than ever to write fully-asynchronous applications and have them communicate with the rest of your Python project; that's all the WebSocket interface server (Daphne) is, after all. Want async URL fetching? IoT communication? Outgoing sockets? You can write the async code as you normally would, and then thanks to Channels, keep the rest of your code in a familiar synchronous framework and communicate back and forth with your specialist code.

Once you have a project running using Channels, it makes it easier than ever to add in more async code as its own process to go and do some new task you need, and then have a clearly-defined, existing solution to communicate with other async and sync processes. The community experience and documentation surrounding it and write-ups and case studies of others who have gone before you all contribute, because things are on a single, shared design and platform.

More Protocols

Of course, this all ties back into the idea of Channels as not being about WebSockets; it's a general cross-process eventing system for Django (and, hopefully, Python at large). WebSockets are one of the protocols specified to run over it, but work is already underway on interface servers for Slack (letting you tie in chat integration to a server cluster) and email (allowing you to write consumers easily against incoming email alongside your HTTP and WebSocket code).

Message format specifications also help alternative implementations; much like there are many WSGI servers, the message formats allow any number of ASGI-compatible HTTP or WebSocket servers to exist, even running alongside each other in the same system.

Some protocols don't need the broadcast functionality that WebSockets do, especially if they don't have stateful connections, but good channel layer design will keep them all routed to the same server; while channel layers are meant to be cross-process and network-transparent, that doesn't mean they have to route every message through a central place. The Channels layout was designed to allow messages that can be done locally to be distinguished from those that must be sent elsewhere.

In fact, with the recent addition of the RedisLocalChannelLayer in the asgi_redis package, you can run servers in a standard socket-terminator and worker pair, and the channel layer will keep as much as it can local to the machine, only traversing over the network when it needs to find a specific terminated socket to send things down to another user, or for group broadcast.

Distributed systems are hard

At its core, then, Channels is solving a distributed systems problem; that of communication and broadcast. Distributed systems is an area where there's no perfect solution; you always have to pick tradeoffs. At-least-once or at-most-once is one key example; the CAP "theorem" about distributed databases is the effect of others.

Channels picks a certain set of these, aimed to be the best fit for the uses and protocols that are around the Web today, especially WebSockets. Dropping frames and closing the socket is preferred to duplicating frames, for example; clients can reconnect, but having a distributed deduplication system for actions is hard unless you make everything idempotent. I'll hopefully get another post up detailing exactly what the tradeoffs I've picked are and what the alternatives would imply, but each of them is chosen for a reason.

It's never going to work for everyone; that's an unachievable goal. Its purpose, instead, is to be the solution for the 90%; something that isn't always perfect, but generally does what you want, and where the tradeoffs are offset by the massive advantage of a single common platform and the shared code and community that enables. It's much like Django, which cannot be a perfect web framework - it can't solve every problem that every developer has - but we can solve 90% of all the problems developers have in a consistent manner, and have a standard and design pattern that encourages re-use and familiarity.

The ASGI API that Channels is built on is deliberately very slim; it specifies the minimum it needs to so you can get a consistent experience across different channel layer backends, while leaving a lot up to the backend, and thus a decent amount of flexibility in how you transport messages. As you get bigger, your needs will change and specialise; the channel layer abstraction is there to allow you to try and grow inside it as long as possible, being flexible while still presenting the same basic API you were developing on when you started; channels, groups, send, and receive.

I don't expect any "top 100" site with to run an unmodified ASGI channel layer, just like they wouldn't run a standard Django installation; as you get bigger and your needs specialise, what you want is a solution that leaves space for you to slowly and reliably replace it, and my goal with the design of ASGI is that, even once you remove all of the channels code, you're left with an abstraction and design that will work with many more specialised examples of distributed systems and events. Just like core Django itself, it lets you heavily modify it and replace parts while you grow and gets out of your way once you no longer need it.

This, then, is the philosophy of Channels - a solution that is not intended to be a panacea, but instead to be a common base to help with developing applications that span multiple servers and deal with stateful protocols like WebSockets. Smaller teams, agencies, and medium size sites can use it without many changes; larger projects will likely need to specialise a channel layer backend and maybe some of the handling, but can still benefit from the developer familiarity that following the abstraction and patterns provides.

Looking Ahead

With that in mind, what is the path ahead for Channels and ASGI? WebSocket projects themselves are in their relative infancy - very few have been deployed at any appreciable scale yet, let alone using Channels - and so we have a way to go with maturity no matter what. Sites are already using Channels in production, and the feedback I've had about it has been pretty much all positive, so we're on a good path to maturity on that front.

Daphne itself is heavily based on Twisted code for HTTP handling, and Autobahn for WebSocket handling - two libraries with a strong history of stability - while ASGI is based on our experience and research into scaling eventing systems inside Eventbrite, my previous experiences with distributed messaging, industry research and case studies, and talking to others handling similar problems. It's as solid a baseline as you can reach in a situation where there's no successful open-source example to easily follow.

The feedback I got during the proposal process for putting Channels into Django 1.10 (it did not get in before the deadline, but will still be developed as an external app with 1.10) was valuable; some of the more recent changes and work, such as backpressure and the local-and-remote Redis backend, are based on feedback from that process, and I imagine more tweaks will emerge as more things get deployed on Channels.

That said, I think the fundamental design abstraction - distributed message-passing - is a solid one, and a sensible API to build Django applications against in future as the needs and complexity of Web applications grows beyond simple request-response handling. This is a space where Django and Python have the opportunity to help lead the way in structuring and running these sorts of applications.

I'm also interested in taking the ASGI message formats and common interface standard into the PEP process, but before that I'm going to reach out to other web frameworks to make sure that it's something that truly works across framework boundaries, as always intended, and to try and work out potential issues in real-world scenarios.

I'm unsure quite what the future holds for Channels - the ideal would be for it to open up Django and Python as the solution to a much greater class of problems than people currently use them for, bringing the positive points of the language, framework and communities to a growing audience of developers faced with writing these large, stateful-protocol systems. It might also be that it ends up just being the WebSocket management layer for Django, but even for that purpose, it's important to get it designed well.

I hope this has illuminated some of the missing context and plans behind Channels; community feedback is incredibly important to this whole process, and so if this helped, or if you still have more questions, please get in touch and let me know. It's important that everyone understands both the implementation and the context of the problem is solves - one is nothing without the other - and I hope that, going forwards, we can have a clear idea of what they both mean together.