The first iteration of the dieselweb.org website was created in 2009 for programmers. The domain has seen several iterations by other owners. We are providing 2009-2010 archived content from the original site.
diesel is a framework for writing network applications using asynchronous I/O in Python.
It uses Python's generators to provide a friendly syntax for coroutines and continuations. It performs well and handles high concurrency with ease.
An HTTP/1.1 implementation is included as an example, which can be used for building web applications.
Here's how easy it is to write a simple echo server:
You can find a nice overview of the nitty-gritty details in the documentation.
Currently, we are developing diesel primarily to meet the requirements of ShopTalk, a web-based group chat application for companies. However, we have been using the library for several years now on other projects. It has been tested with many applications, both HTTP and otherwise. We've found that it makes writing asynchronous applications a breeze.
We are releasing diesel as open source now, because we sense that the community is becoming more interested in asynchronous applications due to the rise in popularity of Comet. The open source community can benefit from diesel's accessible API, and diesel can benefit from the testing and contributions of the community.
We will be actively answering questions on the mailing list. Feel free to join in the discussion and send us your questions and comments.
The diesel source is hosted on bitbucket. Head over there to grab the source and fork the repository. The source repository also contains a copy of the documentation.
The underlying asynchronous library is only one piece of the puzzle. Once this foundation is in place, it becomes much easier to build more useful asynchronous software. In fact, we already have.
We will be releasing the other components of our stack as they become more mature. You can follow us on Twitter if you'd like to be kept apprised of future diesel releases.
diesel is a framework for writing network applications using asynchronous I/O.
What is Asynchronous I/O?
The basic decision network applications need to make when it comes to concurrency is what to do about waiting on data to arrive or to be ready to be written when multiple connections are involved. The problem can be best explained using the recv() syscall. recv() is the way that most network applications retrieve data off a socket; it is passed a socket file descriptor, and it blocks until data is available--then the data is returned.
With more than one socket, however, problems arise if you ignore the needs of concurrency. recv() is socket-specific, so your entire program blocks waiting on data to arrive on socket A. Meanwhile, sockets B, C, and D all have data waiting to be processed by the application. Oh well.
Many applications solve this by using multiple worker threads (or processes), which are passed a socket from a central, dispatching thread. Each worker thread "owns" exactly one socket at a time, so it can feel free to call recv(socket) and wait whenever appropriate. The operating system's scheduler will run other threads for processing data on other sockets until the original thread has data ready.
Asynchronous I/O takes a different approach. Typically, some system call is invoked that blocks on many sockets at the same time, and which returns information about any file descriptors that are ready for reading or writingright now. This allows the program to know that, for particular sockets, data is on the input buffer and recv() will return immediately. In these applications, since (theoretically) nothing will ever block on a I/O syscall for an individual socket, only one thread is necessary.
Advantages of Asynchronous I/O
The memory overhead of each connection is typically much lower than the other approaches, which makes it ideally suited to situations where socket concurrency numbers into the hundreds or thousands. No fork() or spawn()needs to be invoked to handle a surge of connections, and no thread pool management needs to take place. Additionally, switching between activity on sockets doesn't involve the operating system scheduler getting involved and a context switch between threads. All these factors add up: daemons written using Asynchronous I/O are typically the definitive performance champions in their category.
Also, because there is often only one thread running in an application, no complex and expensive locking needs to occur on shared data structures; a routine executing against shared data is guaranteed not to be interrupted in the middle of a series of operations.
Disadvantages of Asynchronous I/O
The inertia of existing code and developer preferences is a challenge. Most well-known client libraries block, so you can pretty much toss them out the window (or sandbox them on a thread, killing most of the aforementioned advantages). And blocking style "feels" more natural and intuitive to most programmers than async does. It's usually perceived as easier to write, and especially, read.
Threaded or multi-processed approaches are also better poised to take advantage of multiple cores. You're already involving the OS in scheduling, and you're already (hopefully) locking your shared data correctly, so running your programs "automatically" across multiple cores is possible. There are ways to do this with async approaches, of course, but they're arguably more explicit.
Finally, handlers within async applications must be good neighbors, and give up control back to the main loop within a reasonable amount of time, or else they can block all other processing from occurring. CPU-intensive operations devoted to individual sockets can be problematic.
In writing diesel, we had to choose between a difficult installation process (pyevent/libevent) or supporting only a specific, but common, platform. We chose the latter.
Currently, diesel is built on Python 2.6's epoll support in the standard library. This means that diesel requires Python 2.6 running on a Linux system. We aren't opposed to adding support for more systems in the future, but right now, that's exactly what you need.
The good news is, it doesn't require anything other than the standard library.
Provided you have setuptools installed, you can install using the standard python-cheeseshop route:
Examples and Docs
We do recommend you get the source, however, which contains lots of useful examples and a copy of this documentation.
The latest source and docs can always be downloaded from bitbucket: http://bitbucket.org/boomplex/diesel/
So diesel does network applications async-style. That we've covered.
What's unusual (and we think, awesome) about it is its preservation of the "blocking" feel of synchronous applications by (ab)use of Python's generators.
How does it Work?
Let's dive in...
Every "thread" of execution is managed by a generator. These generators are expected to yield special tokens that diesel knows how to process. Let's take a look at a simple generator that uses the sleep token:
Let's imagine that both these loops were run at the same time within diesel; here's an examination of what would go on from diesel's perspective:
- print_every_second() is scheduled
- A sleep token is yielded, requesting a wakeup in 1 second
- A one second timer is registered with the diesel event hub
- Are there any other loops to run? Yes, so:
- print_every_two_seconds() is scheduled
- A sleep token is yielded, requesting a wakeup in 2 seconds
- A two second timer is registered with the diesel event hub
- Are there any other loops to run? No, so:
- The main event hub loop waits until the timer that fires the soonest is ready (1s)
- Timers are processed to see what needs to be scheduled
- Run any scheduled loops... and so on.
Take a minute to recognize what's going on here: we're running cooperative loops that appear to be using easily-read, blocking, threaded behavior--but they're actually running within one process!
Hopefully that provides a sense of what diesel is doing and how generators can turn async into blocking-ish routines. Now let's put internals aside and focus on how to use diesel.
The truth is the example above wasn't a full diesel application; here's what a runnable version would look like:
Still, not too bad.
Every diesel app has exactly one Application instance. This class represents the main event hub as well as all the Loop and Service instances it schedules.
Loops and Services
- A Loop is an arbitrary routine that will be first scheduled when the app starts, as we've seen above
- A Service represents a TCP service listening on bound socket; a new connection-handling loop will be created and first scheduled every time an incoming connection is made
We've seen basic Loop s. Let's try a Service:
Having seen the Loop example, it's probably not too difficult to figure out what's going on here. We create and add a Service listening on port 8000 to our diesel Application. When someone connects, handle_echo starts taking over. The first thing this connection-handling loop does is yield an until token to diesel, letting diesel know what sentinel it wants to wait for on the connected socket's stream. diesel "returns" the string to the generator, up to and including the sentinel, as soon as it's available on the input buffer. Finally, the handling loop yield s a string, which dieselinterprets to be a request to write data on the connected socket. And the whole thing repeats.
If the generator ever ends (StopIteration is raised, in Python-speak), the connection will be closed.
Here's what the client side of this looks like:
diesel supports writing network protocol clients, too. Client objects, however, aren't managed by the main event hub exactly the same way Loops s and Service s are. Instead, they provide an API to other network resources that Loop s and Service s can utilize.