Blocking IO in Gunicorn Gevent Workers

As part of the Python Platform Team here at Wayfair, I help support Wayfair’s 100+ Python engineers and data scientists. We’re the go-to team when it comes to better leveraging Python within the Wayfair Tech ecosystem.

During some recent project work, we recently investigated a production issue with a Flask service that would stop responding under moderate load. In this article, I’ll go through how you can go about blocking IO in Gevent used in Python apps that can cause serious and non-intuitive performance issues. Beware!

Background

The above-mentioned Flask service has six servers behind a load balancer; each server runs four Gunicorn Gevent Workers. This means a total of 24 processes available to handle requests, each with their own Gevent event loop.

The workload these servers handle is completely IO bound; most of the response time is spent either reading from the database or writing to Apache Kafka. We would expect this configuration to handle thousands of concurrent requests. However, when the service was put under moderate load, response times would slowly rise and the service would eventually stop responding. If the load was reduced, the service would quickly return to responding normally.

Confusingly, the service would stop responding when the servers were only at ~30% of their total CPU capacity. We expected the service to stop responding like this when 100%+ of their CPU capacity was utilized handling IO. The Gunicorn Gevent documentation states that each worker has an event loop where the worker can handle thousands of requests per second, and we were nowhere near that load limit.

We had a hunch that the Gevent event loop was being blocked, which would cause the issues we were seeing. Gevent monkey-patches all Python libraries, altering them to use asynchronous non-blocking IO rather than synchronous blocking IO. When you make an IO request in a synchronous Python program, the entire program waits (“blocks”) until a response is returned. But what happens when you want to make hundreds or thousands of blocking calls? Then you’d spend most of your time waiting for blocking IO to return.

Looking into alternatives

There are a few different ways to get around this in Python; the traditional approach to use is threading. Threading spawns many parallel OS threads that can wait on blocking IO independently. Although threading works great, managing threads can add a lot of complexity to your code.

Gevent is an alternative to threading where, instead of spawning threads, it runs a libev event loop and monkey patches Python to replace most blocking IO with non-blocking IO. This allows you to write Python in a synchronous style without the complexity of managing threads, while also reaping the benefits of non-blocking IO.

Gevent achieves this concurrency using greenlets, which are a “lightweight pseudo-thread” that work cooperatively via an event loop to yield control to one another while they are waiting for IO. Looking into Gunicorn’s GeventWorker, you can see where it runs Gevent’s monkey patch on startup to replace blocking IO with non-blocking IO.

Gevent and Gunicorn try their best to monkey patch blocking IO in the Python standard library, but they can’t control external C dependencies. This becomes a serious issue in web apps; if your event loop is blocked waiting for a C libraries’ IO, you can’t respond to any requests, even though you have plenty of system resources available.

Looking into the Gevent documentation, they even call this out explicitly:

“The greenlets all run in the same OS thread and are scheduled cooperatively. This means that until a particular greenlet gives up control, (by calling a blocking function that will switch to the Hub), other greenlets won’t get a chance to run. This is typically not an issue for an I/O bound app, but one should be aware of this when doing something CPU intensive, or when calling blocking I/O functions that bypass the libev event loop.”

Getting our thinking caps on

In the Flask app we were troubleshooting, we were using Confluent-kafka to write data to Kafka. Confluent-kafka uses the C library librdkafka under the hood. We suspected that librdkafka was not respecting the Gevent event loop and blocking the greenlets from yielding.

To test this we set up the Flask app in a local container and wrote a harness to send data to the Kafka endpoint concurrently with 100 threads. It turns out that after four concurrent requests (which was also the amount of Gevent workers), the container would stop responding to all requests. The app could only handle four concurrent requests at a time, which was the smoking gun revealing that the Gevent event loop was getting blocked.

Looking at the Kafka code in the app, we noticed that we were calling producer.flush() to wait for the message to be written to Kafka and to call the registered callback. Behind the scenes, producer.flush() calls producer.poll() in a loop until the producer has no more messages to process.

This should usually work fine; ideally we would wait on non-blocking IO and yield back to the event loop to process other requests. However, we suspected that this precise call was actually blocking execution, and deterring the greenlet from yielding.

To test this hypothesis, we changed the call to producer.poll(timeout=0), which would attempt to call any Python callbacks, but wouldn’t wait if there weren’t any available to process. With this change, the app could easily handle hundreds of concurrent requests. We tried varying the timeout on poll to be greater than 0, and re-ran the testing harness. Doing so transported us back to blocking IO land, where we could only request four concurrent requests. It seemed that producer.poll() with a non-zero timeout was causing blocking IO in librdkafka, which was blocking the Gevent event loop.

We deployed a fix which changed the code to use producer.poll(timeout=0) instead of producer.flush(). Response times on the routes that wrote to Kafka lowered from 200ms to 5ns. The Flask servers started behaving normally and were able to handle thousands of requests in parallel.

Conclusion

So, full disclosure for readers: We didn’t fix this issue per se, we merely stopped calling the method that was blocking the Gevent event loop. To put a permanent fix in place, we should find a way to write messages to Kafka that is either pure Python, or look for a solution that respects the libev event loop’s non-blocking IO in its C dependencies.

In short, beware of blocking IO in Gevent applications! It causes non-intuitive and difficult-to-diagnose performance issues. In this case it was clear that there was a serious issue, but if the Gevent event loop was blocked less drastically, it would be very difficult to even identify the issue, let alone what was causing it. When dealing with external C dependencies that handle IO, be thoughtful about whether they are compatible with Gevent.

Have you had similar issues with external C dependencies or Gevent? What creative way is your team leveraging Python in these types of scenarios? Let us know in the comments below.