Operations 8 min read

Zero‑Downtime Deployments: Mastering Gunicorn & uWSGI Reloads with Load Balancers

This guide explains how to achieve rolling, zero‑downtime restarts for Python web services by leveraging Gunicorn and uWSGI reload mechanisms, load‑balancer tricks, and custom health‑check scripts that keep traffic flowing even when applications take up to a minute to start.

Python Programming Learning Circle
Python Programming Learning Circle
Python Programming Learning Circle
Zero‑Downtime Deployments: Mastering Gunicorn & uWSGI Reloads with Load Balancers

When deploying a new version, the simplest approach is to restart the application or all services with administrator privileges, but this often leads to a flood of HTTP 503 errors while the services are coming back up.

Both Gunicorn and uWSGI support reloading without closing the listening socket, so requests are only delayed briefly; this works well if the application starts quickly, but many real‑world apps need up to a minute to become ready, which is too long for socket‑waiting clients.

Gunicorn performs a reload by sending kill -HUP $PID , which shuts down all worker processes before starting new ones, causing a noticeable pause. uWSGI uses a chained reload that starts one worker at a time, but it lacks good Tornado support.

Using a load balancer – a common technique is to remove a single server from the load balancer pool, upgrade or restart it, then add it back. In our setup we use HAProxy to manage sockets, and we deploy to all nodes simultaneously rather than one‑by‑one.

While a node is removed from the pool, we can serve a temporary 404 page to satisfy health checks. This adds a small extra delay: each server experiences two failed health checks five seconds apart, which includes the time for the web process to recover.

Gunicorn reload ++ – Gunicorn automatically restarts failed web processes, potentially killing each worker and waiting until all children exit. This works, but if startup time varies significantly you either wait too long for the restart or risk a brief outage.

Because Gunicorn provides a Python hook, you can write a tiny script that notifies a restart manager when a worker is ready. Although Gunicorn does not ship with the exact hook we need, adding it is straightforward and only requires a small code change before the next release.

The advantage of this approach is that a single socket can be shared by multiple processes; a restart reduces capacity by only 1/N, allowing traffic to continue flowing without long client‑side waits.

The typical restart process looks like this:

<code>for child_pid of gunicorn-master:
    kill child_pid
    wait for app startup
</code>

My first implementation used a shell script and nc to listen for a UDP packet sent by the application when it finished starting. Although integrating this process manager into the shell environment was a bit more involved than expected, it worked reliably.

The restart script should be invoked with the Gunicorn master PID, e.g., masterrestart.sh $PID :

<code>echo 'Killing children of ' $1;
children=$(pgrep -P $1)
for child in $children
do
    echo 'Killing' $child
    kill $child
    response=$(timeout 60 nc -w 0 -ul 4012)
    if [ "$response" != '200 OK' ]; then
        echo 'BROKEN'
        exit 1;
    fi
done
</code>

We chain a post_worker_init script so the application notifies the restart script as soon as it is ready:

<code>import socket
import time

def post_worker_init(worker):
    _send_udp('200 OK\n')

def _send_udp(message):
    udp_ip = "127.0.0.1"
    udp_port = 4012
    sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
    sock.sendto(message, (udp_ip, udp_port))
</code>

For a WSGI application we can also expose a status endpoint to verify readiness:

<code>from werkzeug.wrappers import Request, Response

@Request.application
def application(request):
    resp = Response('Hello World!')
    if request.path == '/_status':
        resp.status = '200 OK'
    else:
        resp.status = '404 Not Found'
    return resp
</code>

Alternatively, the post_worker_init hook can perform an internal request to /_status and send the response status via UDP:

<code>def post_worker_init(worker):
    env = {
        'REQUEST_METHOD': 'GET',
        'PATH_INFO': '/_status',
    }
    def start_response(*args, **kwargs):
        _send_udp(args[0])
    worker.wsgi(env, start_response)
</code>

Be careful not to run too many health‑check requests; if post_worker_init raises an error, the worker exits and prevents the application from starting, which can also mask underlying database connection issues.

With this approach, even a one‑minute startup time can be handled via a rolling restart that never stops the application or drops existing connections.

- END -

Pythonload balancerzero-downtime deploymentuwsgiGunicornrolling restart
Python Programming Learning Circle
Written by

Python Programming Learning Circle

A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.