Cloud Computing 21 min read

How One Engineer Runs a Full SaaS on Kubernetes with Minimal Effort

This article details how a solo engineer built and operated a SaaS platform on AWS using Kubernetes, covering infrastructure overview, automatic DNS, TLS, load balancing, CI/CD rollouts, autoscaling, caching, secret management, monitoring, logging, error tracking, and cost‑effective operations.

Efficient Ops
Efficient Ops
Efficient Ops
How One Engineer Runs a Full SaaS on Kubernetes with Minimal Effort

Kubernetes is an open‑source system for deploying and managing containerized applications at scale, running on Amazon EC2 clusters and handling deployment, maintenance, and scaling.

The story shares how a Costa‑Rican engineer used Kubernetes in a startup to handle load balancing, cron‑job monitoring, alerts, and keep a one‑person company running smoothly.

AWS simplifies running Kubernetes with the managed Amazon Elastic Kubernetes Service (EKS), providing scalable, highly‑available virtual machines and community‑supported integrations.

Overall Architecture Overview

The infrastructure can serve multiple projects; the author uses Panelbear as a concrete example. The SaaS processes massive global requests, stores data efficiently for real‑time queries, and is still early in its business lifecycle.

After several iterations, the stack consists of a Django monolith, Postgres for the app database, ClickHouse for analytics, Redis for caching, Celery for background tasks, all running on a managed EKS cluster.

Automatic DNS, SSL, Load Balancing

Traffic enters the private cluster via an

ingress-nginx

controller, which routes requests to services and applies rate‑limiting and other shaping rules. The example uses a Django app served by Uvicorn.

<code>apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
  namespace: example
  name: example-api
  annotations:
    kubernetes.io/ingress.class: "nginx"
    nginx.ingress.kubernetes.io/limit-rpm: "5000"
    cert-manager.io/cluster-issuer: "letsencrypt-prod"
    external-dns.alpha.kubernetes.io/cloudflare-proxied: "true"
spec:
  tls:
  - hosts:
    - api.example.com
    secretName: example-api-tls
  rules:
  - host: api.example.com
    http:
      paths:
      - path: "/"
        backend:
          serviceName: example-api
          servicePort: http</code>

Automatic Rollout and Rollback

When a new Docker image is pushed, the

flux

component syncs the cluster to the latest image and triggers an incremental rollout.

<code>panelbear/panelbear-webserver:6a54bb3</code>

Horizontal Autoscaling

The app scales based on CPU/memory usage; Kubernetes packs workloads onto nodes and adds or removes nodes as needed, scaling the Panelbear API pods from 2 up to 8 replicas.

CDN Static Asset Caching

Ingress rules include

cloudflare-proxied: "true"

to route traffic through Cloudflare. Application responses set standard HTTP cache headers, e.g.:

<code># Cache this response for 5 minutes
response["Cache-Control"] = "public, max-age=300"</code>

Static files are served directly from the container using Whitenoise, avoiding separate uploads to Nginx/CloudFront/S3.

Application Data Caching

Heavy‑compute results, Django models, and rate‑limit counters are cached for 15 minutes using a decorator:

<code>@cache(ttl=60 * 15)
def has_enough_capacity(site: Site) -> bool:
    """Returns True if a Site has enough capacity to accept incoming events, or False if it already went over the plan limits, and the grace period is over."""
</code>

Per‑Endpoint Rate Limiting

Django Ratelimit with Redis backend enforces limits per endpoint; exceeding the limit returns HTTP 429.

Application Management

Django’s built‑in admin panel assists with customer support. Access is restricted to staff and protected with 2FA. Security emails are sent on new logins.

Scheduled Jobs

Cron‑style jobs run via Celery workers and Celery beat, using Redis as the task queue. Monitoring uses Healthchecks.io, Cronitor, or CronHub, with alerts sent to Slack.

<code>def some_hourly_job():
    # Task logic
    ...
    # Ping monitoring service once task completes
    TaskMonitor(
        name="send_quota_depleted_email",
        expected_schedule=timedelta(hours=1),
        grace_period=timedelta(hours=2),
    ).ping()</code>

App Configuration

All settings are driven by environment variables, e.g.:

<code>INVITE_ONLY = env.str("INVITE_ONLY", default=False)</code>

ConfigMaps in Kubernetes inject these variables into containers.

<code>apiVersion: v1
kind: ConfigMap
metadata:
  namespace: panelbear
  name: panelbear-webserver-config
data:
  INVITE_ONLY: "True"
  DEFAULT_FROM_EMAIL: "The Panelbear Team <[email protected]>"
  SESSION_COOKIE_SECURE: "True"
  SECURE_HSTS_PRELOAD: "True"
  SECURE_SSL_REDIRECT: "True"</code>

Encryption

Secrets are sealed with

kubeseal

and decrypted only inside the cluster.

<code>DATABASE_CONN_URL='postgres://user:pass@my-rds-db:5432/db'
SESSION_COOKIE_SECRET='this-is-supposed-to-be-very-secret'</code>

DNS‑Based Service Discovery

Kubernetes automatically creates DNS records for services, enabling containers to communicate via URLs like

redis://redis.weekend-project.svc.cluster:6379

.

Version‑Controlled Infrastructure

All infrastructure lives in a monorepo with Docker, Terraform, and Kubernetes manifests, enabling one‑command creation or destruction of the entire stack.

<code># Cloud resource Terraform example
resource "aws_s3_bucket" "panelbear_app" {
  bucket = "panelbear-app"
  acl    = "private"
  tags = {
    Name        = "panelbear-app"
    Environment = "production"
  }
  lifecycle_rule {
    id      = "backups"
    enabled = true
    prefix  = "backups/"
    expiration { days = 30 }
  }
  server_side_encryption_configuration {
    rule {
      apply_server_side_encryption_by_default { sse_algorithm = "AES256" }
    }
  }
}</code>

Logging

Logs are streamed to stdout and collected by Kubernetes; tools like

stern

help tail logs across pods.

Monitoring and Alerting

Initially using self‑hosted Prometheus/Grafana, the author later migrated to New Relic, forwarding metrics via the

django‑prometheus

library.

<code># Example metric registration
from django_prometheus import metrics
my_counter = metrics.Counter('my_counter', 'Description')</n</code>

Error Tracking

Sentry aggregates exceptions from the Django app, providing context for each error. Alerts are routed to a Slack #alerts channel.

Sentry error aggregation
Sentry error aggregation
monitoringCI/CDKubernetesautoscalingsecurityAWSInfrastructure as Code
Efficient Ops
Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.