Operations 27 min read

Encountering Nginx 502 Errors? A Step‑by‑Step Guide to Fast Troubleshooting

Nginx 502 Bad Gateway is one of the most frequent operational issues; this article outlines a systematic, layered approach—from checking Nginx error logs and backend service status to network connectivity, resource limits, timeout settings, and permission problems—providing concrete commands, example scenarios, and preventive measures to quickly identify and resolve the root cause.

MaGe Linux Operations
MaGe Linux Operations
MaGe Linux Operations
Encountering Nginx 502 Errors? A Step‑by‑Step Guide to Fast Troubleshooting

Problem background

Nginx returns 502 Bad Gateway when it successfully receives a client request but fails to obtain a valid response from the upstream service. The error indicates a communication problem between Nginx and the backend, not an internal server error.

Common root‑cause categories

Backend service unavailable : crashed processes, OOM kills, ports not listening, firewall blocks, or services that start but do not respond.

Resource exhaustion : insufficient memory, CPU saturation, full disks, exhausted file descriptors, or system limits.

Misconfiguration : overly short proxy_connect_timeout, proxy_send_timeout, proxy_read_timeout, incorrect health‑check intervals, or wrong load‑balancing settings.

Permission / socket issues : mismatched run‑user between Nginx and the backend, wrong Unix socket permissions, or SELinux/AppArmor blocks.

Network problems : different network zones, VPC security‑group changes, Docker network mis‑configuration, or Kubernetes service discovery failures.

Troubleshooting path and core commands

Step 1 – Inspect Nginx error log

Default location:

/var/log/nginx/error.log
tail -n 100 /var/log/nginx/error.log
tail -f /var/log/nginx/error.log

Typical entries: connect() failed (111: Connection refused) or upstream prematurely closed connection.

Step 2 – Analyse access‑log status distribution

# Count 502 occurrences
grep " 502 " /var/log/nginx/access.log | wc -l

# Show recent 502 sources (IP, path, host)
grep " 502 " /var/log/nginx/access.log | tail -n 50 | awk '{print $1, $7, $12}'

# Top 20 URLs causing 502
grep " 502 " /var/log/nginx/access.log | awk -F'"' '{print $2}' | sort | uniq -c | sort -rn | head -20

Step 3 – Verify network connectivity

# Test backend port
curl -v http://127.0.0.1:8080/api 2>&1

# Measure response time
curl -w "
Time: %{time_total}s
" http://127.0.0.1:8080/api

# Check listening sockets
ss -tlnp | grep :8080
netstat -tlnp | grep :8080

Step 4 – Check backend service status

PHP‑FPM

systemctl status php-fpm
ps aux | grep php-fpm
curl http://127.0.0.1:9000/status
cat /var/log/php-fpm/www-error.log
grep -E "^(pm|pm\.|pm\.max)" /etc/php-fpm.d/www.conf

Java (Tomcat/Jetty)

ps aux | grep java
tail -n 100 /var/log/tomcat/catalina.out
curl -s http://127.0.0.1:8080/manager/status

Node.js / Python

ps aux | grep -E 'node|python'
ps -p $(pgrep -f "node server.js") -o pid,etime,cmd
tail -f /var/log/myapp/application.log

Step 5 – Examine server resource usage

# CPU
top -b -n 1 | head -20

# Memory
free -h

# Disk
df -h

# Process count
ps aux | wc -l

# Open file descriptors
lsof 2>/dev/null | wc -l
cat /proc/sys/fs/file-max
ulimit -n

Step 6 – Review Nginx timeout configuration

proxy_connect_timeout 60s;
proxy_send_timeout 60s;
proxy_read_timeout 60s;

Increase these values if backend processing exceeds the defaults.

Typical scenarios and fixes

Scenario 1 – All PHP‑FPM workers died

Symptoms: every PHP page returns 502; error log shows connect() failed or no live workers.

# Verify PHP‑FPM is running
systemctl status php-fpm
ps aux | grep php-fpm

# Inspect PHP‑FPM error logs
cat /var/log/php-fpm/error.log
cat /var/log/php-fpm/www-error.log

# Restart if missing
systemctl start php-fpm
systemctl status php-fpm -l
journalctl -u php-fpm -n 50

Common causes:

Memory leak – set pm.max_requests = 500 to recycle workers.

Wrong socket/port – ensure listen = /var/run/php-fpm/php-fpm.sock matches Nginx fastcgi_pass.

Socket permission – set listen.owner = nginx, listen.group = nginx, listen.mode = 0660 and restart.

Scenario 2 – Backend Java / Node.js / Python crash

# OOM detection
dmesg -T | grep -i "out of memory"
dmesg -T | grep -i "killed process"

# Service status
systemctl status myapp
journalctl -u myapp -n 100 | grep -E "Exit|error|fail"

If OOM is the cause, increase JVM heap or container limits:

export JAVA_OPTS="-Xms512m -Xmx2048m -XX:+HeapDumpOnOutOfMemoryError"

Database connection timeout can also trigger 502; test with:

mysql -h db-server -u appuser -p -e "SELECT 1"
grep -E "spring.datasource|spring.jpa|connection-pool" /app/config/application.yml

Scenario 3 – Memory exhaustion and OOM Killer

# Memory check
free -m
ps aux --sort=-%mem | head -15
dmesg -T | grep -i "out of memory"
dmesg -T | grep -i "killed process"

# Release pagecache (temporary)
sync && echo 3 > /proc/sys/vm/drop_caches

# Restart heavy services
systemctl restart myapp

Long‑term: add swap, tune oom_score_adj for critical processes.

Scenario 4 – Backend response timeout

# Measure latency
time curl -v http://backend:8080/api/heavy

# Inspect Nginx timeout settings
grep -E "timeout" /etc/nginx/conf.d/*.conf

# Check Java slow‑query logs
grep -E "slowquery|slow query" /var/log/myapp/application.log

# Database performance
mysql -h db -u app -p -e "SHOW FULL PROCESSLIST;"

Increase Nginx timeouts (example 120 s) and optimise backend processing.

Scenario 5 – Unix socket permission problem

# Socket permissions
ls -la /var/run/php-fpm/

# Verify run users
ps aux | grep "php-fpm: master" | awk '{print $1}'
ps aux | grep "nginx: master" | awk '{print $1}'

# Nginx error log for permission
tail -n 20 /var/log/nginx/error.log | grep -i permission

# Fix socket config
listen = /var/run/php-fpm/php-fpm.sock
listen.owner = nginx
listen.group = nginx
listen.mode = 0660
systemctl restart php-fpm

Scenario 6 – Docker / Kubernetes network issue

# Docker Compose status
docker-compose ps
docker-compose logs -f backend
docker-compose exec nginx sh -c "curl -v http://backend:8080/health"

docker network ls
docker network inspect <network_name>

# Kubernetes checks
kubectl get pods -n <namespace>
kubectl logs -n <namespace> <pod-name>
kubectl get endpoints -n <namespace>
kubectl exec -it -n <namespace> nginx-pod -- sh -c "curl -v http://backend-service:8080/health"
kubectl describe pod -n <namespace> <pod-name> | grep -E "Events|Conditions"

Fix container start‑order with health checks or adjust network drivers.

Scenario 7 – Connection‑limit exhaustion under high load

# Nginx connections
ss -s

# Backend connection count
ss -ant | grep :8080 | wc -l

# System file‑descriptor limit
ulimit -n

# Increase limits temporarily
ulimit -n 65535

# Persist limits ( /etc/security/limits.conf )
cat >> /etc/security/limits.conf <<'EOF'
* soft nofile 65535
* hard nofile 65535
nginx soft nofile 65535
nginx hard nofile 65535
EOF

# Restart Nginx
systemctl restart nginx

# Nginx worker settings
worker_processes auto;
worker_rlimit_nofile 65535;

events {
    worker_connections 65535;
    use epoll;
    multi_accept on;
}

# Backend thread pool (Spring Boot example)
server:
  tomcat:
    max-threads: 800
    max-connections: 16384
    accept-count: 200

Prevention measures

Health‑check configuration

upstream backend {
    server backend1:8080 max_fails=3 fail_timeout=30s;
    server backend2:8080 max_fails=3 fail_timeout=30s;
    keepalive 32;
}

Monitoring & alerting (Prometheus + Grafana)

# Nginx stub_status endpoint
server {
    listen 8888;
    server_name localhost;
    stub_status on;
    allow 127.0.0.1;
    deny all;
}

# Prometheus scrape config
scrape_configs:
  - job_name: 'nginx'
    static_configs:
      - targets: ['localhost:8888']

# Alert rule for high 502 rate
groups:
  - name: nginx_502_alert
    rules:
      - alert: High502Rate
        expr: rate(nginx_http_requests_total{status="502"}[5m]) > 0.1
        for: 2m
        labels:
          severity: critical
        annotations:
          summary: "Nginx 502 error rate too high"
          description: "502 error rate exceeds 10%, current value: {{ $value }}"

Enhanced logging

log_format main '$remote_addr - $remote_user [$time_local] "$request" $status $body_bytes_sent "$http_referer" "$http_user_agent" upstream_addr: $upstream_addr upstream_status: $upstream_status request_time: $request_time upstream_response_time: $upstream_response_time';
access_log /var/log/nginx/access.log main;

Verification after fix

# Functional curl checks
for path in "/" "/api/users" "/api/products" "/health"; do
  status=$(curl -s -o /dev/null -w "%{http_code}" http://127.0.0.1$path)
  if [[ "$status" == "200" || "$status" == "301" || "$status" == "302" ]]; then
    echo "[PASS] $path -> $status"
  else
    echo "[FAIL] $path -> $status"
  fi
done

# Simulate concurrency
for i in {1..100}; do curl -s -o /dev/null -w "%{http_code}
" http://127.0.0.1/api/users & done
wait

# Ensure no new 502 in access log
grep " 502 " /var/log/nginx/access.log | wc -l

Summary

The workflow starts with Nginx error logs, then verifies backend liveness and resources, followed by network checks and timeout configuration review. Root causes fall into five categories, each with specific commands and remediation steps. Implementing proactive health checks, comprehensive monitoring, detailed logging, and regular disaster‑recovery drills prevents recurrence and shortens MTTR for future 502 incidents.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

backendMonitoringDockerLinuxTroubleshootingnginx502
MaGe Linux Operations
Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.