Operations 20 min read

Efficient Nginx Log Analysis Using GoAccess and Practical Case Studies

This article explains why Nginx logs are critical, compares various log‑analysis tools, provides detailed installation and configuration steps for GoAccess, discusses selection criteria, and shares real‑world case studies that demonstrate how to extract valuable system and business insights from massive access logs.

LOFTER Tech Team
LOFTER Tech Team
LOFTER Tech Team
Efficient Nginx Log Analysis Using GoAccess and Practical Case Studies

Abstract

Nginx logs are essential but often overlooked; handling massive logs can be daunting. This article demonstrates how to efficiently analyze Nginx access logs using the GoAccess tool, combined with practical testing scenarios and strategies.

Background

As product daily active users increase, Nginx log volume grows. Access logs record every client request and contain valuable business, semantic, and behavioral information useful for spider analysis, traffic evaluation, bandwidth distribution, request source characteristics, and more.

Industry Common Methods

Embedding JavaScript or SDKs for client‑side tracking.

Using stream processing or offline analysis of Nginx access logs.

Each method has pros and cons: client‑side tracking is simple but misses AJAX and spider data; stream/offline processing offers flexibility but requires higher infrastructure cost.

Tool Comparison

1. AWK

AWK processes lines without memory overflow, suitable for large files. Common commands include counting total requests, unique IPs, top URLs, etc.

wc -l access.log | awk '{print $1}'
awk '{print $1}' access.log | sort | uniq | wc -l
awk '{print $6}' access.log | sort | uniq -c | sort -rn | head -5

Advantages: powerful text processing, flow control, math operations. Disadvantages: no visual output, complex syntax for advanced analysis.

2. Request‑log‑analyzer

A Ruby gem that can parse Rails, Nginx, Apache, MySQL, PostgreSQL logs, providing page visit counts and source analysis. Currently unmaintained.

3. Nginx_log_analysis

Based on Ngx_Lua, sends logs via UDP to InfluxDB, supporting clustering, regex URI analysis, upstream time merging, PV statistics, and various reports.

4. Rhit

Reads (including gzipped) Nginx logs, displays results in console tables, processes millions of lines per second, but lacks visual interaction.

5. GoAccess

Open‑source, terminal‑based log analyzer that provides real‑time HTML, JSON, CSV reports. Supports custom log formats, GeoIP, B+Tree storage, and extensive statistics.

Tool Selection Considerations

High‑frequency basic metrics (HTTP code, response time) are often covered by existing monitoring platforms.

Business‑level metrics (traffic, spider analysis) can be sampled weekly or per release.

Goal is to reduce massive logs to manageable subsets for focused analysis.

Prefer tools that can pre‑aggregate data and produce visual reports.

GoAccess Detailed Introduction

GoAccess is a fast terminal log analyzer that can generate real‑time HTML dashboards, supporting Apache/Nginx logs and many other formats.

Installation / Usage

$ sudo apt install libncursesw5-dev libgeoip-dev libmaxminddb-dev
$ wget https://tar.goaccess.io/goaccess-1.5.5.tar.gz
$ tar -xzvf goaccess-1.5.5.tar.gz
$ cd goaccess-1.5.5/
$ ./configure --enable-utf8 --enable-geoip=mmdb
$ make
# make install
$ goaccess --version
GoAccess - 1.5.5

Configure goaccess.conf to match Nginx log_format, e.g.:

log-format %h %^[%d:%t %^] "%r" %s %b "%R" "%^" "%v" %^ %T "%^"

Typical analysis command:

$ goaccess -f /path/to/access.log -p /etc/goaccess.conf -o /path/to/report.html --real-time-html --daemonize

Run as a daemon listening on port 7890, or schedule via cron for periodic HTML reports.

Result Report Data Analysis – Metric Interpretation

1. Site‑wide Requests

Shows hits, visitors, total traffic, and response time metrics.

2. URL‑level Statistics

Provides hits, visitors, traffic, and latency per URL.

3. Static Resource Statistics

Analyzes requests for static files.

4. Domain‑level Statistics

Aggregates data per domain.

5. Time‑dimension Statistics

Shows request distribution over time.

6. 404 Anomalies

Identifies missing resource requests.

7. Request IP Sources

Highlights high‑frequency IPs for further investigation.

8. Browser Sources

Breaks down traffic by user‑agent.

9‑14. Additional dimensions (pages, sites, response codes, geographic regions, file types, TLS versions)

Provide deeper insight into traffic patterns.

Daily Practice Case Studies

Case 1: Evaluating Client Request Reasonableness

Discovered iOS client sending 2.7× more requests than Android for certain APIs, leading to resource waste; after fixing the client, the issue was mitigated.

Case 2: Spider Data Analysis

Detected abnormal 503 spikes caused by foreign IP ranges belonging to a crawler; identified affected endpoints and mitigated impact.

Case 3: HTTP Code Analysis

Found excessive 499 responses from iOS due to aggressive request triggering; after adding debounce logic, the 499 rate dropped significantly.

Case 4: Network Protocol Optimization Performance Measurement

Measured the effect of upgrading to HTTP/2 and Brotli across Android, iOS, and Web by comparing unique visitors and requested files before and after the change.

Conclusion

As quoted from the author of “Big Data”, whoever can better capture, understand, and analyze data will stand out in the next wave of competition.

monitoringoperationsnginxLog Analysisgoaccess
LOFTER Tech Team
Written by

LOFTER Tech Team

Technical sharing and discussion from NetEase LOFTER Tech Team

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.