Operations 12 min read

How to Build a Resilient High‑Traffic Website: Domains, CDN, Monitoring, and Security

This guide outlines practical steps for creating a highly available, secure, and scalable website—including domain strategy, CDN deployment, image caching, data‑center selection, monitoring, attack mitigation, redundancy, server configuration, database replication, testing environments, disaster‑recovery planning, and high‑concurrency testing.

Efficient Ops

May 28, 2024

How to Build a Resilient High‑Traffic Website: Domains, CDN, Monitoring, and Security

1. Domain Strategy

Purchase multiple domains (primary and promotional) from GoDaddy for stability, enable domain protection, and delegate DNS management to Cloudflare, DNSPod, or ZNDNS; you can also self‑host a DNS server to accelerate DNS changes.

2. CDN Deployment

Buy a CDN service (e.g., Cloudflare) and point your domain to it; the CDN caches content, forwards traffic, and can absorb attacks of at least 200 GB, providing global caching and DDoS mitigation.

3. Image Server

Deploy image caching servers in mainland China to improve load speed; Nginx itself can act as an image cache.

4. Server Location

Select data‑centers close to your users; for high‑bandwidth needs consider US locations. Test ping values nationwide (e.g., using chinaz) and choose providers offering high defense, reliability, and responsive support.

5. Homepage Hosting

Use a cloud host for an advertising‑style homepage that links to the game front‑end; employ CDN or non‑备案 data‑centers for the shield server to avoid mandatory Chinese ICP filing.

6. Monitoring System

Implement real‑time monitoring of server health and attack indicators, collect logs to a syslog server, visualize with Cacti, and set up alerts for traffic spikes or abnormal log patterns.

7. Attack Defense

Small‑scale attacks can be mitigated with Nginx and iptables; large‑scale DDoS attacks require high‑defense data‑centers (minimum 200 GB capacity) and IP blocking at the carrier level.

8. Redundancy

Design for at least double the expected concurrent load (e.g., 2 000 users for a 1 000‑user peak) to handle traffic spikes.

9. Server Hardware

Equip servers with three NICs (user traffic, internal communication, SSH management), multiple IPs per NIC, RAID‑1 disks, dual CPUs, dual power supplies, and avoid single points of failure.

10. Database Architecture

Implement master‑slave replication with off‑site backups; configure Nginx upstream clustering; separate front‑end and back‑end services onto different machines.

11. Test Environments

Maintain three test setups: developer workstation, LAN test environment, and internet test environment, plus production; use SVN or Git for code management and ensure LAN stability with dedicated network gear.

12. Core and Shield Servers

Ensure ping connectivity between the shield server and the core server to verify network reachability.

13. Operations Staff

At least two ops engineers (or one plus a manager) should maintain documented procedures, be on 24‑hour standby, and coordinate without shift work.

14. Large‑Scale Architecture Team

For extensive infrastructures, establish a dedicated core data‑center staffed with engineers covering databases, networking, security, storage, and coordination.

15. Linux System Optimization

Optimize Nginx based on CPU usage and set per‑process CPU/memory limits.

16. Security Hygiene

Rotate all passwords (especially domain and email accounts) every three months.

17. LAN Stability

Provide at least 10 Mbps bandwidth, dual network cables, and a mobile Wi‑Fi hotspot for staff devices.

18. Operations Tools

Standardize tooling: SQLyog for databases, CRT for SSH, KeePass for password vaults, WinSCP for file transfers; allocate time daily for learning new technologies and English documentation.

19. Disaster‑Recovery Plan

Develop and rehearse a standby plan for critical failures, perform regular backup restoration drills, and ensure backups are usable.

20. Server Security Configuration

Apply comprehensive security settings covering user accounts, applications, system, and file permissions.

21. High‑Concurrency Testing

Simulate 2 000 concurrent users to evaluate load, choose optimal IP addresses and data‑center locations, and invest where necessary.

22. Knowledge Sharing Share all operational information (passwords, configuration steps) between at least two team members to build a cohesive, skilled team. 23. Logging Practices Record every operation with timestamps, perform risk assessments before changes, and document mitigation steps. 24. Ops Philosophy Focus on availability, monitoring & alerts, capacity planning, process standardization, knowledge management, and automation. 25. Post‑Launch Ops Tasks Include version upgrades, service monitoring, usage statistics, routine inspections, incident response, scaling, security hardening, and ops‑dev work. 26. Connection Count Example <code>netstat -ant | grep $ip:80 | wc -l netstat -ant | grep $ip:80 | grep EST | wc -l</code>

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

monitoring high availability website infrastructure

Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.