Operations 27 min read

Methodology for Internet Architecture Technical Review and Capacity/Performance Evaluation

This article presents a comprehensive methodology for reviewing internet‑scale system architectures, focusing on non‑functional quality attributes such as performance, availability, scalability, security, and maintainability, and provides detailed guidelines, metrics tables, and a classic case study for capacity and performance planning.

Architecture Digest
Architecture Digest
Architecture Digest
Methodology for Internet Architecture Technical Review and Capacity/Performance Evaluation

Background

In the IT industry, fundamental technical skills are akin to the "inner kung fu" of Shaolin, while frameworks represent the "sword techniques". Enterprise‑level development emphasizes complex business logic and high reusability, whereas internet development focuses on decomposing responsibilities and optimizing non‑functional qualities like high availability, performance, scalability, security, stability, and maintainability.

This article offers a basic methodology for internet‑oriented technical reviews, helping developers and architects evaluate how well a system meets functional and non‑functional requirements.

Goals

2.1 Overview of Non‑Functional Quality Requirements

Reference technical review indicators to ensure system architecture satisfies user and system non‑functional demands.

Core Non‑Functional Qualities:

Core Quality

Description

High Performance

High efficiency and cost‑effectiveness

Availability

Continuous availability, reduced downtime, error recovery, reliability

Scalability

Vertical and horizontal scaling

Extensibility

Pluggable, component reuse

Security

Data security, encryption, circuit‑breaker, attack resistance

Other Non‑Functional Qualities:

Other Quality

Description

Observability

Fast detection, location, and resolution

Testability

Canary releases, previews, mocks, decomposition

Robustness

Fault tolerance, recoverability

Maintainability

Easy maintenance, monitoring, operation, expansion

Reusability

Portability, decoupling

Usability

Operability

2.2 Specific Indicators for Non‑Functional Requirements

The indicators are divided into four parts: application servers, databases, caches, and message queues.

2.2.1 Application Server

The application server is the entry point; its traffic determines the load on databases, caches, and queues. Key metrics include peak requests per second and response time.

Consider the following metrics:

Metric Category

Deployment Structure

Capacity & Performance

Other

1

Load‑balancing strategy

Daily request volume

Whether requests contain large objects

2

High‑availability strategy

Peak per‑interface traffic

GC collector selection and configuration

3

I/O model (NIO/BIO)

Average response time

4

Thread‑pool model

Maximum response time

5

Thread‑pool size

Concurrent users

6

Mixed business deployment

Request size

7

Network card I/O traffic

8

Disk I/O load

9

Memory usage

10

CPU usage

2.2.2 Database

Based on application traffic, calculate required QPS, TPS, and daily data volume to size the database.

Consider the following metrics:

Metric Category

Deployment Structure

Capacity & Performance

Other

1

Replication model

Current data volume

Whether queries use indexes

2

Failover strategy

Daily data growth (estimated)

Presence of large‑data queries

3

Disaster‑recovery strategy

Read peak per second

Multi‑table joins and index usage

4

Archiving strategy

Write peak per second

Pessimistic vs. optimistic locking, row‑level locks

5

Read‑write separation

Transaction volume

Transaction consistency level

6

Sharding strategy

JDBC datasource type, connection count

7

Cache static/semistatic data

Enable JDBC diagnostic logging

8

Cache penetration protection

Stored procedures usage

9

Cache invalidation & warm‑up

Sharding strategy for partitioned tables

10

Cache invalidation & warm‑up

Implementation method for horizontal sharding (client, proxy, NoSQL)

2.2.3 Cache

Evaluate cache size and access peaks based on hot data proportion.

Consider the following metrics:

Metric Category

Deployment Structure

Capacity & Performance

Other

1

Replication model

Cache size

Cold‑hot data ratio

2

Failover

Number of cached items

Possibility of cache penetration

3

Persistence strategy

Expiration time

Presence of large objects

4

Eviction strategy

Data structure

Use of cache for distributed locks

5

Thread model

Read peak per second

Support for cache scripting

6

Warm‑up method

Write peak per second

Avoidance of race conditions

7

Sharding hash strategy

Cache sharding method (client, proxy, cluster)

2.2.4 Message Queue

Calculate required message‑queue capacity and throughput based on application traffic.

Consider the following metrics:

Metric Category

Deployment Structure

Capacity & Performance

Other

1

Replication model

Daily data increment

Consumer thread‑pool model

2

Failover

Message expiration

Sharding strategy

3

Persistence strategy

Read peak per second

Reliable delivery

4

Write peak per second

5

Message size

6

Average latency

7

Maximum latency

3 Technical Review Outline

The outline helps architects organize thoughts and produce an implementable design.

3.1 Current Situation

Business Background

Project name

Business description

Technical Background

Architecture description

Current system capacity (average calls)

Current peak calls

3.2 Requirements

Business Requirements

Items to be refactored

New functional requirements

Performance Requirements

Estimated average system load

Estimated peak load

Other non‑functional qualities (e.g., security, scalability)

3.3 Solution Description

Solution 1

The solution must consider all metrics from the technical review checklist to satisfy non‑functional quality demands.

Overview – one‑sentence highlight (e.g., dual‑write, master‑slave, sharding, scaling, archiving)

Detailed description – include diagrams if needed (middleware architecture, logical architecture, data architecture, fault handling, disaster recovery, gray‑release)

Performance evaluation – baseline data and resource estimation

Pros and cons – quantified advantages and disadvantages

Solution 2

Similar structure, tailored to alternative trade‑offs.

3.4 Solution Comparison

Compare alternatives and justify the chosen one.

3.5 Risk Assessment

Identify risks and propose mitigation or rollback strategies.

3.6 Workload Estimation

Detail tasks for development, testing, and deployment; present a simple task‑plan table.

4 Classic Capacity & Performance Case Study

4.1 Background

The logistics system has two priority quality demands: maintaining members' frequent addresses and asynchronously generating logistics orders while polling third‑party status.

4.2 Target Data Volume

Use a leading e‑commerce platform as reference: 200 million members (growth 5 万/day) and 14 million orders/day during promotion.

4.3 Evaluation Standards

General Standards

Capacity calculated with 5× redundancy.

Address data retained for 30 years; logistics orders for 3 years.

Third‑party query interface: 5 000 QPS.

MySQL

Read: 1 000 QPS per port.

Write: 700 TPS per port.

Single table capacity: 50 million rows.

Redis

Read: 40 000 QPS per port.

Write: 40 000 TPS per port.

Memory per port: 32 GB.

Kafka

Read: 30 000 QPS per node.

Write: 5 000 TPS per node.

Application Server

Peak request rate: 5 000 QPS.

4.4 Solution

Solution 1 – Maximum Performance

Designed for peak traffic of a top‑tier e‑commerce site.

Requirement 1 – Member Frequent Addresses

Read QPS calculated as (14 M × 0.5) / (2 h) ≈ 1 000 /s; with 5× redundancy → 5 000 QPS, requiring 5 read ports. Write TPS calculated as (14 M × 0.2 + 5 万) / (2 h) ≈ 400 /s; with 5× redundancy → 2 000 TPS, requiring 3 write ports. Data volume: (200 M + 5 万 × 365 × 30) × 5 ≈ 35 billion rows; with 5× redundancy → 175 billion rows, fitting into 350 tables (rounded to 512). Design result: 4 ports × 32 databases × 4 tables per DB, master‑8‑slave configuration. Requirement 2 – Logistics Orders & Records Read QPS ≈ 250 /s (address lookup) → 2 500 QPS with redundancy → 3 read ports. Write TPS ≈ 1 000 /s (order creation) + 1 200 /s (record insertion) → 2 200 /s; with 5× redundancy → 11 000 TPS, requiring 15 write ports. Data volume: 46 billion rows (orders + records) → 230 billion with redundancy, needing 4 096 tables. Design result: 16 ports × 32 databases × 8 tables per DB, master‑16‑slave. Message queue: Kafka 1 node with a processing machine suffices; can scale horizontally if needed. Application servers: 2 – 3 nodes to handle combined read/write peaks. Solution 2 – Minimal Resources Assumes current traffic is low; a single database instance with one port can handle the load, but the design retains sharding and scaling hooks for future growth. Design results: Member addresses: 1 port × 32 DB × 16 tables, master‑1‑slave. Logistics orders/records: 1 port × 128 DB × 32 tables, master‑1‑slave. 4.5 Summary The minimal‑resource solution is preferred because current traffic is modest, it saves cost, yet keeps sharding hooks for future scaling and allows optional activation of cache and message‑queue components. 5 Performance Evaluation Reference Standards Values are based on typical x86 PCs; adjust according to actual hardware. Capacity calculated with 5× redundancy. Sharding typically stores 30 years of data. Third‑party query interface: 5 000 QPS. Average DB row size ≈ 1 KB. 6 Conclusion The article outlines a methodology for internet‑scale non‑functional quality assessment, provides a detailed review checklist, and demonstrates a classic capacity‑performance case study to help architects design, evaluate, and scale high‑concurrency systems. All data are based on the author’s experience on a specific platform and serve as a methodological reference rather than a one‑size‑fits‑all solution.

BackendperformancearchitectureoperationsCapacity Planningnon-functional requirements
Architecture Digest
Written by

Architecture Digest

Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.