Fundamentals 16 min read

Deep Dive into IO Optimization and System Performance Tuning

This article explains how to evaluate storage performance by analyzing host ports, storage systems, and backend disks, discusses IO aggregation penalties, business workload models, RAID and cache impacts, and provides practical guidance for accurate performance assessment and tuning.

Architects' Tech Alliance

Nov 25, 2017

Deep Dive into IO Optimization and System Performance Tuning

In practical projects, it is necessary to analyze host ports, storage systems, and backend disks end‑to‑end according to real business requirements in order to provide configurations that truly meet those needs.

The author has summarized a performance‑evaluation methodology into an e‑book titled IO Knowledge and System Performance Deep Tuning , which is available for a small fee via the original link.

Typical factors that affect accurate performance assessment are listed and discussed.

1. IO Aggregation Write Penalty

When IO is aggregated to full‑stripe size, no read‑ahead is needed and RAID write penalty is avoided; otherwise, read‑ahead and parity calculation increase the number of IO operations. For example, a RAID5‑5 small‑write requires two reads and one write for data and one write for parity, effectively expanding one IO into four.

When writing a full‑stripe, four data blocks are written together with one extra parity write, expanding four IOs into five, which is much more efficient.

Storage’s ability to merge IO depends on host‑side IO ordering and the merging capabilities of the storage path (cache, block devices, disks). Random small IOs typical of database workloads often cannot be fully merged.

2. Business Model Scenarios

Common workload models include OLTP, OLAP, VDI, and SPC‑1. SPC‑1 is a standard random‑IOPS model used when the actual business type is unknown. The following table summarizes the basic IO characteristics of each model.

OLTP : Small read/write per transaction, many concurrent users, latency 10‑20 ms, 8 KB random IO, read/write ratio ~3:2.

OLAP : Mostly read‑heavy, complex queries lasting hours, large sequential IO (≈512 KB), >90 % reads.

VDI : Startup and login storms cause bursty read‑ or write‑intensive loads, latency around 10 ms, mixed small IO sizes.

SPC‑1 : Random IO workload designed to measure IOPS, read/write ratio ~4:6, IO size 4 KB, with noticeable hot‑spot regions.

3. Impact of Parity on Performance

For sequential writes, RAID5‑5 (4D+1P) adds one parity IO for every four data IOs, consuming bandwidth without benefiting the host workload. For sequential reads on full‑stripe data, only data disks are accessed, not parity disks.

Example calculation shows a system with 96 × 600 GB 15K SAS disks in RAID6‑6 (4D+2P) can deliver a maximum effective write bandwidth of 1920 MB/s, limited by parity overhead.

4. Read/Write Ratio Influence

The proportion of read to write IO directly affects cache strategy, RAID level choice, and LUN configuration; write‑heavy workloads consume more storage resources.

Write paths are longer due to cache mirroring and may involve additional reliability mechanisms (e.g., write‑hole), while reads benefit from cache hits and generally have lower latency.

5. RAID Level Performance Impact

Different RAID algorithms impose varying write penalties; for the same number of disks, RAID10, RAID5, and RAID6 deliver different effective performance depending on the workload’s read/write mix.

RAID6 offers the best reliability, followed by RAID10 and RAID5, but capacity efficiency varies accordingly.

6. Sequential vs. Random Characteristics

Sequential IO outperforms random IO on both mechanical disks and modern SSDs, especially for small‑IO IOPS. Caches can prefetch sequential streams, while random small IO suffers low cache hit rates.

7. IO Size Influence

Small IO is measured by IOPS, large IO by bandwidth. SPC‑1 evaluates random small‑IO IOPS, while SPC‑2 focuses on bandwidth under heavy sequential loads. Larger random IO reduces IOPS, especially on HDDs.

8. Cache Impact

Cache accelerates writes through write‑back (asynchronous batch writes), write‑hit merging, and IO aggregation. For reads, cache hit reduces latency dramatically; full‑hit scenarios achieve the maximum possible IOPS.

For more detailed analysis, refer to the original article or obtain the compiled e‑book "IO Knowledge and System Performance Deep Tuning".

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

IO optimization RAID IOPS storage performance workload modeling

Written by

Architects' Tech Alliance

Sharing project experiences, insights into cutting-edge architectures, focusing on cloud computing, microservices, big data, hyper-convergence, storage, data protection, artificial intelligence, industry practices and solutions.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.