Fundamentals 22 min read

Will Data Be Lost When a Process Crashes During File Write?

This article examines the conditions under which data may be lost when a Linux process crashes while writing a file, explaining page cache behavior, the roles of stdio versus system calls, dirty page handling, write‑back mechanisms, and strategies such as fflush, fsync, and direct I/O to ensure data integrity.

Deepin Linux
Deepin Linux
Deepin Linux
Will Data Be Lost When a Process Crashes During File Write?

During the execution of a computer system, unexpected situations such as a process crash can raise concerns about whether data that has been prepared for writing to a file will be lost. This issue is crucial for data integrity, file system understanding, I/O operations, and overall system stability.

1. Problem Introduction and Background

In the digital era, file creation and storage are essential for individuals and enterprises. When a process crashes while writing a file, data loss can lead to wasted time, effort, and even severe business or legal consequences, especially in finance, scientific research, and other critical domains.

2. Related Functions Analysis

2.1 Page Cache Details

Page Cache is a memory area managed by the Linux kernel, composed of pages (typically 4 KB). It can be inspected via # cat /proc/meminfo , which shows fields such as Buffers, Cached, and SwapCached. The formula Buffers + Cached + SwapCached = Active(file) + Inactive(file) + Shmem + SwapCached defines the relationship between these counters.

Page Cache resides in kernel space and caches file contents to accelerate disk access. When a read request arrives, the kernel first checks the cache; if the data is present, it is served from memory, otherwise it is fetched from disk and stored in the cache.

When a write request occurs, the kernel writes to the cache, marks the page as dirty , and adds it to the dirty list. Periodically, dirty pages are flushed back to disk, synchronizing the on‑disk data with the cached data.

Page Cache can be generated by two mechanisms:

Buffered I/O (standard I/O)

Memory‑mapped I/O (mmap)

In Buffered I/O, user‑space buffers are copied to kernel buffers (Page Cache). In Memory‑mapped I/O, the Page Cache pages are directly mapped into the user address space, allowing the application to read/write the cached pages without an extra copy.

2.2 Memory Management and Reclamation

Linux uses a page‑reclaim mechanism based on the LRU (Least Recently Used) algorithm, maintaining two doubly‑linked lists: active and inactive . Frequently accessed pages stay in the active list, while less used pages move to the inactive list and are the first candidates for reclamation when memory pressure arises.

Reclamation can be triggered directly by the kernel (direct reclaim) when memory is critically low, or by the background daemon kswapd when the low watermark is reached. Watermarks (min, low, high) are configurable per memory zone via /proc/sys/vm parameters.

Dirty page thresholds control when write‑back starts: dirty_background_ratio (default 10 %) or dirty_background_bytes trigger background write‑back, while dirty_ratio (default 30 %) or dirty_bytes can block the writing process until pages are flushed.

Write‑back is performed by kernel threads (historically bdflush , later pdflush and flusher threads). If the system loses power before dirty pages are persisted, data loss may occur.

2.3 Buffering Mechanisms and Disk Write Flow

File I/O buffering includes full buffering, line buffering, and unbuffered I/O. Data flow:

User‑space buffer receives data.

When the buffer is full or fflush / fclose is called, data is copied to the kernel buffer (Page Cache).

To guarantee persistence on the physical disk, an explicit fsync() (POSIX) or FlushFileBuffers() (Windows) must be invoked.

If a process crashes before fflush / fclose , data remains in user space and is lost. If those functions are called but the machine powers off before the kernel flushes the dirty pages, the data may still be lost.

3. Different Scenarios Analysis

3.1 Using stdio Library

Without fflush or fclose , data stays in the user buffer and is definitely lost on crash. With those calls, data reaches the kernel cache; if the machine does not lose power, the data usually survives, but a power outage can still cause loss.

3.2 Using System Calls (write)

When write() returns, data is already in the kernel cache. A process crash without a power failure typically does not lose the data, because the kernel retains it. However, a power loss can still result in loss of dirty pages that have not been flushed.

4. Summary

Whether data is lost when a process crashes during file write depends on several factors:

With stdio, omitting fflush / fclose guarantees loss; calling them protects data unless a power failure occurs before the kernel flushes.

With system calls, data is safer after write() returns, but power loss can still cause loss.

Direct I/O bypasses the page cache, reducing dirty pages and improving consistency for workloads such as databases, but it requires strict alignment and more complex error handling.

fflush vs. fsync

fflush flushes the user‑level stdio buffer to the kernel; fsync forces the kernel buffer to be written to the physical disk, also updating metadata.

fsync vs. fdatasync

fsync writes both file data and metadata to the storage device, while fdatasync only guarantees that file data is persisted, potentially offering better performance when metadata updates are unnecessary.

Memory ManagementLinuxFile I/Opage cacheData Integrityprocess crash
Deepin Linux
Written by

Deepin Linux

Research areas: Windows & Linux platforms, C/C++ backend development, embedded systems and Linux kernel, etc.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.