How to Resolve Longhorn PV Mount Failures with fsck and e2fsck Upgrade
After a Longhorn component crash caused PersistentVolume mount errors in Kubernetes, this guide walks through diagnosing the issue, upgrading e2fsck to handle newer filesystem features, running fsck to repair the volume, and finally recreating the pod to restore normal operation.
When a Longhorn component unexpectedly restarts, the PersistentVolumes (PVs) created by Longhorn may fail to mount, producing errors like "fsck found errors on device" and "UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY".
First, inspect the pod events to see the mount failure details:
<code>Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 99s default-scheduler Successfully assigned devops/nexus3-84c8b98cb-rshlv to node-02
Warning FailedMount 78s kubelet MountVolume.SetUp failed for volume "pvc-9784831a-3130-4377-9d44-7e7129473b90": rpc error: code = Internal desc = 'fsck' found errors on device /dev/longhorn/pvc-9784831a-3130-4377-9d44-7e7129473b90 but could not correct them: fsck from util-linux 2.31.1
/dev/longhorn/pvc-9784831a-3130-4377-9d44-7e7129473b90 contains a file system with errors, check forced.
/dev/longhorn/pvc-9784831a-3130-4377-9d44-7e7129473b90: Inodes that were part of a corrupted orphan linked list found.
/dev/longhorn/pvc-9784831a-3130-4377-9d44-7e7129473b90: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY.</code>The error suggests running
fsckon the device, but the bundled
e2fsckversion is too old to understand the filesystem's metadata checksum feature.
<code># fsck.ext4 -cvf /dev/longhorn/pvc-9784831a-3130-4377-9d44-7e7129473b90
e2fsck 1.42.9 (28-Dec-2013)
/dev/longhorn/pvc-9784831a-3130-4377-9d44-7e7129473b90 has unsupported feature(s): metadata_csum
e2fsck: Get a newer version of e2fsck!</code>Upgrade
e2fsckby building a newer e2fsprogs package:
<code># wget https://distfiles.macports.org/e2fsprogs/e2fsprogs-1.45.6.tar.gz
# tar -zxvf e2fsprogs-1.45.6.tar.gz
# cd e2fsprogs-1.45.6/
# ./configure
# make
# cp e2fsck /sbin/ # replace the system e2fsck
</code>Run
fsckagain with the updated tool:
<code># fsck.ext4 -cvf /dev/longhorn/pvc-9784831a-3130-4377-9d44-7e7129473b90
e2fsck 1.45.6 (20-Mar-2020)
Checking for bad blocks (read‑only test): done
/dev/longhorn/pvc-9784831a-3130-4377-9d44-7e7129473b90: Updating bad block inode.
...
***** FILE SYSTEM MODIFIED *****
2372 inodes used (0.72%, out of 327680)
81056 blocks used (6.18%, out of 1310720)
...
</code>After the repair, verify the pod status again. The previous pod still shows mount failures, but deleting and recreating the pod allows the PV to attach correctly:
<code># kubectl -n devops delete pod nexus3-5c9c5545d9-nmfjg
pod "nexus3-5c9c5545d9-nmfjg" deleted
# kubectl -n devops describe pod nexus3-5c9c5545d9-bm9dk
... (output shows SuccessfulAttachVolume and no mount errors) ...
</code>In summary, the steps to resolve the Longhorn PV mount issue are:
Identify the mount error via
kubectl describe podand pod events.
Run
fsck.ext4on the affected device; note if the e2fsck version is outdated.
Download, compile, and install a newer e2fsprogs (e2fsck) that supports the filesystem's features.
Re‑run
fsckto repair inode, block, and inode‑bitmap inconsistencies.
Delete the failing pod and let Kubernetes recreate it so the repaired PV can be mounted successfully.
Ops Development Stories
Maintained by a like‑minded team, covering both operations and development. Topics span Linux ops, DevOps toolchain, Kubernetes containerization, monitoring, log collection, network security, and Python or Go development. Team members: Qiao Ke, wanger, Dong Ge, Su Xin, Hua Zai, Zheng Ge, Teacher Xia.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.