Hi All,
Our kubernetes master node has 2 etcd volumesThey were working fine for almost 2 years
One of the volumes has become read-only due to file system errors 2 weeks back
/dev/nvme1n1 on /mnt/master-vol-id1 type ext4 (rw,relatime,data=ordered)
/dev/nvme2n1 on /mnt/master-vol-id2 type ext4 (ro,relatime,data=ordered)
tune2fs showed the filesystem had errors
FS Error count: 4
First error time: Thu Jul 8 12:10:00 2021
First error function: ext4_journal_check_start
First error line #: 56
First error inode #: 0
First error block #: 0
Last error time: Wed Jul 21 16:57:50 2021
Last error function: ext4_remount
Last error line #: 4964
Last error inode #: 0
Last error block #: 0
Checksum type: crc32c
Checksum: 0xfb3bfe96
By checking online I found out that the solution was to unmount the filesystem and run fsck
I unmounted the filesytem but when I ran fsck it showed the device was busy
# fsck.ext4 /dev/nvme2n1
e2fsck 1.43.4 (31-Jan-2017)
/dev/nvme2n1 is in use.
e2fsck: Cannot continue, aborting.
fuser and lsof commands show that it is not being used by anything
# fuser -cu /dev/nvme2n1
# lsof /dev/nvme2n1
the filesystem can by mounted read-only but cannot be as read-write
# mount -o ro /dev/nvme2n1 /etcd-events/
# mount -o remount,rw /dev/nvme2n1 /etcd-events/
mount: cannot remount /dev/nvme2n1 read-write, is write-protected
Is there any solution, I do not want to reboot the system as it will cause outage