Filesystem mounting ro with heavy NVMe I/O

Krait@discuss.tchncs.de · edit-2 1 year ago

Filesystem mounting ro with heavy NVMe I/O

Atemu · 1 year ago

Boot a live ISO with the flags recommended in the kernel message and do some tests on the bare drives. That way you won’t have the filesystem and subsequently the rest of the system giving out on you while you’re debugging.

Krait@discuss.tchncs.de · 1 year ago

Boot a live ISO with the flags recommended in the kernel message and do some tests on the bare drives. That way you won’t have the filesystem and subsequently the rest of the system giving out on you while you’re debugging.

Which tests are you referring to exactly? I have read about badblocks for example, and it not being much use for ssds in general, due to their automatic bad-block-remapping, so they remain invisible to the OS as all remapping happens in the drive’s controller. Smart values look great for both drives, about 20TBW on the Samsung drive, and a lot less on the Kioxia drive.

Atemu · 1 year ago

I’d start by generating some synthetic workloads such as writing some sequential data to it and then reading it back a few times.

badblocks concerns partial failure of the device where (usually) just a few blocks misbehave while the rest remains accessible. The failure mode seen here is that the entire drive becomes inaccessible and it’s likely not due to the drive itself but how it’s connected.

If synthetic loads fail to reproduce the error, I’d put a filesystem on it and copy over some real data perhaps. Put on some load that mimics a real system somehow to try and get it to fail without the OS actually being ran off the drive.

Krait@discuss.tchncs.de · 1 year ago

Thanks, I’ll try that. I loaded the drive using dd a couple of times, and that did bring the system down a couple of times. I was writing to the filesystem though, while the system was booted

Atemu · 1 year ago

Did you boot with the kernel flags from the log?

Could you show the dmesg from the point onwards when the drive dropped out?

Krait@discuss.tchncs.de · 1 year ago

I did, yes, but no avail. The dmesg output I posted is after the drive was mounted as ro, and is the best i could get. After some time, the system stops responding completely

Atemu · 1 year ago

Your system stops responding even if it’s not booted from those drives but a live ISO?

Filesystem mounting ro with heavy NVMe I/O

Filesystem mounting ro with heavy NVMe I/O

IMG 20240205 180054 2 — Postimages