Hello everyone. I have a system with Ryzen 9 7950x, 32GB 6400 mhz DDR5 ram, 1 TB primary SSD where Windows 11 and Linux installed and Gainward RTX 4080 graphics card and Asus Prime X670-P Wifi mobo. I also have 1 TB SSD and 2 TB HDD’s mounted. 2-3 months ago, I started to get crashes on my both OS’es. And in time, they got frequent. I bought a brand new SSD for OS installation, after a while it started again. I cannot get any error message on Windows, since BSOD screen just stays for 2-3 seconds and system restarts. After restart, I sometimes get “no Bootable device found” error on boot stage. When the crash happens on Linux, dmesg outputs show something like whole SSD disconnected. It shows I/O messages for root partition as well. I changed primary SSD 1 month ago, errors still persist. Sent mobo to the service, no issues were found. BIOS also updated and reset. When I run PC on live Linux media, I get no issues however. What can I do else? What can cause this issue? Thanks in advance.

  • L7HM77@sh.itjust.works
    link
    fedilink
    English
    arrow-up
    2
    ·
    edit-2
    2 months ago
    Rambling Story

    Once, I had an El Cheapo and very questionable SATA SSD fail on my system. Had similar symptoms, Windows would hang and crash at random, becoming more frequent over time. Found out while digging through Windows logs and troubleshooting, that the system would crash when trying to access the drive via the file explorer, because the drive would disconnect. The SSD seemed to fail slowly, but I was using it as a faster workspace and saving everything to an HDD, so I never looked into the possibility of a failing drive until the system wouldn’t boot. Removing the drive cured everything. I should probably note that the failed SSD wasn’t the boot drive, it was used strictly for data, so the OS wasn’t being unmounted directly. I think the drive itself was shorting out some of the SATA pins, scrambling the whole bus.

    Several years later, on the Linux side of things, I found out that fstab can prevent booting if a storage device is missing. Fstab had auto configured an external drive enclosure as a critical component on a fresh install. Not sure what the error messages would look like if an internal data drive mounted as critical disconnected on a running system, but I would assume Linux would halt even if no processes are running from the drive.

    I’m not sure what the symptoms would have been if my SSD drive failed while running Linux. My gut says it would show similar to your Linux dmesg, like the boot drive I/O disconnecting or becoming inaccessible.

    I’ve also had a system with an AMD processor fail to boot, but that one wouldn’t even POST. Fixed that one by finally reseating the CPU. Turns out that’s a common issue with some AMD CPUs using the AM4 socket, found a lot of complaints online for that one after the fact.

    Since your system runs fine from a live USB, and you’ve already replaced the M.2 drive, I would try running the system without any SATA drives installed, and try to force a crash until you feel confident the issue is gone.

    If the problems still persist, then I would look at getting a cheap fresh HDD and new SATA cable, installing a temporary OS, and try the test again.

    If it STILL crashes, I would look at removing all unnecessary hardware from the motherboard and slowly testing each stage as you rebuild.

    • mathias_freireOP
      link
      fedilink
      arrow-up
      1
      ·
      2 months ago

      I unplugged SATA cables last night, booted from Windows USB to install it, SSD disconnected again mid course :) SSD is disconnected somehow and if it happens in OS installed on, it causes crash. On USB, there is no crash. It’s not HDD, not memory or cpu, not SSD (it’s brand new already). I’m down to motherboard at this point.

    • mathias_freireOP
      link
      fedilink
      arrow-up
      2
      ·
      2 months ago

      It’s 1 year old, still has warranty. 800 watt gold+. It had no problems so far. I thought it might be the cause, but SSD seeming disconnected even after restart (not everytime) makes me think it’s either SSD or motherboard. But still not sure of anything.

      • CarbonatedPastaSauce@lemmy.world
        link
        fedilink
        English
        arrow-up
        1
        ·
        edit-2
        2 months ago

        Is any part of the OS installed on the HDDs? If not, disconnect them and see if the OS is stable.

        I’d also recommend running memory tests on your RAM just to eliminate that as a factor. You could also decrease the clock speed on your memory for a while and see if that helps.

        After all that , if the issue persists, try disconnecting all storage and running Linux from a usb drive. If the system is stable at least you know it’s something in the storage, but if it keeps crashing you’ll know the storage errors are a red herring.

        • mathias_freireOP
          link
          fedilink
          arrow-up
          1
          ·
          2 months ago

          Memtests showed nothing. I also ran SSD check in BIOS settings, still nothing. I will try again. As of live Linux over USB, it’s stable even with storage connected. I also have an installed system that may run to issues.