I’m trying a new CPU in my PC (Ryzen 5500GT) and I’m seeing:

  • Sporadic kernel panics during boot.
  • Random .ko.zst module files (different one each boot) complaining that ZST decompression failed checksum.
  • Random .so’s failing to find a symbol and causing programs to crash/fail to start.
  • Started a stress-ng sequential session at 5s per stressor and it hung up after a dozen stressors. Couldn’t ctrl-c it and also ps didn’t work anymore. 😅

Funny thing is, other than that the system runs fine (when it boots, that is).

Switched back to my old CPU (that’s the only change in the machine) and all of these things stopped.

That CPU that’s doing that is defective, correct? Just double-checking I’m not missing anything else.

I’ve reset BIOS between CPU swaps and left it at defaults. Could default settings cause a CPU to act like this?

Edit: cooling is good, all temps (chipset, CPU etc.) are in the 30’s C in idle, CPU went up to 75C when stressed. Have a tower cooler (Scythe Kotetsu) with a 120mm fan.

I’m also adding some voltage readings I took from sensors while the problematic CPU was installed:

Vcore: 840mV
+3.3V: 3.31V
+12.0V: 12.10V
+5.0V: 5.01V
VSOC: 780mV
VDDP: 900mV
DRAM: 1.21V
3VSB: 3.29V
VBAT: 3.26V
  • just_another_person@lemmy.world
    link
    fedilink
    arrow-up
    1
    ·
    25 days ago
    1. Check if the CPU is overheating. You didn’t mention anything about cooling.
    2. Are you sure your installed RAM is frequency compatible with the new CPU?
    • Markaos@discuss.tchncs.de
      link
      fedilink
      arrow-up
      0
      ·
      25 days ago

      I don’t think overheating would cause random corruptions (it should throttle down when overheating, and then shut down if the temperature gets too high even when throttled, but there should never be an incorrect result of any computation), and surely the RAM will run at the standard 2133 speed on default settings - OP says they reset the BIOS settings to default between CPU swaps.

      • lemmyvore@feddit.nlOP
        link
        fedilink
        English
        arrow-up
        3
        ·
        25 days ago

        RAM is indeed at 2133 MHz and the cooling is great, got a tower cooler (Scythe Kotetsu mark II), idle temps are in the low 30’s C, stress temp was 76C.

  • lemming741@lemmy.world
    link
    fedilink
    English
    arrow-up
    0
    ·
    25 days ago

    I had a 3700x that would lock up sometimes at light usage. Passed every stress test, and could idle for days. I swapped ram, psu, and mb with no effect. It’s possible the microcode and firmware mentioned here could have fixed it but I got another CPU and all my problems went away. Worth the $200 for me.

    • lemmyvore@feddit.nlOP
      link
      fedilink
      English
      arrow-up
      1
      ·
      25 days ago

      It’s a pain in the butt to swap CPUs one more time but that may pale in comparison to trying to convince the shop that a core is bad and having intermittent faults. 🤪

  • catloaf@lemm.ee
    link
    fedilink
    English
    arrow-up
    0
    ·
    25 days ago

    It might be the CPU, but it might be something else. On the old CPU, update the OS, update the BIOS, and run fwupd or boot Windows temporarily to update all other firmware. Then run memtest and a cpu stress test to make sure you’re not just triggering an existing hardware issue.

    If that’s all clean, put in the new CPU and run memtest and a cpu stress test to see where you get issues.

    • lemmyvore@feddit.nlOP
      link
      fedilink
      English
      arrow-up
      1
      ·
      25 days ago

      Everything is up to date as far as I can tell, I did Windows too.

      memtest ran fine for a couple of hours, CPU stress test hang up partway through though, while CPU temp was around 75C.

      • FauxLiving@lemmy.world
        link
        fedilink
        arrow-up
        1
        ·
        25 days ago

        75C is fine, the CPU will throttle in order to avoid max temps. This isn’t something that should cause instability.

        It’s POSSIBLE that this is a bug that’s fixed with a microcode update, see here for installing it: https://wiki.archlinux.org/title/Microcode

        TL;DR:

        1. Install amd-ucode
        2. Edit /etc/mkinitcpio.conf, add microcode after autodetect
        3. sudo mkinitcpio -P
        4. reboot

        If that doesn’t fix it, and it crashes in Windows too, it may be a hardware problem. There isn’t much you need to do in order to get a CPU working.

        • lemmyvore@feddit.nlOP
          link
          fedilink
          English
          arrow-up
          0
          ·
          edit-2
          25 days ago

          This sounds like my best shot, thank you.

          I’ve installed the amd-ucode package. It already adds microcode to the HOOKS array in /etc/mkinitcpio.conf and runs mkinitcpio -P but I’ve moved microcode before autodetect so it bundles code for all CPUs not just for the current one (to have it ready when I swap) and re-ran mkinitcpio -P. Also had to re-run grub-mkconfig -o /boot/grub/grub.cfg.

          I’ve seen the message “Early uncompressed CPIO image generation successful” pass by, and lsinitcpio --early /boot/initramfs-6.12-x86_64.img|grep micro shows kernel/x86/microcode/AuthenticAMD.bin, there’s a /boot/amd-ucode.img, and an initrd parameter for it in grub.cfg. I’ve also confirmed that /usr/lib/firmware/amd-ucode/README lists an update for that new CPU (and for the current one, speaking of which).

          Now from what I understand all I have to do is reboot and the early stage will apply the update?

          Any idea what it looks like when it applies the microcode? Will it appear in dmesg after boot or is it something that happens too early in the boot process?