New 5900X rig suddenly running sluggish (need help identifying & solving)

JYeager11

Gawd
Joined
Nov 15, 2006
Messages
563
Hi guys,

Not sure if this is a AMD-specific issue, a SAMSUNG one, or a MICROSOFT one... apologies if I picked the wrong forum; feel free to move the thread if we find the problem and it turns out to have nothing to do with AMD.


1. The new rig

With the help of the HF crew (notably pendragon1 & hititnquitit) I was able to emergency-build a pretty decent PC when my old one died on me in the middle of a pandemic parts shortage last summer.
  • AMD Ryzen 9 5900X
  • MSI MAG X570 Tomahawk WIFI
  • Noctua NH-D15S chromax.Black
  • Phanteks Eclipse P600S
  • Crucial Ballistix 32GB (2x16GB) 3600MHz C16
  • Seasonic FOCUS-GX-1000 80+ Gold
  • 1 x Samsung 980 PRO M.2 NVMe SSD 1TB <- Windows Drive
  • 2 x Samsung 870 EVO SSD 1TB <- Docs + Backup drives
  • MyBook + MyPassport <- redundant mechanical backups
  • NVIDIA GTX 750 Ti
    (this is the ONLY legacy component in the rig, due to current GPU scarcity/overpricing)
Running on Windows 10 fully updated, from a still-valid Windows 7 license.


2. The initial KERNEL_SECURITY_CHECK_ERROR problem

Everything has been running GREAT, except for the occasional KERNEL_SECURITY_CHECK_ERROR blue screen / reboot.

How occasional? I can go weeks without seeing one, then I can get 3 in the same day. Checked my temps via HWINFO, nothing seems faulty there. But after a few months, I had it narrowed down to the VPN : it always happens when I'm on SurfShark. And 99% of the time, it happens while I'm streaming video (it could be perfectly legitimate video, like YouTube). This issue doesn't seem to care what browser I'm using -- or even if I have my adblocker turned on or off -- but it has yet to happen when I'm NOT on the VPN (and most of the time, I'm not). There is definitely correlation there.

The one and only time I got the fatal error WITHOUT any video being streamed, I was downloading via qbittorrent. So to narrow it down even further, it appears to have something to do with receiving a continuous flow of data while on the VPN.

We never did find the exact cause or remedy, but having it narrowed down to the VPN led me to believe the SurfShark software might be at fault; and if so, that issue would fix itself when I switch to another VPN provider at the end of my current subscription. And since everyone here said not to sweat them, I just got used to the occasional interruptions : the rig boots so blazingly fast that waiting out a reboot was barely an inconvenience. 20 seconds later, I'm back in like nothing happened.

It's been this way for 6 months; and other than these rare blue screens, the rig performs admirably under all conditions. I could not say enough good things about this entire set-up, except for this issue...

...until today.


3. From occasional blue screens to permanent sluggishness

Because today, I was tempted to Google the KERNEL_SECURITY_CHECK_ERROR for more info, which I found. I wish I hadn't, but one of the suggested fixes was running a CHKDSK /F on my windows drive (which, as mentioned in the parts list above, is a m.2 NVMe SSD). It wouldn't do it without rebooting first, so I did. CHKDSK auto-ran at startup and then I was back in Windows (too quickly to read the CHKDSK info that flashed by first).

The next suggestion on the help page was verifying drivers via a CMD command called "verify" -- because, the page claims, this Kernel Security Check error is most often caused by drivers that simply need updating.

And this is where things get weird.

Once I typed "verify" and ENTER, nothing happened. The CMD window just froze. I wait a bit to see if it's just stalled.. and then just hit the X on it to close it, figuring I probably just forgot to make sure I had Admin privileges.

Only problem is, my computer has been sluggish ever since :cry: (it even takes longer for desktop right-click dialogs to appear). I'm also hearing the fan start and stop more than I usually do while idle (which is not at all) so I load up HWINFO to monitor my temperaturess : CPU is now idling at 55-60C, which is about 15-20C more than it usually idles at (40C). And it's been about 2h so far, it's still idling there.

(When I say everything is sluggish, I mean it : if I reboot, the time it takes to shut down is slower, the time it stays off is longer, and the time it takes to boot back up is slower too. It's not just once Windows is loaded. I'm seeing the difference everywhere.)

Task Manager doesn't seem to reveal anything suspect at first glance, but I'm no expert. I do notice the NVIDIA Container oscillating between Medium and High in the power consumption column... while I'm doing absolutely nothing... which reminded me that I haven't updated the video drivers (of my legacy 750 Ti GPU, the ONLY legacy component in this rig) in a while. So I do that next, figuring it couldn't hurt... but it didn't help. Same slowness issues after updating gpu drivers + rebooting.

It's like something is working hard behind-the-scenes that I can't see, and it's bottlenecking everything. Could it still be trying to perform the driver verification despite 4 reboots since I executed that command? Since it knew to CHKDSK after rebooting, maybe rebooting doesn't interrupt this "verify" command. But wouldn't it be done 2h later? Because even copying large volumes of files starts slowly (like 250k/s) before speeding up to 500+ mb/s about 33% into it (the k's and mb's are not typos).

Photoshop now takes 45 seconds to load (used to take 5s), taking forever to initialize panels and other stuff at the start. Even the thumbnails of recent files take longer to draw, on the welcome dashboard.


4. Narrowing down the cause of the sluggishness

Went back to Google for help (from a different source, in case the previous one misled me into bricking an important component) and saw a suggestion to analyze the situation via Windows 10 Performance Monitor.

And I finally got my first hint at the answer (yellow highlight by me):

report.jpg


I'm way past my depth here, but I'm going to guess all those interrupts at the top are the problem. Does anyone have any advice for me to bring things back to how they were this morning before I touched CHKDSK /F and VERIFY? (Two commands I plan to never use again after this experience.)

I can't imagine that idling this hot, after 6 months of idling at much lower temps, is normal. Or good for my rig long-term, even if I WERE to resign myself to putting up with this. (Turning off Background App Refreshing didn't help either!)

Anyways... as usual, thanks in advance, y'all saved my life on several occasions, I'm hoping you can do it again! :)

PS: Certain Google results claim that using CHKDSK /F on a SSD drive can brick it, but others refer to this as outdated information, SSD can handle CHKDSK and CHKDSK /F just fine in 2022. Not sure what to believe anymore. If the drive is permanently damaged (although analytics tools don't seem to be suggesting that) would it be safe to simply MIRROR its contents to another identical 980 Pro I can purchase? Because I am NOT looking forward to rebuilding Windows from scratch this soon after last time.
 
Last edited:
Boot a Linux distro live and see how it performs to narrow it down to a hardware or a software problem.
 
what the cpu usage look like in task man.?
Perfectly normal, half a dozen process idling at less than 2% each, with everything averaging about 12% total while idle. Memory at 18%, but I don't know if it still remembers Photoshop, which I just opened and closed to crop a screenshot I just DM'd you.
 
  • Seasonic FOCUS-GX-1000 80+ Gold
Yep, and in Hybrid mode (it's an annoyingly easy button to push on the back of the case, where the PSU is exposed). I could turn Hybrid mode off and see what happens, but if that helps, it would mean this all happened because I accidentally turned on hybrid mode earlier today (which I don't remember doing, but I guess that's why they call it 'accidentally').

It's technically possible that this thing just isn't running on enough power all of a sudden.. would the symptoms be similar? As unlikely as this would be, I don't mind looking under ALL the stones.

PS: About 5 hours into this now, with several restarts, and even a 30 min period with the PC & modems shut down... nothing helped, still idling at 60C with nothing running right after a reboot, still hearing the fan more than I usually do, but nothing "stopped working" yet. Everything works, even Photoshop (albeit very slowly, too slowly to be anywhere near normal).
 
Last edited:
live boot linux and see if it works. With that RAM - what's SoC voltage set to? (I've ALWAYS had to bump it for Corsair to be stable at XMP on x570/Zen).
 
live boot linux and see if it works. With that RAM - what's SoC voltage set to? (I've ALWAYS had to bump it for Corsair to be stable at XMP on x570/Zen).
Linux talk is way out of my depth, but here's a snapshot I took at the bios screen with the voltages.
1642723963740.png

Note the low CPU temp compared to what HWiNFO reports once Windows is loaded (55-60C). Not sure why this is.

Download speeds seem unaffected. And copying files is inconsistent, but can still hit those top speeds. Photoshop, however, takes a MASSIVE amount of time to load up compared to yesterday. We're talking almost a full minute, when it used to take literally 5 seconds.

You may be onto to something with this ram theory, but why would the ram have been affected by anything I did today?
 
I have no idea if this is any kind of clue, but now when I reboot -- in addition to it taking about 3x longer to shut down to black than it did yesterday -- I hear a "click" from the speakers at the very end of the shutdown process (before it starts up again).

There was no click until today, and it's literally every time I shut things down now. It's very reminiscent of when an audio device freezes, and you have to reboot to re-initialize your audio... the "click" you'd hear at shutdown through the speakers. But it would only happen when the audio engine froze. In THIS case, the audio engine seems just fine, I can play and hear stuff. Still clicks at shutdown, tho.
 
Did you try verify off
No, is that a thing? If it is, I'm surprised you're the first to suggest it. However, you can imagine how nervous I am about going back to CMD prompt after what happened today, so if you could really BABY STEP this for me, I'll give it a shot. (Including the proper way to retain admin privileges at CMD prompt, which might be where I f'd up the first time.)

HWiNFO also seems to have a driver repair tool at the bottom of its window that I never tried, called "Scan, Update and Repair Drivers" <- could this be helpful? I hear HWiNFO is a trusted and appreciated tool in these forums.
 
yes its a thing, verify doesn't do what you think it does, it verifies everything written to a harddrive after writing until turned off. on a spinner it probably won't kill a system, but you have an nvme drive.. which writes really fast comparatively.
 
hey i just reread your comment about babystepping it out for you give me a minute, just put my kids to bed
 
yes its a thing, verify doesn't do what you think it does, it verifies everything written to a harddrive after writing until turned off. on a spinner it probably won't kill a system, but you have an nvme drive.. which writes really fast comparatively.
Gotcha. I just Googled the steps to properly "verify off" (just in case) and in the process learned that to even turn this thing on, you have to type "verify on" and not just "verify". IIRC, I only typed "verify" and nothing happened. In theory, it should have responded with whether it's on or off, right?

Either way, I went in, typed "verify off" and it just went back to command line without confirming anything (why can't it just tell me "OK, it's off!" so I don't have to wonder) and then rebooted. No change. Same slowness. (Was worth a shot.)

hey i just reread your comment about babystepping it out for you give me a minute, just put my kids to bed
It's OK, I found it.
 
go into your event viewer and try to find the interrupts, try windows logs and then system
I'm there, but I'm not knowledgeable enough in these matters to know what to look for or where to find it. That being said, without touching anything, I can see 1 Critical Error in the last 24h (without a doubt the Kernel Security Check Error that happens when I'm on my VPN, described at the beginning of the OP).

I also see in that same 24h span:
- 78 regular errors
- 311 warnings
- 2180 "information"
- 3,664 "audit success" (482 in the last hour)
 
I have no idea if this is any kind of clue, but now when I reboot -- in addition to it taking about 3x longer to shut down to black than it did yesterday -- I hear a "click" from the speakers at the very end of the shutdown process (before it starts up again).

There was no click until today, and it's literally every time I shut things down now. It's very reminiscent of when an audio device freezes, and you have to reboot to re-initialize your audio... the "click" you'd hear at shutdown through the speakers. But it would only happen when the audio engine froze. In THIS case, the audio engine seems just fine, I can play and hear stuff. Still clicks at shutdown, tho.

I'm thinking Windows update downloaded some unsavory Realtek audio drivers. I would run DPC latency checker and see what kind of numbers you get.
 
I'm thinking Windows update downloaded some unsavory Realtek audio drivers. I would run DPC latency checker and see what kind of numbers you get.
I was leaning drivers myself at this point i'm wondering if just re-installing all of them wouldn't be a bad idea. i was hoping to look at the interrupt data to see which device
 
I was leaning drivers myself at this point i'm wondering if just re-installing all of them wouldn't be a bad idea. i was hoping to look at the interrupt data to see which device

I had an issue that turned out to be Realtek audio drivers before. The symptoms were constant CPU usage of 7% at idle - which was 1 thread at 100% on a 16-thread processor, the machine took a very long time to shut down, and I heard some occasional static through my speakers. No BSOD or noticeable performance issues, though. The process that pegged one thread at 100% was System. I forget what the thread was called in Process Explorer, but I'm pretty sure it had the word "interrupt" in it.
 
Linux talk is way out of my depth, but here's a snapshot I took at the bios screen with the voltages.
View attachment 434641
Note the low CPU temp compared to what HWiNFO reports once Windows is loaded (55-60C). Not sure why this is.

Download speeds seem unaffected. And copying files is inconsistent, but can still hit those top speeds. Photoshop, however, takes a MASSIVE amount of time to load up compared to yesterday. We're talking almost a full minute, when it used to take literally 5 seconds.

You may be onto to something with this ram theory, but why would the ram have been affected by anything I did today?
So for SoC we'll need more. Don't have an MSI board handy, but it'd normally be under overclocking or the like, where hte other CPU and RAM Voltages are. Can you send a pic of the full bios? I'll walk you to it - I know where it is, just can't remember the steps till I see them.

I suspect windows is whack - and the BSODs were because of the ram not quite having enough juice (the memory controller technically). You just got unlucky on the last one, or when you tried to clean, and it ate crap and died. I'd normally suggest a reinstall, but was trying to avoid that.

Here's a test - roll ram back to stock speed for the moment. See if suddenly it's stable... although you may still need to reinstall if it's horribly slow still.
 
I'm thinking Windows update downloaded some unsavory Realtek audio drivers. I would run DPC latency checker and see what kind of numbers you get.
this is also very possible. Interrupts are literally a device or thing telling the cpu "stop, do THIS" - with that many, you're constantly interrupting (lol) the CPU to do something other than what it was GOING to do prior. Hence performance. Now the question is where those are coming from.
 
I had an issue that turned out to be Realtek audio drivers before. The symptoms were constant CPU usage of 7% at idle - which was 1 thread at 100% on a 16-thread processor, the machine took a very long time to shut down, and I heard some occasional static through my speakers. No BSOD or noticeable performance issues, though. The process that pegged one thread at 100% was System. I forget what the thread was called in Process Explorer, but I'm pretty sure it had the word "interrupt" in it.
Just so we're clear, we're diagnosing 2 completely different issues, here.

  1. SINCE I BUILT THE RIG LAST SUMMER (NOT REALLY A BIG DEAL): The Kernel Security Check Error, which is a sudden BSOD mentioning the error name, a quick dump of files, followed by a forced reboot. All at once, all without possible interruption. This happens once or twice a week, always when I'm receiving data, and always when on my VPN (which is on only 20% of the time, which is why it would be a strange coincidence if the VPN had nothing to do with it, although I guess it's theoretically possible). It almost always happens when streaming something, even if it's a legit site like YouTube, and it doesn't care if the ad blocker is on or off. Nor if I'm using Edge or Firefox. The only time it happened when I wasn't streaming anything was when I was downloading something via qbittorrent one time (IIRC, no other apps were even loaded). However, the rig's speed was never affected by these occasional errors, so they were only a minor inconvenience for me; I'd be rebooted and back into whatever I was doing 20 seconds later. Told myself it will probably fix itself when my SurfShark subscription ends and I switch to another company. Or not. Like I said, given how fast and reliable the rig has otherwise been, I didn't pay this error much mind. The reboot was that fast and most programs have interrupted document recovery options now.
    --
  2. NEW FROM TODAY (AND A MUCH BIGGER DEAL): Everything, besides internet transfer speeds, seems affected and sluggish. Seems to have started when I tried a CHKDSK /F followed by a VOLUME (that apparently froze so I shut it off). Takes longer to open a window, shut down, restart.. and more peculiarly, the time it stays shut off before restarting is longer as well. It's struggling like the 12 year-old PC I just replaced with this one did. Audio seems to be working. Not sure what drivers I'm using, but it's the motherboard audio (specs in OP). EDIT: Typing "verify off" at CMD did not help.
I don't mind if we NEVER solve the BSOD problem, it's the sluggishness that started TODAY that I need to solve ASAP.

(Not sure if the distinction between the two issues was made clear enough earlier, so apologies if you already understood everything. But wouldn't it make at least some sense for a SECURITY error to somehow be VPN related? Or do you still think it's the audio?)
 
Last edited:
So for SoC we'll need more. Don't have an MSI board handy, but it'd normally be under overclocking or the like, where hte other CPU and RAM Voltages are. Can you send a pic of the full bios? I'll walk you to it - I know where it is, just can't remember the steps till I see them.
Here it is :
https://www.msi.com/blog/uefi-bios
That's exactly what I'm using. There's even a screenshot of my specs at the top of the BIOS, earlier in this thread.

Someone will REALLY have to babystep what to do from there, though, I'm embarrassed to admit. I've never played around with my ram before.

BTW, Game Boost and A-XMP (the top 2 boxes on the screen) are both off, this much I remember from memory. (I've never overclocked anything and have no plans to, value stability too much, and also have no idea what I'm doing, so...)

PS: Also more than willing to re-install all audio drivers if someone can tell me where to find reliable ones, just to scratch that one off the list.
 
Last edited:
Just so we're clear, we're diagnosing 2 completely different issues, here.

  1. SINCE I BUILT THE RIG LAST SUMMER (NOT REALLY A BIG DEAL): The Kernel Security Check Error, which is a sudden BSOD mentioning the error name, a quick dump of files, followed by a forced reboot. All at once, all without possible interruption. This happens once or twice a week, always when I'm receiving data, and always when on my VPN (which is on only 20% of the time, which is why it would be a strange coincidence if the VPN had nothing to do with it, although I guess it's theoretically possible). It almost always happens when streaming something, even if it's a legit site like YouTube, and it doesn't care if the ad blocker is on or off. Nor if I'm using Edge or Firefox. The only time it happened when I wasn't streaming anything was when I was downloading something via qbittorrent one time (IIRC, no other apps were even loaded). However, the rig's speed was never affected by these occasional errors, so they were only a minor inconvenience for me; I'd be rebooted and back into whatever I was doing 20 seconds later. Told myself it will probably fix itself when my SurfShark subscription ends and I switch to another company. Or not. Like I said, given how fast and reliable the rig has otherwise been, I didn't pay this error much mind. The reboot was that fast and most programs have interrupted document recovery options now.
    --
  2. NEW FROM TODAY (AND A MUCH BIGGER DEAL): Everything, besides internet transfer speeds, seems affected and sluggish. Seems to have started when I tried a CHKDSK /F followed by a VOLUME (that apparently froze so I shut it off). Takes longer to open a window, shut down, restart.. and more peculiarly, the time it stays shut off before restarting is longer as well. It's struggling like the 12 year-old PC I just replaced with this one did. Audio seems to be working. Not sure what drivers I'm using, but it's the motherboard audio (specs in OP).
I don't mind if we NEVER solve the BSOD problem, it's the sluggishness that started TODAY that I need to solve ASAP.

(Not sure if the distinction between the two issues was made clear enough earlier, so apologies if you already understood everything, just wanted to be clearer than I was earlier. But wouldn't it make at least some sense for a SECURITY error to somehow be VPN related? Or do you still think it's the audio?)
If your in a hurry: reformat your drive, re-install windows, re-installl all your drivers with fresh versions from the vendors, dont install your vpn software, see what happens. It sounds like a driver issue< The problem is the symptoms happen after chkdsk /f. chkdsk /f, prior to a certain patch point, did break 980s, but i havn't heard of these specific symptoms. So really you need to remove software as an issue and since you have the chkdsk in the equation i would reformat just to try to remove it from the equation.
 
If your in a hurry: reformat your drive, re-install windows, re-installl all your drivers with fresh versions from the vendors, dont install your vpn software, see what happens. It sounds like a driver issue< The problem is the symptoms happen after chkdsk /f. chkdsk /f, prior to a certain patch point, did break 980s, but i havn't heard of these specific symptoms. So really you need to remove software as an issue and since you have the chkdsk in the equation i would reformat just to try to remove it from the equation.
You think starting Windows over from scratch will SAVE me time? Ha! It'll take me days to be back where I was if I do that. Absolute last-resort option, I'm afraid. I was even willing to mirror onto a new 980, but that would also mirror the drivers. That's why I'd much rather fix this, than reinstall windows and everything on it. (The docs are safe and backed up already, hundreds of gigs worth; that's how I found out the sluggishness isn't actually stopping anything from working, even when you push it, it just takes way longer.)

How are we at the nuclear option already? ;)
 
use msconfig and set it to clean boot with network, see how it runs. in not better just flip it back.
 
You think starting Windows over from scratch will SAVE me time? Ha! It'll take me days to be back where I was if I do that. Absolute last-resort option, I'm afraid. I was even willing to mirror onto a new 980, but that would also mirror the drivers. That's why I'd much rather fix this, than reinstall windows and everything on it. (The docs are safe and backed up already, hundreds of gigs worth; that's how I found out the sluggishness isn't actually stopping anything from working, even when you push it, it just takes way longer.)

How are we at the nuclear option already? ;)

use msconfig and set it to clean boot with network, see how it runs. in not better just flip it back.
Getting close but try this first.
 
Here it is :
https://www.msi.com/blog/uefi-bios
That's exactly what I'm using. There's even a screenshot of my specs at the top of the BIOS, earlier in this thread.

Someone will REALLY have to babystep what to do from there, though, I'm embarrassed to admit. I've never played around with my ram before.

BTW, Game Boost and A-XMP (the top 2 boxes on the screen) are both off, this much I remember from memory. (I've never overclocked anything and have no plans to, value stability too much, and also have no idea what I'm doing, so...)

PS: Also more than willing to re-install all audio drivers if someone can tell me where to find reliable ones, just to scratch that one off the list.
1642736863765.png


Ok. So you see down under voltage where it says CPU NB/SoC Voltage, Auto? Set that to 1.1 instead. Stock is 1, which is often too low for most RAM (G.Skill seems to be the only one happy there, and even then, at larger quantities, it wants to get flaky).
 

Attachments

  • 1642736845919.png
    1642736845919.png
    338.1 KB · Views: 0
Ok. So you see down under voltage where it says CPU NB/SoC Voltage, Auto? Set that to 1.1 instead. Stock is 1, which is often too low for most RAM (G.Skill seems to be the only one happy there, and even then, at larger quantities, it wants to get flaky).
Now this is something I can do. But before I do, can I ask what those numbers mean and why you feel 1.1 would be better for my set of sticks than 1.0 is? Stable is better than quick, here. (I tried to find a reference to either 1.0 or 1.1 in the part specs, but all I find are 3600MHz and C16. Is there something about these values that suggests 1.1 is better, so I know in the future?) Is there a "for dummies" version of this answer even a doofus like me might grasp just enough to not worry about this once it's set? ;)
 
Also, I went sleuthing for "CHKDSK /F BROKE MY 980 PRO" and actually found LITERALLY that, here:
https://docs.microsoft.com/en-us/answers/questions/668840/34chkdsk-f34-39broke39-my-ssd.html

Only reason I'm still skeptical is the symptoms appear to be quite different (other than the extreme slowness and higher idle temps, everything technically works, at least so far). But if it's an issue of CHKDSK damaging the 980's filesystem, I suppose the symptoms wouldn't have to be identical across the board. What do you guys think? It's the same drive except that mine's a 1TB, and he's using his as a DOC drive (not Windows like me).

Off to bed, thanks for putting up with my ignorant noobness! (I'm trying very hard not to freak out at the idea of having to rebuild the C drive from scratch until I have to.)
 
this is also very possible. Interrupts are literally a device or thing telling the cpu "stop, do THIS" - with that many, you're constantly interrupting (lol) the CPU to do something other than what it was GOING to do prior. Hence performance. Now the question is where those are coming from.
They could be coming from just about anything. Among other things, such as the NIC telling the CPU it got a packet, interrupts are used for error handling. So just about anything that's broken/dying or has a bad driver could end up spewing them. It would be really useful to find out what the source of the excess interrupts is so you'd have an idea where to look for the problem. I haven't personally had to deal with debugging an interrupt problem like this, so I'm not sure what to suggest for a diagnostic utility. Most of my screwing around with interrupts involved performance tuning for real time applications on Linux.

The craziest problem I've heard of with involving excessive numbers of interrupts and system sluggishness was in a thread on here started by a guy in Iran with an i3-10100F and an H510 (I think) mainboard. After multiple trips to a repair shop and the warranty repair depot it turned out his CPU and mainboard didn't get along. Other CPUs worked in his board and his CPU worked on other boards, but the two of them together generated a ridiculous number of interrupts. My guess is his board had a bug that made it not get along with CPUs that lack integrated graphics since all the test CPUs he mentioned had integrated graphics.
 
Now this is something I can do. But before I do, can I ask what those numbers mean and why you feel 1.1 would be better for my set of sticks than 1.0 is? Stable is better than quick, here. (I tried to find a reference to either 1.0 or 1.1 in the part specs, but all I find are 3600MHz and C16. Is there something about these values that suggests 1.1 is better, so I know in the future?) Is there a "for dummies" version of this answer even a doofus like me might grasp just enough to not worry about this once it's set? ;)
Sure!
You're running XMP speeds, one way or another - 3600mhz is not a stock RAM speed (JEDEC, the board that determines ram specs, stopped at 2133 for DDR4, IIRC). This is NORMAL - no one runs stock JEDEC RAM unless you're building a server with ECC RAM, and even THOSE finally started bumping speeds (the memory consortium is very weird). Ryzen's IMC (integrated memory controller) is... finnicky. XMP itself is an Intel standard that AMD adopted, and they got it ~mostly~ right, but the controllers are still picky. Hence why QVL certification/listing is a big deal - but even then, on these processors, it gets super picky. The solution to this is to give that controller slightly more juice, which fixes any issues 99.9% of the time (until you get to extreme overclocks or extreme memory quantities, which is a different ball game again). You also want that higher memory speed - Ryzen thrives on bandwidth.

Stock IMC voltage is 1.0v or 1.05v (I've seen both numbers, my bet is 1v is the real "standard"). Standard "we're gonna just fix this" is 1.1v. You can go as high as 1.2, but generally everyone sets it to 1.1 and just leaves it. Out of the 5 Ryzen systems (3960, 3950, 1950, 2700, 3200G) I have, only ONE of them is not running the modified voltage (3200G), and that's because I haven't turned on XMP yet - it's sitting open on a desk, I just booted it to make sure the mobo/CPU were functional. The only memory I've seen that doesn't need this tends to be G.Skill - but that is only true to 32G or so (64 or 128, you're touching that voltage again). Any Ryzen overclocking or memory guide will have this very near the top of the page, if not the first step - you just set it there to avoid having an issue.

The second voltage we might touch is the actual memory voltage itself - 1.35v is often needed for these too, but that's less common a change.

This is safe and still within spec - it's just not enough at stock (trying to be energy efficient here!) to be fully reliable. When Jay2cents/LTT comment that "AMD takes some fiddling to get stable," this is one of the top things they're talking about. It's picky on memory. Good news? Once set, you'll never touch it again. It'll just work.

This MAY not fix your current problem, but if it's what helped trigger it, you won't hit it again. For me, I figured out how badly this was needed when I took my 3960X to 128G - it was fine for almost anything, but loved to crash on youtube videos with a BSOD. Bump IMC voltage - 100% stable now for over a year. Similar issue on the 3950X at 64G, and so on...
 
They could be coming from just about anything. Among other things, such as the NIC telling the CPU it got a packet, interrupts are used for error handling. So just about anything that's broken/dying or has a bad driver could end up spewing them. It would be really useful to find out what the source of the excess interrupts is so you'd have an idea where to look for the problem. I haven't personally had to deal with debugging an interrupt problem like this, so I'm not sure what to suggest for a diagnostic utility. Most of my screwing around with interrupts involved performance tuning for real time applications on Linux.

The craziest problem I've heard of with involving excessive numbers of interrupts and system sluggishness was in a thread on here started by a guy in Iran with an i3-10100F and an H510 (I think) mainboard. After multiple trips to a repair shop and the warranty repair depot it turned out his CPU and mainboard didn't get along. Other CPUs worked in his board and his CPU worked on other boards, but the two of them together generated a ridiculous number of interrupts. My guess is his board had a bug that made it not get along with CPUs that lack integrated graphics since all the test CPUs he mentioned had integrated graphics.
I haven't since the pre-NT days either, so I'm drawing a blank. Something like the RealTime latency monitors might show something, but heck if I know. My gut would be rebuild from scratch, but I also know the urge to avoid that like the plague. All my modern interrupt tuning has been on ESXi :p

I've had hardware that just didn't like me, or vice versa. I went through THREE polaris cards (RX480 4g, 480 8g, 580 8g), none of them were stable or worked worth a damn on my old workstation. Hard freezes. Hard crashes. Every one worked PERFECTLY for the folks I sold them to (as unknown status). The 5700XT I finally dropped in? Worked PERFECTLY. Something about Polaris and that old x370 board just did NOT play nice.
 
Also, I went sleuthing for "CHKDSK /F BROKE MY 980 PRO" and actually found LITERALLY that, here:
https://docs.microsoft.com/en-us/answers/questions/668840/34chkdsk-f34-39broke39-my-ssd.html

Only reason I'm still skeptical is the symptoms appear to be quite different (other than the extreme slowness and higher idle temps, everything technically works, at least so far). But if it's an issue of CHKDSK damaging the 980's filesystem, I suppose the symptoms wouldn't have to be identical across the board. What do you guys think? It's the same drive except that mine's a 1TB, and he's using his as a DOC drive (not Windows like me).

Off to bed, thanks for putting up with my ignorant noobness! (I'm trying very hard not to freak out at the idea of having to rebuild the C drive from scratch until I have to.)
Sadly no detail to tell us what WAS wrong. Just that something went wrong. It's hard to break the filesystem of an SSD - not hard to break NTFS though.
 
This MAY not fix your current problem, but if it's what helped trigger it, you won't hit it again. For me, I figured out how badly this was needed when I took my 3960X to 128G - it was fine for almost anything, but loved to crash on youtube videos with a BSOD. Bump IMC voltage - 100% stable now for over a year. Similar issue on the 3950X at 64G, and so on...
Reassured that things weren't likely to blow up, I went into my BIOS (with the intention of doing exactly what you recommend) and found it was already done. SoC voltage is set to 1.112V (cpu core voltage set to 1.476V). While I was there, I figured it could only help to take a few snapshots and present them here.

1642781811532.png


Maybe you can find something that would explain the occasional KERNEL_SECURITY_CHECK_ERRORs that I've been suffering since the rig was built last summer. (Are those 2 memory failure retry counts normal?)

The common speculation until now was audio driver issues (realtek?) which we've not looked into yet. Are there more recent audio drivers I can install for my MB that might help?

Do NVMe drives have firmware that could benefit from being updated? (Is updating my 980 Pro's firmware even a thing I can do, and would that help?)

PS (Reminder) : The occasional BSOD Kernel issues have been around from Day 1 and are theoretically unrelated to my (potentially) damaging my OS and/or C drive (980 Pro) yesterday via CHKDSK /F.
 
So it’s set to auto, but somehow getting to 1.1. Did you build this or buy it?
 
Sadly no detail to tell us what WAS wrong. Just that something went wrong. It's hard to break the filesystem of an SSD - not hard to break NTFS though.
I'm way out of my depth here, but didn't the link I shared last night propose a solution? Granted, it was a solution I didn't fully understand on first read (I've literally never had to go into recovery options), but he DID seemingly fix his issue without having to reformat. Are you saying there's no real chance his solution will help us here?
 
So it’s set to auto, but somehow getting to 1.1. Did you build this or buy it?
A friend built it in front of me. He does this a lot, but because it was me, he was extra slow and careful with stuff (more than he would've been otherwise, he says). I think it took us 5 hours total, from all the parts in their respective boxes. Although he's more a Linux guy, he's also done some Windows setups (but is a little less familiar with them). He runs the servers for a large IT company.
 
Ok. So they configured XMP speeds without using XMP - weird, but... ok... huh. I'm wondering what else he/she/it tweaked. Some of that doesn't look normal. It almost looks like they locked it to 3.7Ghz all-core, instead of boosting/etc... Sadly, I don't have much hands-on time with modern MSI AMD boards (just by chance, not choice). Dan_D had an X570 Godlike, I think - he might remember the pages more than I do.
 
Back
Top