New 5900X rig suddenly running sluggish (need help identifying & solving)

pendragon1

Extremely [H]
Joined
Oct 7, 2000
Messages
39,767
So it’s set to auto, but somehow getting to 1.1. Did you build this or buy it?
i think some bios updates might have addressed that, maybe, as mine autos to 1.1 too...

Ok. So they configured XMP speeds without using XMP - weird, but... ok... huh. I'm wondering what else he/she/it tweaked. Some of that doesn't look normal. It almost looks like they locked it to 3.7Ghz all-core, instead of boosting/etc... Sadly, I don't have much hands-on time with modern MSI AMD boards (just by chance, not choice). Dan_D had an X570 Godlike, I think - he might remember the pages more than I do.
yeah its setup odd. maybe a cmos clearing is in order...



op, is the bios up to date?! if not, do so and reset everything to optimized defaults.
 

lopoetve

Fully [H]
Joined
Oct 11, 2001
Messages
32,425
i think some bios updates might have addressed that, maybe, as mine autos to 1.1 too...


yeah its setup odd. maybe a cmos clearing is in order...



op, is the bios up to date?! if not, do so and reset everything to optimized defaults.
I'm tempted to let Dan comment, since I'm 99.9% positive he had an OCed 5950X on the Godlike - he'll know what is off from stock or not, or fi there's a global setting doing something I forgot about.
 

pendragon1

Extremely [H]
Joined
Oct 7, 2000
Messages
39,767
I'm tempted to let Dan comment, since I'm 99.9% positive he had an OCed 5950X on the Godlike - he'll know what is off from stock or not, or fi there's a global setting doing something I forgot about.
pffft, dan_d, what does he know ;)
yeah, thats a combo i havent touched.
 

JYeager11

Gawd
Joined
Nov 15, 2006
Messages
559
Ok. So they configured XMP speeds without using XMP - weird, but... ok... huh. I'm wondering what else he/she/it tweaked. Some of that doesn't look normal. It almost looks like they locked it to 3.7Ghz all-core, instead of boosting/etc... Sadly, I don't have much hands-on time with modern MSI AMD boards (just by chance, not choice). Dan_D had an X570 Godlike, I think - he might remember the pages more than I do.
I went digging specifically for how to configure my exact sticks with my exact motherboard, and didn't find much, except this page confirming their compatibility. Do the numbers in the specs suggest my BIOS should be configured differently?

pendragon1 As the screenshot indicates, I'm using BIOS Ver E7C84AMS.160 (build date 05/28/2021)
Meanwhile, the BIOS page for the X570 Tomahawk suggests there have been 2 bios revisions released since : a stable one, and a beta one.
Should I be updating? And if so, to which one?

Also, why isn't anyone giving this thread I posted earlier any mind? He points to Microsoft's proposed solution in 2020 for CHKDSK /F issues (about halfway down the page) which look like :
  • Start up into the Recovery Console
  • Select Advanced options.
  • Select Command Prompt from the list of actions.
  • Once Command Prompt opens, type: chkdsk /f
  • Allow chkdsk to complete the scan, this can take a little while. Once it has completed, type: exit
Do I just scratch this off as a potential solution? (Running chkdsk /f again, but from the recovery console this time.)
Or is that more likely to make the issue worse? I'm just trying to avoid having to re-install everything from scratch on the same drive.
 
Last edited:

JYeager11

Gawd
Joined
Nov 15, 2006
Messages
559
We're a bit all over the place right now, so I'll try to clean things up a bit.

Everyone seems to be giving up on fixing the sluggishness issue, preferring to concentrate on the Kernel Security Check Failures that never really affected performance, and are likely ram or audio related (even if they only seem to happen while on SurfShark VPN)... but what do I do about the sluggishness? Because until yesterday - and despite those occasional KSCF BSODs - performance was top-notch. The KSCF errors were barely an inconvenience (I was usually back in whatever app I was in 20 s later like nothing happened, and that includes a full reboot).

The sluggishness, however, is new from yesterday, and began after running CHKDSK /F and VERIFY on my C drive (a m.2 NVMe 980 Pro). I went back and made sure "VERIFY" was currently set to "OFF", too, but no change. I feel like I'm back on the 12 year old rig I replaced last summer with this one.

So while I do plan on getting my BIOS in tip-top shape ASAP, I'm more concerned about the latter issue than the former; unless one of you has reason to believe the BIOS fixes can also somehow fix this new across-the-board sluggishness that began yesterday - but did not exist until then.

* * *

My short-term worry is that Windows is relentlessly doing something that's taxing the CPU, raising its idle temp to 60C (from the usual 42C) and causing everything to run more slowly across the board as a result. I can hear the fans randomly take off while idling, when I used to only hear 'em while pushing the rig hard. Don't I risk damaging the CPU (or wearing out the paste) if I let this go on too long? (That would be a much more expensive replacement.) Nothing has failed or broken (yet), things just take much, much longer to do.

(PS: When I reboot and go into BIOS, the CPU temp there is 36C, which is about where it always is in the BIOS... suggesting the issue is likely Windows-related.)

Couldn't HWiNFO's various monitors help narrow down where the bottlenecking is occurring, or am I wasting my time with these concerns? I don't know HWiNFO very well, but it was highly recommended by HardForum users. Could it be the reading? The writing? Is it the drive, the ram, or something else? Do we give up updating the audio engine as a possible solution? Does the 980 Pro have firmware that could be updates that might fix things? Etc.

If we're all out of solutions here, then I need to start considering... the nuclear options :(

* * *

Would this be possible? -> Getting another NVMe 1TB drive tomorrow, placing it on the motherboard alongside the current one (I haven't looked, but the X570 Tomahawk advertises 2 NVMe slots) and then somehow MIRRORING all the contents of my C drive onto the new drive. If this is somehow doable, would it not be a preferable solution to losing everything and starting over, from a time/workload perspective? I just wouldn't know where to begin mirroring a NVMe windows/c drive. (I assume it involves creating some sort of image (.iso?) and then copying all of that to the new drive?)

If there are good reasons why this would not work, then re-installing Windows from scratch on the same drive after a full reformat seems to be the final solution. Ugh. I hate my life. What would I have to do here, place the Windows media creator app on a bootable USB stick and letting it do the rest automatically? (And re-entering my Win 7 license key when prompted? I think that's how we did it the first time.)

Whew! Sorry for the novel-sized recap. I'd just hate to go to the nuclear option if there was an easier fix to this all along (like some kind of rollback feature) and wanted to be thorough. :unsure:
 

lopoetve

Fully [H]
Joined
Oct 11, 2001
Messages
32,425
I went digging specifically for how to configure my exact sticks with my exact motherboard, and didn't find much, except this page confirming their compatibility. Do the numbers in the specs suggest my BIOS should be configured differently?

pendragon1 As the screenshot indicates, I'm using BIOS Ver E7C84AMS.160 (build date 05/28/2021)
Meanwhile, the BIOS page for the X570 Tomahawk suggests there have been 2 bios revisions released since : a stable one, and a beta one.
Should I be updating? And if so, to which one?

Also, why isn't anyone giving this thread I posted earlier any mind? He points to Microsoft's proposed solution in 2020 for CHKDSK /F issues (about halfway down the page) which look like :
  • Start up into the Recovery Console
  • Select Advanced options.
  • Select Command Prompt from the list of actions.
  • Once Command Prompt opens, type: chkdsk /f
  • Allow chkdsk to complete the scan, this can take a little while. Once it has completed, type: exit
Do I just scratch this off as a potential solution? (Running chkdsk /f again, but from the recovery console this time.)
Or is that more likely to make the issue worse? I'm just trying to avoid having to re-install everything from scratch on the same drive.
All that does is run a chkdsk. It won't hurt, but the times I've seen that help in 20 years can be counted without even taking my shoes off. NTFS is damned good at handling that stuff automatically unless the drive itself actually died.

There's also no detail on what it supposedly found/fixed/did, so it's hard to figure where that might make things better. Generally, rather than poor performance, a corrupted NTFS volume would result in a hard crash or panic, or an error in an application. That being said, go ahead and do it - but this doesn't fit the symptoms of corrupted FS (which I HAVE dealt with more times than I can count).
 

lopoetve

Fully [H]
Joined
Oct 11, 2001
Messages
32,425
We're a bit all over the place right now, so I'll try to clean things up a bit.

Everyone seems to be giving up on fixing the sluggishness issue, preferring to concentrate on the Kernel Security Check Failures that never really affected performance, and are likely ram or audio related (even if they only seem to happen while on SurfShark VPN)... but what do I do about the sluggishness? Because until yesterday - and despite those occasional KSCF BSODs - performance was top-notch. The KSCF errors were barely an inconvenience (I was usually back in whatever app I was in 20 s later like nothing happened, and that includes a full reboot).
Fix the core cause, or it'll just come back (and you're likely going to have to reinstall windows anyway afterwards either way).
The sluggishness, however, is new from yesterday, and began after running CHKDSK /F and VERIFY on my C drive (a m.2 NVMe 980 Pro). I went back and made sure "VERIFY" was currently set to "OFF", too, but no change. I feel like I'm back on the 12 year old rig I replaced last summer with this one.

So while I do plan on getting my BIOS in tip-top shape ASAP, I'm more concerned about the latter issue than the former; unless one of you has reason to believe the BIOS fixes can also somehow fix this new across-the-board sluggishness that began yesterday - but did not exist until then.

* * *

My short-term worry is that Windows is relentlessly doing something that's taxing the CPU, raising its idle temp to 60C (from the usual 42C) and causing everything to run more slowly across the board as a result. I can hear the fans randomly take off while idling, when I used to only hear 'em while pushing the rig hard. Don't I risk damaging the CPU (or wearing out the paste) if I let this go on too long? (That would be a much more expensive replacement.) Nothing has failed or broken (yet), things just take much, much longer to do.
No. It'll be fine. :) It'll run forever at 60+c without even blinking - that's well within thermal design spec.
(PS: When I reboot and go into BIOS, the CPU temp there is 36C, which is about where it always is in the BIOS... suggesting the issue is likely Windows-related.)

Couldn't HWiNFO's various monitors help narrow down where the bottlenecking is occurring, or am I wasting my time with these concerns? I don't know HWiNFO very well, but it was highly recommended by HardForum users. Could it be the reading? The writing? Is it the drive, the ram, or something else? Do we give up updating the audio engine as a possible solution? Does the 980 Pro have firmware that could be updates that might fix things? Etc.
Troubleshooting generally doesn't happen much at this level anymore - it's gotten both complex enough, and simple enough, that you don't tend to have to go digging into interrupt polling/etc as a troubleshooting option. One thing: Turn off the integrated audio in BIOS. That'll show if the audio driver is causing an issue, because the device will go away. If it suddenly goes back to performing well, now we know where to dig. Also disable the HDMI audio device on your video card (under device manager), that'll remove that conflict as a potential cause too.
If we're all out of solutions here, then I need to start considering... the nuclear options :(

* * *

Would this be possible? -> Getting another NVMe 1TB drive tomorrow, placing it on the motherboard alongside the current one (I haven't looked, but the X570 Tomahawk advertises 2 NVMe slots) and then somehow MIRRORING all the contents of my C drive onto the new drive. If this is somehow doable, would it not be a preferable solution to losing everything and starting over, from a time/workload perspective? I just wouldn't know where to begin mirroring a NVMe windows/c drive. (I assume it involves creating some sort of image (.iso?) and then copying all of that to the new drive?)
There's no evidence that the drive is the problem. Chkdsk looks for filesystem inconsistencies and journal issues - it, unfortunately, cannot look ~inside files~ to see if their contents are sane (because it has nothing to compare against). Which means you'd just potentially mirror bad data, wherever that is, if this is the issue. There are various ways to do this - all the ones off top of head are UNIX based in some form, but I'm sure there's a livecd version somewhere too.
If there are good reasons why this would not work, then re-installing Windows from scratch on the same drive after a full reformat seems to be the final solution. Ugh. I hate my life. What would I have to do here, place the Windows media creator app on a bootable USB stick and letting it do the rest automatically? (And re-entering my Win 7 license key when prompted? I think that's how we did it the first time.)

Whew! Sorry for the novel-sized recap. I'd just hate to go to the nuclear option if there was an easier fix to this all along (like some kind of rollback feature) and wanted to be thorough. :unsure:
This is where I'd normally turn now. But in order, I'd do this:
1. Disable audio devices. See if performance improves - if so, driver problem, or weird esoteric hardware problem (driver - very possible. hardware - unlikely).
2. Run the chkdsk command from above. Can't hurt, likely won't help.
3. Wait to see what Dan comments on for MSI OC options, since I'm not familiar enough with those to say for certain (other than "that ain't stock").
4. Format and reinstall windows. Your process looks right, but I always just buy a crappy win10 license and tie it to an MSFT account, so I just use the latest W10 ISO to install (rufus builds the usb stick).
 

JYeager11

Gawd
Joined
Nov 15, 2006
Messages
559
This is where I'd normally turn now. But in order, I'd do this:
1. Disable audio devices. See if performance improves - if so, driver problem, or weird esoteric hardware problem (driver - very possible. hardware - unlikely).

Done, and done (I didn't even know my 750 Ti had built-in audio until you brought it up just now)
Rebooted, audio icon in taskbar confirms changes with red X and "No Audio Output Device installed" statement.
No difference :( same sluggishness.

2. Run the chkdsk command from above. Can't hurt, likely won't help.

Are you really asking me to run a CHKDSK on my SSD after what happened yesterday? :LOL:
And discovering after the fact that using CHKDSK on SSDs is apparently a known issue, including one thread I linked to literally called "CHKDSK /F BROKE MY 980 PRO"? :nailbiting:

Although you didn't mention the /F part. Do you just want me to check and not fix? I might be brave enough for that. :)

3. Wait to see what Dan comments on for MSI OC options, since I'm not familiar enough with those to say for certain (other than "that ain't stock").

Dan to the rescue!
We love Dan!
Dan for President!

4. Format and reinstall windows. Your process looks right, but I always just buy a crappy win10 license and tie it to an MSFT account, so I just use the latest W10 ISO to install (rufus builds the usb stick).

I appreciate the alternative suggestion, but I'd feel a bit silly not using my own legit license. I also don't have my current Windows paired with any email accounts (privacy concerns, I don't like when I'm too linked-up everywhere; I don't even like being logged into my Google account when websurfing).

Re-enabling onboard audio in the BIOS now, since we've determined that had no effect on the sluggishness (but still no clue if it helped the BSODs, as those are infrequent).
 
Last edited:

bluestang

Limp Gawd
Joined
Dec 14, 2018
Messages
440
You can also take a look at your Performance Monitor if you open the Task Manager and see what your disk(s) are doing and what program/operation, along with CPU as well. Curious if your disk is at or near 100% usage for some reason now and causing lags/sluggishness.
 

chameleoneel

Supreme [H]ardness
Joined
Aug 15, 2005
Messages
5,741
View attachment 434731

Ok. So you see down under voltage where it says CPU NB/SoC Voltage, Auto? Set that to 1.1 instead. Stock is 1, which is often too low for most RAM (G.Skill seems to be the only one happy there, and even then, at larger quantities, it wants to get flaky).
Yes indeed. Kernel issues like that are likely a RAM issue, before anything else. And more specifically, a RAM configuration/settings issue.

1. I wouldn't trust what "auto" is reporting for the SOC voltage. I would manually set that. I have experienced situations where a manual voltage setting improved things, even when set exactly the same as what "auto" was reporting.
2. Use XMP. Whatever is configured manually for timings by your friend, may not be playing well for the Ryzen platform.
3. Check for the command rate value. If its set to 1t command rate, set it to 2t. I have this same exact RAM and could never get it stable at 1t on my Ryzen platform.

4. Once you get the RAM setup properly...I would just re-install windows.

5. kind of a flailing guess here but, all of those interrupts could be various processes having to re-do data, because of the corruption from your RAM.


They could be coming from just about anything. Among other things, such as the NIC telling the CPU it got a packet, interrupts are used for error handling. So just about anything that's broken/dying or has a bad driver could end up spewing them. It would be really useful to find out what the source of the excess interrupts is so you'd have an idea where to look for the problem. I haven't personally had to deal with debugging an interrupt problem like this, so I'm not sure what to suggest for a diagnostic utility. Most of my screwing around with interrupts involved performance tuning for real time applications on Linux.

The craziest problem I've heard of with involving excessive numbers of interrupts and system sluggishness was in a thread on here started by a guy in Iran with an i3-10100F and an H510 (I think) mainboard. After multiple trips to a repair shop and the warranty repair depot it turned out his CPU and mainboard didn't get along. Other CPUs worked in his board and his CPU worked on other boards, but the two of them together generated a ridiculous number of interrupts. My guess is his board had a bug that made it not get along with CPUs that lack integrated graphics since all the test CPUs he mentioned had integrated graphics.
Man I remember that thread. I felt so bad for that guy!
 

JYeager11

Gawd
Joined
Nov 15, 2006
Messages
559
Yes indeed. Kernel issues like that are likely a RAM issue, before anything else. And more specifically, a RAM configuration/settings issue.

First, hi, and thank you for your help. Solving the KSCF BSODs would be great -- especially if they're RAM related, I wouldn't want to rebuild on unstable foundations -- but as I've mentioned before, they've been happening since the rig was built 6 months ago and never affected performance : I'd usually be back in whatever app I was using 20s after the blue screen (10s to reboot, another 10s to get into my work app) thanks to how blazing fast this rig was until Thursday (6+ months in).

It wasn't until Thursday's CHKDSK /F + VERIFY that everything went massively sluggish (the boot-up, the shut-down, the time it takes to start browsers and start loading sites, even the time it stays off before starting up again after a manual reboot). The only thing that seems unaffected are internet speeds and copying files from one external drive to another (when I was backing stuff up just in case).

But wait, there's more..!
Because last night, I got my first KSCF BSOD since the sluggishness began on Thursday.


And it was a scary one, because unlike the previous times, it hung at the blue screen and did not reboot on its own. It usually reboots faster than I can read the error screen (5s) but this time it hung seemingly indefinitely. I forced a hard-shutdown after waiting 10 mins. Thankfully, Windows was fine (again) albeit still sluggish. But it was otherwise like the KSCF didn't happen (like all the other times it happened in the prior 6 months).

BUT HERE'S WHERE THINGS GET INTERESTING :
It happened within 5s of my starting a large 25GB file download, AFTER GETTING ON SURFSHARK VPN FOR THE FIRST TIME SINCE THURSDAY. I hadn't touched my VPN at all since the sluggishness, specifically because based on past experience, the odds were pretty good that this app is somehow triggering the KSCF BSODs (6 months in, they have yet to happen when I'm NOT in SurfShark's virtual tunnel). But I was itching to test the theory one last time last night.

And I think we can now safely assume correlation, at least at the trigger point.

After hard-shutting down (because it hung this time) I then go back into Windows, which now takes infinitely longer to do than it did for 6 months, and started THE EXACT SAME FILE DOWNLOAD again, from the same source, using the same software, only this time while leaving the VPN out of the chain; and this went JUST FINE : took about a half-hour at my regular download speed, while surfing websites at the same time.

At the end of the day, I don't know if SurfShark VPN is triggering something on the RAM side of things, or if there's something about the app that Windows Security doesn't find kosher, or even if it's just having a hard time co-existing with something else on my machine (though I can't imagine what)... but either way, last night convinced me that I AM NEVER USING IT AGAIN, and definitely will not be installing it on any new Windows I'm forced to build (since it's looking like we're headed that way). I'll be riding out my SurfShark subscription using only their browser extension, which I recently learned is an alternative to the fully embedded Windows software.

So it's not entirely out of the realm of possibility that as long as my new OS never has the SurfShark software installed in the first place, I'll never see the KSCF errors again.

The only thing I don't understand is why I'm not seeing more reports of the SurfShark app triggering KSCFs out there. There's very little that's special about my machine, everything from the parts to the OS are very common and popular. Even the Windows license is legit. The only unusual thing about this rig is that the GPU is legacy (GTX 750 Ti) until those 3070 prices come back down, as that is most likely the weakest link in the chain. Could it be a mistake to use modern NVIDIA drivers on a card this old, even if it's listed as compatible with said drivers?


1. I wouldn't trust what "auto" is reporting for the SOC voltage. I would manually set that. I have experienced situations where a manual voltage setting improved things, even when set exactly the same as what "auto" was reporting.
2. Use XMP. Whatever is configured manually for timings by your friend, may not be playing well for the Ryzen platform.
3. Check for the command rate value. If its set to 1t command rate, set it to 2t. I have this same exact RAM and could never get it stable at 1t on my Ryzen platform.

4. Once you get the RAM setup properly...I would just re-install windows.

I would love to do everything you're describing, but I'm a bit out of my depth in these waters. :unsure:
Is there any chance you (or someone else) could babystep the process for me a bit more? I've literally never played with my RAM before, on this rig or any previous one.
Note 1: A friend set everything up for me last August when he built the rig in front of me, based on the timings provided with the RAM; and while he's been building servers for 15 years, he's more an old school Linux type and not quite as familiar with Windows and the latest tech; which is why I suspect he stayed away from A-XMP, as I'd be surprised if he'd ever heard of it before seeing the option on-screen. I know I hadn't.

Note 2: In case it matters, I can confirm that the 2 x 16GB sticks are currently installed in the X570 Tomahawk's slots 2 and 4 (slots 1 and 3 are empty). I don't see actual numbering, but to be extra-clear : I consider Slot 1 to be the inside one, closest to the CPU.

I tried popping both sticks out and clicking them back in (yesterday), no change.

Should I try them in different slots?

(I appreciate all this help more than you guys know, I would be totally screwed without it.)
 
Last edited:

TheSlySyl

[H]ard|Gawd
Joined
May 30, 2018
Messages
1,958
I know I'm jumping in late, so sorry if these have already been tried. But, in no particular order.

1: Ethernet drivers - disable the Ethernet port, see how things go.

2: run these commands:
DISM.exe /Online /Cleanup-Image /Restorehealth

SFC /Scannow

See if windows has anything corrupt from all the previous stuff that's been going on. (Obviously need internet to run the online one)
Never hurts to run these every now and then anyway.

3: Raise the RAM voltage a bit and lower the RAM speed...RAM errors get fucking weird. I'd also take out and reseat the RAM, make sure every pin is making contact.

How many drives are in this computer total? The issue could be one of the Non-OS drives causing errors.
 

JYeager11

Gawd
Joined
Nov 15, 2006
Messages
559
I know I'm jumping in late, so sorry if these have already been tried. But, in no particular order.

1: Ethernet drivers - disable the Ethernet port, see how things go.

2: run these commands:
DISM.exe /Online /Cleanup-Image /Restorehealth

I didn't feel safe jumping right to /RestoreHealth (after jumping right into CHKDSK /F on Thursday), but here's what /CheckHealth reveals :

DISM report.jpg

Does this suggest that a /RestoreHealth would be pointless, or do you still recommend taking the risk?


Very first tangible sign of data corruption

I speak of risk because going into Photoshop just now (to crop that screenshot for you) also revealed something else : my Photoshop preferences and defaults (including "recent documents") have been wiped. They were fine for the past couple of days despite the sluggishness. THIS IS THE FIRST INSTANCE OF ANY ACTUAL DATA GOING MISSING since the rig was built; so it seems significant.

EDIT: I just reloaded Photoshop to see if it would at least remember the above image, after wiping out recent doc history : and it did. It's listed (alone) in recent documents, so that recent-docs data file got rebuilt after its initial corruption.

To answer your question about the number of drives, there are 3 internal drives : one 980 Pro NVMe for the C, and two 870 Evos (regular SSD) for the docs.

* * *

Given that it was similar CMD commands (CHKDSK /F, VERIFY) that started the issues on Thursday in the first place, I think I'd feel safer continuing with the SFC /scannow AFTER preparing my Windows installation media, should something we do FUBAR the C drive.

I see Windows 11 is now an option on the Microsoft download page (it wasn't when we built the rig). If I end up having to re-install Windows (which seems increasingly likely) do I take this opportunity to fresh-install Windows 11, or do I re-install Windows 10 and upgrade to 11 later when it's my turn? (I'm currently still on the W11 upgrade waiting list, so still W10)

I have both MediaCreationTool21H2.exe and MediaCreationToolW11.exe downloaded, I'll probably create installation media for both in the meantime.

EDIT: Umm.. maybe I should stay away from Windows 11 for now after all, based on that list of issues.
 
Last edited:

lopoetve

Fully [H]
Joined
Oct 11, 2001
Messages
32,425
Yeah. My money is on crashes from RAM - and you finally corrupted the inside of a file (hence why things are going weird) on a crash. It's rare, but it can happen. PS losing things - the file wasn't closed, and the fall back is probably "create new file" when its trashed.

So, to my prior points -
0. Restore bios defaults. This will clear all the settings your friend did - he did it manually, we don't really do that anymore unless we're REALLY trying to push boundaries on overclocking. You're not, so lets not.
1. Set SoC voltage manually to 1.1. Ignore the auto setting.
2. Enable XMP Profile 1.
3. Reinstall Windows. Try to see if things are stable now. Betcha they are. IF not, set RAM voltage to 1.35 or 1.4v - try again. Windows installs are dead easy these days. :)
 

JYeager11

Gawd
Joined
Nov 15, 2006
Messages
559
Yeah. My money is on crashes from RAM - and you finally corrupted the inside of a file (hence why things are going weird) on a crash. It's rare, but it can happen. PS losing things - the file wasn't closed, and the fall back is probably "create new file" when its trashed.

So, to my prior points -
0. Restore bios defaults. This will clear all the settings your friend did - he did it manually, we don't really do that anymore unless we're REALLY trying to push boundaries on overclocking. You're not, so lets not.
1. Set SoC voltage manually to 1.1. Ignore the auto setting.
2. Enable XMP Profile 1.
3. Reinstall Windows. Try to see if things are stable now. Betcha they are. IF not, set RAM voltage to 1.35 or 1.4v - try again. Windows installs are dead easy these days. :)
This all sounds like good advice, and I'm real close to feeling brave enough to do it :LOL: what's the best way to restore BIOS defaults? (When I Google the subject, I get some scary answers involving unplugging the power supply, looking for specific pins, etc.) Surely it's easier than that?

The person who helped me build the rig recommends trying a Windows Repair with my (newly created) Windows installation media before I try anything else (I didn't even know it could do that until he told me). If the repair doesn't work, he wants me to go straight to the reformatting + re-installing windows part. He especially doesn't want me updating the BIOS first (and apparently neither do you, you just want me to restore defaults) because he claims it was fine for 6 months and I risk making stuff worse. He blames the blue screens on the faulty SurfShark VPN and says just to not re-install it on the clean rebuild.

(I get that there's an excellent chance you're right and his ego isn't allowing for the possibility that he might not have configured the ram correctly, but in his defence, until I ran that CHKDSK /F + VERIFY combo on Thursday, everything was blazing fast for 6 straight months, with zero degradation over time -- only the occasional errors forcing quick reboots while receiving data on the VPN -- the sluggishness didn't show up until Thursday's Command Line stuff.)

Speaking of... I believe you also said it wouldn't hurt to run CHKDSK again from the "Recovery Console" which is what fixed that other guy's issues with his 980 Pro (after a regular CHKDSK /F almost broke his drive too). I have no idea where this Recovery Console is, is it launched through the Windows 10 Installation USB stick I created earlier, via MS' Windows Media Creation Tool?

I also thought I would share this screenshot with you. It details the progress of my backing up my Firefox user profile for the re-install. It's only 200mb in size but has appx. 7,500 files and subfolders. Copying from sluggish C: (980 Pro NVMe) windows SSD to D: (870 Evo) documents SSD. 3 days ago, this would've taken a fraction of the time it took here.

1642905860289.png

Those two peaks at the start are where it hit 4MB, then it averages out to about 300-400k the rest of the way. Meanwhile, I'm backing up from my D: drive to my old gen external usb backups at light speed compared to this.
 

JYeager11

Gawd
Joined
Nov 15, 2006
Messages
559
As an aside, I'm surprised there's no way to diagnose if the problems are RAM or SSD related. I'm (clearly) no I.T. expert, but can't we run some kind of test on both to find out? With either Windows' own tools, or HWiNFO?
 

TheSlySyl

[H]ard|Gawd
Joined
May 30, 2018
Messages
1,958
I'm also leaning towards RAM. I've never had any AM4 system that was perfectly stable with default ram settings.
 

lopoetve

Fully [H]
Joined
Oct 11, 2001
Messages
32,425
As an aside, I'm surprised there's no way to diagnose if the problems are RAM or SSD related. I'm (clearly) no I.T. expert, but can't we run some kind of test on both to find out? With either Windows' own tools, or HWiNFO?
Chkdsk tells you if the filesystem is fine (it is, apparently). The drive itself will throw SMART errors (it isn't) if it's broken - that'll show up all over the place. That leaves file contents, and there's no way to really check those other than checksum against a known-good file of the exact same version (g'luck).

Beyond that, these things exist for enterprise workloads; consumer stuff? Eh... not as much. Samsung has a tool to report on the health of the drive - you can run it, but I'll pretty much guarantee that the drive is fine (won't hurt to check though). On the consumer side, we tend to go with "limit it down, replace the part, stride briskly onwards" - because tbh, the data isn't generally valuable enough for true integrity systems/etc.

As for testing hte RAM - memtest, but Ryzen not-enough-volts and memtest often return false "it's fine" unless run for a good long time (think 24 hours+).
 

bluestang

Limp Gawd
Joined
Dec 14, 2018
Messages
440
While I agree there is a RAM stability issue causing the crashes (don't forget about setting BOOT DRAM Voltage as well), I have never seen that cause this issue with a drive like this. There would have to be an enormous amount of files corrupted causing read/write issues for this to happen and there would be errors trying to open programs and do basic things IMO.

Have you even looks at your drive SMART datta with CrystalDiskInfo or Hard Disk Sentinel to see if there is an issue with the drive itself?
 

lopoetve

Fully [H]
Joined
Oct 11, 2001
Messages
32,425
This all sounds like good advice, and I'm real close to feeling brave enough to do it :LOL: what's the best way to restore BIOS defaults? (When I Google the subject, I get some scary answers involving unplugging the power supply, looking for specific pins, etc.) Surely it's easier than that?
Generally it's under the save/exit page in the BIOS - would be something there that says reset to default. No need to touch batteries or pins anymore!
The person who helped me build the rig recommends trying a Windows Repair with my (newly created) Windows installation media before I try anything else (I didn't even know it could do that until he told me). If the repair doesn't work, he wants me to go straight to the reformatting + re-installing windows part. He especially doesn't want me updating the BIOS first (and apparently neither do you, you just want me to restore defaults) because he claims it was fine for 6 months and I risk making stuff worse. He blames the blue screens on the faulty SurfShark VPN and says just to not re-install it on the clean rebuild.
Sure. That'll try replacing files from teh WIM image on the media. I'm always sketched out about that from WAY too much history to bother - and you'll want a backup anyway, so... might as well reinstall? If it goes wrong, you're reinstalling anyway, so...

Software shouldn't cause a BSOD, unless there's something REALLY sketchy about that VPN (and that's not my expertise). Possible? Sure. But also possible that the RAM timing is whack - for one of my systems, that only showed up playing high-quality youtube vids. It's a WEIRD issue on how it shows.
(I get that there's an excellent chance you're right and his ego isn't allowing for the possibility that he might not have configured the ram correctly, but in his defence, until I ran that CHKDSK /F + VERIFY combo on Thursday, everything was blazing fast for 6 straight months, with zero degradation over time -- only the occasional errors forcing quick reboots while receiving data on the VPN -- the sluggishness didn't show up until Thursday's Command Line stuff.)
Ayup.

So the thing is, verify isn't part of the chkdsk command. You've run "verify off" as an admin user, right (windows key, type: cmd, right click, run as administrator)? Thought you had from above.

Speaking of... I believe you also said it wouldn't hurt to run CHKDSK again from the "Recovery Console" which is what fixed that other guy's issues with his 980 Pro (after a regular CHKDSK /F almost broke his drive too). I have no idea where this Recovery Console is, is it launched through the Windows 10 Installation USB stick I created earlier, via MS' Windows Media Creation Tool?
It's from the boot media of some kind, yes - haven't had to do that since Server 2012, so I'm rusty.
I also thought I would share this screenshot with you. It details the progress of my backing up my Firefox user profile for the re-install. It's only 200mb in size but has appx. 7,500 files and subfolders. Copying from sluggish C: (980 Pro NVMe) windows SSD to D: (870 Evo) documents SSD. 3 days ago, this would've taken a fraction of the time it took here.

View attachment 435623
Those two peaks at the start are where it hit 4MB, then it averages out to about 300-400k the rest of the way. Meanwhile, I'm backing up from my D: drive to my old gen external usb backups at light speed compared to this.
File ops are REALLY slow - I don't expect bandwidth on small files, but file ops are extremely slow. This is weird. Still fix the ram - otherwise we might be chasing ghosts. That's always my FIRST AMD troubleshooting step for good reason.
 

JYeager11

Gawd
Joined
Nov 15, 2006
Messages
559
I'm also leaning towards RAM. I've never had any AM4 system that was perfectly stable with default ram settings.
But I thought default RAM settings is what lopoetve wants me to reset my BIOS back to. I'm so confused.

Are you saying the guy who helped me build the rig just set auto-everything? Hmm. Is that really a bad thing, though, if it's displaying all the correct values while set to auto (and has been running flawlessly outside those VPN-related KSCF errors for 6 months)? In other words, what if you guys are right about the Kernel Security Check Errors piling up and eventually breaking something that caused the sluggishness (which wouldn't explain why it began with the CHKDSK/VERIFY) but that it's the VPN software causing these errors, not the RAM? (Maybe I need to shut up with alternate theories since I am way out of my depth here.) :)

You suggested disabling Ethernet earlier, to see how performance is affected. I tried it, and it killed my internet (which is normal). Without rebooting (because this takes forever now) I then tried loading Photoshop and it was just as slow as it's been since Thursday. However, when I went back to re-enable it and saw the full list of network adapters, I saw :
- Ethernet
- Wi-Fi
- wintunshark0 <- Surfshark?
- Ethernet 2
- All network adapters

If that middle one is Surfshark, and the crashes have only ever happened while I'm on that connection, wouldn't it be worth disabling THAT (instead of Ethernet) to see what happens? (I don't dare 'til one of you I.T. gods tells me it's safe) ;)

I feel like not enough attention has been put on the fact that the Kernel errors never happened (not once) when I wasn't on Surfshark VPN receiving data. And I'm only on it 20% of the time. Meaning the other 80% was trouble-free this entire time, right up until Thursday with the CHKDSK /F + VERIFY. Since then, constant sluggishness. Everything is slow. Always. And the one time I tried getting back on the VPN since Thursday, the Kernel error returned within 5 seconds of downloading something I had no trouble downloading when I tried it again without the VPN, a minute later.

I feel like these are telling signs, but you guys don't seem to feel that way.
 
Last edited:

JYeager11

Gawd
Joined
Nov 15, 2006
Messages
559
So the thing is, verify isn't part of the chkdsk command. You've run "verify off" as an admin user, right (windows key, type: cmd, right click, run as administrator)? Thought you had from above.
Okay, let's recap the facts that seem to have become a little blurry again (apologies if you're sick of them at this point, but maybe some late arrivals to the discussion appreciate the rundown). :)
  • A friend of mine built this rig in front of me from parts recommended by this channel last summer. While he's more Linux than Windows, he was extra-careful with my rig because we're friends. He didn't just rush through it.
  • Once built, this thing was stable and blazing fast, booted full Windows 10 Pro in 8 seconds, Photoshop in 5, etc. AMAZING!
  • As soon as I get the thing home (same-day) I install my important Adobe + Microsoft apps, as well as SURFSHARK VPN for Windows.
  • I set the VPN software to load at startup but NEVER auto-connect to a server. I would only manually connect to the VPN, which turned out to be barely 20% of the time (it was pretty much just for porn & getting around geo-restrictions)
  • Those KERNEL_SECURITY_CHECK_ERRORS only happened when I was CONNECTED to the VPN, as well as downloading something significant (watching a HD stream qualifies). Never when connected + idle. These BSODs never cared what browser I'm on or if my adblock is enabled or not. The only consistency I found was the VPN connection. (I even got one after rebooting, reconnecting to the VPN, and loading nothing but my Torrent app to finish a download; with no audio or video streaming when it happened.)
  • This had been going on from August 2021 until Thursday (2 days ago) and never, ever affected performance in any way. I was always back in whatever I was doing less than 20s after getting the error. Barely an inconvenience. Small price to pay for the performance, I told myself.
  • I don't know what came over me Thursday, but this time I decided to see if a CHKDSK /F would help. It wouldn't run it outright (files in use I guess) so I scheduled it to do it at the reboot, which it did. Didn't take too long, text flew by too quick to read, and soon I was back in Windows. I'm sure there's a report somewhere, but I wouldn't know where to find it (if you tell me, I'll fetch it).
  • Once back in after the CHKDSK /F, I decide to try a VERIFY since I heard driver issues are often responsible for this error. But I wasn't savvy enough to type "VERIFY ON", just "VERIFY". Here, it SHOULD have told me if VERIFY was on or off (that's what VERIFY, alone, does) but it didn't. Nothing happened when I hit ENTER. I figured it would abort what it's doing if I close the window, so I did.
Sluggishness ever since.

While VERIFY has since been confirmed as OFF, what I think happened is that the CHKDSK did something non-kosher to the 980 Pro NVMe (wouldn't be the first time, Google it) and the very first symptom of it might've been the unresponsive "VERIFY" CMD that I got impatient with and closed. Not only did I get the command wrong (I thought I was turning VERIFY ON by typing just VERIFY) but I bet if I would have waited a few minutes, I would've seen a response (it would've told me if VERIFY was ON or OFF, which is what VERIFY on its own does). So it could never have been that. But the CHKDSK /F? That's still a suspect, imo.
 
Last edited:

JYeager11

Gawd
Joined
Nov 15, 2006
Messages
559
You can view your CHKDSK log from the Event Viewer...
https://www.minitool.com/news/chkdsk-log.html
It's scary going into this thing for the first time! =)
Most of the events are just regular "information".
However, there are 3 or 4 types of errors that seem to be recurring either daily, or on every other day.
  • Event ID: 86, CertificateServicesClient-CertEnroll
    SCEP Certificate enrollment initialization for WORKGROUP\XXXXXX via XXXXXXXX failed.
    GetCACaps: Not Found

    This one usually appears solo every couple of days, maybe less.

  • Event ID: 100, Bonjour Service
    This one always appears in groups of 5, and they always end with :
    Local Hostman DESKTOP-XXXXXXX.local already in use; will try DESKTOP-XXXXXXXX-2.local instead
    Seems to appear every couple of days or so.

  • ==> THIS MONSTER <==
    Event 2002, EapHost
    Skipping: Eap method DLL path validation failed.

    Always appear in groups of 18 (not a typo) and all at once, within 1 second of each other.
    The last time these 18 errors manifested was last night, but I can't remember if the timing (8pm) coincides with the BSOD that I purposely triggered as a test. To be sure, I kept looking for the time before that; and EUREKA : it lines up with Thursday's BSOD that prompted me to use CHKDSK /F in the first place (which spawned the current sluggishness).
Does this mean we just found the errors causing the KSCF BSODS? :woot:
A possible barrage of 18 simultaneous Event 2002's every time I download through my VPN app..?
Could it really have been the VPN software all along? And if so, why am I the only one complaining about this? Given how popular the VPN and my parts list are, shouldn't the internet be filled with complaints about this?

It's also possible that the problem isn't the SurfShark app itself, but something about my specific Windows configuration that the app doesn't play well with. The end result is the same tho : no more SurfShark app, switching to browser extension once this is over.

Also, once you get this solved/fixed then ditch that VPN for one that doesn't crash your PC as that's BS!
Right?! :) I agree! This was my first VPN experience ever (never tried one before) and 6 months later, I've already decided that I'm switching to the browser extension until my subscription runs out, regardless how this nightmare ends.

I will likely try default bios config next (tomorrow).
 
Last edited:

chameleoneel

Supreme [H]ardness
Joined
Aug 15, 2005
Messages
5,741
I dunno at what point you will actually accept and take some advice here?

A few of us have pretty succinctly told you that Ryzen is sensitive to RAM configuration and you need to do a couple of things to get that straightened out. It could be all of your issues. It also may not be an issue. But its such a common problem for Ryzen, we want to get it out of the way of possibility.

There very well could be a problem with your VPN. But ALSO, if you are having RAM issues, it can result in repeated problems with the same things, such that it looks like that thing is a problem.

If you have RAM issues but don't fix them and then try to reinstall windows: your windows will continue to have problems.

Your computer boots. So, the easy way to load the bios defaults, is the boot the computer into the bios-----and select the option to load defaults/load optimized defaults----or whatever your bios says which is closest to that sort of wording. Clearing the CMOS with jumper pins (to reset the BIOS to defaults) is what you do, if your computer won't POST and then allow you to boot into the bios.

after you load and save the defaults (usually forcing a reboot): boot back into the bios and make the RAM voltage and timing changes which we suggested. (1.1 soc voltage, change to 2t command rate if it says its at 1T. you can try XMP or leave it off for now).

Read the manual for your motherboard to help you get more knowledgeable with the bios and what's in it. As well as other aspects of your motherboard. That might sound annoying to you. But even as an advanced builder myself----I look at the manuals for every new motherboard I purchase. I just got an new motherboard a couple of months ago, for Intel 12 series. and I read parts of the manual, to be sure about some aspects of the build and configuration.
 

JYeager11

Gawd
Joined
Nov 15, 2006
Messages
559
A few of us have pretty succinctly told you that Ryzen is sensitive to RAM configuration and you need to do a couple of things to get that straightened out. It could be all of your issues. It also may not be an issue. But its such a common problem for Ryzen, we want to get it out of the way of possibility.
I agree, I've just been hit with a lot of conflicting advice at once (not only here, but elsewhere as well) and wanted to look into as much of it as I can (for instance the SFC /scannow, resetting to BIOS defaults, and/or running CHKDSK /F from recovery console)... things I was planning to get to eventually. Unfortunately, it's 3am now so I'm done, but I hope to try at least the BIOS reset tomorrow.
 

pendragon1

Extremely [H]
Joined
Oct 7, 2000
Messages
39,767
(not only here, but elsewhere as well)
thats part of your confusion/problem. we here are all telling you the same thing. we dont know what they are telling you "elsewhere", causing you confusion. so pick a poison and stick to it, i would suggest us, but im biased...
 

lopoetve

Fully [H]
Joined
Oct 11, 2001
Messages
32,425
But I thought default RAM settings is what lopoetve wants me to reset my BIOS back to. I'm so confused.
Yes, so we can set them correctly. I have no idea what your friend set - he clearly set parts of it manually, and parts of it on auto, and that isn't a sane path anymore (it used to be, things change).
Are you saying the guy who helped me build the rig just set auto-everything? Hmm. Is that really a bad thing, though, if it's displaying all the correct values while set to auto (and has been running flawlessly outside those VPN-related KSCF errors for 6 months)? In other words, what if you guys are right about the Kernel Security Check Errors piling up and eventually breaking something that caused the sluggishness (which wouldn't explain why it began with the CHKDSK/VERIFY) but that it's the VPN software causing these errors, not the RAM? (Maybe I need to shut up with alternate theories since I am way out of my depth here.) :)
With Ryzen? Yep, it's a bad thing. Ryzen is picky. Perfectly stable once you set it right, but picky before that.

RAM utilization is odd. Encapsulation (for a VPN) uses different parts of a CPU (AES offload, generally) than normal operation - you're encrypting traffic with a different algorithm (and different extensions of the CPU) than other things would. All sorts of oddities that can trigger stuff - for me, remember, it was high-quality youtube videos. That means the AV1 codec, which uses different parts of the chip and GPU than H.264 or H.265 would - they're implemented differently in hardware. Hence one crashes, and the other works fine.

Is it the RAM? Don't know, but we KNOW how absurdly common that is, so we fix the common things first. If your car shifts hard, but you've got a flat tire - well, fix the tire first - because otherwise, figuring out the shifting is both difficult, and pointless.
You suggested disabling Ethernet earlier, to see how performance is affected. I tried it, and it killed my internet (which is normal). Without rebooting (because this takes forever now) I then tried loading Photoshop and it was just as slow as it's been since Thursday. However, when I went back to re-enable it and saw the full list of network adapters, I saw :
- Ethernet
- Wi-Fi
- wintunshark0 <- Surfshark?
- Ethernet 2
- All network adapters

If that middle one is Surfshark, and the crashes have only ever happened while I'm on that connection, wouldn't it be worth disabling THAT (instead of Ethernet) to see what happens? (I don't dare 'til one of you I.T. gods tells me it's safe) ;)
Don't know how the software would handle that. That is the VPN adapter, but lord knows how the software would handle it (I use IPSEC and OpenVPN, but my use case is entirely different than yours). Disabling it won't hurt, but the software may bitch at you or just re-enable it.
I feel like not enough attention has been put on the fact that the Kernel errors never happened (not once) when I wasn't on Surfshark VPN receiving data. And I'm only on it 20% of the time. Meaning the other 80% was trouble-free this entire time, right up until Thursday with the CHKDSK /F + VERIFY. Since then, constant sluggishness. Everything is slow. Always. And the one time I tried getting back on the VPN since Thursday, the Kernel error returned within 5 seconds of downloading something I had no trouble downloading when I tried it again without the VPN, a minute later.

I feel like these are telling signs, but you guys don't seem to feel that way.
Because that uses a different set of tools than normal operation does :) SO much is offloaded to specific silicon bits these days that things show up ~weird~. At first glance, a VPN causing a crash from hardware sounds weird - till you realize that the encryption, encapsulation, pass-off and packet formation are all handled in hardware - and all require things to pass through system RAM multiple times - and goofy kit will cause one of those to fail, and garbled data can cause the OS to panic and pull the ejection handle.
 

lopoetve

Fully [H]
Joined
Oct 11, 2001
Messages
32,425
Okay, let's recap the facts that seem to have become a little blurry again (apologies if you're sick of them at this point, but maybe some late arrivals to the discussion appreciate the rundown). :)
  • A friend of mine built this rig in front of me from parts recommended by this channel last summer. While he's more Linux than Windows, he was extra-careful with my rig because we're friends. He didn't just rush through it.
No doubt. He just didn't know some of the stupid Ryzen tricks we've all learned the insanely hard way. Trust me, my first X370 system was the same way - there's lots of AMD "tricks" you learn - Intel tends to be more "out of the box" ready, for better or worse.
  • Once built, this thing was stable and blazing fast, booted full Windows 10 Pro in 8 seconds, Photoshop in 5, etc. AMAZING!
  • As soon as I get the thing home (same-day) I install my important Adobe + Microsoft apps, as well as SURFSHARK VPN for Windows.
  • I set the VPN software to load at startup but NEVER auto-connect to a server. I would only manually connect to the VPN, which turned out to be barely 20% of the time (it was pretty much just for porn & getting around geo-restrictions)
  • Those KERNEL_SECURITY_CHECK_ERRORS only happened when I was CONNECTED to the VPN, as well as downloading something significant (watching a HD stream qualifies). Never when connected + idle. These BSODs never cared what browser I'm on or if my adblock is enabled or not. The only consistency I found was the VPN connection. (I even got one after rebooting, reconnecting to the VPN, and loading nothing but my Torrent app to finish a download; with no audio or video streaming when it happened.)
See prior comments on offload/etc.
  • This had been going on from August 2021 until Thursday (2 days ago) and never, ever affected performance in any way. I was always back in whatever I was doing less than 20s after getting the error. Barely an inconvenience. Small price to pay for the performance, I told myself.
  • I don't know what came over me Thursday, but this time I decided to see if a CHKDSK /F would help. It wouldn't run it outright (files in use I guess) so I scheduled it to do it at the reboot, which it did. Didn't take too long, text flew by too quick to read, and soon I was back in Windows. I'm sure there's a report somewhere, but I wouldn't know where to find it (if you tell me, I'll fetch it).
  • Once back in after the CHKDSK /F, I decide to try a VERIFY since I heard driver issues are often responsible for this error. But I wasn't savvy enough to type "VERIFY ON", just "VERIFY". Here, it SHOULD have told me if VERIFY was on or off (that's what VERIFY, alone, does) but it didn't. Nothing happened when I hit ENTER. I figured it would abort what it's doing if I close the window, so I did.
I've only used the verify command once - a full journal replay on every command would make things sluggish, yes, but you've turned it off - so something is wrong. Chkdsk wouldn't actually do anything, and the reboot part is normal (has to be run on an unmounted filesystem, so it boots windows to ram, runs it, and then finishes loading). Try verify on and then verify off to see if anything changes, but we're pretty much at a reinstall here.
Sluggishness ever since.

While VERIFY has since been confirmed as OFF, what I think happened is that the CHKDSK did something non-kosher to the 980 Pro NVMe (wouldn't be the first time, Google it) and the very first symptom of it might've been the unresponsive "VERIFY" CMD that I got impatient with and closed. Not only did I get the command wrong (I thought I was turning VERIFY ON by typing just VERIFY) but I bet if I would have waited a few minutes, I would've seen a response (it would've told me if VERIFY was ON or OFF, which is what VERIFY on its own does). So it could never have been that. But the CHKDSK /F? That's still a suspect, imo.
I work in the storage industry - outside of filling the SLC cache or DRAM cache, there shouldn't be anything that Chkdsk does to the underlying media because it's only operating at the NTFS level. Now could it have found something stupid that the controller on the drive can't handle? Sure, but all the reports on google are people with other problems - not actually killing drives, which would be a much more major issue
Starting to think malware...
Possible, but that should show up in task manager/etc, and all we're seeing there is massive amounts of interrupts, which tells me "fucked up system, fix hardware, reinstall".
It's scary going into this thing for the first time! =)
Most of the events are just regular "information".
However, there are 3 or 4 types of errors that seem to be recurring either daily, or on every other day.
  • Event ID: 86, CertificateServicesClient-CertEnroll
    SCEP Certificate enrollment initialization for WORKGROUP\XXXXXX via XXXXXXXX failed.
    GetCACaps: Not Found

    This one usually appears solo every couple of days, maybe less.
Normal. You don't have a CA.

  • Event ID: 100, Bonjour Service
    This one always appears in groups of 5, and they always end with :
    Local Hostman DESKTOP-XXXXXXX.local already in use; will try DESKTOP-XXXXXXXX-2.local instead
    Seems to appear every couple of days or so.
Normal, bonjour is a moronic service and local broadcast does weird stuff.


  • ==> THIS MONSTER <==
    Event 2002, EapHost
    Skipping: Eap method DLL path validation failed.

    Always appear in groups of 18 (not a typo) and all at once, within 1 second of each other.
    The last time these 18 errors manifested was last night, but I can't remember if the timing (8pm) coincides with the BSOD that I purposely triggered as a test. To be sure, I kept looking for the time before that; and EUREKA : it lines up with Thursday's BSOD that prompted me to use CHKDSK /F in the first place (which spawned the current sluggishness).
DLL validation could easily be "this file is fucked, and I can't load it properly," or "I loaded this, but the data in RAM is garbage" (see a thread here?). EAP is an auth system - go back to my comments on crypto offload/etc, and we're probably at the SOFTWARE side of the crash, but still searching for the hardware.
Does this mean we just found the errors causing the KSCF BSODS? :woot:
A possible barrage of 18 simultaneous Event 2002's every time I download through my VPN app..?
See above on crypto offload again :)
Could it really have been the VPN software all along? And if so, why am I the only one complaining about this? Given how popular the VPN and my parts list are, shouldn't the internet be filled with complaints about this?
Yes, but I suspect that it's not the VPN or the parts, but Ryzen being picky and when it does crypto offload, it's, as the old saying goes, "eating a bag o' dicks".
It's also possible that the problem isn't the SurfShark app itself, but something about my specific Windows configuration that the app doesn't play well with. The end result is the same tho : no more SurfShark app, switching to browser extension once this is over.


Right?! :) I agree! This was my first VPN experience ever (never tried one before) and 6 months later, I've already decided that I'm switching to the browser extension until my subscription runs out, regardless how this nightmare ends.

I will likely try default bios config next (tomorrow).
Default bios. Set SoC voltage to 1.1. Enable XMP. Reinstall windows. Betcha it's 100% stable from there on out forever.
 

JYeager11

Gawd
Joined
Nov 15, 2006
Messages
559
You guys have been so amazingly patient with me.

In a few minutes, I'll be resetting BIOS to factory settings at the suggestion of, well... literally everyone here, it seems (and despite the very strong objections of the close friend who spent 6 hours carefully building and setting up this PC in the first place, after y'all helped me pick the parts). If this goes wrong, he won't want to help me and I won't have a functioning PC to use to reach y'all =)

But before I go ahead and put this precious childhood friendship on the line, here... chameleoneel was right, I was skipping ahead too much and should've been more methodical with the advice I was given. I can imagine how insulting that must feel, so I went back to the start of the thread and went over every suggestion to make sure I took the time to check the boxes y'all were kind enough to line up for me.

I think you'll find the results since my last post a bit more revealing.

  1. Run CHKDSK /F & Verify at CMD
    The first thing I tried after Thursday's Kernel Security Check Failure.
    --> All the sluggishness starts here <--
    CPU has been running 15C hotter when idling ever since, and the fan is always on, used to be quieter and only get this loud when pushed by Photoshop.
  2. Run Verify Off at CMD
    No improvement.
  3. Turning the PSU's Hybrid Mode OFF
    No improvement.
  4. Making sure the RAM voltage is 1.1
    It is, but group wants me to restore factory defaults anyway (not done yet)
  5. Run DISM /Online /Cleanup-image /CheckHealth at CMD
    Found nothing wrong.
  6. Using Event Viewer to find the errors SUCCESS!
    (Well, kinda.) A string of 18 simultaneous Event 2002: EapHost errors seems to roughly coincide with the timing of the last couple of Kernel Security Check Failure BSODs, pretty safe to assume there was a KSCF BSOD every time those same 18 simultaneous errors show up. Scrolling back in time seems to indicate they've been around since before Thursday's events.

    ---------------- That's where we left off yesterday ----------------

  7. Reboot after uninstalling SurfShark VPN and all its components
    No improvement.
  8. Individually searched for Driver Updates to all my Network Adapters
    All 10 (if we include WAN Miniports) are already the best drivers available after checking online, so no improvement.
  9. Reboot in CLEAN Windows (non-MS processes disabled)
    No improvement.
  10. Reboot in SAFE MODE IMMEDIATE IMPROVEMENT!
    Both with and without network (rebooted in both) Photoshop loads in less than 10 seconds, like it did right up until Thursday.
  11. Run SFC /SCANNOW in SAFE MODE SUCCESS!
    Found errors and repaired them!
  12. Run DISM /Online /Cleanup-image /CheckHealth in SAFE MODE
    Found nothing wrong.
I was really hoping that SFC /SCANNOW repair would've helped, but now that I'm back in Full Windows, I see it didn't (I could immediately tell at boot-up). Fnally checking off more of your boxes DID reveal new findings, however. The only suggestion I have't dared try yet, is another CHKDSK /F (probably from Safe Mode). I've been afraid of it because the problems started with it. (Still worth risking, or waste of time at this point?)

* * *

Right now, I'm getting a backup BIOS USB ready to stand by in case something goes wrong like it did for this guy. I chose 1.7, which is the next stable version after mine, released just over 1 month later. (I'm told mine was actually the first Zen3 compatible version, which might explain why the next release came as quick as it did.

I'm also preparing my Windows installation media (USB). I now have the option to fresh-install Windows 11, instead of re-installing 10 and upgrading to 11 later. What say you experts? Wait a while longer for 11, or jump on the rare opportunity for a clean install (which I only do when absolutely forced)?


We all still want me to just restore factory defaults of my current BIOS and see what happens, correct?

-J.

EDIT: You know the first Windows screen that prompts you for your user password (if you have one) after a reboot, before you see your desktop? If I click the power icon at the bottom right and then click REBOOT from this screen, I get a "If you shut down, you and any other people using this PC could lose unsaved work" warning (after a reboot!) Even if I wait 5 full minutes after rebooting, before rebooting again, without ever getting on my desktop, I'll get that warning. THIS IS NEW BEHAVIOR. Before this sluggishness, I could reboot from this screen sans interruptions. Isn't that odd?
 
Last edited:

lopoetve

Fully [H]
Joined
Oct 11, 2001
Messages
32,425
You guys have been so amazingly patient with me.

In a few minutes, I'll be resetting BIOS to factory settings at the suggestion of, well... literally everyone here, it seems (and despite the very strong objections of the close friend who spent 6 hours carefully building and setting up this PC in the first place, after y'all helped me pick the parts). If this goes wrong, he won't want to help me and I won't have a functioning PC to use to reach y'all =)
He did the right thing for pre-Ryzen AMD or even Intel (although it's faster on intel to just whack the XMP button too).
But before I go ahead and put this precious childhood friendship on the line, here... chameleoneel was right, I was skipping ahead too much and should've been more methodical with the advice I was given. I can imagine how insulting that must feel, so I went back to the start of the thread and went over every suggestion to make sure I took the time to check the boxes y'all were kind enough to line up for me.

I think you'll find the results since my last post a bit more revealing.

  1. Run CHKDSK /F & Verify at CMD
    The first thing I tried after Thursday's Kernel Security Check Failure.
    --> All the sluggishness starts here <--
    CPU has been running 15C hotter when idling ever since, and the fan is always on, used to be quieter and only get this loud when pushed by Photoshop.
  2. Run Verify Off at CMD
    No improvement.
  3. Turning the PSU's Hybrid Mode OFF
    No improvement.
  4. Making sure the RAM voltage is 1.1
    It is, but group wants me to restore factory defaults anyway (not done yet)
  5. Run DISM /Online /Cleanup-image /CheckHealth at CMD
    Found nothing wrong.
  6. Using Event Viewer to find the errors SUCCESS!
    (Well, kinda.) A string of 18 simultaneous Event 2002: EapHost errors seems to roughly coincide with the timing of the last couple of Kernel Security Check Failure BSODs, pretty safe to assume there was a KSCF BSOD every time those same 18 simultaneous errors show up. Scrolling back in time seems to indicate they've been around since before Thursday's events.

    ---------------- That's where we left off yesterday ----------------

  7. Reboot after uninstalling SurfShark VPN and all its components
    No improvement.
  8. Individually searched for Driver Updates to all my Network Adapters
    All 10 (if we include WAN Miniports) are already the best drivers available after checking online, so no improvement.
  9. Reboot in CLEAN Windows (non-MS processes disabled)
    No improvement.
  10. Reboot in SAFE MODE IMMEDIATE IMPROVEMENT!
    Both with and without network (rebooted in both) Photoshop loads in less than 10 seconds, like it did right up until Thursday.
Lots of drivers not loaded, lots of things not running. Makes sense actually. Wish we had a process or something that we could see taking the CPU or a hardware piece generating the interrupts
  1. Run SFC /SCANNOW in SAFE MODE SUCCESS!
    Found errors and repaired them!
This isn't uncommon, but is also the one thing windows has for checking the integrity of the files (content wise). Doesn't mean it can check everything though - just some things. This is why we're suggesting the reinstall - all new!
  1. Run DISM /Online /Cleanup-image /CheckHealth in SAFE MODE
    Found nothing wrong.
You did this with the network up, right?
I was really hoping that SFC /SCANNOW repair would've helped, but now that I'm back in Full Windows, I see it didn't (I could immediately tell at boot-up). Fnally checking off more of your boxes DID reveal new findings, however. The only suggestion I have't dared try yet, is another CHKDSK /F (probably from Safe Mode). I've been afraid of it because the problems started with it. (Still worth risking, or waste of time at this point?)

* * *
Won't hurt, bet it won't find anything real though.
Right now, I'm getting a backup BIOS USB ready to stand by in case something goes wrong like it did for this guy. I chose 1.7, which is the next stable version after mine, released just over 1 month later. (I'm told mine was actually the first Zen3 compatible version, which might explain why the next release came as quick as it did.
That guy got unbootable system as a result of XMP. Most of the time, it's just not stable in windows :p
I'm also preparing my Windows installation media (USB). I now have the option to fresh-install Windows 11, instead of re-installing 10 and upgrading to 11 later. What say you experts? Wait a while longer for 11, or jump on the rare opportunity for a clean install (which I only do when absolutely forced)?
I'd do 10, but I run in an enterprise mindset. 11 ain't ready yet.
We all still want me to just restore factory defaults of my current BIOS and see what happens, correct?
Ayup. Although it'll likely still be slow as crap in windows until you reinstall.
-J.

EDIT: You know the first Windows screen that prompts you for your user password (if you have one) after a reboot, before you see your desktop? If I click the power icon at the bottom right and then click REBOOT from this screen, I get a "If you shut down, you and any other people using this PC could lose unsaved work" warning (after a reboot!) Even if I wait 5 full minutes after rebooting, before rebooting again, without ever getting on my desktop, I'll get that warning. THIS IS NEW BEHAVIOR. Before this sluggishness, I could reboot from this screen sans interruptions. Isn't that odd?
I ~think~ I've seen that on normal windows too, but I could be wrong. I'll check in a few.
 

pendragon1

Extremely [H]
Joined
Oct 7, 2000
Messages
39,767
ok if it acts normal in safe mode, there is something loading in normal mode causing this. try completely removing adobe and your vpn stuff.
IF you end up reloading the os its up to you but 11 is just 10 with a face lift. once youre used to using to it there very little difference. the hdr stuff in 11 is nicer though...
 

pendragon1

Extremely [H]
Joined
Oct 7, 2000
Messages
39,767
PMs that added to the convo/troubleshooting.

14 minutes ago
If you go back to my last update, you'll see I removed all the VPN stuff. No difference.

And Adobe Photoshop loaded just fine in Safe Mode, blazing fast too. I'm not comfortable uninstalling all my Adobe products (they're literally why the workstation exists).

Are you also suggesting that the BIOS factory reset ISN'T likely to help anymore? Because that's been causing me a lot of anxiety today =) if these new findings confirm it's something loading with Windows. Or do you feel that can still realistically fix all this?
but the adobe stuff isnt running in the background in safe mode, it may load fine but half its tasks are missing. i didnt say anything about the bios but you do need to reset it to get your setting straight. if you dont take some of the steps we keep suggesting, the help will dry up. its very, very frustrating...
 

pendragon1

Extremely [H]
Joined
Oct 7, 2000
Messages
39,767
2 minutes ago
The factory defaults will be restored, 100%, I was just making sure you weren't suggesting this was no longer useful with what I found today.

But uninstalling Adobe is a step too far. It's a 30GB suite and what I use my machine for. Or did you mean for me to simply disable all Adobe processes at startup? Because I'd be totally find with that, if you think it's worth trying.
uninstall it, you cant disable everything. you can reinstall later. ive seen it cause goofy shit before, there was an issue with it eating cpu resources only like a year ago.
maybe even just try creating another user account, it might only be installed for yours.
 

Skull_Angel

[H]ard|Gawd
Joined
May 31, 2010
Messages
1,664
Just a heads up for OP

I use the same motherboard, but I've been out of the loop on bios updates for this board for several months ("if it ain't broke, don't fix it"), so this information may be outdated. At the time I was configuring (overclocking and stability tweaking) this motherboard it still had minor RAM training issues (auto settings not being fully stable), meaning if you experienced any memory related issues at the time it was more than likely related to RAM settings; at this point it is recommended to run programs (TestMem5 1usmus v3/Anta777 extreme profiles, Karhu, Memtest86) to check RAM stability and note which errors occur and when they occur to narrow down what setting(s) may be the cause.

The RAM you're using seems to use Micron E-die chips; this is the most important information you need to be aware of when looking to manually tweak any RAM settings because it gives you relevant search criteria in seeking help and finding baseline settings from other users and manufacturer specifications (your results may vary slightly due to the nature of micro architectures, but you'll have a good source of information to begin with).

Your friend may have spent several hours setting up this rig, but it can and usually does take much longer to properly set up a solidly-stable machine. I'm guessing not much time was spent on testing after the initial build, so a very minor problem was overlooked and ended up causing all the headache you're currently having. From my standpoint it seems like minor RAM issues eventually lead to file corruption and now you're chasing your tail trying to fix the symptoms rather than the root cause. My advice would be to use a clean install on a second drive to load up on stability testing software and use it to narrow down the cause before you begin changing setting all over the place.
 

Thatguybil

Limp Gawd
Joined
Jan 21, 2017
Messages
148
Memtest64 is a fairly easy to use tool to check the ram.

Another safe way to check the ram is to set it to stock speeds 2166 and see if the errors go away.

After reading the thread I am pretty sure eventually you will need a clean install of windows.
 

JYeager11

Gawd
Joined
Nov 15, 2006
Messages
559
lopoetve Dan_D Skull_Angel John Ransom
Guys!!

😁😁
MSI_SnapShot_PostFix1a.jpg


I believe this is what we were looking to achieve, correct? Woohoo! (Although I probably shouldn't cry victory until one of you confirms.)

I didn't have to manually adjust anything. Not even the SoC voltage. I just turned XMP on and saw those numbers all go back to what they were before (when they were manually entered). This is literally the stock BIOS with XMP on, nothing else. I even updated the thing to the last stable version. It wasn't sitting right with me that I was 2 stable versions behind. So thank you to everyone who chimed in to re-enforce that this needed to be done. Now that I did it, I get what everyone meant.

However, it did nothing to fix Windows 10's sluggishness since last Thursday; not that I was really expecting it to after seeing Photoshop load in a blink of an eye in Safe Mode.

Having lived with (and studied) the issue for several days now, I can say this latency (or whatever you want to call it) was remarkably consistent throughout : it never sped back up to normal, nor did it deteriorate further. The time it took to boot, load large apps and shut down never changed. It was frustratingly slow, but consistent.

Also, no files or programs were seemingly affected, except Photoshop's preferences, one time. Windows Memory Repair Tool found nothing wrong after running 2 passes at bootup. Samsung Magician reported my 980 Pro C: SSD could use a firmware update, but otherwise found nothing wrong with (even after a diagnostics scan). SFC /scannow actually DID find and repair some corrupt files, but the sluggishness was unaffected by the repairs. DSIM found nothing wrong after that. I've yet to risk another CHKDSK (tho after Googling more about it, I'd be using /R instead of /F this time) but since virtually no one here seems to expect this to help, I probably won't bother.

All roads seemingly lead to the nuclear reset option I was hoping to avoid, but I don't regret what I learned. I think I'll put off the reformat / Windows 11 install 'til tomorrow morning, when I'll have more uninterrupted time ahead of me to explore and configure it.

Any last words of advice before I go ahead and do that? Documents have all been backed up!
 
Top