Hard Lockups/nvlddmkm EventID 14 errors since upgrading to a 3080Ti

gamerk2

2[H]4U
Joined
Jul 9, 2012
Messages
2,060
Long story short: Upgraded my PC from a 1080Ti to a 3080Ti (specifically, a Gigabyte GeForce RTX3080 Ti GAMING OC 12G) a little over six months ago now. Since then, about once every two weeks, *only* when watching videos on Youtube, I get a hard lockup of my PC. If I let it sit for a bit, it eventually restarts and logs a nvlddmkm EventID 14 error in the Windows Event Viewer. I'm pretty sure the display driver is crashing and failing to recover, and am trying to figure out if I have a HW issue (leaning that way) or if I have some sort of software problem.

So far, I've done the normal steps to isolate the problem: Reseat card, DDU, try different drivers, and so on. Pretty close to just going through the RMA process, but am asking to see if anyone has any last ditch ideas to try out.

Full specs:
OS: Win 10 x64
CPU: Intel i7 8700k
RAM: 2x16GB (forget the brand offhand)
Motherboard: ASUS ROG STRIX Z-370-H GAMING*
PSU: 850W (Corsair I think)

I've yet to encounter a problem in games; not sure if that's observation bias or a hint of something else going on.

*BIOS still on version 1101, which is quite old. I'm planning on updating this tonight; I typically leave the BIOS alone unless there's major problems.

Reference THF thread so I don't have to repeat a bunch of stuff I did: https://forums.tomshardware.com/threads/hard-lockups-since-acquiring-a-3080ti.3757639/#post-22658414
 
The last time I had similar symptoms when I first got my 2080ti. Watching youtube and anything video related would crash/lock. After all the space invaders issues I waited wayyy after to purchase mine so assumed it wasn't due to that. I replaced everything. (memory, swapped cpus, etc) Turned out to be my PS. I checked it last cause it was a Corsair AX1200 and assumed it wasn't the problem. I would have the same random (not always) issues but primarily when any videos were playing. I didn't play very many games but didn't have problems there which made it a little confusing.
 
Still have problems when you update the bios?
Haven't done so yet; doing that tonight when I get off work. The almost three week time between events means even if that does fix the issue I won't know until *after* I RMA the card (which I intend to do in any case at this point)

Turn off hardware acceleration in the browser.
That simply would hide the problem.
 
The last time I had similar symptoms when I first got my 2080ti. Watching youtube and anything video related would crash/lock. After all the space invaders issues I waited wayyy after to purchase mine so assumed it wasn't due to that. I replaced everything. (memory, swapped cpus, etc) Turned out to be my PS. I checked it last cause it was a Corsair AX1200 and assumed it wasn't the problem. I would have the same random (not always) issues but primarily when any videos were playing. I didn't play very many games but didn't have problems there which made it a little confusing.
It's almost certainly not my PSU; outside of the semi-random issues the card runs fine at max power. All voltages also well within spec.
 
Haven't done so yet; doing that tonight when I get off work. The almost three week time between events means even if that does fix the issue I won't know until *after* I RMA the card (which I intend to do in any case at this point)


That simply would hide the problem.
Well it's an easy fix. And it could be a PSU transient power problem, you can't be positive unless you swap it out with a known-good spare.
 
Don't got a spare. And given I've got another known good GPU that I've pushed harder then the 3080, I'm willing to bet it's not the PSU.
 
Disable HDMI Audio unless you're using it. I had the exact same symptoms on a 1080/Z170 board, that was the core of it (can't remember the exact name, not enough coffee, but effectively device latency was the problem).
 
Don't got a spare. And given I've got another known good GPU that I've pushed harder then the 3080, I'm willing to bet it's not the PSU.

It's not the total wattage that could be the problem, it's the random spikes in Ampere cards that trip the over current protection in the PSU. I had a known good 850W PSU that for whatever reason could not handle an Ampere card, but worked fine with everything else. I RMA'd the PSU and no problems after.

In fairness, it doesn't sound like that is the problem here, but I wouldn't be so quick to dismiss the PSU being a possible factor.
 
yeah didn't know why and didn't sound logical.. just know that I had an identical AX1200 that I swapped with and problems disappeared. P/S and memory issues can be weird. CPU issues are the weirdest if you get them. I had one CPU that would only work with 1 stick of memory in any mb I put it in.
 
My guess would be PSU as well. Just because it works with other cards doesn’t mean it will play nice with Ampere.
 
Did you do a clean install of the driver when switching the video card out? eventid 14 usually happens due to a corrupt or incorrectly installed driver.
 
As an update:

Updated to the latest motherboard BIOS (also resetting most settings back to defaults from whatever they were set to) and ran with the NVIDIA driver set to "prefer maximum performance"; no issues since the last report so one of those two changes "fixed" the problem. Now testing with the power profile set back to "normal" to see if there's a potential PSU/GPU power ramping conflict going on. If I make it to the end of the month I'll start going through the UEFI and optimizing things a bit [I have a sneaking suspicion ASUS's CPU timing overrides may have been at fault, but I don't want to test more then one setting at a time.]
 
As an update:

Updated to the latest motherboard BIOS (also resetting most settings back to defaults from whatever they were set to) and ran with the NVIDIA driver set to "prefer maximum performance"; no issues since the last report so one of those two changes "fixed" the problem. Now testing with the power profile set back to "normal" to see if there's a potential PSU/GPU power ramping conflict going on. If I make it to the end of the month I'll start going through the UEFI and optimizing things a bit [I have a sneaking suspicion ASUS's CPU timing overrides may have been at fault, but I don't want to test more then one setting at a time.]
Another thing you can do if you suspect a power issue is to cap the % at 80 or so and see if that eliminates it. Would make the spikes less severe.
 
Another thing you can do if you suspect a power issue is to cap the % at 80 or so and see if that eliminates it. Would make the spikes less severe.
Considered that as well; right now I'm focused more on attempting to recreate the issue with the current UEFI version/settings to see if I can point to any specific thing. I should note I was still on like the second release version of the BIOS before I updated, so it's quite possible one of those generic "stability fixes" addressed this problem in full. [Generally, I don't touch the BIOS if everything is working.]
 
Considered that as well; right now I'm focused more on attempting to recreate the issue with the current UEFI version/settings to see if I can point to any specific thing. I should note I was still on like the second release version of the BIOS before I updated, so it's quite possible one of those generic "stability fixes" addressed this problem in full. [Generally, I don't touch the BIOS if everything is working.]
Yeah I typically update only if there's a feature I want or if it's a really early release. Usually quite a few fixes if you're on the first or second revision.
 
The last time I had similar symptoms when I first got my 2080ti. Watching youtube and anything video related would crash/lock. After all the space invaders issues I waited wayyy after to purchase mine so assumed it wasn't due to that. I replaced everything. (memory, swapped cpus, etc) Turned out to be my PS. I checked it last cause it was a Corsair AX1200 and assumed it wasn't the problem. I would have the same random (not always) issues but primarily when any videos were playing. I didn't play very many games but didn't have problems there which made it a little confusing.
I remember when the 5700xt's were exhibiting the exact same behavior for the first 6 months after launch and it was a combination of random hardware combinations not playing nicely with their initial driver designs. They eventually managed to mostly nullify the issue, at least enough so that you didn't see people still complaining about it across the web. It was that lower power state watching youtube videos that caused black screens, resets, and lockups.
 
Second update: Issue re-occured with the Power Management Mode set to "Normal" rather then "Prefer Maximum Performance". Which highlights there's likely a PSU/GPU issue in regards to power ramp-up.

So here's the question: If the issue does in fact go away with "Prefer Maximum Performance", is it worth going the RMA route?
 
Second update: Issue re-occured with the Power Management Mode set to "Normal" rather then "Prefer Maximum Performance". Which highlights there's likely a PSU/GPU issue in regards to power ramp-up.

So here's the question: If the issue does in fact go away with "Prefer Maximum Performance", is it worth going the RMA route?
I vote yes simply because I think if I paid for it, I want it to work properly. Work-arounds are temp solutions in my book. If you have doubts about your HW and need any additional help I can test anything for free. I have spares of everything just about.
 
I vote yes simply because I think if I paid for it, I want it to work properly. Work-arounds are temp solutions in my book. If you have doubts about your HW and need any additional help I can test anything for free. I have spares of everything just about.
Fair point; the only minor hiccup is my backup GPU is on loan so I'd be without a PC (and given I'm current in COVID recovery that would be a downer). I also do half-WFH so I'd need to work around that.

I'm leaning toward keeping it for now, but ensuring everything I've done is documented so if the problem re-occurs I can RMA without issue.

EDIT

If anyone else has any "last ditch" ideas to try, I'd love to hear them. Only thing I could see making a difference at this point would be underclocking (relative to the factory OC)
 
Back
Top