I am not much of a gamer other than Factorio but since I built myself a fancy new computer I thought I’d try some games. I’ve tried Satisfactory and Rust: both exhibit this problem but Factorio plays fine for hours (and hours and hours and hours...)
The problem:
playing a game mentioned above will make my computer shut itself off. No warnings, no messages, just OFF. Sometimes even _launching_ the game is enough to cause this to happen. Sometimes it happens the very first time I move the cursor to look around. Sometimes, after troubleshooting (see below) it will last a half hour or so and then BLAMMO. Gone.
The Steps Taken
* I verified that any overclocking stuff was disabled on the gpu and cpu. Re-test, same result.
* I set the power limit of the gpu to 50 in MSI Afterburner: this worked at least for 30 minutes during play testing. Games were playable but with vastly degraded performance
* I monitored power consumption via a Kill-a-Watt meter and saw during gameplay (with gpu power at 100) the consumption never exceeded 350W
* I ran the torture test of Prime95 + FurMark simultaneously while logging in hwinfo for up to 55 minutes without a crash. FurMark ran at 4096x2160 with an average frame rate of 109.
* Power consumption during this testing maxed out at 598W
* Peak CPU temp during this torture test was 90.8c with an 81.5c average. The cpu never thermal-throttled.
* Peak GPU temp during this torture test was 77.9c with a 75.3c average. The GPU was marked performance limited-power and it did trigger the utilization and reliability voltage values at least once although they are “no” most of the time.
* I have closed almost all open applications and tasks while attempting to play the games (antivirus, wallpaperengine, rainmeter, discord, &c)
* I am running the current GeForce Game Ready driver as of 11-1-2021
* BIOS is current, windows update is current
* I replaced the 3080Ti with my old 980Ti in the same slot and it is able to play and launch the games albeit at a reduced power draw and frame rate
* Reset BIOS to defaults (including disabling BAR and DOCP) - same result although much more stable
* Ran DDU to purge Nvidia and reinstalled
* Ran Windows Memory Diagnostic (no problems found)
* Reinstalled Steam
* Reinstalled games
* Ran Superposition stress test for 30 minutes: passed
* Moved GPU from pcie_1 to pcie_2
The Hardware
* Asus ProArt 570 motherboard
* Seasonic 700w fanless PSU TX-700
* Ryzen 9 5900x with Be Quiet Dark Rock 4 Pro Borg Cube Coolermajig
* Crucial 64GB Ballistix DDR4-3600 2x32GB (BL2K32G36C16U4B) (note: not on QVL [ I thought they were but I was wrong], confirmed installed in correct slots)
* Seagate Firecuda 4TB (Data 2)
* Seagate Firecuda 2TB (Data 1)
* Samsung 980 Pro 512GB (Boot)
* MSI GeForce RTX 3080 Ti GAMING X TRIO (who NAMES these things?!)
* Windows 10 Pro
* LG 4096x2160 monitor
I spoke with MSI on the phone and they felt fairly confident that the card was OK since it passed the stress tests. I spoke with Asus on the phone and they're going to send me a new motherboard but I'm not entirely convinced that's what's wrong. At the moment, my spidey-sense is looking at the RAM. The best success I've had is after resetting the BIOS thus disabling the DOCP (and BAR).
So... what have I missed? I'm going nuts.
The problem:
playing a game mentioned above will make my computer shut itself off. No warnings, no messages, just OFF. Sometimes even _launching_ the game is enough to cause this to happen. Sometimes it happens the very first time I move the cursor to look around. Sometimes, after troubleshooting (see below) it will last a half hour or so and then BLAMMO. Gone.
The Steps Taken
* I verified that any overclocking stuff was disabled on the gpu and cpu. Re-test, same result.
* I set the power limit of the gpu to 50 in MSI Afterburner: this worked at least for 30 minutes during play testing. Games were playable but with vastly degraded performance
* I monitored power consumption via a Kill-a-Watt meter and saw during gameplay (with gpu power at 100) the consumption never exceeded 350W
* I ran the torture test of Prime95 + FurMark simultaneously while logging in hwinfo for up to 55 minutes without a crash. FurMark ran at 4096x2160 with an average frame rate of 109.
* Power consumption during this testing maxed out at 598W
* Peak CPU temp during this torture test was 90.8c with an 81.5c average. The cpu never thermal-throttled.
* Peak GPU temp during this torture test was 77.9c with a 75.3c average. The GPU was marked performance limited-power and it did trigger the utilization and reliability voltage values at least once although they are “no” most of the time.
* I have closed almost all open applications and tasks while attempting to play the games (antivirus, wallpaperengine, rainmeter, discord, &c)
* I am running the current GeForce Game Ready driver as of 11-1-2021
* BIOS is current, windows update is current
* I replaced the 3080Ti with my old 980Ti in the same slot and it is able to play and launch the games albeit at a reduced power draw and frame rate
* Reset BIOS to defaults (including disabling BAR and DOCP) - same result although much more stable
* Ran DDU to purge Nvidia and reinstalled
* Ran Windows Memory Diagnostic (no problems found)
* Reinstalled Steam
* Reinstalled games
* Ran Superposition stress test for 30 minutes: passed
* Moved GPU from pcie_1 to pcie_2
The Hardware
* Asus ProArt 570 motherboard
* Seasonic 700w fanless PSU TX-700
* Ryzen 9 5900x with Be Quiet Dark Rock 4 Pro Borg Cube Coolermajig
* Crucial 64GB Ballistix DDR4-3600 2x32GB (BL2K32G36C16U4B) (note: not on QVL [ I thought they were but I was wrong], confirmed installed in correct slots)
* Seagate Firecuda 4TB (Data 2)
* Seagate Firecuda 2TB (Data 1)
* Samsung 980 Pro 512GB (Boot)
* MSI GeForce RTX 3080 Ti GAMING X TRIO (who NAMES these things?!)
* Windows 10 Pro
* LG 4096x2160 monitor
I spoke with MSI on the phone and they felt fairly confident that the card was OK since it passed the stress tests. I spoke with Asus on the phone and they're going to send me a new motherboard but I'm not entirely convinced that's what's wrong. At the moment, my spidey-sense is looking at the RAM. The best success I've had is after resetting the BIOS thus disabling the DOCP (and BAR).
So... what have I missed? I'm going nuts.