What else can go wrong in 2020??? Ryzen TR 3960x issues

noko

Supreme [H]ardness
Joined
Apr 14, 2010
Messages
7,262
I started to have stability problems with the Ryzen 3960x which got worst over several days. The normal troubleshooting of switching out the 4 sticks of TridentZ 3466mhz 64gb (Hynix) to 4 sticks of TridentZ 3200 32gb (SamsungB), changing out video card, resetting bios, flashing bios twice, pulling CPU in and out bla bla did not fix stability. CPU started crashing on loading windows at 3.8ghz!

Once I started to separate the individual CCDs with their CCXs I found the culprit. CCD1 CCX0 -> Cores 6, 7, 8, will not do 3.8ghz without causing a crash. All the other cores are now at 4.1 ghz being stressed tested with cores 6, 7, 8 at 2.8ghz. This is all done in the bios, MSI allows each CCX to have its own multiplier. I lost Performance Boost. Machine was not even OC except for the memory, 3466mhz to 3600mhz, just using Performance Boost. Only a few times did I use Auto Overclocker in Ryzen Master but saw it as pointless and did not bother using it other than a few benchmarks.

Will RMA with AMD, hopefully that will go smooth. Suppose to get a 3090 this coming Tuesday, may not have a usable machine for it when it arrives, good grief. Unless my iTX case with the 3900x can squeeze it in. Not sure how long it takes with AMD to change out a CPU, anyone have experience there?
 
This sounds awfully like a bad power mosfet that supplies those cores. The mosfets do not share a power bus. They have dedicated lines to my understanding.

I would investigate further by trying another motherboard. Or test your PSU. The CPU 12v lines, which actually just power the mosfets and not the cpu directly, might be under-delivering necessary current which means you have bad caps or bad AC/DC switching transistor in your PSU.

Worth a shot before attempting a CPU replace just to find out you have the same problem.

You must be able to test the PSU and the board, otherwise prepare for a high possibility of disappointment. Might not be the case though. In all honestly the CPU's are tested considerably before leaving the fab as a final product. In all my years of building/IT work, I have never seen a CPU go bad like, say a loaf of bread gets moldy in time. They usually just don't go bad unless something internally zaps them like a hiuge surge or something, and most boards have self resetting pico fuses and other means to shunt over current/voltage events away from a delicate cpu.
 
Last edited:
This sounds awfully like a bad power mosfet that supplies those cores. The mosfets do not share a power bus. They have dedicated lines to my understanding.

I would investigate further by trying another motherboard. Or test your PSU. The CPU 12v lines, which actually just power the mosfets and not the cpu directly, might be under-delivering necessary current which means you have bad caps or bad AC/DC switching transistor in your PSU.

Worth a shot before attempting a CPU replace just to find out you have the same problem.

You must be able to test the PSU and the board, otherwise prepare for a high possibility of disappointment. Might not be the case though. In all honestly the CPU's are tested considerably before leaving the fab as a final product. In all my years of building/IT work, I have never seen a CPU go bad like, say a loaf of bread gets moldy in time. They usually just don't go bad unless something internally zaps them like a hiuge surge or something, and most boards have self resetting pico fuses and other means to shunt over current/voltage events away from a delicate cpu.
I have found the Thermaltake Dr. Power PSU tester to be very helpful in siutations like these, if only to rule the the PSU as the source of the problem. Cost is about $30, well worth it.
 
This sounds awfully like a bad power mosfet that supplies those cores. The mosfets do not share a power bus. They have dedicated lines to my understanding.

I would investigate further by trying another motherboard. Or test your PSU. The CPU 12v lines, which actually just power the mosfets and not the cpu directly, might be under-delivering necessary current which means you have bad caps or bad AC/DC switching transistor in your PSU.

Worth a shot before attempting a CPU replace just to find out you have the same problem.

You must be able to test the PSU and the board, otherwise prepare for a high possibility of disappointment. Might not be the case though. In all honestly the CPU's are tested considerably before leaving the fab as a final product. In all my years of building/IT work, I have never seen a CPU go bad like, say a loaf of bread gets moldy in time. They usually just don't go bad unless something internally zaps them like a hiuge surge or something, and most boards have self resetting pico fuses and other means to shunt over current/voltage events away from a delicate cpu.
Spent over 12 hours thinking it was not the CPU for same reasons but with underclocking CCD1 CCX0 (cores 6,7,8) and bringing the other cores up past where it was failing before at 3.8ghz to 4.1ghz it is smooth sailing. Not ruling out anything totally but the PS will not just affect 3 cores and leave 21 of them ok is my thinking. I was about ready to tear it out of the case and do the 1 ram module, difference power supply (have too many of those) but pretty confident I've found the issue. Have not found the cause.

Yes CPUs are tested and the 3960x already have 8 cores as unusable, so having one more core in a CCX fail is not impossible as improbable it maybe. Those affected cores (6,7,8) are working except only usable now at much lower speeds.

This is the PS model I have for this rig:
https://www.evga.com/Products/Product.aspx?pn=220-P2-1600-X1
 
Last edited:
I have found the Thermaltake Dr. Power PSU tester to be very helpful in siutations like these, if only to rule the the PSU as the source of the problem. Cost is about $30, well worth it.
That is pretty cool, may pick one up:

41xsqqcoINL._AC_.jpg


The voltages look good in the bios and HwInfo but that does not show the quality of the voltage.
 
It also doesnt test your motherboard. Dont get focused on just psu. Motherboard has an incredible importance in power delivery. But good luck. I really hope you can resolve this.
Thanks, running well at 4.1ghz for good cores and 2.8ghz cores 6,7,8. Coming up with plan on optimizing the most used CCX cores as in clock those higher. Windows looks to be prioritizing cores 0,1,3 and 12,13,14, if I can get those CCXs higher and adjust the less used ones to limit over power/temp I should be good. Damn it, I have a 3090 coming in a few more days, don't want to put it in a limping rig.
 
I really doubt it's the CPU. Being your best cores only run at 4.1 something else is wrong... My 3960 does 4.4 all core or if I leave it on defult boosting 4.55 for light threads cinebench at 4.35 or so.

What is your vcore? I would try adding some vcore
 
I really doubt it's the CPU. Being your best cores only run at 4.1 something else is wrong... My 3960 does 4.4 all core or if I leave it on defult boosting 4.55 for light threads cinebench at 4.35 or so.

What is your vcore? I would try adding some vcore
I am not limited to 4.1 ghz, have cores 12, 13, 14 at 4.4ghz for testing, I don't want to exceed 280w or 1.35v while I get a response from AMD. Plus the CPU degraded in a week time period on one or more of those cores in that CCX. I think we are dealing with very low quality dies considering the 24 core takes just as much power as 64 cores in the 3990x CPU. I currently have performance roughly equivalent to what I had before by manually adjusting most used cores higher and other cores 4.1ghz or higher keeping 1.35v or less and 280w.

I cannot see it not being the CPU, up the frequency of 6,7,8 beyond like 3.4 ghz and it fails, all other cores are fine over 4ghz+.
 
Last edited:
Yach! I hate it when I find threads with high end threadrippers having issues! Makes me think twice before building my coming threadripper rig. This is the 2nd thread I read talking about issues with threadrippers. Hopefully it is just bad luck for you guys.
 
What is the likelihood that a CPU fails and of that percentage, how many Threadripper?
 
What is the likelihood that a CPU fails and of that percentage, how many Threadripper?
I have no idea, I don't think it is too high but a few of us are feeling the sting. $1500 processors I would expect they will purr and last a long time especially when running in spec.
 
  • Like
Reactions: tived
like this
I am able to mitigate the issue to a usable configuration. Quick recap: Last week while playing Serious Sam 4 computer crashed, hmmm ok. Thought nothing of it, after several days it was crashing while on the desktop. Checked CPU temperatures, voltages, unplugged the Vive bla bla. Came to point where Windows would start initial loading and lock up sometime around Sunday. Turned off Precision Boost and it would load into Windows but it was not stable. Started serious troubleshooting mentioned earlier in thread. The machine was totally unusable.

Found by experimentation, clocking all cores to less than 3000mhz it would boot up into Windows. So I went into the bios and went through each CCD CCX and lower the frequency to 2800mhz one at a time and tried to boot Windows. What I found was cores 6,7,8 in CCD1 CCX0 when clocked to 2800mhz the machine went into Windows -> An ahha moment. This is how the bios setting looks, notice the 28.00 for CCD1 CCX0 Ratio, this was captured later testing the other cores ratios. As a note, Precision Boost will not work if setting manual clock ratios.

BiosPerCCXOC_png.png
My goal was to optimize cores that are used more like in Precision Boost, if possible that is, I did not know how Windows decided what cores it would use. First I mapped out a visual flowchart to take out the confusion for me the Core Layout with respect to each CCD (just how I work) and used HWiNFO and TimeSpy and noticed 4 top cores avg power levels with benchmark, those were the ones I wanted to increase the frequency. Also single thread Cinebench always picked the same cores to render with which were cores 12 and 13:

CoreLayout.png

TimeSpy and HWinfo Data (this was just a quick and dirty, found Task Manager more useful but this worked):

HwInfoCorePwr.jpg
So the goal was basically what would be the best configuration I can get or to minimize the lower performing cores, while not lost cores just more anemic. So I end up with the following chart which I've been trying to come up with final values. There are many ways most likely to do this, this is just the path I am taking besides waiting for AMD to respond.

FinalClockSpeed.png
 
AMD gave green light for RMA, packed it up and processor is on it's way. Now I have a new 3090 in the case without a processor to run it :(.
 
AMD gave green light for RMA, packed it up and processor is on it's way. Now I have a new 3090 in the case without a processor to run it :(.

At least you'll have peace of mind when it comes back knowing that the processor isn't the issue.
 
  • Like
Reactions: noko
like this
At least you'll have peace of mind when it comes back knowing that the processor isn't the issue.
That was quick, AMD received it today, went by ground. Have not received anything yet on what they found and if they will honor the warranty. May have to wait until next year.
 
That was quick, AMD received it today, went by ground. Have not received anything yet on what they found and if they will honor the warranty. May have to wait until next year.

Friend did an RMA with AMD 2 months ago. After receiving it, AMD took a week to verify it was defective and issued a replacement. They went by serial number so he did not need to provide an invoice. Took another 3 days for the replacement (UPS ground shipping) to arrive.
 
Friend did an RMA with AMD 2 months ago. After receiving it, AMD took a week to verify it was defective and issued a replacement. They went by serial number so he did not need to provide an invoice. Took another 3 days for the replacement (UPS ground shipping) to arrive.
You don't have to try that hard to cheer me up. ;) Well it is the holidays. They did ask for an invoice which I provided, Newegg makes it easy to print past invoices up or copy the pdf. Picture of processor installed which I provided. All troubleshooting done.

Now should I rush to BestBuy and buy an available ASUS X570 Tuff motherboard, pull the 3900x from one machine with Noctua NH-D15S and install in empty available case with the 3090 or wait? Buy better motherboard online? Current iTX case with the 3900x would not support the 3090. Or install the 3090 on 6700K machine with 1000w p/s, lol -> nope. Some options I have.
 
Huh? Not sure what you mean. His system became unstable at stock and I think he pushed his 3700x too hard.
 
Huh? Not sure what you mean. His system became unstable at stock and I think he pushed his 3700x too hard.
I was hoping for a more speedy turn around as in less than a week but since the holidays that may not happen, we will see. In my case I did not push this processor (Ryzen Master Auto OC for a few benchmarks was the extent of any push), saw no need or worth while gain so stock it was.
 
If you updated your BIOS in the past couple of months to AGESA 1.1.0.0, I would suggest a beta BIOS with AGESA 1.1.9.0. Reports are the 1.1.0.0 revisions trigger WHEA errors at idle or near-idle on all multi-CCD Ryzen CPUs-- Zen 2 and 3.
 
If you updated your BIOS in the past couple of months to AGESA 1.1.0.0, I would suggest a beta BIOS with AGESA 1.1.9.0. Reports are the 1.1.0.0 revisions trigger WHEA errors at idle or near-idle on all multi-CCD Ryzen CPUs-- Zen 2 and 3.
MSI has not updated the bios for awhile, still on AGESA 1.0.0.4, no love for TRX40.
 
Now should I rush to BestBuy and buy an available ASUS X570 Tuff motherboard, pull the 3900x from one machine with Noctua NH-D15S and install in empty available case with the 3090 or wait? Buy better motherboard online? Current iTX case with the 3900x would not support the 3090. Or install the 3090 on 6700K machine with 1000w p/s, lol -> nope. Some options I have.
The Tuf board is a fine board. I have one in my kids rig and it runs the same timings that I run on my strix-e. There's not really a need to go higher unless you need water temp sensor inputs which requires a strix-e to crosshair board. And then the only benefit to crosshair is for vrm temp sensors.
 
The Tuf board is a fine board. I have one in my kids rig and it runs the same timings that I run on my strix-e. There's not really a need to go higher unless you need water temp sensor inputs which requires a strix-e to crosshair board. And then the only benefit to crosshair is for vrm temp sensors.
I am tempted since board could upgrade a B350 board machine and leave upgrade options down the road. I keep hearing good things dealing with the Tuf board. It is not that expensive and would get the job done plus I always liked ASUS bios.
 
Received replacement 3960x, brand new in box sealed. Thanks AMD!

Installed and running well so far. Appears to be better than previous processor.
  • Aida 64, stock PB boost is 100mhz higher than the previous processor when stress testing with AVX all cores, 4.1ghz vice 4.0ghz, while power is indicating 50w less, old would be 280w at CPU power limit and new is around 230w via HWinfo
  • Max boost speed is also 50hz higher, previous was 4500mhz, this one is going to 4550mhz. This is default bios with only XMP for memory
  • Still the rather high voltages when CPU is not loaded, 1.49v, when all threads are loaded as in Aida 64 stress test voltages go to 1.3v at 4.1ghz which is reasonable
Happy so far with 3960x

Now the EVGA 3090 XC3 Ultra is below average for stock settings in Port Royal and above if the power limit is maxed at 104% (yea, some power limit there) plus core/mem OC -> stable overall, quieter than I expected and remains relatively cool for the smaller HSF on it. If one has the space the FE is the one to get with the 114% power limit and better cooler while being $100 cheaper - that is if one can get one of those. Finally can get to playing some games with the new GPU.
 
  • Like
Reactions: tived
like this
this is why i dont want to buy a piece here and a piece there, over time due to shortages. what if i buy something and it is flawed? now ive wasted part of my warranty because ive had to wait on equipment to finish a build. i hope you get your cpu purchase dilemma settled. if it was me i wouldve asked for a refund. that is scary!
 
Back
Top