Major flaw detected with AMD 7900 XTX vapor chamber cooler

StryderxX

[H]ard|Gawd
Joined
Jun 22, 2006
Messages
1,725
I've been following the news of several 7900 XTX owners seeing GPU temps hitting 110c and it looks like der8auer found that there seems to be an issue with the vapor chamber cooler that's used on AMD and AIB cards. AMD has responded to the allegations with the following:

"We are aware that a limited number of users are experiencing unexpected thermal throttling on AMD Radeon RX 7900 XTX graphics cards (reference models made by AMD). Users experiencing unexpected thermal throttling of an AMD Radeon RX 7900 XTX should contact AMD Support(opens in new tab),"
Link: https://www.tomshardware.com/news/amd-responds-to-rx-7900-xtx-hotspot-fiasco

Der8auer investigated the issue and found a potential problem with the vapor chamber.

I fast forwarded to the summary but I recommend you watch the entire video to review his investigation and conclusion.
 
I find it interesting that vertical mounted GPUs are completely fine, but if you mount it horizontally (the majority of users), it causes the 110c hotspot in some cards. That's a pretty major flaw.
 
Seriously is a problem to the 99% of users who don't water cool and expect that at a 1k price point to have a decent functioning card working at the full output you paid for. I'd be pissed if I couldn't access all the hp in my car due to a design flaw and it was throttling it behind my back.
 
Mine without a doubt had this issue. Chip ran well, but it heated up to >110c junction temp after a few minutes, then would throttle. Undervolting helped a bit, but it still wouldn't run at stock speeds.

It's been RMA'ed. I could have rigged up a waterblock, heck, I could have pieced together a better cooler from parts I have laying around. But when I pay >$1000 for something I expect it to work. Shame on AMD's QA department, catching this sort of stuff is Bush-league (and if I know QA, which I do, someone did know, but the Brass pushed it through to capitalize on holiday sales).
 
Last edited:
The issue certainly warrants it's own thread post, but of course some would prefer it to be buried deep in another thread where it can easily be missed by some.

No such thing, they start at $1K.

That is IF people start slowing down their purchasing of 7900 XTX units because of this news, there may be some good deals. Obviously not right now.
 
I wonder if this explains the inconsistent load and idle temperatures many reviewers talked about at launch.

If it does, it could actually be pretty good news.

It's much easier to fix a cooler than it is to fix a problematic architecture design.
 
7900xtx, not a fire risk, just not getting the performance you paid for and potential rma in near future. Would you accept being only able to use 80% of your car's hp?

Advantage of us older gamers who've been building rigs since god knows is we've already blown up psu, caused fires and what not to lnow to ensure to double check your connections.
 
I have to admit, i cringe when anyone calls this a "user issue".

Just about everyone building a system with traditional PCIe power connectors ALWAYS wraps them tight straight out of the connector and then zip-ties them so they go straight down and through the motherboard tray cable passthrough so that there is no slack in the cable.

This is literally standard practice.

If NVidia introduces a cable with which standard PC building practice can no longer be observed, and you have to ahve a loose cable flopping around over the middle of your GPU, and if you don't and instead use traditional standard practice pulling it straight down the side of the GPU, it fries shit, that's 100% a design flaw.

At the very least there whould ahve been big red flashing warning signs that say "WARNING, IF YOU BUILD A PC THE WAY YOU HAVE ALWAYS BUILT IT, IT MAY CAUSE CABLING TO MELT". But ideally instead, they shouldn't have introduced a sensitive, defective by design connector.

I wish I could buy a 4090 with traditional PCIe power connectors instead. This new connector is the dumbest solution I have ever seen in my entire life.
That's why "user issue" is the incorrect term to use in this case; luckily there is one which much better fits the bill: "design induced user error". I gather it is common in the aviation industry, where such things are rightly the manufacturers' responsibility to correct.
 
Not sure if this is a true issue, my 6900XT would sit at the max hot spot temp under heavy loads with the reference cooler installed. It would only matter if it's actually forcing the gpu clock to run under stock boost. I bought reference though because I knew I would water cool this card and it ran much cooler but it did not let the card overclock a whole ton more despite better thermals. Some cards ran cooler then mine but performed worse as they could not sustain higher clock speeds. I think people might be over reacting to the hot spot temps it is often not what is holding the card back. I think these days with cpu's and gpu's being pushed to the max your going to start getting variable performance levels in the same brackets as 1 just made the cut and the other was almost good enough for the next bracket. Will see if this actually turns into a issue.
 
I was wondering what the RDNA3 Hardware Drama would be! "Hotspot is too hot" kind of feels like a Free Space on the AMD GPU Bingo but it (seemingly, from the der8auer vid) not being a mounting deficiency or finstack heatsoak thing is interesting.

I've been looking at a bunch of AMD reference heatsink designs today and am seeing a pretty clear lineage since the HD 6970, thru GCN1-5 and RDNA1/2/3 of "just make the biggest damn vapor chamber you can and generally forget that heatpipes exist" Wonder if that's biting them in the ass here.

The vapor chamber on the 7900XT/X is the most intense I've seen, check it in the thumbnail for the vid- it takes up the entire back of the finstack! Fill port is way up by the display outputs and the end is way down by the power connectors. It's not difficult to imagine that with such an intricate and large shape there may be circulation issues in some circumstances? I look forward to seeing folks dig deeper into this.
 
I wonder if this explains the inconsistent load and idle temperatures many reviewers talked about at launch.

If it does, it could actually be pretty good news.

It's much easier to fix a cooler than it is to fix a problematic architecture design.
Even the "reviewer" at OCUK (who is in fact doing a promo to sell the cards at their store) raised an eyebrow when they saw junction temps on their sample. I've literally never seen them do anything but glowing videos of products...

 
On a side note, I would be highly amused if it turned out that these cards didn't have any issues when in a ports-up orientation, as in the Silverstone Alta F1.
 
Last edited:
On a side note, I would be highly amused if it turned out that these cards didn't have any issues when in a ports-up orientation, as in the Silverstone Alta F1.
I'm afraid you are not going to be amused. Here's my 7900 XTX running completely fine when horizontal:
7900xtx_horizontal_q2.png

and here's it overheating and throttling when vertical (ports up):
7900xtx_vertical_q2.png


Since mine worked fine horizontal, this seems to be a different issue than the one described by most other users. I think in my case the problem was that this particular design of vapour chamber simply doesn't work in that orientation. Apparently some of the 6000 series had the same problem. The problem that others have had where the card overheats when horizontal seems to be a manufacturing flaw, as it doesn't affect all cards.

Anyway, I RMA'd mine, and will get an AIB card instead when they are available.
 
Pretty bad. How much does that cut performance? I'm guessing it pretty much makes it run like a 7900XT?

Does AMD even have a solution for this right now? I'd be worried about doing a RMA and just getting the same problem again since it seems to be a design flaw.
 
I'd be worried about doing a RMA and just getting the same problem again since it seems to be a design flaw.
If it is a design flaw, then AMD's best strategy might be to refund owners of defective cards and sell the boards to the partners at a discount.

Not much time to come up with a new design and test it thoroughly
 
It's not that big of a deal, really. It's a manufacturing flaw in specific runs of the cooler. Der Bauer had several stock reference designs that did not exhibit the problem in any orientation, and had to buy flawed cards from users in the wild to test it. They need to determine which series of cards are effected and issue a recall and replace the defective coolers, but unlike the 4090 thing, it is not a safety issue - it's a performance and potential card life issue. Really, it's just something for people to pointlessly fight about in forums.

The cards throttle, and while that's not great they still work and you still get MOST of the expected performance. As far as card life is concerned, we're not talking failure in terms of only days or months here, but maybe shaving a little off of the literal years of expected life. Again, not great, but a simple recall should resolve it. The vast majority of people experiencing the issue at all are enthusiasts who 1) know what they're looking at, hence the reports of the issue to begin with, and 2) likely have another card or video out option they can use while the recall is in process.

TLDR: Could this issue destroy life or property? No, not at all. Is it an important issue that needs to be resolved ASAP regardless? Yes, absolutely. Is the sky falling? Fuck no.
 
Let's keep this on topic please. More responses here off topic than on.
 
Last edited:
You're orientating it wrong.
Can't tell if you are joking. He got more replies of defective units from his viewers alone than anyone saw of the entirety of the Nvidia power plug debacle. In a shorter amount of time no less.

I'm not joking. It's just not that big of a deal. Important: yes. World ending: No. The problem is not in the silicon, which would be irreparable without replacing the entire card - it's in the REFERENCE DESIGN cooler (and not even all of them), which is infinitely easier to rectify. Hell, AMD could send enthusiasts the new cooler and they could install it themselves thus minimizing downtime. Should I have to fix their fuck up? Again, no - but I'd appreciate the option. It's inconvenient, but I mean, I shouldn't have to take my car in for recalls either (and just as pertinent, not all car recalls are for safety issues).

As for the raw number of reports between both of these nVidia and AMD issues it's entirely irrelevant. One effected a smaller amount of users but could be a potential fire hazard - and the fix for it if it happened to you was potentially a lot more expensive AND you would DEFINITELY have to RMA the entire card. The other effects a lot more (but by no means all) users, but all they lose out on is a bit off the maximize performance potential of their card. The fix is to either RMA the entire card, or if AMD allows this as an option (and given the target audience, I think they should) - install the new heatsink yourself. Neither is in reality that big of a deal. nVidia is making newer easier to secure cables and the 4090 users know what to look out for to avoid the issue regardless. AMD will at the very least recall the effected cards to replace the coolers and AMD users now know to look at the serial number or test the Junction Temp on their new card purchase to see if it will be effected by that coming recall.

It's a nothing burger, plain and simple. The nVidia problem happened. nVidia has taken steps to fix it. All done. The AMD problem is happening right now. AMD WILL take steps to fix it. End of story.
 
Pretty bad. How much does that cut performance? I'm guessing it pretty much makes it run like a 7900XT?

Does AMD even have a solution for this right now? I'd be worried about doing a RMA and just getting the same problem again since it seems to be a design flaw.
Going by how much mine throttles core clocks (from a bit over 2.5 GHz down to 2.1GHz!) under sustained 3DMark stress testing when sitting horizontally, it performs about 15-17% worse in 3DMark Time Spy (Extreme), so a 7900 XT with 4 GB more VRAM sounds about right.

That certainly isn't what people like me pay $100 extra over the XT for, to say the least. It runs more or less within spec when sitting vertically, but I'm not about to change my entire computer case, have a massive 800D full tower sprawled out on the floor, or spend $200+ on a waterblock just to work around a factory defect.

Alas, the fact that it's inherent to the reference cooler means that until AMD has confirmed that new production batches have the problem fixed, the only way they can really solve this right now is to do a recall, or in the case of the AIB partners, work out something loosely akin to the EVGA Step-Up Program where you can switch to an upgraded model that won't have the problem just for the price difference between the two. (Or better yet, free, but I'm not realistically expecting that.)

My solution? Return the card to Micro Center and get my money back, but I'm fortunate enough to have a lengthy 30-day return period, full refund, no restocking fees or other BS from what I can discern. Not everyone is that fortunate...
 
Clearly this is an issue that just needs to be handled properly even if its just a very small percentage of affected users. AMD can handle this properly by just RMAing anyone who's reference unit is suspected of this behavior and use that to investigate further and retain happy customers - even when you make boutique or luxury items you're never going to be totally without issue but how you respond to them can win or lose you a lot of good faith in your brand. One real curiosity however is that supposedly this is in regards to some AIBs too , but if this can be isolated in reference designs then the AIBs will be more or less forced to follow suit with RMAs or else look like morons AND possibly damage their relationship with AMD! So anyone who pulls the "oh that doesn't happen to OUR cooler" better be DAMN sure before they mouth off no matter which way the thing is oriented!
 
Speculation from Igor Wallossek:

Screenshot_20230103-093139_Opera.jpg


https://www-igorslab-de.translate.g...l=auto&_x_tr_tl=en&_x_tr_hl=en-GB#post-200117

Via CompuBase & VideoCardz

https://www-computerbase-de.transla..._tl=en&_x_tr_hl=en-GB#update-2023-01-02T22:07

https://videocardz.com/newz/amd-par...adeon-7900-gpus-is-affected-by-thermal-issues

AMD partner suspects a faulty batch of reference Radeon 7900 GPUs is affected by thermal issues​


Igor weighs in by sharing a note from a board partner. It is reported that the undisclosed OEM suspects at least one faulty batch of Radeon 7900 series might have left the factory. The possible issue being described is insufficient coolant added to the vapor chamber.
 
Last edited:
AMD has a long history of mauling launches, which nVidia also has been trying diligently to do as well.

It seems like the early adopter fees are almost a 100% certainty anymore for all products….
 
Pretty bad. How much does that cut performance? I'm guessing it pretty much makes it run like a 7900XT?

Does AMD even have a solution for this right now? I'd be worried about doing a RMA and just getting the same problem again since it seems to be a design flaw.
On mine, it would hit 110c after a few minutes, then throttle to 307w and run between 2200-2400mhz core -- so performing nearly exactly like a 7900xt.
 
That was really cool to see. That said, did not tell us anything. I still don't get why my gets 110C junction temp, and does not throttle.
He seems to think that there's not enough water in the vapor chamber which might cause the high temp issue.
 
He seems to think that there's not enough water in the vapor chamber which might cause the high temp issue.
Well, I understand that. However, cutting it open like that proves nothing. Really worth nothing more than funsies to see inside, which is cool.

Not sure what type of vacuum equipment you would need to properly measure the liquid contents properly, then know the specification at what level it should be either.
 
Well, I understand that. However, cutting it open like that proves nothing. Really worth nothing more than funsies to see inside, which is cool.

Not sure what type of vacuum equipment you would need to properly measure the liquid contents properly, then know the specification at what level it should be either.
That's what I was thinking. How exactly does he know how much liquid should be in there, and cutting it open isn't exactly precision work.

Also, I wonder if those of us that aren't getting 110C throttling are sitting on a timebomb. Do these lose liquid over time like AIOs or are they good indefinitely if it's working correctly?
 
Also, I wonder if those of us that aren't getting 110C throttling are sitting on a timebomb. Do these lose liquid over time like AIOs or are they good indefinitely if it's working correctly?
TMK, I have never seen a properly sealed vapor chamber having evaporation issues. That said, maybe that is the issue? Just a thought, not even sure that is possible. I think it is either sealed or not.

What is odd, is mine gets to 110C junction, but does not throttle. Confusing to say the least.
 
That's what I was thinking. How exactly does he know how much liquid should be in there, and cutting it open isn't exactly precision work.

Measuring the liquid inside the vapor chamber is an interesting problem. Get a good sample and get a bad sample and try to measure the liquid in both. Weigh the vapor chamber before destruction, then drill out a hole (save the drill shavings to reweigh later), heat up the vapor chamber to evaporate the liquid, and reweigh again?

edit: probably try this with several good and several bad samples to see how much the number vary
edit2: maybe instead of drilling and worrying about shavings its better to just cut the chamber with metal-cutting scissors?
 
Last edited:
Not sure what type of vacuum equipment you would need to properly measure the liquid contents properly, then know the specification at what level it should be either.
Having said that, there's probably plenty of people with the proper equipment to drill a small hole in the chamber, add some more water or whatever in, and then re-apply vacuum and seal it back up for testing.
 
  • Like
Reactions: noko
like this
A 3D x-ray diffraction scan might be able reveal the phase change behavior spatially. I've done this before for other devices e.g. battery cells and these phase change coolers don't have much metal to punch through so a reasonably high resolution scan should be possible.
 
A 3D x-ray diffraction scan might be able reveal the phase change behavior spatially. I've done this before for other devices e.g. battery cells and these phase change coolers don't have much metal to punch through so a reasonably high resolution scan should be possible.
Maybe Tech Jesus can find somebody willing to do that like he did with the 16-pin connector.
 
Having said that, there's probably plenty of people with the proper equipment to drill a small hole in the chamber, add some more water or whatever in, and then re-apply vacuum and seal it back up for testing.
How would you do that? Drilling a hole and adding water would be easy. Vacuum chambers are also readily available. I find it hard to think of a practical way to reseal it while under vacuum without a specialized machine though. I suppose you could boil the water inside to force the air out, then seal it, rather than using a vacuum chamber, but it would be hard to control the amount of water remaining inside the heatsink.
 
Edit: I didn't think this thru enough, margin too small

With a sufficient sample size, oughtn't it be possible to find a pattern based only on thermal performance and mass? If the rest of the tolerances on the heatsink are tight enough, the "not enough liquid inserted during manufacturing" theory could be somewhat validated if all the worst-performing heatsink samples are at the same end of the bell curve for mass. Dunno if that would be viable though, if the average difference is only a gram or two a pattern may not show up.
 
Last edited:
With a sufficient sample size, oughtn't it be possible to find a pattern based only on thermal performance and mass? If the rest of the tolerances on the heatsink are tight enough, the "not enough liquid inserted during manufacturing" theory could be somewhat validated if all the worst-performing heatsink samples are at the same end of the bell curve for mass. Dunno if that would be viable though, if the average difference is only a gram or two a pattern may not show up.
There's probably too much variation in the mass of the copper plate and fins to get a good reading on the liquid mass.
 
There's probably too much variation in the mass of the copper plate and fins to get a good reading on the liquid mass.
Makes sense yeah. I thought about it a little more and a gram or two difference on a base mass of like 1.5KG(?) is a pretty miniscule margin.
 
Back
Top