Analysis of RDNA 3 vs RDNA 2 by a blogger:
https://hole-in-my-head.blogspot.com/2023/11/the-performance-uplift-of-rdna-3-over.html?m=1
Straight away, the improvement to the FP32 pipeline is immediately obvious in almost all instruction types - though some have a greater improvement than others. What is interesting to me is the Int 16 improvement in RDNA 3 which I have not seen mentioned anywhere. An additional curiosity is the lack of gains in FP64 (not that it's really that useful for gaming?) given that I've seen it said that the dual-issue FP32 can run as an FP64 instruction as long as the driver is able to identify it. So, maybe this is purely the way this programme is written.
Cyberpunk, especially, seems to enjoy running on the RDNA3 architecture with around a 20 % performance increase over RDNA2 - something not shown before when testing the RX 7600.
Starfield sees another 20 % increase*, likely to do with some of the reasons outlined by Chips and Cheese regarding RDNA2's bottlenecks.
Alan Wake 2 also shows a good 15 % increase between the two architectures.
Finally, Metro Exodus sees an above 10 % improvement, with increasing performance as the difficulty of running the game gets harder at higher settings. This potentially indicates that with heavier workloads, the gap widens between the two architectures when given the same resources.
Speaking of operating frequencies, we see an interesting behaviour in the RDNA 3 part - at lower core (shader) clocks, the front-end is essentially equal. Whereas, as the shader frequency increases, the front-end frequency moves further ahead so that, by the time the shader clock is around 2050 MHz, the front-end is ~2300 MHz. Additionally, though I've not shown it below, at stock, the front-end reaches ~2800 MHz when the shader clock is ~2400 MHz.
This seems like a power-saving feature to my eyes - and it's not necessary to raise the front-end clock when the workload is light or non-existant!. There's no benefit!
What's interesting here, is that Chips and Cheese documented that the cards based on N31 also use this same trick whereas the N33-based RX 7600 actually clocked the front-end consistently lower than the shader clock, whilst also having lower latency than the N22 (RDNA 2) cards it was succeeding. Implying that there's some architectural improvement in how the caches are linked.
https://hole-in-my-head.blogspot.com/2023/11/the-performance-uplift-of-rdna-3-over.html?m=1
Straight away, the improvement to the FP32 pipeline is immediately obvious in almost all instruction types - though some have a greater improvement than others. What is interesting to me is the Int 16 improvement in RDNA 3 which I have not seen mentioned anywhere. An additional curiosity is the lack of gains in FP64 (not that it's really that useful for gaming?) given that I've seen it said that the dual-issue FP32 can run as an FP64 instruction as long as the driver is able to identify it. So, maybe this is purely the way this programme is written.
Cyberpunk, especially, seems to enjoy running on the RDNA3 architecture with around a 20 % performance increase over RDNA2 - something not shown before when testing the RX 7600.
Starfield sees another 20 % increase*, likely to do with some of the reasons outlined by Chips and Cheese regarding RDNA2's bottlenecks.
*This was performed before the current beta update...
Alan Wake 2 also shows a good 15 % increase between the two architectures.
Finally, Metro Exodus sees an above 10 % improvement, with increasing performance as the difficulty of running the game gets harder at higher settings. This potentially indicates that with heavier workloads, the gap widens between the two architectures when given the same resources.
Speaking of operating frequencies, we see an interesting behaviour in the RDNA 3 part - at lower core (shader) clocks, the front-end is essentially equal. Whereas, as the shader frequency increases, the front-end frequency moves further ahead so that, by the time the shader clock is around 2050 MHz, the front-end is ~2300 MHz. Additionally, though I've not shown it below, at stock, the front-end reaches ~2800 MHz when the shader clock is ~2400 MHz.
This seems like a power-saving feature to my eyes - and it's not necessary to raise the front-end clock when the workload is light or non-existant!. There's no benefit!
The core clocks for the RX 6800 and the core vs front-end frequencies for the RX 7800 XT in Metro Exodus... |
What's interesting here, is that Chips and Cheese documented that the cards based on N31 also use this same trick whereas the N33-based RX 7600 actually clocked the front-end consistently lower than the shader clock, whilst also having lower latency than the N22 (RDNA 2) cards it was succeeding. Implying that there's some architectural improvement in how the caches are linked.
Conclusion...
In this very empirical overview, it is clear that, ignoring the increase in core (shader) frequencies, the RX 7800 XT has an architectural performance increase over the RX 6800. This also extends to the full N31 product (7900 XTX) as well. However, AMD's choice in reducing the L3 cache sizes for Navi 31 and Navi 32 appears to significantly hinder their overall performance. Additionally, the choice to move that L3 cache onto chiplets has resulted in a significant increase in energy use, and an over-dependence on the bandwidth to those chiplets. It also appears to be the case that there is an overhead for fully utilising the L3 cache, with performance dropping even before that limit is reached.I didn't mention it in this blogpost, but the RX 7600 also doesn't have N31 and N32's increased vector register file size (192 KB vs 128 KB). However, since I don't have an understanding of how I could measure the effect of this on performance, I have decided to gloss over it - especially since Chips and Cheese do not appear to be overly concerned about it affecting N33's performance due to its lower CU count and on-die L3 cache.What does appear to affect performance negatively is the choice to not clock the front-end higher on N33 and this is likely the source of a good amount of the observed performance bonus between the RDNA 2 and 3.
So, where does this leave us?
From my point of view, it appears that AMD made some smart choices for RDNA 3's architectural design which are then heavily negated by the inflexibility caused by going with the chiplet design and the need to bin/segregate in order to make a full product stack. Moving to chiplets has also had the knock-on effect of increasing power draw (and likely heat), which has a negative impact on the ideal operating frequencies that each design can work at which has hindered the performance of the card. Just looking back at Metro Exodus, increasing the stock settings on the RX 7800 XT to +15 % power limit increases performance by 4 % (though this is only 3 fps!) showing that the card is still power limited as-released and may potentially see a bigger benefit to reducing operating voltage than RDNA 2 cards did.Additionally, the RX 7600 appears hamstrung by the lack of increased front-end clock - perhaps due to power considerations? - and it is the choice to decouple front-end and shader clocks that seems to me to be the biggest contributor of RDNA 3's architectural uplift as it is this aspect which appears to allow the other architectural improvements to low-level caches and FP32 throughput to really shine.
Last edited: