Nvidia's RT performance gains not as expected?

Welp, this thread aged like fine milk left out in the desert sun on a brisk 120°F summer day.
Your point?
I seen this thread panning out perfectly. RT is not the magic switch that magically makes a game into a heaven experience but only one aspect that can or not improve the gaming experience. After 3 generations of Nvidia Hardware there is no clear RT paradise, compromises are still in general needed to use it. AMD hardware can have meaningful RT usage, even the anemic console RT hardware. Most still don't use it mainly because of performance issues or it just does not make a meaningful difference to them.
 
Your point?
I seen this thread panning out perfectly. RT is not the magic switch that magically makes a game into a heaven experience but only one aspect that can or not improve the gaming experience. After 3 generations of Nvidia Hardware there is no clear RT paradise, compromises are still in general needed to use it. AMD hardware can have meaningful RT usage, even the anemic console RT hardware. Most still don't use it mainly because of performance issues or it just does not make a meaningful difference to them.
I suggest you re-read the OP ,then re-read the post by LukeTbk.

OPs assumption that there was little to no RT performance gains in the 40 series is 100% wrong, whether you like it or not.
 
I suggest you re-read the OP ,then re-read the post by LukeTbk.

OPs assumption that there was little to no RT performance gains in the 40 series is 100% wrong, whether you like it or not.

The performance gains are definitely there and all line up to what's on paper if you play fully path traced games like Quake/Portal RTX. However, I think a lot of us here are missing the other point that's being made in that the higher level of RT TFlops has NOT resulted in a smaller performance penalty for turning on RT effects in non fully path traced games. It seems like whether your GPU has 10 RT Tflops or 1000 RT Tflops you are still going to take the same performance penalty for flipping on RT in a lot of games. I would like to see smaller penalties for turning on RT effects as we get higher and higher levels of RT Tflops but I'm sure there are other technical reasons beyond my understanding for why that isn't possible. Just kinda sucks because imagine that even if you had an RTX 9090 with 1000 RT TFlops you still take a -50% hit to your frame rate for turning on RT.
 
  • Like
Reactions: noko
like this
However, I think a lot of us here are missing the other point that's being made in that the higher level of RT TFlops has NOT resulted in a smaller performance penalty for turning on RT effects in non fully path traced games. It seems like whether your GPU has 10 RT Tflops or 1000 RT Tflops you are still going to take the same performance penalty for flipping on RT in a lot of games
Are we talking by amount of ms used by RT-denoising using engine that let you look for a breakdown of what goes on in a frame or by % drop in fps ?

Taken a game that use the tech quite a bit like metro exodus without being portal rtx

metro-exodus-3840-2160.png
metro-exodus-rt-3840-2160.png
h

They go from about to
3090ti: 8.17ms -> 13.96 ms a frame
4080: 7.15ms -> 11.35 ms a frame

It could be way too simple to say that it took Y-X ms to do the raytracing-denoising part has maybe it removed \part of traditionnal render, has fps goes down the less large of a percentage of the frame are CPU ms has it had all the previous render to prepare the next frame and so on, but just for the rough talk.

3090ti added a 5.79 ms cost for raytracing, the 4080 added 4.2, that a 27% time reduction took to do the RT or the 3090ti was 38% slower to do it depending on the direction, is that a similar performance penalty or one significantly smaller ?

-37% can look similar to minus 42%, but the faster you go, much smaller the penalty in time budget by frame you need to achieve a similar reduction in fps %

That why a 6800 can have a -56%, same has a 7900xtx -55% but the 7900xtx is a completely different higher tier in raytracing performance
6800 : 12.82ms -> 29.2 ms a frame
7900xtx: 7.05ms -> 15.6 ms a frame

One added 17ms for that effect the other about half of that, the performance hit in ms budget was half.

Would we talk and think in millisecond in a frame that cost adding a visual option like game engine maker do, the perception would be maybe quite different from us being so used to talk in fps drop in percentage.

To think about it another way, imagine you play on a 120fps oled tv, you need to do everything in 8.33ms, what matter is how many ms did it cost you to raytrace, denoise, etc... versus the alternative and Lovelace use way less ms to do the same amount of RT-denoising and so on than Ampere did (or RDNA 3 versus RDNA 2).
 
Last edited:
Are we talking by amount of ms used by RT-denoising using engine that let you look for a breakdown of what goes on in a frame or by % drop in fps ?

Taken a game that use the tech quite a bit like metro exodus without being portal rtx

View attachment 547838View attachment 547839h

They go from about to
3090ti: 8.17ms -> 13.96 ms a frame
4080: 7.15ms -> 11.35 ms a frame

It could be way too simple to say that it took Y-X ms to do the raytracing-denoising part has maybe it removed \part of traditionnal render, has fps goes down the less large of a percentage of the frame are CPU ms has it had all the previous render to prepare the next frame and so on, but just for the rough talk.

3090ti added a 5.79 ms cost adding raytracing, the 4080 added 4.2, that a 27% time reduction took to do the RT or the 3090ti was 38% slower to do it depending on the direction, is that a similar performance penalty or one significantly smaller ?

-37% can look similar to minus 42%, but the faster you go, much smaller the penalty in time budget by frame you need to achieve a similar reduction in fps %

That why a 6800 can have a -56%, same has a 7900xtx -55% but the 7900xtx is a completely different higher tier in raytracing performance
6800 : 12.82ms -> 29.2 ms a frame
7900xtx: 7.05ms -> 15.6 ms a frame

One added 17ms for that effect the other about half of that, the performance hit in ms budget was half.

Would we talk, think in millisecond in a frame that cost adding an visual option like game engine maker do, the perception would be maybe quite different than used to talk in fps drop in percentage.

To think about it an other way, imagine you play on a 120fps oled tv, you need to do everything in 8.33ms, what matter is how many ms did it cost you to raytrace, denoise, etc... versus the alternative and Lovelace use way less ms to do the same amount of RT-denoising and so on than ampere did.

That's a great way to look at it. Very eye opening. So when can we get to the point where the cost of RT is negligible? Where it doesn't add another 3-4ms per frame. Is such a thing even possible? To have the cost of RT be under 1ms or 2ms.
 
That's a great way to look at it. Very eye opening. So when can we get to the point where the cost of RT is negligible? Where it doesn't add another 3-4ms per frame. Is such a thing even possible? To have the cost of RT be under 1ms or 2ms.
I think game maker, hardware maker and so on will always push the envelope for which 3-4 ms will never be negligible to some, that cost make impossible an 500-1000 fps on a monitor without some DLSS 3 type of boosting going on, but that goes at 120 fps instead of 170 fps (think enabling RT at Resident evil village on a 4080) and will certainly possible to be the type of cost of much heavier RT implementation this very decade, if they continue to double RT performance every 2 generation for a while.

Rt cores take a relatively small portion of today GPUs, I think only 10% of the massive 2080TI was used by the RT and tensor cores, would it be say 50% of a 4090 type of die, who knows what would be already possible.
 
I think game maker, hardware maker and so on will always push the envelope for which 3-4 ms will never be negligible to some, that cost make impossible an 500-1000 fps on a monitor without some DLSS 3 type of boosting going on, but that goes at 120 fps instead of 170 fps (think enabling RT at Resident evil village on a 4080) and will certainly possible to be the type of cost of much heavier RT implementation this very decade, if they continue to double RT performance every 2 generation for a while.

Rt cores take a relatively small portion of today GPUs, I think only 10% of the massive 2080TI was used by the RT and tensor cores, would it be say 50% of a 4090 type of die, who knows what would be already possible.

That's true. Seems like game makers just love to push things harder as we get faster hardware. Just look at Hogwart's Legacy. Although perhaps that game is just a case of really really bad optimization because for as good as it looks, I honestly cannot say that the visuals are so amazing that frame rates this low are justified. A 3090 Ti getting below 60fps without even using RT? And RT itself doesn't even include global illumination yet completely destroys even a 4090.

performance-3840-2160.png
 
The why make a 4090 level card if you cannot push 240fps at 4k via DP 2.1 narrative will be short lived it look like.
 
Back
Top