Hawaii matching Maxwell 2?? What's happening under Ashes of the Singularity?

So, is Epic and their Unreal Engine supporting asynchronous shaders on console, but artificially not on PC? Or not at all?

Async shaders is a pretty big thing on consoles for titles in production right now. Also AMD did make an Unreal Engine 3 Async plugin with Square Enix and their title Thief as a proof of concept so we know it is very possible.

Now I'm curious.

Not only on consoles, but on PC too if the VR revolution is coming as advertised (Valve Vive, Oculus rift on PC and Project Morpheus on PS4). On PS4, async shaders reduced the latency of 6ms per frame (10ms per frame in the stress test) in "The Tomorrow Children" according to the developer. Latency reduction is one of the key elements of getting a comfortable VR experience and in addition it gives performance increase:
http://gearnuke.com/tomorrow-childr...its-multiplatform-development-across-ps4xbo/#

I doubt PS4 developers that makes the game Project Morpheus comaptible will give up this latency reduction and performance increase async shaders give. With DX12, async shaders can be supported on PC as well, so we might see more and more ports with async shaders when Project Morpheus hits the market.
 
IP :p

So far, however, Oxide has been quite open. They've shared information about their CPU Optimizations after I emailed them and now their GPU optimizations after I emailed them.

What a great developer :)

Hmm its not IP, the profiler won't show any code, or anything liek that, it will show how the shaders are broken up into individual parts and show what is taking how long, and this is something management might not want to share at this point because they still aren't do with the game.
 
Thanks OP. Appreciate the work and insight that went into this. I wouldn't be too concerned with the characters crying bloody murder. I've been around the block a couple of times and use to subscribed to Rage3D, Anandtech, NVNews, and hard forums back in the college days when I worked as a server. Doing the work and sharing it across the different boards was nothing new back then. In fact, quite of few of the contributors back in the day that did this went on to actually work for Nvidia and AMD (formally ATI.)

It clear that you're into this and it shows.


As far as HyperQ goes, I know some of the features that work on the Tesla line of GPUs are disabled or cut off from their Geforce line. Some exceptions would be GK208 that supports the features though you need some modifications to the drivers to make it work. I wonder how that plays into it were someone try the test on GK208 with modifications to take advantage of the full full HyperQ/Parallelism feature set.
 
Last edited:
I think the main issue is the fact that the AWSs reside within the SMMs thus their ability to function out of order and correct themselves are limited.

Think about it. An AWSs wouldn't have much of a problem doing this for the CUDA cores which reside within its SMM block. But the moment it would attempt to communicate outside of its own SMM, it would hit that singular L2 shared cache block. I don't think Maxwell2 was designed with that amount of shared load onto that L2 cache block.

You would need to remove the AWSs from the SMMs in order to grant them this out of order error checking capability.

Perhaps create caching pools around the SMMs which are interlinked together. Creating a ring. GCN does this be granting the ACEs access to various cache and memory pools with the ACEs being outside of the Shader engines forming a horizontal line above them all.

An article, not long ago, discussed a 20ms latency for Maxwell2 handling Asynchronous Time Warp for VR. Perhaps we're seeing something of the sort in the Oxide comments.

As for the good old days, yeah... I miss them as well. Back then, threads such as these, would have been filled with various theories. I don't know about what you think, but I enjoyed them.

Ps. Thank you for the kind words.
 
Final conclusion:

A GTX 980 Ti can handle both compute and graphic commands in parallel. What they cannot handle is Asynchronous compute. That's to say the ability for independent units (ACEs in GCN and AWSs in Maxwell/2) to function out of order while handling error correction.

It's quite simple if you look at the block diagrams between both architectures. The ACEs reside outside of the Shader Engines. They have access to the Global data share cache, L2 R/W cache pools on front of each quad CUs as well as the HBM/GDDR5 memory un order to fetch commands, send commands, perform error checking or synchronize for dependencies.

The AWSs, in Maxwell/2, reside within their respective SMMs. They may have the ability to issue commands to the CUDA cores residing within their respective SMMs but communicating or issueing commands outside of their respective SMMs would demand sharing a single L2 cache pool. This caching pool neither has the space (sizing) nor the bandwidth to function in this manner.

Therefore enabling Async Shading results in a noticeable drop in performance, so noticeable that Oxide disabled the feature and worked with NVIDIA to get the most out of Maxwell/2 through shader optimizations.

Its architectural. Maxwell/2 will NEVER have this capability.
 
I wonder if this is part of the reason Nvidia rejected Mantle, as it would have shown off these issues before DX12.

Regardless, thank you very much for taking the time to research the issue :)

Most people just want games to work and to get the most out of their hardware, its only the vocal few fanboys that try to stop any actual discussions so I'm glad this thread has survived.
 
Thank razor 1 as well. We needed razor 1 in order to bounce ideas and find the true culprit.

And you're welcome.
 
So, is Epic and their Unreal Engine supporting asynchronous shaders on console, but artificially not on PC? Or not at all?

Async shaders is a pretty big thing on consoles for titles in production right now. Also AMD did make an Unreal Engine 3 Async plugin with Square Enix and their title Thief as a proof of concept so we know it is very possible.

Now I'm curious.

Why do people find this weird? Nvidia has done this already with Assassin's Creed Dx 10.1 implementation.
 
So where are the critics now that Kollock from Oxide responded and confirmed Mahigan's findings?

This is also very interesting:

Kollock said:
I suspect that one thing that is helping AMD on GPU performance is D3D12 exposes Async Compute, which D3D11 did not. Ashes uses a modest amount of it, which gave us a noticeable perf improvement. It was mostly opportunistic where we just took a few compute tasks we were already doing and made them asynchronous, Ashes really isn't a poster-child for advanced GCN features.

Our use of Async Compute, however, pales with comparisons to some of the things which the console guys are starting to do. Most of those haven't made their way to the PC yet, but I've heard of developers getting 30% GPU performance by using Async Compute. Too early to tell, of course, but it could end being pretty disruptive in a year or so as these GCN built and optimized engines start coming to the PC. I don't think Unreal titles will show this very much though, so likely we'll have to wait to see. Has anyone profiled Ark yet?

In the end, I think everyone has to give AMD alot of credit for not objecting to our collaborative effort with Nvidia even though the game had a marketing deal with them. They never once complained about it, and it certainly would have been within their right to do so.

AFAIK, Maxwell doesn't support Async Compute, at least not natively. We disabled it at the request of Nvidia, as it was much slower to try to use it then to not.

WOMP WOMP WOMP
 
I think you can feel pretty safe that they'll not gimp their engine. It's not like the first release is the end of all improvements.

On PC, considering Epic's traditional and extensive partnership with nvidia they could and wouldn't be surprised at all if they at least delay implementing it as long as they possibly could while nvidia continue to scramble on something to help themselves. In doing so Epic would look really bad compared to what other engines would be capable of mind you so it would be an interesting scenario. Async shaders are a big thing on consoles right now and contribute to them being able to achieve 60fps in some games running on just an APU! Imagine what can be done on full discrete solutions.

Not only on consoles, but on PC too if the VR revolution is coming as advertised (Valve Vive, Oculus rift on PC and Project Morpheus on PS4). On PS4, async shaders reduced the latency of 6ms per frame (10ms per frame in the stress test) in "The Tomorrow Children" according to the developer. Latency reduction is one of the key elements of getting a comfortable VR experience and in addition it gives performance increase:
http://gearnuke.com/tomorrow-childr...its-multiplatform-development-across-ps4xbo/#

I doubt PS4 developers that makes the game Project Morpheus comaptible will give up this latency reduction and performance increase async shaders give. With DX12, async shaders can be supported on PC as well, so we might see more and more ports with async shaders when Project Morpheus hits the market.

Agreed, latency reduction is great for game emersion and VR. Until I learned Thief under Mantle was using Async Shaders I didn't know why, but DAMN the feel of the game was awesome and totally different between DX11 and Mantle in addition to getting rid of all stuttering in the game. After playing Thief I'm very, very much looking forward to other Async Shaders enabled games. FPS is fine, but after that having reduced latency feels really great.

And yes, for VR things like eliminating the head moving lag is paramount to enjoying it.
 
Last edited:
Something about AotS benchmarks doesn't make sense.

What else is giving AMD GPUs a boost in DX12 besides async compute? Even with AMD's own Mantle in Thief and BF4 which support async compute, the 290X doesn't beat a 980 and the Fury X is just behind a 980ti / Titan X.
 
Although Mantle could, technically, allow devs to use Async Compute, the devs only ever used tier 2. That means overlapping the compute and graphic workloads. Maxwell/2 can do this as well under DX12. It is achieved by allowing the ACEs to process up to 8 waveforms (compute commands) each per cycle and 1 graphics command by the Graphics Command Processor. NVIDIA Maxwell/2 achieve this as well with their Asynchronous Warp Schedulers. Capable of 31 compute commands and 1 graphic command per cycle. Basically, on a cuda core or CU level, these commands are processed "in sync" or "in order". You end up with relying on dependencies (if you have a compute command which relies on the result of another compute command or graphic command then you get a pause while the other command completes).

What ACEs can do differently is act independently "out of order" and finish the compute command, not relying on the result of another compute or graphic command, freeing up resources quicker. The ACEs then check for correction (sort of like re-issuing a correction).

It's like an extra degree of efficiency on top of the Tier 2 level of DX12. This makes GCN Tier 3 when it comes to async compute. Their architecture is truly asynchronous. Down to the CUs. How? DMA engines. GCN has two. So the architecture can operate in a bindless format (independent). Maxwell 2 also has two DMA engines but it looks like NVIDIA may have borked the implementation.

http://s29.postimg.org/hsmlfnv7b/rbt.jpg
 
Last edited:
Maxwell2 was supposed to be bindless too... But that doesn't appear to be the case (also has two DMA engines).
 
See I based my assumptions of Maxwell2 supporting Async Compute bindless format from Anandtech as well as this post by a dev:

"Here is a simple FAQ:
- Is D3D12 require new hardware?
No! The API will works fine with the existing GPUs if the D3D12 driver exist for them. The actual hardware support already announced.
- What about the features?
Some feature will go, and some feature will come.
The low-level APIs will simplify the access to the hardware. In the past, many new features came to the API because the driver actually hid the GPU memory from the application. So every new thing had to be implemented in the API, and then a new driver introduced the support for it. After this the application can access the new feature. D3D12 will allow explicit access to the GPU memory so some earlier features will not accessible in D3D12 in their "traditional D3D11 form". But this is not a problem, because with explicit memory access all of these (and many more) can be implemented in the application. For example tiled resources will be gone in the actual form, but it is possible to write an own implementation for it.
The resource model will be also advancing, so for example Typed UAV Load will be a new feature.
- Are these new features will require new hardware?
The best answer is yes and no. This is a complicated question, and hard to answer it when the specs are not public. But let's say Typed UAV Load will require hardware support. The GCN based Radeons can support it, as well the Maxwell v2 (GM206/GM204) architecture. Maybe more hardware can access the feature from NVIDIA, but I don't know because they don't disclose what possible with Maxwell v1/Kepler/Fermi. Intel might support it, but I'm not familiar with these iGPUs.
But many of these new features can be... I don't want to say emulated, but some workaround is possible. So even if the hardware support is not present, the actual effect might be executable on all GPUs. Of course, these workarounds will have some performance hit.

These are the most important things to know.

There are some other important things like the binding model. I have read frequently that D3D12 is bindless. No it's not. Bindless is only possible with AMD GCN, and NV Kepler/Maxwell. D3D12 is a universal API, so bindless is not suitable for it. But this doesn't mean that the D3D12 binding model is bad. It's actually very nice.

In this PDF you can see the resource binding tiers at page 39 (if you don't want to download the file than here is an image). This is the D3D12 binding table, and the GPUs must support one of these tiers.
Most of the GPUs support the first tier or TIER1.
Maxwellv2 (GM206 and GM204) support the second tier or TIER2.
All GCN-based Radeons support the third tier or TIER3.
I'm expect that all future hardware will support TIER3.

One more thing. We all know that D3D12 is built for efficiency. Yep, this is true, but Microsoft only talk about the batch performance. Everybody knows the advantages, it will mostly! eliminate the limitations on the CPU side.
There are two other features in D3D12 that will eliminate the limitations on the GPU side! These will help to speed up the rendering even when the application seems to be limited by the GPU.
These optional features are called asynchronous DMA and asynchronous compute. Simple definitions:
- Asynchronous DMA will allow data uploads without pausing the whole pipeline. It will need two active DMA engines in the GPU so this feature is supported by all GCN-based Radeons or Maxwellv2(GM206/GM204)-based GeForce. Most modern NVIDIA GPUs use two DMA engines, but one of these disabled on the GeForce product line, so in the past this was a professional feature. On the GM206/GM204 GPUs the two DMA not just present in the hardware but activated as well.
- Asynchronous compute allow overlapping of compute and graphics workloads. Most GPUs can use this feature, but not all hardware can execute the workloads efficiently. The GCN-based Radeons with 8 ACEs! are very good at this in my own tests.

I can't tell you more, because there is an embargo for some infos.

If you want to ask what GPU is the best for D3D12 at present, than I will say go for a GCN-based Radeon (prefer GPUs with 8 ACEs) or a Maxwellv2(GM206/GM204)-based GeForce. These are the most future-proof architectures now, so these will support a higher resource binding tier and most of the optional D3D12 features."
 
When I saw the performance in Ashes of the Singularity I was stunned. Maxwell 2 was supposed to support this...

Ergo I began to theorize why it didn't.

I came here and made my post. I posted everywhere. Hoping someone would help me she'd some light.

Everyone was as stunned as I was. When Oxide responded, after my email, they mentioned that Nvidia can't do it. It doesn't work.

So now I'm hoping NVIDIA sees all this and clarifies the situation. It seems like their bindless implementation is broken. In hardware. So much so that Nvidia asked Oxide to turn off bindless and instead worked with Oxide to optimize as best they could.

Now I'm theorizing as to why they can't. I think it's the L2 caching limitations. It could also be an issue with their second DNA ENGINE.
 
The whole point is to illicite a response by NVIDIA who seem to be quiet about all this. Which again bolsters the argument that it is broken at the hardware level.
 
I don't want to be right. I want them to respond. It is in all of your interests that they do respond.

It isn't fair to all the people who bought, or are thinking on buying, a GTX 980 or 980 Ti.

I think NVIDIA wants to bury this. If we want an answer... We have to make some noise.

We shouldn't argue. We should demand a response.
 
Last edited:
I don't want to be right. I want them to respond. It is in all of your interests that they do respond.

It isn't fair to all the people who bought, or are thinking on buying, a GTX 980 or 980 Ti.

I think NVIDIA wants to bury this. If we want an answer... We have to make some noise.

We shouldn't argue. We should demand a response.

Yes me too. Although I've been enjoying my GTX 970 since December, I bought because I thought it had 4 GB of memory and I looked up the dx12 Tier binding, seeing it had tier 2 and compared it to the difference between the tiers, and that was a buying decision factor.

Then you have nVidia retcon the memory and perhaps now the dx12 tier level. :eek:

For a hardware vendor, "fixing it with software" isn't a good solution.
 
I think its a driver related issue, if you look at B3D's thread on Dx12 performance, MDolenc made a small async program and testing it under different vendors, Maxwell 2 shaders performance with regards to latency is better then GCN when doing one of the type of shader; compute or graphics, which speaks to async shaders doing their job, but when combining the two together, they don't perform as well. I don't see why hardware wise this combination would hurt from an architecture point of view from what we have seen so far.

And this was an assumption I was thinking about originally, shader compiler will be very different for Maxwell 2 compared to Maxwell and Kelper because of Async shaders units, I think, not sure cause that is not my expertise.

This should be easily seen in engines using VXGI too, because it is a heavily compute based global illumination system. I will have more info when UE4 4.9 comes out officially with Dx12 support, we are using VXGI (I've tried the 4.9 preview, too buggy for my liking)
 
Last edited:
I don't want to be right. I want them to respond. It is in all of your interests that they do respond.

It isn't fair to all the people who bought, or are thinking on buying, a GTX 980 or 980 Ti.

I think NVIDIA wants to bury this. If we want an answer... We have to make some noise.

We shouldn't argue. We should demand a response.
Nvidia already said AotS doesn't represent DX12 performance overall, so there's your response. If you want a technical response to their asynchronous problems then you'll need to wait for a few more benchmarks to prove that a problem actually exists. Nvidia's current stance is that there is no problem.

But let's be honest here, what else would you expect them to say? They won't release more info until the "scandal" goes public... Same thing that happened with the 970 a few months ago.
I've never seen a benchmark as over-analyzed as this one.
 
Has this topic actually graduated to scandal? :p
I'm seeing the same names across forums and reddit now hahah.
 
Has this topic actually graduated to scandal? :p
I'm seeing the same names across forums and reddit now hahah.
We've reached the point where Nvidia has possibly lied about the specs of their GPUs, I'm calling that a scandal.
 
Nvidia already said AotS doesn't represent DX12 performance overall, so there's your response. If you want a technical response to their asynchronous problems then you'll need to wait for a few more benchmarks to prove that a problem actually exists. Nvidia's current stance is that there is no problem.

But let's be honest here, what else would you expect them to say? They won't release more info until the "scandal" goes public... Same thing that happened with the 970 a few months ago.
I've never seen a benchmark as over-analyzed as this one.



When Oxide’s Ashes of the Singularity benchmark tool was released, NVIDIA issued a statement claiming that it does not consider this particular benchmark to represent what DX12 can achieve on its hardware. However, it seems that a lot has been going on in the background. According to one of Oxide’s developer, NVIDIA was pressuring Oxide to remove certain settings in its benchmark.

As Oxide’s developer claimed, NVIDIA’s PR department put pressure on the team in order to disable certain settings in its Ashes of the Singularity benchmark. Oxide refused to do so, which basically led to NVIDIA’s statement regarding Ashes of the Singularity.

http://www.dsogaming.com/news/oxide-developer-nvidia-was-putting-pressure-on-us-to-disable-certain-settings-in-the-benchmark/

only after they told nvidia they are not turning off settings.
 
Robert Hallock (from AMD)

https://www.reddit.com/r/AdvancedMi...ide_games_made_a_post_discussing_dx12/cul9auq

Oxide effectively summarized my thoughts on the matter. NVIDIA claims "full support" for DX12, but conveniently ignores that Maxwell is utterly incapable of performing asynchronous compute without heavy reliance on slow context switching.

GCN has supported async shading since its inception, and it did so because we hoped and expected that gaming would lean into these workloads heavily. Mantle, Vulkan and DX12 all do. The consoles do (with gusto). PC games are chock full of compute-driven effects.

If memory serves, GCN has higher FLOPS/mm2 than any other architecture, and GCN is once again showing its prowess when utilized with common-sense workloads that are appropriate for the design of the architecture.
 
"Common-sense" workloads - sounds like a jab at tessellation haha :p
 
Wait, this is a scandal now? When I saw the DX12 benchmark the 390x and 980 were neck and neck, just like they always have been in [H] reviews. The 390x was god awful in DX11 though.

Did I miss some benchmarks or new developments or something?
 
Wait, this is a scandal now? When I saw the DX12 benchmark the 390x and 980 were neck and neck, just like they always have been in [H] reviews. The 390x was god awful in DX11 though.

Did I miss some benchmarks or new developments or something?
Issues with the 970 broke maybe a month or two (late 2014) before the press picked it up, and things escalated from there. This is the same situation, except we're patient zero, the origin. Well not HardForum in particular, but multiple enthusiast tech boards across the web.

Eventually this news will circulate and then we'll get the information we need.
If this ends up not being complete FUD then yes, we're about to see another shitstorm for Nvidia over the coming months.
 
Issues with the 970 broke maybe a month or two (late 2014) before the press picked it up, and things escalated from there. This is the same situation, except we're patient zero, the origin. Well not HardForum in particular, but multiple enthusiast tech boards across the web.

Eventually this news will circulate and then we'll get the information we need.
If this ends up not being complete FUD then yes, we're about to see another shitstorm for Nvidia over the coming months.

Yeah, that worked out so badly for NVIDIA. :rolleyes:

The 970 issue affected a very niche proportion of the total users, basically SLI users who use 1440p and above. Those have probably died out now ever since the 980Ti was released.

Big freaking deal. NVIDIA suffered so hard because of it. Oh wait, they didn't. Because the non-issue affected a population so minute it didn't even matter. Did anybody actually bother to check the issue again after NVIDIA promised they'll deliver better memory allocation algorithms for the 970? I don't think so. After the bandwagoners stopped the drums, literally nobody cared.

From my POV it seems to me that StarDock is just trying to ride along for free press. Whoop-de-doo. Let's see if their new game will move as many units as Sins of a Solar Empire. With what I'm seeing it's gonna be even more niche than Sins was, simply because of the hardware requirements.

In other news, soon ARK will have a DX12 patch, let's wait and see whether the same pattern emerges.
 
Yeah, it was a niche problem that caused momentary outage before it faded away.
But this problem will effect all Nvidia owners (mostly Maxwell2.0) and could potentially effect a lot of DX12 games going forward. ~30% performance gives AMD a huge advantage.

Not so easy to sweep this one under the rug. Unfortunately we are many months, maybe a year, away from the problem becoming a reality so most people won't care. Nvidia is banking on getting Pascal out the door before anything comes of this story.
 
Oxide effectively summarized my thoughts on the matter. NVIDIA claims "full support" for DX12, but conveniently ignores that Maxwell is utterly incapable of performing asynchronous compute without heavy reliance on slow context switching.

WOW!

This is big. Bigger than I'd even thought when all this began by nvidia publicly trying to shame Oxide.

At this point I'm also waiting on an answer to this.
 
Yeah, that worked out so badly for NVIDIA. :rolleyes:

The 970 issue affected a very niche proportion of the total users, basically SLI users who use 1440p and above. Those have probably died out now ever since the 980Ti was released.

Big freaking deal. NVIDIA suffered so hard because of it. Oh wait, they didn't. Because the non-issue affected a population so minute it didn't even matter. Did anybody actually bother to check the issue again after NVIDIA promised they'll deliver better memory allocation algorithms for the 970? I don't think so. After the bandwagoners stopped the drums, literally nobody cared.

From my POV it seems to me that StarDock is just trying to ride along for free press. Whoop-de-doo. Let's see if their new game will move as many units as Sins of a Solar Empire. With what I'm seeing it's gonna be even more niche than Sins was, simply because of the hardware requirements.

In other news, soon ARK will have a DX12 patch, let's wait and see whether the same pattern emerges.

So you are saying its OK to Lie and misrepresent your hardware because not enough people care or are informed enough to realize there are issues? Great. Well I guess they do sucker people into buying the Titan's still so it must work well for them :)
 
Yeah, it was a niche problem that caused momentary outage before it faded away.
But this problem will effect all Nvidia owners (mostly Maxwell2.0) and could potentially effect a lot of DX12 games going forward. ~30% performance gives AMD a huge advantage.

Not so easy to sweep this one under the rug. Unfortunately we are many months, maybe a year, away from the problem becoming a reality so most people won't care. Nvidia is banking on getting Pascal out the door before anything comes of this story.

Let's see how many games will actually take advantage of this before making this a problem.

30% performance only bring AMD to parity in most cases, which is good. I actually prefer this because then there'll be a price war, and I'm looking to upgrade soon-ish.

I'm sure there won't be many games that's DX12 exclusive coming out in the next few years. NVIDIA can still rely on their strong DX11 performance to carry them through peoples' current hardware cycle.

Now, if Pascal has the same design, then you will have a point. But with so many unknowns, how can you classify this as a problem?
 
Maxwell2 was supposed to be bindless too... But that doesn't appear to be the case (also has two DMA engines).

What do you mean "bindless"? You used that term a lot. It generally has a certain meaning in graphics programming and it's not the same context that you're using here, so I'm interested in knowing what you're referring to.

When I saw the performance in Ashes of the Singularity I was stunned. Maxwell 2 was supposed to support this...

Other than the poor DX11 performance for AMD, what is so stunning about the benchmark? Doesn't it show Fury X and the 980 Ti at about the same FPS? Isn't that ordinary?
 
Heyyo,



Lol cool story bro. You make it sound like AMD has never put out a bugged driver. NVIDIA already fixed the bug in their drivers for Kepler GPUs. :p

-shill speak-

The bolded is false, i own a GTX690 anything past 347.52 is garbage for me on EVERY game I own.
 
It has been mentioned but I haven't found any information regarding why that is the case (just a simple bug? incorrect information?). What would be interesting is if there is actually a technical reason requiring vsync.

I actually have another question as well. Does Ashes of Singularity have a playable alpha currently? If so is DX12 or Mantle support enabled for it? How are the results in actual game play?
 
What ever oxide is doing is Not correct, Async code works fine on Maxwell 2, tested and verified at B3D, yes Maxwell 2 seems to have a hit with Async code when doing too much of it, but for the most part it has lower latency than GCN. The only way I think this will happen is if they go over the Queue amounts another words they are using shaders that would push the register and cache amounts of Maxwell 2, which is not necessary to do.
 
So too many out of sequence requests can flood the Maxwell 2 queue, while GCN can keep handling them as they come? If so, seems this would coordinate with the Context Switching limitation workaround nvidia put in place.
 
Back
Top