NVIDIA’s Neural Texture Compression for material texture compression

Software and hardware development have to progress each, they go hand in hand, sometimes one has to progress more than the other for a certain thing
Games in general in the last 5 years haven't progressed worth shit. They have only progressively gotten worse, and the developers have gotten more bold to release games in ever more broken states.

If we don't start voting with our wallets, things will get much, much worse.
 
Games in general in the last 5 years haven't progressed worth shit. They have only progressively gotten worse, and the developers have gotten more bold to release games in ever more broken states.

If we don't start voting with our wallets, things will get much, much worse.
I honestly think software development in general has gotten significant worse. Most applications are bloated and slow these days, I'm not really sure what the cause is but in general I think software is in a less stable, bloated state. Hardware has been basically the only thing that's kept us going.
 
I honestly think software development in general has gotten significant worse. Most applications are bloated and slow these days, I'm not really sure what the cause is but in general I think software is in a less stable, bloated state. Hardware has been basically the only thing that's kept us going.
Cost overruns and the search for lowest bidder contracting. AAA development costs have more than tripled in the past 7 years. What took one studio to do now takes 2, so they outsource or contract out far more than they used to.
But this boils down to a project management issue, project management and development require 2 different mindsets and very different skill sets. So as a whole software developers tend to be terrible project managers, and project managers usually fail to understand and accommodate the need of large software projects.

So ultimately everything being “Open World” is what is doing this, building and filling a world is hard and expensive. Testing it even more so.

This is where one of the big pushes for AI assisted development comes in. Developers can specify how many tasks or events they want in the world, specify the type of environment and give a list of creature assets and such and let the AI build it out accordingly and generate a map when done for the developers to work from. It’s complicated and cutting back that complexity is going to be key in getting things back on track.
 
I honestly think software development in general has gotten significant worse. Most applications are bloated and slow these days, I'm not really sure what the cause is but in general I think software is in a less stable, bloated state. Hardware has been basically the only thing that's kept us going.
Insanely unrealistic schedules, lack of appreciation for complexity, unfortunate staffing choices.
 
Cost overruns and the search for lowest bidder contracting. AAA development costs have more than tripled in the past 7 years. What took one studio to do now takes 2, so they outsource or contract out far more than they used to.
But this boils down to a project management issue, project management and development require 2 different mindsets and very different skill sets. So as a whole software developers tend to be terrible project managers, and project managers usually fail to understand and accommodate the need of large software projects.

So ultimately everything being “Open World” is what is doing this, building and filling a world is hard and expensive. Testing it even more so.

This is where one of the big pushes for AI assisted development comes in. Developers can specify how many tasks or events they want in the world, specify the type of environment and give a list of creature assets and such and let the AI build it out accordingly and generate a map when done for the developers to work from. It’s complicated and cutting back that complexity is going to be key in getting things back on track.
Oh I wasn't referring just to games, I think just general software is way worse than it was like 10 years ago in terms of bloat and stability. But you're definitely right about the major bottlenecks in games. Also the asset quality increasing exponentially is a major bottleneck, having AI being an assistant in model and texture production will be big. I'm not sold on copilot for game development, copilot X will be fine with GPT-4 though. We basically wound up shutting off standard copilot because it interrupted our flow so much it wasn't worth using at the moment. Also it was just horribly wrong, and having to go back and fix its bugs wasn't worth my time. I do still use GPT-4 for game dev currently, but people need to understand you have to KNOW how to program to get it to build complex things. You can say "Make a 2d asteroids game" and it'll probably give you a very basic boilerplate for a game like that. But if you want to build anything that's expansive or deeper systems its up to developers to figure out the specifics and get the AI to write it.

A good example is a milestone I just finished for our architecture game, I need a very specific system that built custom materials, mapped to from older ones to the newer ones. And it need to do it in the editor with whatever FBXs or prefabs I selected (I did it based on currently selected folder). Let me tell you, guiding the AI took a significant amount of time. What it did cut back on was me needing to look for specific libraries in unity docs to figure out how to guide it. Once I knew exactly which part of the unity namespace to use and had some boilerplate, I was almost immediately able to grasp what needed done, and how to refine it. The prompt itself is multiple pages long at this point, and I did a bunch of hand refinement and fed it back into GPT-4 so it knew what the current context was for each script. These systems are very cool, but they don't reason like a human, they're generative. I will say though, removing the writing the boilerplate time, and most of the research time probably cut the development time required in about half.

Sorry if that's a tangent, but I do think our toolsets increasing means things will likely be done better if companies don't use these tools to cut down their staff. If you have lower quality developers, even with AI you're going to get bad or worse software. Understanding all of the context of what it's writing is super important, and unless you've spent thousands of hours writing code you're not going to be capable of adequately reviewing it. I think the same thing will be true of 3d modeling, or level design, world design etc. It'll be a major productivity boost and it's just a matter of time until these products are fully available and good enough to be integrated into everyone's workflow.
 
Oh I wasn't referring just to games, I think just general software is way worse than it was like 10 years ago in terms of bloat and stability. But you're definitely right about the major bottlenecks in games. Also the asset quality increasing exponentially is a major bottleneck, having AI being an assistant in model and texture production will be big. I'm not sold on copilot for game development, copilot X will be fine with GPT-4 though. We basically wound up shutting off standard copilot because it interrupted our flow so much it wasn't worth using at the moment. Also it was just horribly wrong, and having to go back and fix its bugs wasn't worth my time. I do still use GPT-4 for game dev currently, but people need to understand you have to KNOW how to program to get it to build complex things. You can say "Make a 2d asteroids game" and it'll probably give you a very basic boilerplate for a game like that. But if you want to build anything that's expansive or deeper systems its up to developers to figure out the specifics and get the AI to write it.

A good example is a milestone I just finished for our architecture game, I need a very specific system that built custom materials, mapped to from older ones to the newer ones. And it need to do it in the editor with whatever FBXs or prefabs I selected (I did it based on currently selected folder). Let me tell you, guiding the AI took a significant amount of time. What it did cut back on was me needing to look for specific libraries in unity docs to figure out how to guide it. Once I knew exactly which part of the unity namespace to use and had some boilerplate, I was almost immediately able to grasp what needed done, and how to refine it. The prompt itself is multiple pages long at this point, and I did a bunch of hand refinement and fed it back into GPT-4 so it knew what the current context was for each script. These systems are very cool, but they don't reason like a human, they're generative. I will say though, removing the writing the boilerplate time, and most of the research time probably cut the development time required in about half.

Sorry if that's a tangent, but I do think our toolsets increasing means things will likely be done better if companies don't use these tools to cut down their staff. If you have lower quality developers, even with AI you're going to get bad or worse software. Understanding all of the context of what it's writing is super important, and unless you've spent thousands of hours writing code you're not going to be capable of adequately reviewing it. I think the same thing will be true of 3d modeling, or level design, world design etc. It'll be a major productivity boost and it's just a matter of time until these products are fully available and good enough to be integrated into everyone's workflow.
I know a number of guys who will use AI to assist them after they have built something, find bugs they never thought to text for, or figure out why their recursive function is or isn't exciting as they think it should and stuff like that, but I don't know anybody who has seriously said Hey Chat GPT build me this ...
I mean I have asked it to build me a few maps with encounters and scenarios with various quest hooks for some D&D stuff though and most of them are kinda awesome. Way better than the shit I had planned out.
AI has a good place for the very simple monotonous jobs that ultimately bore a human to the point where they can't focus on it and end up fucking it up, requiring you to put 2 or 3 people on it to ensure it actually gets done properly.
 
Last edited:
It's 2023, put more VRAM into the cheap hardware.
I imagine you are just trolling at this point, looking at a render farm hardware, think about the nintendo Swtich 2-3, how is there a world where better quality of compression make no sense and has no value ?

Talk for when this would be relevant, say 2026-2030, if it catch up, yes cheap hardware will have more vram, yes nvme drive will be a bit bigger (maybe, the stagnation have been quite something), gigabit broadband more mainstream around the world, there will probably still a need to fit a game under 300 gig install size and under 48 gig of vram and for much bigger and better looking games than now, i.e. asset will still be heavily compressed.

From what I've read NTC uses tensor cores and AMD hardware doesn't have tensor cores.
If you read a bit more, you will see that the game studio when making the game asset it use tensor cores, not the real time renderer.
This is just Nvidia creating planned obsolescence, because they want to make sure that when new games do make use of more VRAM
A solution that would make viable a 2070 super and 3070-3080 viable for a long time is planned obsolescence (if that what you think this is, that it would become fast enough widespread among AAA games to matter for pre 2021 gpu, I am not so sure it can go that fast....something major kind of become popular among dev 3-4 years before we see it to be common in the wild no)? Making cheap 16 gig vram card would not be ? I am not sure if this make sense.

How making rapid jump of vram generation after generation not one of the best planned obsolescence strategy there is ? Specially if making GPU much more performance every generation is getting really hard to do.

Nvidia isn't showing you how much data is needed to get BC7 with the same image they produced with NTC,
I think they do (page 9) ?, the 48mb BC get close to the quality of the 17 mb NTC, I have the feeling you did not even read the paper ? Which is ok, obviously, but still a bit strange to talk with such authority without taking just a quick look before.
 
Last edited:
It doesn't matter for games, it's why AMD doesn't include the functions on the RDNA and until about 2 years ago it didn't matter for most research either again why they chose to remove it from CDNA. It is however absolutely critical for machine learning, and AI in general, a field AMD wrote off and they have to backtrack on.
I guarantee you that any future AI being done, won't be done on current generation GPU's, and that includes AMD and Nvidia. You could make the claim that since this is for AI and future games may make good use of AI then this might be good for future games, but I really doubt it.
But this is needed it is not just a add more VRAM issue, it's an add more everything, add more VRAM more bandwidth, and more memory channels, consumers are now being hampered by only having 2 channels for system ram, more storage space, and faster storage space. Would I love to see consumer CPUs with 4 memory channels, and a 24GB min for VRAM, with a 64GB 4 channel standard, hell yeah, but that would cut into workstation sales for everybody and they want to protect that so it's not going to happen. It is 100% an artificial segmentation problem and Intel, AMD, and Nvidia don't want to disturb the waters there too much.
Just a reminder that Intel's A770 is a 16GB graphics card for $400, and I'm sure Intel wants in on that workstation market.
I mean why make anything better or even research an idea when we can just throw more hardware at it
This isn't better, but worse. This is tech meant to appease the losers who bought 8GB cards. If the NTC texture looked indistinguishable from the uncompressed image, I'd say there's something there. The question is how much more data is needed for both BC7 and NTC to look indistinguishable from an uncompressed texture? There's a reason Nvidia showed what they showed, because by the time you added enough data to make the textures look good, the difference from BC7 and NTC aren't that big. Just my tin foil hat.
The way things have been going, I give it 2 to 3 more new architectures before we see low end trash AMD cards with 128GB of vram because all these fanboys keep crowing that 64GB aren't enough.
Stop giving Nvidia a pass for putting to little amount of VRAM on their low to mid range cards.
Stop giving developers a pass on a game that looks last gen that runs like shit on the fastest GPU on the planet.
If it's running slow on the fastest GPU's on the planet, then they aren't the fastest GPU's on the planet.
Edit: To clarify, trch like this is cool and cam help enhance an already awesome thing. Imagine being able to download a ultra HD texture pack mod and this tech help make it run on more than just professional grade hardware.
A lot of community made texture pack mods not only enhance performance but also image quality. If someone took the time to implement NTC, they can take the time to optimize their textures.
 
Last edited:
I guarantee you that any future AI being done, won't be done on current generation GPU's, and that includes AMD and Nvidia. You could make the claim that since this is for AI and future games may make good use of AI then this might be good for future games, but I really doubt it.
I haven't said shit about games, it's for developers, who make games but what they use and what we need are very different things, I can't tell at this stage if you are completely misunderstanding things or just trying to troll at this point. The paper makes it pretty clear that the algorithm uses an AI to develop a custom compression algorithm on a texture-by-texture basis, then stores the decompression key within the texture so it can be decompressed by the user, the decompression uses a standard matrix multiplication method that any raster core is more than capable of doing

Just a reminder that Intel's A770 is a 16GB graphics card for $400, and I'm sure Nvidia wants in on that workstation market.
Intel's A770 is not a workstation card, it lacks features. drivers, critical software compatibility, it has memory sure but memory alone is not what makes a workstation or server product. And I am not sure what your Nvidia comment there is, Nvidia makes up like 90% of the workstation GPU market Nvidia wants their Grace CPUs in there for sure and they are going to work on that I guarantee it but I don't know what this is supposed to mean, especially in relation to AMD and Intel limiting their consumer CPU's to 2 memory channels, and how the lack of memory channels in consumer systems is actively bottlenecking the speeds at which data can be fed from an NVME drive into VRAM. But having 4+ memory channels is one of the key selling features of the Xeon-W and Threadripper lineups and if they added those to the consumer chips one of the primary selling features of those very lucrative markets goes away, so they won't do it to protect those margins. If they aren't going to give us more memory channels we need better compression methods and new inline decompression methods to circumvent that limitation that is being placed on us.
 
I guarantee you that any future AI being done, won't be done on current generation GPU's, and that includes AMD and Nvidia. You could make the claim that since this is for AI and future games may make good use of AI then this might be good for future games, but I really doubt it.

What

We have devs using 4090s for ML work
 
We have devs using 4090s for ML work
A6000, RTX 3090 would be a better example here, as there a lot of today AI being done on previous generation GPUs

And if we talk use the result of machine learning on day to day, one very obvious example a RTX 2060 from 5 years can do the latest DLSS 2.x of last week or :


AMD Radeon instinct MI25 from 2017 are apparently really nice option for the price to do Stable Diffusion work.
 
Last edited:
Stop giving Nvidia a pass for putting to little amount of VRAM on their low to mid range cards.

I've done no such thing. Your comment is like me telling you to stop giving AMD a pass for releasing mid tier cards as high end ones and trash drivers.

See how silly that would make me look?
 
I honestly think software development in general has gotten significant worse. Most applications are bloated and slow these days, I'm not really sure what the cause is but in general I think software is in a less stable, bloated state. Hardware has been basically the only thing that's kept us going.
I agree. There have been several times recently where I've had to fetch an older version of some software because the newer versions are a disaster or generally a waste of resources.
 
I honestly think software development in general has gotten significant worse. Most applications are bloated and slow these days, I'm not really sure what the cause is but in general I think software is in a less stable, bloated state. Hardware has been basically the only thing that's kept us going.
Depends on the criteria and the point of comparison, bloats it did happen quite a bit, software surface of interaction did tend to get larger, from a single thread of a one command line entry to a single command line output, it went to complex GUI, talking to many things via the internet getting, talked too by other application, etc...

It could because I had a lot of my "formative" year that created an impression of computer during a very rough time/system in terms of being stable, the transition from many OS/apis/MS-DOS to Win 95-98-Me, with some maybe not so bad win 98 Se2, in general I would quality the system I had before windows Xp service pack has nowhere near close as good as now in terms of stability.

In the last 12 years, a windows of all OS never being rebooted between 2 electric current failure or updates have not been uncommon to me, often seen over 30 days of uptime on a windows work computer used 9 hours a day, it was frequent for people reinstall the OS in shorter window back in the 90s earls 00s days, people having a good image and doing it 2-3-4 time a year was not uncommon where I grew up. I do not remember when the last time I did a format or OS reinstall because of an issue (well on Linux because I do not know what I am doing very much, but not on Windows).

I do not remember the last time I bought a steam game that did not downloaded-installed itself and launched right away on the first try, without having to do anything, back in the days adjusting your type of memory, reinstalling windows, rebooting 2-3 time until your gravis gamepad worked, would it not have existed the expression blue screen would maybe not occur by itself in today world.

I remember more issue from my PS3 not able to launch a game (or with the Wifi) than my computer with game since.
 
Last edited:
I imagine you are just trolling at this point,
No I'm fucking not. You have graphic cards sold today with pretty decent performance capabilities with the same VRAM as my Vega 56 released back in 2017. 8GB of VRAM wasn't even new in 2017, as the R9 290 came with 8GB back in 2013.
looking at a render farm hardware, think about the nintendo Swtich 2-3,
Steam Deck has 16GB of memory while the Switch has 4GB. No excuses, upgrade the ram.
how is there a world where better quality of compression make no sense and has no value ?
Firstly it's an Nvidia product, which means nobody else will be able to use it. I'll be surprised if Nvidia pushes for sharing it, but I'm not holding my breath. More than likely we'll see a BC8 standard that will do most of what Nvidia's NTC does, or AMD will fart something out similar to NTC but you will all hate it because Hardware Unboxed shows that NTC is just slightly better. Also the image Nvidia shows is still a lot worse than the uncompressed image. We need someone who knows how to use BC7 that can actually give their opinion.
If you read a bit more, you will see that the game studio when making the game asset it use tensor cores, not the real time renderer.
I thought tensor cores do the decompression of the textures? They are literally trading compute for VRAM. It took the GPU 1.15 ms to render a 4K image with NTC textures and 0.49 ms for BC7.
I think they do (page 9) ?, the 48mb BC get close to the quality of the 17 mb NTC, I have the feeling you did not even read the paper ? Which is ok, obviously, but still a bit strange to talk with such authority without taking just a quick look before.
I see it, but again it's Nvidia presenting this which will be biased. You really want a 3rd party to show the differences and not Nvidia.
 
Last edited:
neural-net-processor-terminator2.gif
 
I thought tensor cores do the decompression of the textures? They are literally trading compute for VRAM. It took the GPU 1.15 ms to render a 4K image with NTC textures and 0.49 ms for BC7.
In this case the tensor cores are not handling the decompression of the textures only the compression itself. The decompression is handled via matrix multiplication using standard DX12 or Vulkan calls.
Looking into it though it appears that the BC7 does use a bit of dedicated silicon to decompress the textures and that bit was required to meet DX11 compatibility, so I did not know that and it explains why it is very fast.
(article on the topic from 2012: https://www.reedbeta.com/blog/understanding-bcn-texture-compression-formats/)
Looking at what has come out recently though it appears that since DX12 and Vulkan have become the mainstay environments BC7 usage has fallen off dramatically and LZ4, Deflate, and Oodle have taken its place, but LZ4 and Deflate are very CPU heavy and do not play well with direct storage techniques, and Oodle is proprietary and now owned by Epic.
NTC vs BC7 in a different photo and the NTC image looks better in my opinion.
NVIDIA-NTC2-768x405.jpg
 
Depends on the criteria and the point of comparison, bloats it did happen quite a bit, software surface of interaction did tend to get larger, from a single thread of a one command line entry to a single command line output, it went to complex GUI, talking to many things via the internet getting, talked too by other application, etc...

It could because I had a lot of my "formative" year that created an impression of computer during a very rough time/system in terms of being stable, the transition from many OS/apis/MS-DOS to Win 95-98-Me, with some maybe not so bad win 98 Se2, in general I would quality the system I had before windows Xp service pack has nowhere near close as good as now in terms of stability.

In the last 12 years, a windows of all OS never being rebooted between 2 electric current failure or updates have not been uncommon to me, often seen over 30 days of uptime on a windows work computer used 9 hours a day, it was frequent for people reinstall the OS in shorter window back in the 90s earls 00s days, people having a good image and doing it 2-3-4 time a year was not uncommon where I grew up. I do not remember when the last time I did a format or OS reinstall because of an issue (well on Linux because I do not know what I am doing very much, but not on Windows).

I do not remember the last time I bought a steam game that did not downloaded-installed itself and launched right away on the first try, without having to do anything, back in the days adjusting your type of memory, reinstalling windows, rebooting 2-3 time until your gravis gamepad worked, would it not have existed the expression blue screen would maybe not occur by itself in today world.

I remember more issue from my PS3 not able to launch a game (or with the Wifi) than my computer with game since.
On your fourth point, I think that's due to the market shifting into the mainstream. The average user is unwilling to troubleshoot (or incapable) and wants everything to be plug and play. Back in the 90s users were expected to figure things out and most probably didn't have easy internet access to check message boards.
 
4gb not 8.
Meant the 290x which is still a 2013 GPU.
Both of which are virtually no time at all.
To a human not a GPU. That's a potential performance loss.
In this case the tensor cores are not handling the decompression of the textures only the compression itself. The decompression is handled via matrix multiplication using standard DX12 or Vulkan calls.
Looking into it though it appears that the BC7 does use a bit of dedicated silicon to decompress the textures and that bit was required to meet DX11 compatibility, so I did not know that and it explains why it is very fast.
Maybe I'm reading the article wrong but the way they worded it seems like the decompression speed is 1.15 - 1.92 ms by using tensor cores on a 4090.
 
To a human not a GPU. That's a potential performance loss.
Yes 1 ms is a good amount of time, imagine at 100 fps you have less than 10ms to render frames (so 1 ms is often more than 10% of your whole budget) and if you have many hundreds of textures to read it could add up.

In the page 10 of the article you can see that for a specific example it took 1.15ms to 1.92ms depending on the NTC quality level versus BC high 0.49ms (using 8 channel of a 4k texture)

But GPU are so much parallel work and order optimization, that it can be hard to predict the final cost of something like that, like the paper says:
Furthermore, when rendering a complex scene in a fullyfeatured renderer, we expect the cost of our method to be partially hidden by the execution of concurrent work (e.g., ray tracing) thanks to the GPU latency hiding capabilities. The potential for latency hiding depends on various factors, such as hardware architecture, the presence of dedicated matrix-multiplication units that are otherwise under-utilized, cache sizes, and register usage. We leave investigating this for future work.

If that step occurs a lot in concurrence during the RT-denoising occurring in parallel for example, maybe a lot of the cost has no impact, it ends up a bit slower if at all but not necessarily 2-4 times slower in practice (while it could start to show up without RT trying to lock 170 fps). This is a worst case scenario when you were pretty much only doing this and waiting for the result.
 
Last edited:
Meant the 290x which is still a 2013 GPU.

To a human not a GPU. That's a potential performance loss.

Maybe I'm reading the article wrong but the way they worded it seems like the decompression speed is 1.15 - 1.92 ms by using tensor cores on a 4090.
As per section 5.2:
These extensions are not restricted to any shader types, including ray tracing shaders.

They demoed it on the tensor cores, but they also discussed how much of a pain in the ass it was.

In section 6.5.2 they also touch on the increased compute time for the decompression is masked by the latency generated by the actual moving of data between NVME and VRAM or by the other tasks the GPU is undertaking
Although NTC is more expensive than traditional hardware accelerated
texture filtering, our results demonstrate that our method
achieves high performance and is practical for use in real-time rendering.
Furthermore, when rendering a complex scene in a fully featured
renderer, we expect the cost of our method to be partially
hidden by the execution of concurrent work

Then they briefly mention that there may be ways to further mask it with different techniques or future hardware changes that are outside the scope of the paper.

The potential for latency hiding depends on various factors, such as hardware architecture, the presence of dedicated matrix-multiplication units that are otherwise under-utilized, cache sizes, and register usage. We leave investigating this for future work.


But the Conclusion is the interesting bit.

By utilizing matrix multiplication intrinsics available in the off the-
shelf GPUs, we have shown that decompression of our textures
introduces only a modest timing overhead as compared to simple
BCx algorithms (which executes in custom hardware), possibly
making our method practical in disk- and memory-constrained
graphics applications.

Disk constrained here would be a direct shot at the consoles.
Memory constrained can be interpreted as an acknowledgment that their existing hardware RTX 2000 series onwards is going to be memory starved.
 
Meant the 290x which is still a 2013 GPU.
That was 4gb also... and 0.66ms more a frame at 60fps is virtually nothing (16.66ms a frame). I leave it as an exercise for the reader as to just how small that truly is ;).
 
Last edited:
Back
Top