Intel’s Answer to AMD 3D V-Cache: Cache DRAM, 50% Faster & 60% More Efficient than HBM

That graph make you think, maybe they will hit some wall soon in reducing transistor size all the big player will reach a something similar-dimishing result price wise to try to go smaller and it will become all about packaging, in between chiplet communication energy lost and hability to stacking vertically stuff
 
Misleading title. Dangling modifiers infers AMD v-cache is a similar technology to cache DRAM and HBM.

HBM is high bandwidth mostly higher latency than other memories. Say cache DRAM has massive improvements relative to HBM, it still sits farther from the L3 than V-cache, (over fabric) thus will most likely have greater latency than L3. AMD's penalties for regular L3 vs v-cache L3 is ~5 clocks. Unless this is the case for cache DRAM, they should not be directly compared.

Edit: After reading the article again: I can't really tell were the cache DRAM sits. Premature to make any inferences let alone comparisons and projections.
 
Last edited:
Misleading title. Dangling modifiers infers AMD v-cache is a similar technology to cache DRAM and HBM.

HBM is high bandwidth mostly higher latency than other memories. Say cache DRAM has massive improvements relative to HBM, it still sits farther from the L3 than V-cache, (over fabric) thus will most likely have greater latency than L3. AMD's penalties for regular L3 vs v-cache L3 is ~5 clocks. Unless this is the case for cache DRAM, they should not be directly compared.

Edit: After reading the article again: I can't really tell were the cache DRAM sits. Premature to make any inferences let alone comparisons and projections.
The cache DRAM can sit just about anywhere, you could place it around the CPU/GPU like you would HBM, or you can stack it on top of the intended chip like you would TSMC CoWoS stacked cache.
The Samsung DRAM just gives you the ability to do either, but it's fab agnostic, whereas the stacked cache is a TSMC-only process.
I have also heard of some examples given where it can be worked into the interposer itself, but that has some significant $$$$ associated with it.
 
Will this run into some of the same problems that Ryzen did with being able to get the VCache to sustain the same temperatures and frequencies ? AMD's 5800X3D had significant limitations in frequency to keep temps and power low and even the 7950X3D, which is a beast of a chip (initially, stupidly limited by lack of firmware/software optimizations) has a block of cores with extra cache and slightly lower frequencies + a block of cores with full 7950X frequencies? If the lithography process can't manage to stack this DRAM and have it more durable than the TMSC process then I imagine it will end up in the same place? I'm sure that eventually maybe both AMD and Intel will get away from the asymmetric paradigm in this regard, but I guess it depends on the process and its efficiency as well as its performance?
 
Will this run into some of the same problems that Ryzen did with being able to get the VCache to sustain the same temperatures and frequencies ? AMD's 5800X3D had significant limitations in frequency to keep temps and power low and even the 7950X3D, which is a beast of a chip (initially, stupidly limited by lack of firmware/software optimizations) has a block of cores with extra cache and slightly lower frequencies + a block of cores with full 7950X frequencies? If the lithography process can't manage to stack this DRAM and have it more durable than the TMSC process then I imagine it will end up in the same place? I'm sure that eventually maybe both AMD and Intel will get away from the asymmetric paradigm in this regard, but I guess it depends on the process and its efficiency as well as its performance?
That depends, AMD's biggest issue is the copper-based adhesive that TSMC uses in their CoWoS packaging, it applies at a temperature that exists within AMD's operating range, and has a relatively high resistance value.
So AMD needs to keep the voltage in check so the copper doesn't liquify and cause the layer to shift. AMD is still using the first generation of it, TSMC is onto their 4'th generation now but they are having significant issues getting their packaging speed to a place that can sustain AMD's demand with the newer versions of the process.

Intel's new Foveros Direct packaging tackles that with some sort of fluid dynamic dark voodoo magic, which pulls the chips together and pressure bonds them. TSMC's stacked cache leaves around a 30um gap between the stacked chips, Intel has that down to less than 10 now. So it is a very different bonding approach, which is supposedly cheaper and faster to do at scale. And looks to be impressing everybody who has seen it in action, Apple and Nvidia alike.

As a note, Intel has been showing off the tech for a while but using some specific Samsung LPDDR5x modules, namely SKU K3KL3L30CM, which let them get 64 to 128GB on the die.
 
Last edited:
That depends, AMD's biggest issue is the copper-based adhesive that TSMC uses in their CoWoS packaging, it applies at a temperature that exists within AMD's operating range, and has a relatively high resistance value.
So AMD needs to keep the voltage in check so the copper doesn't liquify and cause the layer to shift. AMD is still using the first generation of it, TSMC is onto their 4'th generation now but they are having significant issues getting their packaging speed to a place that can sustain AMD's demand with the newer versions of the process.

Intel's new Foveros Direct packaging tackles that with some sort of fluid dynamic dark voodoo magic, which pulls the chips together and pressure bonds them. TSMC's stacked cache leaves around a 30um gap between the stacked chips, Intel has that down to less than 10 now. So it is a very different bonding approach, which is supposedly cheaper and faster to do at scale. And looks to be impressing everybody who has seen it in action, Apple and Nvidia alike.

As a note, Intel has been showing off the tech for a while but using some specific Samsung LPDDR5x modules, namely SKU K3KL3L30CM, which let them get 64 to 128GB on the die.

Will be interesting to see what it does when they release it, would be what a apu would need to have to really perform.
 
Will be interesting to see what it does when they release it, would be what a apu would need to have to really perform.
We can expect their first offerings with the DDR5 in higher-end laptops and mobile workstations coming this Christmas. So not a terribly long wait.
 
I wonder if the Samsung chip will have some sort of bad firmware that causes it to wear down in a few weeks?
 
I’m more curious to see when the x86 market shifts to putting the RAM on package. Would be cool to have CPUs with like 8-16GB of ram on the package with the ability to expand more RAM through classic means, so you get massive latency and bandwidth gains for that first amount on the SOC, but at least have more in reserve slower for those that need it.
 
I’m more curious to see when the x86 market shifts to putting the RAM on package. Would be cool to have CPUs with like 8-16GB of ram on the package with the ability to expand more RAM through classic means, so you get massive latency and bandwidth gains for that first amount on the SOC, but at least have more in reserve slower for those that need it.
When Dell was showing them off at their last trade show the rep said, "Hopefully November but don't hold me to that". So...
 
Intel’s Answer to AMD 3D V-Cache: Cache DRAM, 50% Faster & 60% More Efficient than HBM

"Till now, boost clock hikes have been Intel’s primary response to the Ryzen V-Cache threat. Together with Samsung, it is finally working on a worthy competitor to the 3D-stacked CPUs from its archrival."
Intel has played with it on and off since the 5'th Gen core series debuted with the 5775C, where they used it as a bridge for the iGPU, the biggest hurdle for every fab that has done it has been ramping up the production speeds to a point where they are commercially viable. Too fast and you get too many bad chips, too slow and the hold up on assembly times makes them too expensive to sell for what they deliver.
TSMC can do it for AMD there because they sell relatively few x3D CPUs, but attempting it at a scale needed for Apple and TSMC just does not have the packaging space available to them to make it worth the expense.

Intel has been using stacked chips in this way since around 2017 when they refreshed their Stratix 10 lineup.
 
Back
Top