Originally posted by xonik
the P4 EE has varying and usually marginal performance gains over a similarly clocked P4 3.2C.
I thought I'd already pointed out that the P4EE has a large L3, not L2, cache. Thus it's a red herring in this conversation, because L3 runs significantly slower than L2.
Looking at Barton vs. Thoroughbred comparisons, you'll see that the effect of a DOUBLED L2 cache lends a scant 2-3% performance gain at identical clocks.
Under what benchmarks? If it's an artificial benchmark, that's a huge "duh". Artificial benchmarks are designed NOT to exercise cache -- i.e., random memory access patterns, which can't be predicted by prefetch algorithms, and wouldn't speed up if you had 8GB of L2 cache. If it's not used again, it doesn't matter if it's in cache.
Show me that same result using application benchmarks.
And just how is the L3 cache much slower than L2 cache when it's on-die? There is very, very little difference in latency between the two cache levels when they are on-die, to the point that they could be effectively combined. Just look at a picture of the die layout and you will see a very small difference in trace length from the cache to the ALU or FPU, leading to very small differences in latency and resultant performance. Your reasoning would have held water when L3 caches were on-chip or even discrete, but now it all falls apart.
Because L3 is the interface with the system RAM. It's not the speed with which it shares data with L2 and the CPU -- it's the speed with which it accesses system RAM. As long as it's the go-between for the CPU and the RAM, it's going to be slower as a matter of function, if not of design.
Finally, all current model Xeons have 512 kB of L2 cache, with varying levels of that forbidden L3 cache.
There are larger-L2 cache chips that can be bought. At least, that used to be the case. I haven't looked in the past year or so.
And I'm not against L3 -- it's a wonderful thing. But L2 is moreso.