Intel Arrow Lake to remove HyperThreadding?

Marees · Feb 6, 2024

Axman said:
IIRC what will make Zen4c different from Zen4 will be a reduced instruction set, and maybe less cache?

I am not sure I get what you are saying here

Zen 4c is same core as zen 4 but frequency limited for better area efficiency (hence cheaper to produce per core)

Lakados · Feb 6, 2024

Marees said:
I am not sure I get what you are saying here

Zen 4c is same core as zen 4 but frequency limited for better area efficiency (hence cheaper to produce per core)

No it isn’t, it does have less cache and by the very nature of changing from an 8T gate to a 6T gate it can’t be the same architecture regardless of how much AMD likes to say it is. AMD had to remove something to make it 35% smaller.

OutOfPhase · Feb 6, 2024

A lot depends upon one's definition of "architecture".

Vengance_01 · Feb 6, 2024

Lakados said:
No it isn’t, it does have less cache and by the very nature of changing from an 8T gate to a 6T gate it can’t be the same architecture regardless of how much AMD likes to say it is. AMD had to remove something to make it 35% smaller.

You are correct in less cache and lower frequencies but its not like Intel's E-Cores which are based on Atom architecture and its a different architecture

toast0 · Feb 6, 2024

Lakados said:
No it isn’t, it does have less cache and by the very nature of changing from an 8T gate to a 6T gate it can’t be the same architecture regardless of how much AMD likes to say it is. AMD had to remove something to make it 35% smaller.

L1 and L2 cache is the same bytes per core in Zen4 and Zen4c. On the mixed design chips, all cores share one L3. On server skus, all chiplets have the same bytes L3 (not counting x3d), so half the bytes per core.

From what they've said, most of the size difference is from removing buffers needed to reduce the size of clock domains, which is needed for high clock rates. 8T -> 6T is a component of size difference too; I'm guessing there must be some perf difference you can measure from the 6T caches but I've not seen any reporting on it, and I don't have the CPUs to test. There's also a area reduction from removing X3D connection points.

Zarathustra[H] · Feb 6, 2024

Vengance_01 said:
Look at this YT video that talks about this very same topic. Very interesting to say the least

View: https://youtu.be/IiwD8kcjD98?t=499

Interesting stuff!

DukenukemX · Feb 6, 2024

Vengance_01 said:
You are correct in less cache and lower frequencies but its not like Intel's E-Cores which are based on Atom architecture and its a different architecture

Yep, this is the main differences. A benchmark like Cinebench wouldn't see the differences much between Zen4 and Zen4c, because the workload is mostly math and cache doesn't help. Clock speed does, but more cores helps even more in Cinebench. This is why so many people say that Zen4c performs the same, because of the type of benchmarks they run wouldn't show the difference. In games where logic is more important than math, it shows. It's still a smarter method to reduce power consumption, without going so far as to make the equivalent of an Intel Atom.

1_rick · Feb 6, 2024

Axman said:
IIRC what will make Zen4c different from Zen4 will be a reduced instruction set, and maybe less cache?

No--they're the same core, but Zen 4 is made with standard techniques that basically take up a lot of space on the die, whereas the 4c cores are (possibly hand-)optimized to be dense, like old-school design. I'm not sure but I think the density means they generate more heat--there's a lot of dead space in the regular design that isn't in the c cores--per area, so they reduce the max clock speed to keep them cooler.

1_rick · Feb 6, 2024

Here: https://www.semianalysis.com/p/zen-4c-amds-response-to-hyperscale

Scroll down a ways and look at the various pictures. Aside from what Lakados mentioned about 8T vs 6T (transistors? RAM bits? I forget which) they squeezed out all the spare space (look at the "low density" vs "high density" pic and the two showing an 8 core zen 4 CCD vs a 16-core Zen 4C die and notice hwo the cores look very different.

"AMD created Zen 4c by taking the exact same Zen 4 Register-Transfer Level (RTL) description, which describes the logical design of the Zen 4 core IP, and implementing it with a far more compact physical design. The design rules are the same as both are on TSMC N5, yet the area difference is massive. We detail the three key techniques of device Physical Design that enables this."

toast0 · Feb 6, 2024

1_rick said:
8T vs 6T (transistors? RAM bits? I forget which)

Transistors per bit in the L1 cache, I think.

Stugots · Feb 6, 2024

Wasn’t HyperThreading responsible for some of the recent security problems with Intel CPU’s? Wouldn’t surprise me if they ditched it for good.

SMT is still something I work with on IBM Power CPU’s. Some workloads see decent improvements with SMT8, others prefer SMT4 or lower.

Zarathustra[H] · Feb 6, 2024

1_rick said:
No--they're the same core, but Zen 4 is made with standard techniques that basically take up a lot of space on the die, whereas the 4c cores are (possibly hand-)optimized to be dense, like old-school design. I'm not sure but I think the density means they generate more heat--there's a lot of dead space in the regular design that isn't in the c cores--per area, so they reduce the max clock speed to keep them cooler.

I don't think it is more heat. The heat is just more concentrated in a smaller space, and thus more difficult to dissipate fast, resulting in higher temps.

Axman · Feb 6, 2024

Stugots said:
Wasn’t HyperThreading responsible for some of the recent security problems with Intel CPU’s? Wouldn’t surprise me if they ditched it for good.

May have been but most of it was due to the branch prediction. They almost certainly knew those attacks were possible but without them Core 2 and subsequent gens would have been performance-wise on par with Bulldozer and its subsequent gens.

1_rick · Feb 7, 2024

toast0 said:
Transistors per bit in the L1 cache, I think.

After I posted I read the rest of the article I linked and it said an 8T bit is dual-ported, meaning you can read and/or write to a bit twice in one clock cycle. The 6T stuff is technically single-ported, but you can effectively do a read followed by a write in once cycle, so it's more limited. Probably not suitable for things like the register file, therefore--I would imagine the power of fully dual-ported memory is too important.

1_rick · Feb 7, 2024

Zarathustra[H] said:
I don't think it is more heat. The heat is just more concentrated in a smaller space, and thus more difficult to dissipate fast, resulting in higher temps.

Well, maybe "more heat per unit area" in high-density areas, or something like that, then. There was a quote in the article about 4GHz speeds at 100C in the high-density areas. In the example, they showed a bunch of low-density standard cells which approached 50% unused space.

Vengance_01 · Feb 7, 2024

the Zen4c was a reaction to the big/little in SOCs and seeing how efficient it can be. I suspect as AMD moves on to next gen chips you might still see the Zen 4c stuff to create a real big.little config

toast0 · Feb 7, 2024

Vengance_01 said:
the Zen4c was a reaction to the big/little in SOCs and seeing how efficient it can be. I suspect as AMD moves on to next gen chips you might still see the Zen 4c stuff to create a real big.little config

Also a reaction to M1 and seeing what is possible when you set your max clock significantly lower.

Zarathustra[H] · Feb 7, 2024

Vengance_01 said:
the Zen4c was a reaction to the big/little in SOCs and seeing how efficient it can be. I suspect as AMD moves on to next gen chips you might still see the Zen 4c stuff to create a real big.little config

I really tend to think big/little core combos are the future. I suspect D will be doing this as well. The only thing stopping them right now is probably that they didn't have a recent small core design they could pull off the shelf like Intel did with Atom.

I mean they have the old Zacate cores, but those are absolutely ancient at this point. So I wouldn't be surprised if they have people working on true low power cores for future hybrid releases.

I don't think this is an IP issue. Intel is hardly the first to use a big core little core approach. ARM designs have been doing it for years, and we know AMD has an ARM license, so...

LukeTbk · Feb 7, 2024

It seem quite limited the range where it is more efficiant:
https://www.kitguru.net/lifestyle/m...-efficiency-gains-with-new-zen-4c-processors/

Under 13 watt mainly, which could be a big deal for laptop, but the 180mm down to 140mm (and much cheaper to do) die would be enough reason to do it regardless, it seems to focus was more performant 128 core instead of less core less performant cloud cpus than efficacy. Could me M1 reaction, but maybe not, the effort of putting more core on cpu AMD has been doing all along could have lead to this regardless.

Lakados · Feb 7, 2024

toast0 said:
Also a reaction to M1 and seeing what is possible when you set your max clock significantly lower.

2/3'rds of Apple Silicon's M-series performance comes from its extensive use of ML acceleration via fixed-function ASICs designed into the silicon, the Apple Neural Engine has as much to do with how the platform performs as the ARM cores do.

Apple is very clear that they do not consider their Neural Engine to be AI as it is not doing any training work locally, it simply accelerates the learned patterns and algorithms it has worked on in-house and pushes down as updates.

Axman · Feb 7, 2024

AMD isn't reacting to Intel, or anyone else, they've all been pulling in this direction for about a decade. The only thing is that Intel went first with Windows, and it's part of the reason Windows 11 is half-baked and why they're already prepping Windows 12. We saw the same thing with Windows 8 which everyone hated, so Microsoft fucking bailed and threw everything into Windows 10.

Halon · Feb 7, 2024

I don't foresee AMD moving away from SMT in the near to mid-term; their wide architecture benefits from it too much, and they're competing well with their current product stack. Post-Rocket Lake Intel has been adapting existing designs to counter competitor's moves in a reactionary way. A decade from now their position will look like an even bigger mess than it does presently. What I would suggest is that rentable units may comprise the biggest rearchitecting of Intel's chips since the Core 2 Duo, and may even shake their chips out of a lot of technical debt they've maintained.

Questions of whether SMT is dead are kind of silly, given AMD's relative success with it and the continued existence of niche server CPUs where 4+-way SMT implementations are routine (IBM Power, some flavors of ARM, the ghost of SPARC, &c.). It is a potential source of security vulnerabilities, but so is speculative execution, and unless you're in an embedded space that's too big a performance benefit to let go of. As different products show, there's no single magic solution to any of these problems.

Red Falcon · Feb 10, 2024

Stugots said:
Wasn’t HyperThreading responsible for some of the recent security problems with Intel CPU’s? Wouldn’t surprise me if they ditched it for good.

Yes, it was the Foreshadow L1 exploit from the same era as Meltdown and Spectre.
The patches for it were only partial mitigations, and Intel has never truly fixed it in old or newer designs.

Using HT in the datacenter and most server environments is DOA, and simply needs to be disabled at this point in order to be fully secure.
AMD's SMT never had the exploit and can be used in any environment.

Stugots said:
SMT is still something I work with on IBM Power CPU’s. Some workloads see decent improvements with SMT8, others prefer SMT4 or lower.

Exactly, SMT is still used in a lot of ISAs and architectures, just not fully securely on Intel's x86-64 CPUs.
It's probably fine in most home environments but it certainly isn't in the majority of enterprise environments.

blandead · Feb 11, 2024

bigdogchris said:
With their ability to pack in E-Cores and the updates to Windows scheduler to support this, I can see them totally getting away from HT. It was a cool technology, and very helpfull in the early 2000's when it came out. But now days it probably gets in the way more than it helps (while designing chips). I support this.

I remember the first CPU I had with HT (I think 3.06GHz P4 maybe), when it was just 1 core 2 threads. It was a massive increase when doing things as simple as multiple windows open at the same time.

Or they are limited by being two process nodes behind so they came up with this mix match craziness, still kinda cool

xDiVolatilX · Feb 11, 2024

I like hyper threading. I'd rather have more big cores with hyper threading vs more little cores . Little cores only confuse the OS but we have them on 12 13 14 Gen so we gotta deal with them.

Halon · Feb 11, 2024

xDiVolatilX said:
I like hyper threading. I'd rather have more big cores with hyper threading vs more little cores . Little cores only confuse the OS but we have them on 12 13 14 Gen so we gotta deal with them.

Yeah, but each E core occupies around 1/3 the die space of a P core. Cramming eight E cores in as a value add takes less space than three additional P cores would, and the power requirements to feed the little cores are even lower proportionally. It will be interesting to see what 15th gen brings.

Lakados · Feb 11, 2024

xDiVolatilX said:
I like hyper threading. I'd rather have more big cores with hyper threading vs more little cores . Little cores only confuse the OS but we have them on 12 13 14 Gen so we gotta deal with them.

I don’t know, I like the idea of a process scheduler that is content aware. As it stands now with hyper threading a big job would go to core 0 and 1 first being the main core and its thread but Intels work with Xeons and VM’s show there is far more performance to be gained by placing that same workload on Core 0 and 2 and not using the thread.

Intel there has gone a step further in making the scheduler aware of that the CPU resources are doing and smart enough to break incoming instructions down into sub commands. In theory it could take 2 or more seperate instructions split them into a dozen different commands, process them each where it will get done the fastest in one batch then reassemble them on the way out. Very similar to how GPU’s render scenes.

Given the changing nature of computer systems and GPUs, NPUs, CPUs, and FPGA’s letting the hardware decide how to best accelerate the job may be way better long term.

Lakados · Feb 11, 2024

Halon said:
Yeah, but each E core occupies around 1/3 the die space of a P core. Cramming eight E cores in as a value add takes less space than three additional P cores would, and the power requirements to feed the little cores are even lower proportionally. It will be interesting to see what 15th gen brings.

I mean what if cutting out all the silicon that makes HT a thing allowed them to go from 8/8 P cores to just a solid 12 P cores + what ever E cores they toss in there.

Halon · Feb 11, 2024

Lakados said:
I mean what if cutting out all the silicon that makes HT a thing allowed them to go from 8/8 P cores to just a solid 12 P cores + what ever E cores they toss in there.

The silicon enabling SMT on contemporary Intel chips is around 5% of die size - north of trivial, but definitely not enough to toss it aside and suddenly find yourself with enough space to ramp up P-core count. There is a logic behind having some number of big cores for latency-sensitive, demanding applications and then having a gaggle of smaller cores for handling grunt work and contributing to multithreaded jobs by sheer numbers. I can understand pruning away SMT for Intel if it’s going to aggregate the complexity of their rentable unit strategy.

A lot depends on the use case of the chip, too. For a non-workstation laptop, working on improving IPC for the small chips without blowing up power consumption and leaning on dedicated purpose blocks like video encode/decode could make a 2P/4E really tolerable at the low end. For gaming a distribution like 6P/8E could be entirely acceptable, and would manage to skirt the periodically observed impact SMT can have on performance by making as many of the big cores’ resources available without an impact on cache hierarchy. I’d love for someone with inside baseball knowledge of the enterprise to talk about how P/E core distribution works on Xeons that aren’t better-binned, rebadged consumer parts.

Lakados · Feb 11, 2024

Halon said:
I’d love for someone with inside baseball knowledge of the enterprise to talk about how P/E core distribution works on Xeons that aren’t better-binned, rebadged consumer parts.

I don’t have many systems doing heavy work that have the P+E core setup but I have a few hosting VM’s and they do well. The Hypervisor does a really good job moving the work around across the whole lot so if I don’t set any affinities or priorities it does a better job at task scheduling than I can manage manually by a good margin.

I was reading that the new Gen5 scalable Xeons do it way better. A 16/16 will subdivide out into 48 logical vCPU units before you encounter traditional over provisioning issues. But I don’t have any of them and likely won’t until 2025 at the earliest.

pippenainteasy · Feb 11, 2024

Lakados said:
Do you realize that they are not just turning off Hyperthreading but instead replacing it with an entirely different scheduler that will break the incoming instructions down into smaller parts and batch them across the registers in a much more efficient method?

Replacing it with rentable units in 17th gen CPUs, which means Arrow Lake is just going to be its own thing without hyperthreading.

xDiVolatilX · Feb 11, 2024

I only use my personal home PCs for gaming. E cores are whatever aside from what the OS does lol. At least they scale well on some games now and and I'm sure any games in the future that can leverage more cores. For example Spiderman Remastered is incredibly CPU intensive. It uses all 32 threads of my 13900KS, ALL OF THEM lol. That's when I really feel like I got my moneys worth

and my 1% lows are noticably better then when I play on my 12700K rig.
A 20 Pcore CPU with hyper threading would be the end all be all for gaming prolly lol

DanNeely · Feb 12, 2024

Zarathustra[H] said:
Could also be a gradual move towards a new RISC based ISA.

Keep the master x86 decode unit separate, and have RISC-like cores, and when ready, make the move either make the decode unit optional, or just remove it

Intel's been decomposing x86 into RISCy micro-ops for execution since the 90s in order to be able to pipeline them. They could have exposed them as an alternate "core-x86" instruction set anytime in the last 30ish years if they wanted to.

DukenukemX · Feb 12, 2024

Halon said:
Yeah, but each E core occupies around 1/3 the die space of a P core. Cramming eight E cores in as a value add takes less space than three additional P cores would, and the power requirements to feed the little cores are even lower proportionally. It will be interesting to see what 15th gen brings.

The reason why HyperThreading was created was to make efficient use of the Pentium 4's excessive pipelines and make two more efficient virtual cores. If Intel is removing HT, then it's not because of E cores are more efficient in terms of space but because Intel doesn't see a reason to implement a feature that they feel isn't worth the effort to implement. It could also be that Intel maybe pushing HT as a Xeon server exclusive feature. The amount of threads a modern Intel chip has is extremely excessive, so HT is probably not needed.

DanNeely · Feb 12, 2024

DukenukemX said:
The reason why HyperThreading was created was to make efficient use of the Pentium 4's excessive pipelines and make two more efficient virtual cores. If Intel is removing HT, then it's not because of E cores are more efficient in terms of space but because Intel doesn't see a reason to implement a feature that they feel isn't worth the effort to implement. It could also be that Intel maybe pushing HT as a Xeon server exclusive feature. The amount of threads a modern Intel chip has is extremely excessive, so HT is probably not needed.

The potential exploits from HT attacks are more serious in server environments though. I'm wondering if it's about thermals. We can jam so many cores onto a chip that if all are at maximum speed/load the power and temperature levels go through the roof. Running more discrete cores might allow better overall throughput because they don't have to run as fast and hot to keep each thread running as fast because contention means they're not able to do as much per clock cycle.

kac77 · Feb 12, 2024

Zarathustra[H] said:
Interesting.

So is this like one big shared decode unit, that decodes everything and sends the micro-ops to the individual cores?

Sounds like a fixed Bulldozer CMT design that eliminates the latency.

Lakados · Feb 12, 2024

DukenukemX said:
The reason why HyperThreading was created was to make efficient use of the Pentium 4's excessive pipelines and make two more efficient virtual cores. If Intel is removing HT, then it's not because of E cores are more efficient in terms of space but because Intel doesn't see a reason to implement a feature that they feel isn't worth the effort to implement. It could also be that Intel maybe pushing HT as a Xeon server exclusive feature. The amount of threads a modern Intel chip has is extremely excessive, so HT is probably not needed.

Nah HT just no longer does a good job at filling a CPU. If you break down the commands sent a lot of the silicon goes unused or is forced to wait as memory channels are grabbing other bits.
But future Intel CPU’s are likely to contain the Adamantine cache system which gives them a lot of options for information fetching. Pair that with DDR5’s ability to subdivide its channels for micro fetches and Intel just has better options.
It’s less of a removal of Hyperthreading but an evolution of it.

Lakados · Feb 12, 2024

kac77 said:
Sounds like a fixed Bulldozer CMT design that eliminates the latency.

But with the cache and memory speeds to feed it. More akin to how you would send jobs to a GPU.

Intel Arrow Lake to remove HyperThreadding?

2[H]4U

[H]F Junkie

Supreme [H]ardness

Supreme [H]ardness

2[H]4U

Extremely [H]

Supreme [H]ardness

Supreme [H]ardness

Supreme [H]ardness

2[H]4U

Supreme [H]ardness

Extremely [H]

VP of Extreme Liberty

Supreme [H]ardness

Supreme [H]ardness

Supreme [H]ardness

2[H]4U

Extremely [H]

Supreme [H]ardness

[H]F Junkie

VP of Extreme Liberty

Gawd

[H]ard DCOTM December 2023

Limp Gawd

2[H]4U

Gawd

[H]F Junkie

[H]F Junkie

Gawd

[H]F Junkie

[H]ard|Gawd

2[H]4U

Supreme [H]ardness

Supreme [H]ardness

Supreme [H]ardness

2[H]4U

[H]F Junkie

[H]F Junkie