AMD Further Unveils Zen Processor Details

Few thoughts on this benchmark now dust has settled.

For one it's telling they are showing multi-core performance, they're using a workstation orientated benchmark. This leads me to think they may not have the clockspeeds we're after, maybe close but just not enough, so they target the multi thread workload market, lower cost multi core option than Intel. This naturally extends to servers if they have TDP in check.

Canned benchmarks sure have their place. But for someone like me who wants to know how well it does encoding video, in a real scenario when I'm working with it, real world tests are the one.

I got Blender up and running and made a few passes on the BMW file. It does load all 16 threads to 100% during a good portion of the render - it tapers off to nearly nothing for the last couple of seconds. Handbrake doesn't usually run more than 98% consistent on each thread with the settings I use.

OK, there's real world. I can run Blender and Handbrake. Is my computer any faster or slower running them than a Zen? Can't say. You'd need a benchmark for that.
 
That chart raises a few questions in relation to Blender.

Is an i3 6100 dual core really as fast as an 8 core FX in a benchmark that supposed to be super multithreaded?

In CB and PovRay this is the outcome.
BROADWELL-E-35.jpg

BROADWELL-E-44.jpg


FX series is far ahead as they should here compared to the i3 6100.
 
I got Blender up and running and made a few passes on the BMW file. It does load all 16 threads to 100% during a good portion of the render - it tapers off to nearly nothing for the last couple of seconds. Handbrake doesn't usually run more than 98% consistent on each thread with the settings I use.

OK, there's real world. I can run Blender and Handbrake. Is my computer any faster or slower running them than a Zen? Can't say. You'd need a benchmark for that.

Fair point, its not always easy to get high utilisation. Usually something like Premiere Pro will show differences between various configurations quite well I find, it's pretty effective.
Running from ramdisk will also help minimise any subsystem issues.
 
That chart raises a few questions in relation to Blender.

Is an i3 6100 dual core really as fast as an 8 core FX in a benchmark that supposed to be super multithreaded?

Perhaps due to workload you hit the core architecture issues with the FX? Remember the bulldozer CMT kerfuffle?
 
That chart raises a few questions in relation to Blender.

Is an i3 6100 dual core really as fast as an 8 core FX in a benchmark that supposed to be super multithreaded?

In FP workloads? Yes.

Blender is a FP heavy benchmark. The Bulldozer architecture was starved by only having one FP unit to service both cores of a BD module, which would result in stalls in FP heavy workloads.

So in this case, what's happening is half the cores are doing next to no work, in which case I'd expect the i3 to win handily.

This is also likely why AMD was keen to show of Blender, since they can "prove" they fixed one of the major problems with their last CPU lineup.
 
  • Like
Reactions: N4CR
like this
Perhaps due to workload you hit the core architecture issues with the FX? Remember the bulldozer CMT kerfuffle?

Yeah, I'm thinking it could be the poor cache performance of Piledriver. This is why the 6700k gets an ALMOST 20% speedup over the 4790k, which is unheard of for most Skylake to Haswell performance improvements.

Skylake's improved cache plus aggressive prefetch is probably the reason here. Those Piledriver cores can't be kept fed, so a Core i3 outperforms it. Since everyone agrees this application loves fast cache, the results are no surprise.

Given that, it's no surprise why AMD wants to highlight their improved cache performance in a problem app like Blender.

I don't think Blender is any more FP-heavy than Cinebench (which performs fine on Piledriver, in Core i5 range like you would expect). They're both 3D rendering engines, and there operations are mixed FP/INT, which is what Piledriver does well. I just think Blender is more cache-hungry.

It still scales to multiple cores, but not if you have an unoptimized cache.
 
Last edited:
  • Like
Reactions: N4CR
like this
In FP workloads? Yes.

Blender is a FP heavy benchmark. The Bulldozer architecture was starved by only having one FP unit to service both cores of a BD module, which would result in stalls in FP heavy workloads.

So in this case, what's happening is half the cores are doing next to no work, in which case I'd expect the i3 to win handily.

This is also likely why AMD was keen to show of Blender, since they can "prove" they fixed one of the major problems with their last CPU lineup.


But if I recall, the FP unit in bulldozer was one 256bit FP unit per module, which could split itself into two 128bit units, one per integer core, if needed.

So I guess this limitation was only a big deal if 256bit floating point operations were needed.

That being said, I don't have a good understanding of how often in typical code 256bit FP is needed, compared to 128bit FP
 
Does it really matter? I mean it could be 1% faster or even 2% slower maybe it is 1.5% slower. It so goes beyond the point AMD tries to make here. The amount is rather trivial

I say if the 40% number is correct Zen should be behind by at least 20% on average if not more. However there will always be outliers.
 
Last edited:
I think everyone is really hoping for the Zen to be stellar so that AMD once again gives Intel a run for their money and, in turn, spurs healthy competition and innovation.

Surely you jest! I love overpaying for shit. :D
 
Have any of you taken a look at Blender CPU render benchmarks ?

*graph*

Some odd results... Gulftown Xeons doing pretty well lol

I'm running a 6 core westmere at 4.2, and the only reason I'm finally looking to upgrade is because of motherboard features, not CPU performance. Def got my money out of that purchase. X58 fo life haha
 
I'm running a 6 core westmere at 4.2, and the only reason I'm finally looking to upgrade is because of motherboard features, not CPU performance. Def got my money out of that purchase. X58 fo life haha

The main reason I upgraded to Haswell from my i7 920 is because I lost my Thermalright Ultra 120 Extreme box containing the spanner for unmounting. Essentially I was afraid to order to the westmere xeon and find myself unable to remove the heatsink from the CPU socket :p

I still have my X58 P6T Deluxe sitting in my case with the TRUE strapped onto it :p Haven't managed to remove it yet, which is a huge pain because I could sell the whole system for a bout 200 euros
 
I'm running a 6 core westmere at 4.2, and the only reason I'm finally looking to upgrade is because of motherboard features, not CPU performance. Def got my money out of that purchase. X58 fo life haha

Same here, the only reason I upgraded was for SATA3 and M.2 support. My Brother in law is loving Fallout 4 on my old 920 System with the GTX780.
 
The main reason I upgraded to Haswell from my i7 920 is because I lost my Thermalright Ultra 120 Extreme box containing the spanner for unmounting. Essentially I was afraid to order to the westmere xeon and find myself unable to remove the heatsink from the CPU socket :p

I still have my X58 P6T Deluxe sitting in my case with the TRUE strapped onto it :p Haven't managed to remove it yet, which is a huge pain because I could sell the whole system for a bout 200 euros

Same here, the only reason I upgraded was for SATA3 and M.2 support. My Brother in law is loving Fallout 4 on my old 920 System with the GTX780.

Well, I'm one gen ahead of you with Sandy-E, but I hear you. I really have no need for more performance.

I think X79 was one of the first generations using RoHS compliant plastic parts on motherboards though, and they hadn't gotten the formula quite right yet, because the plastic pieces on my Asus P9X79 motherboard are crumbling. Every time I remove a RAM stick from a slot, I lose another RAM slot clip, two of my PCIe slots no longer have the rear clip that holds long x16 cards in place. This far everything still works, but I fear it's only a matter of time before something breaks that has an actual impact.

So when I finally upgrade I'm upgrading for three reasons:
  • Less power use
  • More modern features
  • My motherboard is falling apart

Not only will this be the motherboard and CPU I've had longer than any other (by a very wide margin), but when I finally upgrade it, it will be my first major system upgrade done for reasons other than performance :p
 
i just ran Blenchmark on my quad Opteron 61xx ES (overclocked to 3 GHz). 69 seconds was the result. it definitely uses all 48 cores. Its showing up as "AMD Engineering Sample" :) And yes a multisocket setup will make something like an FX8350 seem like a dog in raw compute.

Chances are those chips at the top are running multi socket configurations if that's the case then yes an Opteron 8425 could very well beat an 8350 if it's running multi-socketed. That model I believe runs in sockets 4 and up. Same holds for those Xeons at the top. Those are 2 socket processors.
This just seems like a complete and utter bullshit benchmark, here's a listing of results: CPU benchmarks | BlenchMark
According to this particular benchmark the 6 core 3.47GHz Intel X5690 from five and a half years ago is 20% faster than a 10 core 3GHz Haswell-E, and the fastest AMD is a 6 core K10 from 2009 (Opteron 8425), which, if you go solely by this benchmark that so many here are claiming vindicates AMD, is over 60% faster than an 8 core AMD FX-8350. Does any AMD fan really want to stand up and claim the Opteron 8425 is 60% faster than the FX-8350 or can we all just agree this is a very strange benchmark that obviously cares more about some odd features of certain CPUs and isn't exactly indicative of any sort of real world performance, because even if Zen can stand neck and neck with Haswell-E on this particular benchmark the 5.5 year old X5690 is still beating both of them soundly, and they're only about 30% faster than a 7 year old AMD part.

What we really need is a wide range of benchmarks testing lots of different features and usage scenarios next to this blender benchmark, then we need to throw blender out because it's stupid and go by everything but blender.
 
Yeah, I'm thinking it could be the poor cache performance of Piledriver. This is why the 6700k gets an ALMOST 20% speedup over the 4790k, which is unheard of for most Skylake to Haswell performance improvements.

Skylake's improved cache plus aggressive prefetch is probably the reason here. Those Piledriver cores can't be kept fed, so a Core i3 outperforms it. Since everyone agrees this application loves fast cache, the results are no surprise.

Given that, it's no surprise why AMD wants to highlight their improved cache performance in a problem app like Blender.

I don't think Blender is any more FP-heavy than Cinebench (which performs fine on Piledriver, in Core i5 range like you would expect). They're both 3D rendering engines, and there operations are mixed FP/INT, which is what Piledriver does well. I just think Blender is more cache-hungry.

It still scales to multiple cores, but not if you have an unoptimized cache.

This actually makes a whole lot of sense. Very well put. AMDs cache performance seems to be really top notch on zen from what I have read on anandtech in their second article. They are expecting really good things from zen when it comes to that. I hope if they can get 3.5 base without turbo, this will be a damn good seller.
 
This actually makes a whole lot of sense. Very well put. AMDs cache performance seems to be really top notch on zen from what I have read on anandtech in their second article. They are expecting really good things from zen when it comes to that. I hope if they can get 3.5 base without turbo, this will be a damn good seller.

Agreed if they can do this they can outperform the 6700k in certain games that have more than four threads.

The single-thread performance should be closer to Sandy Bridge, but that shouldn't be an issue for overclocers who want more threads.
 
Agreed if they can do this they can outperform the 6700k in certain games that have more than four threads.

The single-thread performance should be closer to Sandy Bridge, but that shouldn't be an issue for overclocers who want more threads.

Your understanding of threading is wrong.

If no INDIVIDUAL single core is being given more work then it can accomplish, adding additional threads does NOTHING to increase performance. In this case, performance is dominated by single-core performance, not the number of CPU cores.

This more threads = more performance nonsense has got to stop.
 
Your understanding of threading is wrong.

If no INDIVIDUAL single core is being given more work then it can accomplish, adding additional threads does NOTHING to increase performance. In this case, performance is dominated by single-core performance, not the number of CPU cores.

This more threads = more performance nonsense has got to stop.

Agreed.

There ARE cases when more threads can help, a lot, but this is the exception rather than the rule. This is why single threaded performance is still key. If you increase single threaded performance, you increase performance in ALL tasks. If you increase the number of cores/simultaneous threads you increase performance only in tasks that are well threaded.
 
Agreed.

There ARE cases when more threads can help, a lot, but this is the exception rather than the rule. This is why single threaded performance is still key. If you increase single threaded performance, you increase performance in ALL tasks. If you increase the number of cores/simultaneous threads you increase performance only in tasks that are well threaded.


That is the problem isnt it?

No matter if its budget/time/lazy or uneducated programmers. We need MORE threaded applications.

Core count isnt going a way. Single threaded performance is ultimately a dead end isnt it? As is core frequency.
 
That is the problem isnt it?

No matter if its budget/time/lazy or uneducated programmers. We need MORE threaded applications.

Core count isnt going a way. Single threaded performance is ultimately a dead end isnt it? As is core frequency.

See, that is an incorrect understanding as well.

There certainly are cases where programmers have been "lazy" and not properly threaded their code, but there is a misconception in the hardware community that all code can be multithreaded. This simply is not the case. Not even "most" code can be multithreaded. It's not just a matter of trying harder. A lot of types of code just do not lend themselves to multithreading. You either wind up with thread locks and the code just doesn't work at all, or you try to work around that with thread lock timeouts, etc. which results in highly inefficient execution and LOWER performance than the single threaded code was in the first place.

The most threadable are tasks that are highly parallelized. Rendering and encoding are good examples of this type of code. (these often tend to be the types of tasks that run very well on GPU's). By and large most of these apps are already coded to be very well threaded at this point in software development.

There are other ways of cheating around it to, by splitting your application into sub-threads. This technique is often used in game engines. You might have a video rendering subcomponent that is highly parallel and can be multithreaded. However, the sound engine may not be threadable at all, but it is split off from the main engine and runs in its own thread. then you have the main game logic engine which often is not very threadable (depending on the type of game). This may occupy another thread. Physics tasks may be split off as well into a separate subcomponent. These tend to be pretty threadable, etc. etc. if you continue cleverly splitting off subtasks like this, you can take advantage of more cores and offload your main core, making it less likely to become a bottleneck, but it also introduces other efficiency problems which can have negative performance implications in some cases.

I guess the TLDR version is, most code is not a good candidate for multi-threading, and it never will be because of computer science, not because of programmer/developer capability or laziness. The complaint that studios weren't doing their best to thread software was a legitimate one in 2005. In 2016 it simply is no longer the case. Sure, there are still isolated cases of poorly threaded code, but anyone developing code today knows the environment in which their code will be executed, and would be foolish to not make their code as threaded as they can. Most code in 2016 that is not threaded will never be threaded, simply because the types of tasks involved are not thread-able, or at least are not thread-able without other problems that have performance implications of their own which make them not worth threading.

So, we know that not all code can be threaded. The exact percentage of code that can be threaded is difficult to pin down. but let's have a look at amdahls law. It predicts how much of a performance increase you will see from adding more cores, when your code is threaded at x%.

AmdahlsLaw (1).jpg


Per Amdahl's law, if 50% of your code is threaded, you'll never see more than a 2x performance increase, no matter how many cores you add.

If your code is 75% threaded you'll never see more than a 4x performance increase, no matter how many cores you add. And its not efficient either. it requires some 128 cores to get to the 4x level with 75% threaded code.

etc. etc.


TLDR #2 (or the really short version):

It's not 2005 anymore. We aren't suffering from poorly threaded code. In 2016 most of what is thread-able has been threaded by now. We might see small increases over time, but don't expect anything drastic. If I had to predict 20 years into the future, I'd predict that we are not going to be much more threaded than we are today, not because I don't have faith in software developers, but because SCIENCE! (Computer science that is)
 
Last edited:
That may be how it is today but I still have an optimistic view.

I think a lot of it comes down to how applications are designed and using the "old way" and people need be encouraged to think outside the box. Remember that the programmers love to reuse code. This puts them in a box where they do not try to develop more efficient code.

It just comes down to hard work from computer engineers and someone investing lots of money. It probably will take a lot of changes on the software and hardware side till we get to a better place.

It may take a fundamental shift on how applications are written. Just because a sound engine today cannot be multi-threaded does not mean it will never be.

TLDR:
We need more motivation/money pumped into this.
 
That may be how it is today but I still have an optimistic view.

I think a lot of it comes down to how applications are designed and using the "old way" and people need be encouraged to think outside the box. Remember that the programmers love to reuse code. This puts them in a box where they do not try to develop more efficient code.

It just comes down to hard work from computer engineers and someone investing lots of money. It probably will take a lot of changes on the software and hardware side till we get to a better place.

It may take a fundamental shift on how applications are written. Just because a sound engine today cannot be multi-threaded does not mean it will never be.

TLDR:
We need more motivation/money pumped into this.

So you suggest companies to throw money to a problem that doesn't exist.
 
That may be how it is today but I still have an optimistic view.

I think a lot of it comes down to how applications are designed and using the "old way" and people need be encouraged to think outside the box. Remember that the programmers love to reuse code. This puts them in a box where they do not try to develop more efficient code.

It just comes down to hard work from computer engineers and someone investing lots of money. It probably will take a lot of changes on the software and hardware side till we get to a better place.

It may take a fundamental shift on how applications are written. Just because a sound engine today cannot be multi-threaded does not mean it will never be.

TLDR:
We need more motivation/money pumped into this.

There isn't any real alternatives. Mitosis was one of those examples that was attempted. But you would have to trade a massive inefficiency to gain a small performance boost.
 
Remember that the programmers love to reuse code.

Actually, I find the opposite to be the problem.

Coders tend to enjoy tackling new projects from scratch, which often means you need rather strict and knowledgeable management overview to prevent your team from wasting money and time on reinventing the wheel.

If you were to make a car today, you are going to rely heavily on the innovations that came before you. You know, stuff like "how brakes work" suspensiion, etc. etc. Because the reason we have as great cars as we have today is because little by little building blocks were added over many generations of cars to create something that would ahve been impossible to do from scratch right off the bat.

Same thing goes for code. It may be fun to write a new solution to a problem from scratch, but the wiser approach is to direct your team to look at existing code. Don't just copy and paste it, but look at it, learn from it, and improve it where needed. Don't reinvent the wheel unless necessary, because when you do, it takes more time, costs more money and the result is usually worse, because you are discarding the decades of experience that is already in the code you have.

This - of course - assumes you have good coding standards, and the existing code is well commented so you can figure out what the hell is going on. This isn't always the case in the game world.

As far as threading goes, redoing things from scratch is likely not going to help you thread stuff any better in most cases. I mean, game engines are massive, and contain very many lines of code, so there is always going to be something legacy hiding in there somewhere that could be done more efficiently, and hopefully it will be found over time, but throwing it out and starting from scratch is not the solution, because what you build then - without the benefit of existing code - is likely going to be way worse, because it hasn't gone through years of optimization, trial and error and improvement, like the mature code has.

The silver lining here is the trend of separating game engines from games. Most new games today are developed on top of game engines, made by companies that specialize in game engines. This allows the benefits in optimization and threading to be spread over a much larger usage base, and thus be improved and optimized much better over time, than if each studio had to code an engine from scratch, and they have come really far, and are in much better shape today than I would have predicted several years ago.
 
Actually, I find the opposite to be the problem.

The silver lining here is the trend of separating game engines from games. Most new games today are developed on top of game engines, made by companies that specialize in game engines. This allows the benefits in optimization and threading to be spread over a much larger usage base, and thus be improved and optimized much better over time, than if each studio had to code an engine from scratch, and they have come really far, and are in much better shape today than I would have predicted several years ago.

I do agree with you that we have better engines today because they are separated but I do not believe the competition is strong enough. I would like to see more drive to out innovate each other.

As said multi threading does not work well with current practices, sadly we will need to throw out all the code and start over and it will be worse before it gets better.

We live in a world where alpha/beta is acceptable. This works for us and against us. This would be a pro right now as we could throw out all the code and start over since everyone seems to be okay with crappy alpha code. This is a con because it means company's can just introduce short cuts and not give us a quality product and they still get paid.

I think we have gotten off topic from this posting.

I am going to try sum this up.

You think the industry is in a good spot concerning coding performance right now.
I feel the industry is coding down to a price point and could do much better if motivated.

Does this sound about right?
 
There isn't any real alternatives. Mitosis was one of those examples that was attempted. But you would have to trade a massive inefficiency to gain a small performance boost.

I think some people don't quite understand the level of fundamental realities of computing theory you are up against here.


Take a UPS truck full of small packages.


Now say you - instead of one truck - want to use a bunch of smaller cars instead. In most cases this is just fine. The packages are reasonably small, you just split the load up, and load some of it in each of the cars. It takes a little more sorting and organization but it's not a huge deal. Viola. You have transportation multithreading.

Not all loads split up nice and cleanly like that though.

Sometimes you have to transport this:

truck_pulling_big_tube.jpg


There might be a way of sawing it up into tiny pieces and putting it in several cars, but it is never going to be the same again once you try to reassemble it on the other end.

The thing is, just like how there are fundamental basic physics that problems in the physical world are confined to, there are also computing fundamentals. Just like how you can't just instantly become weightless by using an anti-gravity ray (because that would violate many laws of physics) you can't just make non-threadable code threaded by "thinking differently" about it. It violates the fundamentals not of current computing design, but of overall basic computing theory that applies to any potential design.

I know people talk about having the faith that can move mountains, but I don't think any of us actually expect it to work...
 
Last edited:
You think the industry is in a good spot concerning coding performance right now.
I feel the industry is coding down to a price point and could do much better if motivated.

Does this sound about right?

Not really. I feel like all the low hanging fruit and most of the not so low hanging fruit when it comes to threading code has been picked at this point. The threading issue has mostly been solved at this point, but there is still a ton of atrocious code out there.

Most of it because coders tend to have independent streaks, and chafe against working in a structured, defined approach. It's sort of like having an assembly line where everyone wants to do things their own way, rather than following a standardized instruction, and it often results in a mess. Big, serious, business oriented software companies tend to have more structured approaches, but working there is often described by coders as being "soul crushing" because they have to follow standard practices, policies and procedures, strict coding standards including documenting their code instead of just coding, etc. etc.

It becomes more of a bureaucracy and programmers tend to hate that shit, (just like engineers in more physical disciplines used to hate having to document their work, but now we are forced to because of ISO9000 and subsequent standards :p ) and since game studios are often smaller, more creative oriented organizations they don't tend to enforce this type of work which often results in a mess.

This contributes to bad game launches. Another thing that contributes to it is just the massive amount of hardware and software diversity in the field. You could test your code on a million systems, and still find out about an incompatibility you hadn't discovered until you launch.
 
AMD Finds Zen in Microarchitecture

"Zen will trigger a refresh across all the company’s product lines. In 1Q17, it’s scheduled to find a home in a new eight-core desktop-PC processor, code-named Summit Ridge, that is compatible with the existing AM4 socket. Slated to follow this design in 2Q17 is a 32-core server processor, code-named Naples. Last will come new notebook-PC processors in 2H17."
 
AMD Finds Zen in Microarchitecture

"Zen will trigger a refresh across all the company’s product lines. In 1Q17, it’s scheduled to find a home in a new eight-core desktop-PC processor, code-named Summit Ridge, that is compatible with the existing AM4 socket. Slated to follow this design in 2Q17 is a 32-core server processor, code-named Naples. Last will come new notebook-PC processors in 2H17."


Sounds like they are more and more deemphasizing the "limited release in Q416" statement from the last shareholders meeting.

That is a pity. I'm itching to see actual final silicon performance reviews.
 
That is the problem isnt it?

No matter if its budget/time/lazy or uneducated programmers. We need MORE threaded applications.

Core count isnt going a way. Single threaded performance is ultimately a dead end isnt it? As is core frequency.

It all is analogous to Moore's Law: We're reaching the theoretical end of shrink. We have electromigration problems at small scales.

The question now really is: How many instructions can you execute during the swing of a sine wave. And IPC increases are not looking good.
 
I think a lot of it comes down to how applications are designed and using the "old way" and people need be encouraged to think outside the box. Remember that the programmers love to reuse code. This puts them in a box where they do not try to develop more efficient code.

It just comes down to hard work from computer engineers and someone investing lots of money. It probably will take a lot of changes on the software and hardware side till we get to a better place.

It may take a fundamental shift on how applications are written. Just because a sound engine today cannot be multi-threaded does not mean it will never be.

TLDR:
We need more motivation/money pumped into this.

Money is not really the main problem here. It would be if we didn't have the hardware, but we do. We have multi-core CPUs with some shared cache and the X86-64 instruction set.
Coding experiments on that base can and are often "free". A result of research, playing around, computer science.

So where to begin? We have people who develop at the kernel level, particularly on schedulers and drivers. I'd say they are an isolated group which can make existing software run more efficiently. Examples - the affinity optimizer I keep bumping into on here, or the patches for multi-core AMD processors we've seen in the past.

Then you have your game/application developers. As was mentioned, they've been developing a long time now, but we're kind of peaking already.

Another, somewhat isolated group are compiler developers. And this is the group I'd place most of my hope in.
First of all, all gains in this area propagate throughout all layers of a system, they can be retroactive and can hint at potential improvements in hardware.

A trivial example of what a compiler is capable of is taking advantage of large L1 caches by automatically deciding which loops should stay loops (do stuff, increment counter, jump to beginning if not equal...) and which loops should be 'unrolled' into several consecutive blocks of repetitive code.

By that measure, it should be possible to utilize other modern hardware features.
A compiler could be one day smart enough to notice patterns we're unable to see or simply brute force its way towards the best possible binary for a given combination of source code and platform hardware.
 
Back
Top