Will dual core kill PPU sales ?

LordBritish · Apr 26, 2006

I recently purchased a dual core Opteron 165 recently and was thinking that in many games, one of the cores is doing basically nothing. Why can't the idle core be doing physics or whatever else needs to be done ???

Why purchase a PPU when one already has unused computing reserves (almost 50% unused) ?

AMD_Gamer · Apr 26, 2006

this has been answered many times here

here is one good thread to answer your question http://hardforum.com/showthread.php?t=1046008

Jason711 · Apr 27, 2006

it wont.

OAKsider · Apr 30, 2006

VVVVVVVVVVVVVVVI said:
We will have 8 cores with a PPU. According to ATi, their GPU is 37 times faster than a high end CPU at rendering physics.

A general purpose CPU can only run a limited number of instructions at once, because general code is not parallel. A CPUs life is spent trying to run a single thread as quickly as possible through out of order execution.

Graphics and physics code, however, is highly parallel. So, you can take a ton of simple processors and add them together to make a very fast processor capable of running dozens of graphics/physics calculations at once.

This answered it for me. After going through that first page consisting of Asians, Redheads, and Blondes, the PPU's room, fvcking, no fvcking, and other " analogies "..

LuminaryJanitor · Apr 30, 2006

OAKsider said:
This answered it for me.

Too bad it's not actually the reason a PPU outperforms a CPU at physics calculations... It depends on the architecture of the processors, not the quantity.

VVVVVVVVVVVVVVVI · May 1, 2006

LuminaryJanitor said:
Too bad it's not actually the reason a PPU outperforms a CPU at physics calculations... It depends on the architecture of the processors, not the quantity.

That doesn't make any sense, which is probably why you didn't bother to go beyond one line in your 'explanation'.

The reason why graphics and now physics are being separated has to do with the highly parallel nature of graphics and physics. They are automatically parallel, which is why adding more and more pipelines to GPUs increases performance, and why SLI and multi-GPU systems work.

Instruction level parallelism is very limited in x86, which is why adding more execution units doesn't help.

Xipher · May 1, 2006

VVVVVVVVVVVVVVVI said:
Instruction level parallelism is very limited in x86, which is why adding more execution units doesn't help.

Well, if a team developers can conceptually break apart the algorithim they are using, they could add parrellism themselves via threads or some other form of multiple threads of execution (I believe Ada calls them Tasks). This is higher level then hardware instructions though.

VVVVVVVVVVVVVVVI · May 1, 2006

Xipher said:
Well, if a team developers can conceptually break apart the algorithim they are using, they could add parrellism themselves via threads or some other form of multiple threads of execution (I believe Ada calls them Tasks). This is higher level then hardware instructions though.

Sure, TLP is the next big thing (supposidly), and as Intel has shown with Conroe, there are still significant improvements to be made in ILP.

cyks · May 1, 2006

VVVVVVVVVVVVVVVI said:
That doesn't make any sense, which is probably why you didn't bother to go beyond one line in your 'explanation'.

I thought it made fine sense. Your 'explanation' jumped topics, but who am I to sensationalize a post as something it was clearly not meant to be?

VVVVVVVVVVVVVVVI · May 1, 2006

cyks said:
I thought it made fine sense. Your 'explanation' jumped topics, but who am I to sensationalize a post as something it was clearly not meant to be?

He didn't say anything, and neither did you, so how could it make any sense?

Enough, if anyone has anything substantive to add I'll respond, otherwise you trolls have at it and post all the one liners you want.

LuminaryJanitor · May 2, 2006

VVVVVVVVVVVVVVVI said:
So, you can take a ton of simple processors and add them together to make a very fast processor...

The quoted explaination suggests that the higher number of cores is the only reason a PPU is faster than a GPU at performing physics calculations. In other words, if you glued eight CPUs together they'd be equivalent to a PPU in terms of calculating physics, because they can execute the same number of instructions simultaneously. Sure, more cores will help. But it's not a complete explaination of why a PPU outperforms a CPU in this arena.

A GPU performing 37 times faster than a CPU, with the same number of cores, is a perfect example of this. It depends on what's actually in there. How the processor architecture accomplishes this (in this case through superior instruction pipelining) is irrelevant to my point, which is that it's not just the number of cores that's important. The cores themselves are different.

A final note: I wasn't attacking your quoted post. My comment was directed at OAKsider, who seemed to have taken the quote in question as a complete explaination, which it is not, and does not claim to be.

EDIT: One more thing... I don't know where you got "8 cores" from, but Googling it along with PPU gives... this thread. And a pile of pages on four-core CPUs and PS3s.

Sly · May 2, 2006

Likely not. But it will negatively affect the necessity of getting one. The amount of hardbody physics you can do without bottlenecking other components of your PC is actually within the capabilities of the existing CPU.

MrNasty · May 2, 2006

Sly said:
Likely not. But it will negatively affect the necessity of getting one. The amount of hardbody physics you can do without bottlenecking other components of your PC is actually within the capabilities of the existing CPU.

Careful with that - There's only one game which backs that statement up (GRAW), and you have to agree it's not the most optimized piece of software out there

Reality of it is that "how much physics" you can add to a game depends largely on what is implemented and how it's implemented.

A good example of this:

You could make bullet splashes scatter many more non-interactive particles - which negatively impacts the GPU's render load.

Or you could make a car fully destructible and have wheels that fly off and muller bystanders - which increases the CPU's load by altering more gamestates.

Dual or even quad core CPUs are not going to raise the bar substantially, also AGEIA has been pushing the quantity side of their hardware - but that's half the story, and I suspect they are pushing that because it's the easiest to visualize.

With more complex calculations you could have far more realistic physics at a cost of quantity. Havok is, let's face it, horribly unrealistic. When you tread on a can it goes flying away. With more accurate calculations it could easily be modelled to crush or be kicked and dented realistically - think of it as increasing the "resolution" of physics.

CPU's couldn't do that for an appreciable number of objects, quad core or no.

VVVVVVVVVVVVVVVI · May 2, 2006

LuminaryJanitor said:
EDIT: One more thing... I don't know where you got "8 cores" from, but Googling it along with PPU gives... this thread. And a pile of pages on four-core CPUs and PS3s.

8 cores was from the other thread, where my post originated from. Someone asked what would happen to PPUs when we had 8 core processors, implying that a 8 core CPU would be powerful enough to replace a PPU. I was just saying we would have 8 cores and a PPU, since 8 CPUs would not replace a PPU.

Other than that I think we are on the same page.

Sly · May 2, 2006

MrNasty said:
Careful with that - There's only one game which backs that statement up (GRAW), and you have to agree it's not the most optimized piece of software out there

Reality of it is that "how much physics" you can add to a game depends largely on what is implemented and how it's implemented.

A good example of this:

You could make bullet splashes scatter many more non-interactive particles - which negatively impacts the GPU's render load.

Or you could make a car fully destructible and have wheels that fly off and muller bystanders - which increases the CPU's load by altering more gamestates.

Dual or even quad core CPUs are not going to raise the bar substantially, also AGEIA has been pushing the quantity side of their hardware - but that's half the story, and I suspect they are pushing that because it's the easiest to visualize.

With more complex calculations you could have far more realistic physics at a cost of quantity. Havok is, let's face it, horribly unrealistic. When you tread on a can it goes flying away. With more accurate calculations it could easily be modelled to crush or be kicked and dented realistically - think of it as increasing the "resolution" of physics.

CPU's couldn't do that for an appreciable number of objects, quad core or no.

Sure but in order to show the things you described, you'll need more complex 3D models. A standard can cannot be dented, it's a cylinder made up of elongated polygons. In order to make it deformable, you'll have to quadruple the polygon count. Same goes for the car, you'll have to make use of double sided polygons at the very least. Also, breaking it apart also requires more graphic complexity in order to make a mesh more 'crushable'. If you increase the polygon count, you'll also likely increase the texture count as well, and modern games need appreciable graphics to sell requiring up to 4 layers per surface (Texture, Normal, Bump, Specular).

Making clothes realistically wave in the wind using physics also requires more divisions in the object in order to actually make waving worth it. While a cape can still wave realistically with minimal polygons, the number of physics handles actually needed for something like that is also lower, well within the capabilities of a CPU.

If you say, tone down the graphics, then you've just defeated the purpose of a GPU. You would be sacrificing what is probably the most expensive component on your PC just to showoff a PPU. That would be like choosing between a 7900GT and a PPU+5600 and picking the latter.

If you make a game where the full power of the PPU can be realized, you'll also be bottlenecking the rest of the system. Not only bottle necking, but actually slow it down.

Also it isn't just GRAW. While Cell factor is indeed still in beta, it's already exhibiting the same issues.

MrNasty · May 2, 2006

Sly said:
Sure but in order to show the things you described, you'll need more complex 3D models. A standard can cannot be dented, it's a cylinder made up of elongated polygons. In order to make it deformable, you'll have to quadruple the polygon count...

...While a cape can still wave realistically with minimal polygons, the number of physics handles actually needed for something like that is also lower, well within the capabilities of a CPU...

...If you say, tone down the graphics, then you've just defeated the purpose of a GPU... That would be like choosing between a 7900GT and a PPU+MX5600 and picking the latter...

Also it isn't just GRAW. While Cell factor is indeed still in beta, it's already exhibiting the same issues.

Yeah, I remember this argument from another thread about deformable terrain. Let me clarify something: It doesn't necessarily take a quadrupling of the number of poly's to increase the realism with which something interacts - if you wanted to damage it maybe, but that depends on the extent to which you wish to damage it and the level of complexity you want to result from that damage - so it's down to the games coder.

Sure if you want to do truly accurate denting of a can you'll need more, but not 4 times more. A car maybe more - how much? Depends on implementation and design, but it doesn't take much more than a small increase in rigid bodies to stop a CPU. Go and play with Gary's mod.

Now, are you arguing that there's no point in physics modelling if you don't use absolute highest detail meshes? Be careful, that's not entirely true - just because the end effect isn't the same doesn't mean that skimping on the intermediary results in the same end effect.

But how about the idea of actually leveraging useful capabilities of GPUs such as normals and bump maps to account for minor interactions? This combined with a modest increase in polys could result in amazingly interactive scenery.

It wouldn't be like hamstringing your graphics card. You're equating modelling accuracy with number and complexity of the mesh - two different things. To put what you've said another way you could have a game made up of entirely flat surfaces of 2 polys each and chuck in all the pixel shading, bump mapping, parralax depth maps, specular lights and soft shadows you want hence making the most of your card - but it still won't be as immersive as an accurately meshed out game with more of polys flying around, it's time people accepted that graphics only goes so far to improving a game's experience.

Interactive cloth rendering is far more complex than you've made it out to be (at the simplest model level it would be akin to modelling 1 interactive rigid body per segment - ideally each poly) - and yes it can be done with relatively few poly's, but that only proves my first point.

DX9 promised Cinematic Quality Visuals. We didn't get them. D3D10 is around the corner and DX9's potential hasn't even been scratched yet - what does that tell you about the importance of the feature set within DX9 wrt realism? Realistic physics does far more for it than that - I'd rather have more interactive physics than HDR, just personal.

If anything D3D10 should improve the situation by lowering the object overheads - allowing far more efficient processing of objects and making the most of the capabilities of the PPU.

As for CellFactor being "affected like GRAW" - I haven't played it yet. Have you? The only thing I've heard about it being slow is that the SLI system powering the demo struggled to keep up with it. This from a beta - not surprising, and you haven't mentioned how the sheer number of objects being thrown around at any one time wouldn't be possible without a PPU, as per the original multicore CPU kills PPU postulate at the start of the thread.

placeboFx · May 4, 2006

Do proprietary supercomputer makers like Cray, Sun, IBM etc. build their cpus to run paralell algorithms for clusters that do massive physics calculations? What I don't understand about all this is that the cpu is used day in and day out to process exactly these types of functions so why can't adaptations in game code utilize the extra horsepower (core) for physics? If not for physics then to what end will it benefit us if we need to purchase cards to offload calculations from the pc?

illgiveumorality · May 4, 2006

Every time something like this comes out, I wonder what the fuck are CPU's are for.

placeboFx · May 4, 2006

"What are cpus even used for?"

It would seem that they translate code into 'new code' which will be distributed to cards which will actually do the calculations. Seems like another link in the chain. I just don't like the idea that:
A. There is another possible driver issue.
B. Competition which will feed the driver issues (along with many !!!!!! flames, though I shouldn't make comments like that while semi-flaming)
C. More money for something that newer cpu's and gpu's (which did the work before) should be able to tackle the more complex algorithms.

VVVVVVVVVVVVVVVI · May 4, 2006

placeboFx said:
Do proprietary supercomputer makers like Cray, Sun, IBM etc. build their cpus to run paralell algorithms for clusters that do massive physics calculations? What I don't understand about all this is that the cpu is used day in and day out to process exactly these types of functions so why can't adaptations in game code utilize the extra horsepower (core) for physics? If not for physics then to what end will it benefit us if we need to purchase cards to offload calculations from the pc?

Just to break it down in an overly simple manner, lets say there are two types of code.

One, general purpose code. This code handles AI, drivers, all of the hundreds of things a computer is doing at any given time. The problem with this code is something called dependency.

Let's say you have a thread of instructions that runs in this order: A, B, C, D.

In order for B to run, A has to finish, because B is dependant on the output of A. Once B is finished, C can fetch the output of B and run. Not every instruction is dependent on the previous instruction, but most of the time it is.

So, having large numbers of execution units trying to run this type of code is pointless, because they can't do anything. They will just sit there, idle.

However, it is possible to run more than one instruction in a thread at a time, this is called Instruction Level Parallelism. Unfortunately, it is very limited. A huge portion of a Athlon64 or P4 is dedicated to making ILP happen, called Out Of Order Execution (OOOE). Modern x86 CPUs are designed aroung the idea of running one thread as quickly as possible. They only have a handful of execution units, because adding more would be a waste of transistors.

The other type of code, lets call it parallel code. This is for things like graphics and physics.

This code is just math functions repeated over and over, crunching polygons and pixels and etc... They are NOT dependant on each other.

So, we have a line of graphics code: A, B, C, D, E, F, G, H

If we have one execution unit, it will take eight cycles for that line of code to be executed. However, since they are not dependant on each other, you can just add more execution units and run that code faster. This is called a vector processor.

So, if I had TWO execution units, I could run that code in four cycles.
If I had EIGHT execution units, I could run it in one cycle.

This is why adding pipelines to a GPU makes it faster.

Now, there are other ways of running general purpose code than just a really good OOOE processor. Some of the newer non-x86 processors are using TLP, or Thread Level Parallelism in combination with a large number of simple In Order Processors. This is good for things like servers, in which the code is naturally multi-threaded (think dozens of different requests for a webpage).

We all know that even the OOOE x86 world is going multi-threaded, but there are limits to this as well.

Now, even a P4 or a Atlhon64 has a vector processor built in (SIMD... SSEx). However, there are real estate limitations. It takes alot of transistors to make a PPU, so just adding more vector processors to a CPU in the near term just to run parallel code is not going to happen. CPU design is a compromise.

placeboFx · May 4, 2006

Well put.

Is that in the PPU sticky? It should be. It's the smack down response to the dual core vs. ppu threads.

eastvillager · May 5, 2006

Wouldn't there have to be a significant number of PPU sales before you have to worry about anything killing them?

Will dual core kill PPU sales ?

2[H]4U

Fully [H]

Supreme [H]ardness

n00b

2[H]4U

Supreme [H]ardness

2[H]4U

Supreme [H]ardness

[H]ard|Gawd

Supreme [H]ardness

2[H]4U

Supreme [H]ardness

Limp Gawd

Supreme [H]ardness

Supreme [H]ardness

Limp Gawd

Limp Gawd

2[H]4U

Limp Gawd

Supreme [H]ardness

Limp Gawd

[H]ard|Gawd