AMD™ Ryzen© Blender® Benchmark Scores™©® Thread

Got any numbers from it running on 2.78.4?

There was one in anand forum, lemme see if I can find it.

edit. Here's the post about it. But I'm confused it is The STilts build, is there other 2.78.4 builds?

https://forums.anandtech.com/thread...-run-150-samples.2494600/page-5#post-38632293

WjG3n3v.png
 
why the hell are you now talking about 256bit and different versions now? its got nothing to do with the original test and what we were supposed to compare. or did amd release new tests and Su issue another challenge to compare 256bit blender that I missed? or are you just going way off base for no apparent reason?
 
why the hell are you now talking about 256bit and different versions now? its got nothing to do with the original test and what we were supposed to compare. or did amd release new tests and Su issue another challenge to compare 256bit blender that I missed? or are you just going way off base for no apparent reason?

AMD cant do 1 cycle 256bit and they selected a low IPC, high SMT scaling for a good reason as more and more leaked benchmarks have shown ;)
 
rriiiight...
so looking for ways to shit on it then...

Stock 6700K using Blender with 256bit AVX2. 100 samples. 27.17seconds!
did amd make any claim about that? not that I saw. in the new horizon video Su doesn't say shit about IPC, she does say blender "scales well with cores and threads"...
 
Its quite documented that Zen cant do single cycle 256bit ops if that's what you mean. Dredsenboy etc already covered that ages. And the original source of the information is AMD.
 
so?! wtf does that have to do with the claim and this comparison? nothing that I can see...
youre redirecting/deflecting whatever you want to call it, taking away from the fact that it performed as stated/claimed and shown. 256bit was never demoed and there was no challenge to test it either. you are TRYING to find problems so you can continue your crusade against amd.
single cycle 256bit ops
that's really only used for HPC though isn't it?
 
Not at all. Its a lot more used than people think. If it can benefit from SSE it can benefit from AVX. You also got compilers with autovectorization including MS.
https://msdn.microsoft.com/en-us/library/hh872235.aspx
so what are some real world applications then and how does this really affect the average user? why do you really care so much? tell us why we should, in normal laymen's terms.
beside, all the remours ive seen say it supports 256 and 512bit avx anyways. we haven't seen those demoed so I don't get what you are trying to prove without that info. reguardless, as I said before, it wasn't demoed nor offered as a challenge.
 
so what are some real world applications then and how does this really affect the average user? why do you really care so much? tell us why we should, in normal laymen's terms.
beside, all the remours ive seen say it supports 256 and 512bit avx anyways. we haven't seen those demoed so I don't get what you are trying to prove without that info. reguardless, as I said before, it wasn't demoed nor offered as a challenge.

Its already explained in the previous post.

512bit AVX is only Xeon Phi and Skylake-EP.

256bit AVX is supported, but it takes 2 cycles. Haswell, Skylake, Kaby Lake and Xeon Phi does it in 1 cycle.

AMDs own slides also confirms it.
Zenuarch.png

FloatingPoint-Zen.png


One of the issues with single cycle execution is it requires a lot of bandwidth in the caches. And that uses power.
sandra-bw.gif
 
Last edited:
Just ran it in Mac OS Sierra on my Mac Pro. I'm rather surprised that Westmere holds its own so well in these sorts of tests.

Xeon X5680 3.33Ghz: 51.11 sec
 
Its already explained in the previous post.

512bit AVX is only Xeon Phi and Skylake-EP.

256bit AVX is supported, but it takes 2 cycles. Haswell, Skylake, Kaby Lake and Xeon Phi does it in 1 cycle.

AMDs own slides also confirms it.

One of the issues with single cycle execution is it requires a lot of bandwidth in the caches. And that uses power.
I haven't seen anything but arguing and honestly skipped a bunch. I just want a simple answer not "go look for it". I don't see why I should care about all this, so tell me why we should care, in easy simple terms. cause those slide don't really mean shit to me and I don't see this two cycles part youre hung up on. so explain it in simple terms, why should we care? does it improve encoding, make games faster? or is it only for like med research, f@h and shit that normal people don't use? instead of tech-babble make it clear and simple.

honestly to me it just seems like you are grasping at anything and everything that will take away from the good and make it look bad.
 
I haven't seen anything but arguing and honestly skipped a bunch. I just want a simple answer not "go look for it". I don't see why I should care about all this, so tell me why we should care, in easy simple terms. cause those slide don't really mean shit to me and I don't see this two cycles part youre hung up on. so explain it in simple terms, why should we care? does it improve encoding, make games faster? or is it only for like med research, f@h and shit that normal people don't use? instead of tech-babble make it clear and simple.

honestly to me it just seems like you are grasping at anything and everything that will take away from the good and make it look bad.

It improves the speed of any application using it including games. And its becoming more and more used. It became part of Windows and DirectX SDKs with Windows 8 for example and used for store apps and so on.
https://msdn.microsoft.com/en-us/library/windows/desktop/hh437833(v=vs.85).aspx
https://blogs.msdn.microsoft.com/chuckw/2015/06/03/directxmath-avx2/
 
Last edited:
the only uses ive found for avx have been research, medical f@h and stuff like that. and the only game related is in PhysX. so not much normal use. it was the same in 2010, 2014 and earlier this year(threads I found while looking) avx/avx2 are available but not being used much and the average person shouldn't really care. the only game that I know of that has avx is grid 2 and it makes ZERO difference there. so still not seeing why I/we should care...

edit: there was already this avx discussion on [H] back in 2010 and 2014, minimal uses found. but yes IF avx/2 are used back then amd did not perform quite as good as intel. but that makes sense, its an intel tech. just like gameworks is nv and does not work well on amd.
 
the only uses ive found for avx have been research, medical f@h and stuff like that. and the only game related is in PhysX. so not much normal use. it was the same in 2010, 2014 and earlier this year(threads I found while looking) avx/avx2 are available but not being used much and the average person shouldn't really care. the only game that I know of that has avx is grid 2 and it makes ZERO difference there. so still not seeing why I/we should care...

edit: there was already this avx discussion on [H] back in 2010 and 2014, minimal uses found. but yes IF avx/2 are used back then amd did not perform quite as good as intel. but that makes sense, its an intel tech. just like gameworks is nv and does not work well on amd.

Whats next, x87 to SSE didn't do anything either? I get it, you dont like AVX and I know why. But that doesn't mean its not commonly used. Its not like applications and games comes with a checkbox list if they support SSE, AVX etc or not for the consumer to see easily. Vectorization is commonly used.

http://forums.steampowered.com/forums/showthread.php?t=2321925
 
I actually wonder why anyone would use the CPU for rendering in Blender...
gpu.jpg

^^Black Desert Online was running on background, although minimised.
 
the only uses ive found for avx have been research, medical f@h and stuff like that. and the only game related is in PhysX. so not much normal use. it was the same in 2010, 2014 and earlier this year(threads I found while looking) avx/avx2 are available but not being used much and the average person shouldn't really care. the only game that I know of that has avx is grid 2 and it makes ZERO difference there. so still not seeing why I/we should care...

edit: there was already this avx discussion on [H] back in 2010 and 2014, minimal uses found. but yes IF avx/2 are used back then amd did not perform quite as good as intel. but that makes sense, its an intel tech. just like gameworks is nv and does not work well on amd.

lol, AMD's Excavator supports AVX2 and so does Zen. AMD just has a weaker implementation of it, but there's nothing fundamentally stopping AMD from executing those instructions more quickly if it invests to build the hardware for it.
 
Whats next, x87 to SSE didn't do anything either? I get it, you dont like AVX and I know why. But that doesn't mean its not commonly used. Its not like applications and games comes with a checkbox list if they support SSE, AVX etc or not for the consumer to see easily. Vectorization is commonly used.

http://forums.steampowered.com/forums/showthread.php?t=2321925
see now you bring up something completely different to try and continue.
I don't dislike avx, I really have no idea where its being used and why you think it is such a big deal. its barely used in the normal world. the only game related ive seen is PhysX and grid 2. I don't know a single program that uses it and as ER470 just said, amd supports it, its just not as fast.

I actually wonder why anyone would use the CPU for rendering in Blender...
snipped pic
^^Black Desert Online was running on background, although minimised.
was that on gpu?
 
was that on gpu?
Yes. By the way, it's about 11 times faster on my system than using the i7-3770K @ 4,4 GHz. So, even with 1000 samples it's faster than my CPU with 150 samples (quality difference would be pretty huge).
 
Last edited:
I actually wonder why anyone would use the CPU for rendering in Blender...
.

You don't say what GPU you were using, I ran it with SLI Titan X Maxwells @ 1500 and it took roughly twice as long as it did with my CPU - but I admit I don't know much about Blender and might not have had the correct settings applied.
 
You don't say what GPU you were using, I ran it with SLI Titan X Maxwells @ 1500 and it took roughly twice as long as it did with my CPU - but I admit I don't know much about Blender and might not have had the correct settings applied.
My rig is in my sig and I always refer to it unless I say otherwise. For GPUs you need to increase the tile size, a lot.
 
My rig is in my sig and I always refer to it unless I say otherwise. For GPUs you need to increase the tile size, a lot.

See, I was trying to run the file without altering any parameters. And sorry about missing your rig.
 
see now you bring up something completely different to try and continue.
I don't dislike avx, I really have no idea where its being used and why you think it is such a big deal. its barely used in the normal world. the only game related ive seen is PhysX and grid 2. I don't know a single program that uses it and as ER470 just said, amd supports it, its just not as fast.

was that on gpu?
X86 instructions have hit a wall for IPC (Instructions per clock), MMX double the speed of integer type instructions and AVX (128bit) once again about double the speed by making them wider as in more data in the extended instruction. AVX 2 (256 bit, Intel supported since Haswell, AMD Excavator and Zen) once again double amount of data/instruction in the cpu which for very limited type of calculations (ones that are done over and over again like in a rendering program and can stay inside of the cpu caches etc.) AVX 512 (Intel Phi and Skylake E) again is a doubling the width of data and potential of doubling the speed of those calculations.

Good luck in finding many programs that actually use AVX2 besides Blender. How Zen performs using them is to be seen, since it will reside in the cpu it could be virtually as fast as Intel's implementation - we just need some benchmarks.
 
See, I was trying to run the file without altering any parameters. And sorry about missing your rig.
Since you have to change the settings anyway to be able to use GPU rendering it's better to do it properly. Tile settings that are optimal for CPU are not optimal for GPU, far from it.
 
My AMD FX8320 @4600 (DDR3 2100) got 150 samples in 2min 07 sec.
Probably the results could be better. I switched off all options like cool and quiet, C6 state and etc. in BIOS.
I would like to mention that the thermal throttling was on a couple times and set the cpu to 1.5 GHz (cpu-z was on) when the temperature was over 55C.
I am pretty sure that the sample of AMD FX@4600 which shown 2min 35 sec in this test had a very bad cooling system and throttled too much.
I had the same result at my first try, but I opened the case and put an extra side fan 280 mm. The occurrence of thermal throttling decreased. I have a tower cooler V8.
In Fritz chess benchmark FX8320 @4700 I got 14000. As one person mention it is quite possible that the Ryzen which has gpu core built in cpu somehow utilizes resources of gpu to help cpu in this test.
Anyway we will wait for a new tests of Ryzen in the benchmarks, games and applications.
 

Attachments

  • Blender2.07.jpg
    Blender2.07.jpg
    101 KB · Views: 90
As one person mention it is quite possible that the Ryzen which has gpu core built in cpu somehow utilizes resources of gpu to help cpu in this test.
Anyway we will wait for a new tests of Ryzen in the benchmarks, games and applications.
It's not possible since Summit Ridge doesn't have GPU. Secondly Friz chess benchmark doesn't support GPU compute (nor does it support HSA). GPU computing is not magic that suddenly works.
 
My AMD FX8320 @4600 (DDR3 2100) got 150 samples in 2min 07 sec. Probably the results could be better. I switched off all options like cool and quiet, C6 state and etc. in BIOS. I would like to mention that the thermal throttling was on a couple times and set the cpu to 1.5 GHz (cpu-z was on) when the temperature was over 55C.
I am pretty sure that the sample of AMD FX@4600 which shown 2min 35 sec in this test had a very bad cooling system and throttled too much.
I had the same result at my first try, but I opened the case and put an extra side fan 280 mm. The occurrence of thermal throttling decreased. I have a tower cooler V8.
As Shintai said, the official Blender build is, if The Stilt's builds are credible, extremely inefficiently compiled for Piledriver.

Link (OCN)

Link (Anandtech)
 
Last edited:
see now you bring up something completely different to try and continue.
I don't dislike avx, I really have no idea where its being used and why you think it is such a big deal. its barely used in the normal world. the only game related ive seen is PhysX and grid 2. I don't know a single program that uses it and as ER470 just said, amd supports it, its just not as fast.

was that on gpu?


Any software that benefits from SSE benefits from AVX, AVX is a natural extension of increasing 3d vectorizing in pipeline by going wider, that is all it is. AVX doubles the amount of register space available to do operations. Yeah they do have new primitives and data manipulation techniques but those won't come into play for older software.

Performance will never be a perfect increase of 2 fold because of this (older programs) but old programs still gets some benefit from it.
 
Any software that benefits from SSE benefits from AVX, AVX is a natural extension of increasing 3d vectorizing in pipeline by going wider, that is all it is. AVX doubles the amount of register space available to do operations. Yeah they do have new primitives and data manipulation techniques but those won't come into play for older software.

Performance will never be a perfect increase of 2 fold because of this (older programs) but old programs still gets some benefit from it.
Where is the point at which it becomes more efficient to use a GPU to do vector processing rather than the CPU?

agner said:
Current processors with the AVX2 instruction set have 16 vector registers of 256 bits each. The forthcoming AVX-512 instruction set gives us 32 vector registers of 512 bits each, and we can expect future extensions to 1024 or 2048-bit vectors. But these increases in vector size are subject to diminishing returns. Few calculation tasks have enough inherent parallelism to take full advantage of the bigger vector registers. The 512-bit vector registers are connected with a set of mask registers, which have a size limitation of 64 bits. A 2048-bit vector register will be able to hold 64 single-precision floating point numbers of 32 bits each. We can assume that Intel have no plans of making vector registers bigger than 2048 bits because they would not fit the 64-bit mask registers.
 
Where is the point at which it becomes more efficient to use a GPU to do vector processing rather than the CPU?


Depends on what you are doing, GPU's are great for doing certain things, CPU's others. Vector processing is not solely used for parallel processing, actually its quite the contrary, But they can be combined in some circumstances.

Think of it this way, vector processing, can work on little bits and pieces all over the place and them combine everything all in one shot. Parallel processing can work on each set of problems separately and no need to combine at the end.

GPU's in the past prior to scalar, were vector processors too, but as for compute needs increase, scalar is a better way to go as you get better parallel processing, which is ideal for compute and graphics, but doing rudimentary things that a CPU does or heavy calculations that need data from many different things, vector processing is much more useful specially since the way software that runs on today's CPU's are written.

This is why you will also get diminishing returns on older software, because writing for a certain amount of register space, there is only so much "packing" a CPU can do with such a program, even though it might be able to pack more in, the software might not allow that because the programmer needs certain things to be done in a certain way.

Now when you have vastly different architectures, like BD and Intel's crop, where one has AVX and the other one (BD) even though it has it, is badly bottlenecked else where, really only way to write the software, go Intel's route the get the most performance, because going BD's route, would just hurt the software with no real return since the developer has to still support older architectures, they can't go wider on the cores. We don't know hat Ryzen has to offer in this realm comparative to Intel yet so its up in the air. But in general if AMD doesn't have AVX 512, expect this to be an advantage for Intel, and not a small one, cause AVX 256 is and was adopted well by developers, more notably in HPC, but it will trickle down.
 
Last edited:
Now when you have vastly different architectures, like BD and Intel's crop, where one has AVX and the other one (BD) even though it has it, is badly bottlenecked else where, really only way to write the software, go Intel's route the get the most performance, because going BD's route, would just hurt the software with no real return since the developer has to still support older architectures, they can't go wider on the cores.
Strange, then, how the Lynnfield Blender results don't show gain from The Stilt's custom builds but more recent Intel CPUs and FX do. The latter in particular shows a really big gain. I think the biggest gain came from using Intel's own compiler.
 
Strange, then, how the Lynnfield Blender results don't show gain from The Stilt's custom builds but more recent Intel CPUs and FX do. The latter in particular shows a really big gain. I think the biggest gain came from using Intel's own compiler.


As I stated before it could be because of different extensions, primitives, and data manipulation techniques in AVX vs SSE.
 
As I stated before it could be because of different extensions, primitives, and data manipulation techniques in AVX vs SSE.
Yeah, it looks like the Blender devs haven't managed to leverage instruction set that came out with Sandy and Bulldozer.
 
It's not possible since Summit Ridge doesn't have GPU. Secondly Friz chess benchmark doesn't support GPU compute (nor does it support HSA). GPU computing is not magic that suddenly works.

Yes, you are right, right now Ryzen is just CPU, but AMD promises to launch Ryzen APU (built in graphic processing unit in one chip).
Rumors are that Ryzen APU processors will have comparable graphics performance to PlayStation 4".
I just would like to mention who knows what this engineering sample Ryzen contains under the cover. The blueprints are always look nice. We need the sample in the studio.
 
Rumors are that Ryzen APU processors will have comparable graphics performance to PlayStation 4".

In tflops? Sure its possible. In gaming, certainly no.

The APUs will have something like 25-30GB/sec bandwidth. The PS4 got 176GB/sec.
 
Even with DDR4 the bandwidth is likely only 50-60GB/s which is to low. AMD can bypass the limitation two ways.

1) Use of EDRAM or L4 Cache like Iris Pro which is rediculously expensive.

2) Use of HBM which is expensive also and will need a special platform with more interconnect to allow the CPU to use both HBM and DDR4.

Right now I doubt AMD will use HBM, maybe in a couple years they will finally move to HBM on chip.
 
Even with DDR4 the bandwidth is likely only 50-60GB/s which is to low. AMD can bypass the limitation two ways.

1) Use of EDRAM or L4 Cache like Iris Pro which is rediculously expensive.

2) Use of HBM which is expensive also and will need a special platform with more interconnect to allow the CPU to use both HBM and DDR4.

Right now I doubt AMD will use HBM, maybe in a couple years they will finally move to HBM on chip.

EDRAM cost peanuts compared to any other solution. Its like 3$ for a 128MB cache in production cost. But cost is still cost and if people wont pay extra for it. Mobile SKUs are getting EDRAM right and left to save on DRAM speed and power. 7 of 13 i7, 5 of 9 i5 and 2 of 5 i3 SKL mobile SKUs got EDRAM. Or 14 out of 27 ix based SKUs got EDRAM.

HBM is a pipe dream. Not to mention its on a fast track for a power crisis of its own.

DDR4 with 50-60GB/sec? You mean 3200-3600Mhz that's only OC? By the time the APUs come out you see 2667Mhz or so as chips that isn't OCed. And we all know OEMs will go even cheaper than that.

Its a shame AMD abandoned its GDDR sideport memory. But again, a faster APU competes with its own discrete GPUs. And an APU got no value as such being faster. Its a much better business trying to sell people a RX460 or so.
 
Last edited:
Back
Top