Pictures Of Your Dually Rigs!

M3T4LM4N222 · Nov 20, 2014

Love the old PowerPC based Mac lol.

PornoSatan · Nov 25, 2014

b3nno said:
[/IMG]http://i.imgur.com/h7lt7lJ.jpg[/IMG]
[/IMG]http://i.imgur.com/NfAJozZ.jpg[/IMG]

Specs:
Fractal Design Define R4
Corsair VX550W PSU
Supermicro X8DTL-i
2x Intel Xeon X5560
2x Zalman CNPS10X Optima Heatsinks
48GB (6x8GB) 1333MHz DDR3 ECC RDIMM
IBM M1015 Raid controller (flashed to IT)
Intel Pro Dell X3959 Dual Port Gigabit NIC

Storage:
5x ST2000DL003 Seagate 2TB Barracuda Green
Crucial M500 240GB SSD
Lite-On LAT-128M2S 128GB SSD
Hitachi 160GB HDD

zpool1: raidz(5x2TB) 7.1TB usable

datastores: 240GB SSD, 128GB SSD & 160GB HDD

All datastore disks are basicly drives I had laying around.

ESXi 5.5.0
FreeNAS with passthrough to M1015 & one Gbit NIC.
pfSense with passthrough to Intel Dual Port Gigabit NIC
lots of other VMs, primarily windows for testing.
Testing out SCCM2012R2, WDS/MDT, Exchange, Cisco Prime.

[/IMG]http://i.imgur.com/hIXKHGo.jpg[/IMG][/IMG]http://i.imgur.com/HXWNhHZ.jpg[/IMG]
Making custom powerconnectors saves some clutter..

But still it needs tidying up.

Thinking about expanding with another 5-disk raidz1, and perhaps some more SSDs for datastores.

Nice. Those custom power connectors look great. What's that behind the cpu mounting bracket? Fan controller?

b3nno · Nov 25, 2014

Thank you. Thats a fan controller indeed.

Salvaged a backplane, cage and P400 sas controller from and old DL380G5. Cage seems to fit quite well just between the PSU and drivecage. Got some more drives too, and thinking of maybe replacing SSDs for datastore with a raid-5 array of sas disks and using SSDs for host cache/zfs somehow.

rvborgh · Dec 10, 2014

Finally finished my home PC project

Added the final two Noctuas.

Blue Fox · Dec 10, 2014

Why on earth do you have the heatsinks sideways and impeding airflow? You can ditch the fans on them too once you rotate them back to how they're supposed to be.

rvborgh · Dec 10, 2014

Hi, As this is a home PC... i run the front and mid chassis fans at very low speeds to make things useable in a desktop environment soundwise. Basically just enough to vent the case and keep case temps reasonably low.

As far as the cpu temps go the current setup is just fine.. the cores on the top two processors generally run about 2C hotter than the ones below. i cannot tell any difference between processor 4 and processor 3 (despite the fact that #4 as no exhaust fan - simply because the "roof" of this SuperMicro case does not allow room for it). i don't think core temps ever exceed much more than 40C... running at 2.1 GHz. At 3 GHz, i don't think it ever got much higher than high 40s (but that was with the previous setup with 2 dynatrons up top).

just for giggles i pointed a FLIR camera at it... and took a shot using my software.

latest shot with 4 Noctuas...

previous with 2 Noctuas below and 2 Dynatrons above.

bottom line... i'll be experimenting with what works best. The cpus run just fine and cool with the Noctuas in this current orientation... i do need to get some Enzotech copper sinks on the VRMs at those hot spots. i'll also be updating the pusher fans with the latest 2000 rpm Noctua PWM units to replace these 1600rpm NB9 stockers in the near future.

Blue Fox · Dec 10, 2014

I don't think some FLIR images and a comparison with a different type of heatsink is relevant even if it is a neat comparison. Was making more of a point that you should rotate the Noctuas 90 degrees.

rvborgh · Dec 10, 2014

Hi Fox, i definitely understand your point.

Ideally i would think you would want all 4 mounted East/West like you suggest... however... the Noctua heat sink on processor 4 cannot be orientated East/West (the roof of the SuperMico case gets in the way). Given that fact i just tried reorienting all of them North/South to match... and surprisingly i found for my usage that it really didn't make much of a difference at all temperature wise (neither in cpu core measurements, or using the camera) compared to when i flipped just them East/West.

Granted these processors are in no way running at 100% load for hours on end... so that probably contributes.

When i get some time i'll pull #2 Noctua and remount it East/West to show that there is virtually no difference.

For now i am going to be concentrating on taming those hot spots on the VRMs.

Blue Fox said:
I don't think some FLIR images and a comparison with a different type of heatsink is relevant even if it is a neat comparison. Was making more of a point that you should rotate the Noctuas 90 degrees.

.Gunfire · Dec 11, 2014

I've just gotta ask, what on earth are you doing at home to need 4x Opterons?

rvborgh · Dec 11, 2014

1) mainly just for the novelty/fun of it... i wanted to build a monster AMD rig and wanted to see what kind of Cinebench benchmark performance was achievable for just over $2k (ended up doing 38.26 on Cinebench 11.5, and just over 3100 on R15)..
2) workwise... i will be doing some experimentation with regards to refactoring some my code to utilize lots and lots of cores. This machine is perfect for that kind of thing

Patriot · Dec 11, 2014

rvborgh said:
1) mainly just for the novelty/fun of it... i wanted to build a monster AMD rig and wanted to see what kind of Cinebench benchmark performance was achievable for just over $2k (ended up doing 38.26 on Cinebench 11.5, and just over 3100 on R15)..
2) workwise... i will be doing some experimentation with regards to refactoring some my code to utilize lots and lots of cores. This machine is perfect for that kind of thing

I am disappointed in you... only 38 with 61xx es... cmon man.

rvborgh · Dec 11, 2014

Hi Patriot, i don't think i will ever match your amazing performance

Frankly, i am a bit fearful as to what might happen to the motherboard running that much current through the thing

3 GHz is fast enough for me

Patriot said:
I am disappointed in you... only 38 with 61xx es... cmon man.

Patriot · Dec 11, 2014

rvborgh said:
Hi Patriot, i don't think i will ever match your amazing performance Frankly, i am a bit fearful as to what might happen to the motherboard running that much current through the thing 3 GHz is fast enough for me

Fair enough... I did toast a board in testing... but I was the third owner and it wasn't perfectly stable when I got it... so...

That and I did have it under water... so I am able to keep it cool at much higher frequencies than you.

rvborgh · Jan 8, 2015

Just thought i'd bump this thread.

Updated the pushers on my quard rig with the latest Noctua A9 PWM fans...

smangular · Feb 25, 2015

Dare I ask what is your noise level and power consumption like?
Very cool that FLIR picture although those cameras are several hundred dollars last I checked.

rvborgh · Feb 25, 2015

Running at 2.1 GHz... consumption at idle is around 360w (in Windows performance power mode). Running IntelBurnTest... consumption is just a tad under 700w.

Running at 3.0 GHz... full load power consumption (running Cinebench 11.5) is somewhere around 900 watts (but it finishes the test in 10 seconds).

As far as noise goes... with a fan controller you can use it for desktop use. The main source of noise are the 4 small fans on the front of the dual redundant power supplies.

i also added an additional SuperMicro fan on the back so i can run all 4 slower for the same air draw:

this was because #1 CPU was not getting as much airflow as well...

For the lowest noise setup... you probably want to go with a larger case with larger fans. The SM case is more of a server case that can be quieted down. It is built like a tank though and extremely high quality.

Hope that helps.

Voxata · Feb 25, 2015

Holy crap! What in the world do you do w/that thing.

Liquid_Static · Feb 26, 2015

God there are some sweet rigs in here.

mikeblas · Feb 26, 2015

rvborgh said:
Running at 2.1 GHz... consumption at idle is around 360w (in Windows performance power mode). Running IntelBurnTest... consumption is just a tad under 700w.

Er, is this the quad machine in your sig? Isn't it kind of ... slow? You end up with 8 cores, and they're pretty slow ones. Wouldn't a modern Core i7 build be faster?

rvborgh · Feb 26, 2015

i suppose it depends on type of apps you are talking about. Obviously anything that can take advantage of lots of cores will just decimate any i7-4770 setup. Just to give an example of that... for giggles i ran one of our apps (there is a portion of code on there that i wrote that can spread across as many cores as are available) on the 4P. The i7-4770 took 10 seconds for the operation, and the 48 core 4P setup was done in well under a second (while barely registering a blip on task manager).

Other multithreaded examples...
Cinebench R15 - 751 on the 4770, 3112 on the 4P Opteron.
Cinebench 11.5... i think the 4770 scores high 7s, 4P Opteron scores 37.23
PassMark CPUMark... score is just under 20,000.

Core for core... Istanbul K10 at 3 GHz performs about the same as 4 GHz FX8350 Piledriver for integer stuff. Basically its a Phenom II X48 1075T (minus turbo). for normal integer stuff that i run i don't notice it as being slower...

i am not a gamer... so i do not know how it does there. Perhaps you could suggest some gaming benchmarks? I don't have the world's best GPU in it... just a GTX 760. None of the gaming benchmarks i have run seem to really put stress on the CPUs.

i am curious what games can take advantage of lots of cores.

To say that these 4P Operon setups are optimal for home use would be a lie. You are talking about 4 of everything, a case that can take the huge SWTX motherboard. A bit older tech. On the other hand you are running server grade SuperMicro stuff and its rock solid. i am guessing i ran close to 1000 iterations of IntelBurnTest. i would say that is stable.

mikeblas said:
Er, is this the quad machine in your sig? Isn't it kind of ... slow? You end up with 8 cores, and they're pretty slow ones. Wouldn't a modern Core i7 build be faster?

agrikk · Feb 27, 2015

mikeblas said:
Er, is this the quad machine in your sig? Isn't it kind of ... slow? You end up with 8 cores, and they're pretty slow ones. Wouldn't a modern Core i7 build be faster?

Speed per core becomes relevant in gaming and other single core applications where a thread has only one core's worth of clock speed to play with. The i7 is ultimately a consumer CPU with an instruction set optimized accordingly.

In a multithreaded application, like video or audio rendering or full-blown multiuser database applications, you care less about clock speed than how each core plays with each other and how the instruction set within the CPU is optimized for your application.

Build the rig for what you are going to do on it.

mikeblas · Feb 27, 2015

rvborgh said:
The i7-4770 took 10 seconds for the operation, and the 48 core 4P setup was done in well under a second (while barely registering a blip on task manager).

The Magny Cours chips have 12 cores each, then? I thought it was just two each. My mistake, though not really any reason to give a condescending lecture.

agrikk said:
Speed per core becomes relevant in gaming and other single core applications where a thread has only one core's worth of clock speed to play with. The i7 is ultimately a consumer CPU with an instruction set optimized accordingly.

Speed per core is relevant when speed is relevant. 48 cores at 3.0 GHz are faster than 48 cores at 2.1 GHz. Sure, there's a far bigger delta from 8 cores to 48; but the per-core clock speed increase is still an increase.

rvborgh · Feb 28, 2015

Hi Mike, i apologize. I did not intend it to be condescending and am sorry if i came across that way. When i ran that code i was just really impressed with the performance vs my 4770. i had never seen it run so fast. it was like the code was done before it even started vs the 4770 where it would max out all cores for a bit.

My other home machine is a 12 core (dual Opteron 8439SE setup @ 2.8 GHz) setup... that one seems to be about 10-15% faster than my 4770 overall and uses the same K10 Istanbul cores.

Yes, Magny Cours are a dual Istanbul in a single chip (2x6 K10, and double dual memory channels). Kind of older 2010 tech, but i like the K10.

i haven't really tried taking these chips past 3 GHz... they run there at 1.175v... but i think the power usage at say 3.3 and 1.25v would be a bit ridiculous (well over 1000w at full bore).

mikeblas said:
The Magny Cours chips have 12 cores each, then? I thought it was just two each. My mistake, though not really any reason to give a condescending lecture.
Speed per core is relevant when speed is relevant. 48 cores at 3.0 GHz are faster than 48 cores at 2.1 GHz. Sure, there's a far bigger delta from 8 cores to 48; but the per-core clock speed increase is still an increase.

Nathan_P · Feb 28, 2015

rvborgh said:
i haven't really tried taking these chips past 3 GHz... they run there at 1.175v... but i think the power usage at say 3.3 and 1.25v would be a bit ridiculous (well over 1000w at full bore).

3.8 is the largest overclock I have seen on G34 hardware, and that was under water. Not sure what the power drawer was but I think it was a 1200w at least. You also run the risk of cooking your board as the VRM's strain under the load.

mikeblas · Feb 28, 2015

rvborgh said:
Hi Mike, i apologize. I did not intend it to be condescending and am sorry if i came across that way. When i ran that code i was just really impressed with the performance vs my 4770. i had never seen it run so fast. it was like the code was done before it even started vs the 4770 where it would max out all cores for a bit.

No worries. I haven't used AMD hardware in about five years (or, at least, I haven't concerned myself with relative AMD performance in that long), so I'm not always familiar with their parts. Running 48 cores is a very different story than running 8!

agrikk said:
and how the instruction set within the CPU is optimized for your application.

I can't help but wondering what you specifically mean by this. Can you elaborate?

Zarathustra[H] · Feb 28, 2015

rvborgh said:
i suppose it depends on type of apps you are talking about. Obviously anything that can take advantage of lots of cores will just decimate any i7-4770 setup. Just to give an example of that... for giggles i ran one of our apps (there is a portion of code on there that i wrote that can spread across as many cores as are available) on the 4P. The i7-4770 took 10 seconds for the operation, and the 48 core 4P setup was done in well under a second (while barely registering a blip on task manager).

Other multithreaded examples...
Cinebench R15 - 751 on the 4770, 3112 on the 4P Opteron.
Cinebench 11.5... i think the 4770 scores high 7s, 4P Opteron scores 37.23
PassMark CPUMark... score is just under 20,000.

Core for core... Istanbul K10 at 3 GHz performs about the same as 4 GHz FX8350 Piledriver for integer stuff. Basically its a Phenom II X48 1075T (minus turbo). for normal integer stuff that i run i don't notice it as being slower...

i am not a gamer... so i do not know how it does there. Perhaps you could suggest some gaming benchmarks? I don't have the world's best GPU in it... just a GTX 760. None of the gaming benchmarks i have run seem to really put stress on the CPUs.

i am curious what games can take advantage of lots of cores.

To say that these 4P Operon setups are optimal for home use would be a lie. You are talking about 4 of everything, a case that can take the huge SWTX motherboard. A bit older tech. On the other hand you are running server grade SuperMicro stuff and its rock solid. i am guessing i ran close to 1000 iterations of IntelBurnTest. i would say that is stable.

That is pretty impressive.

My dual Xeon L5640 (12 cores, 24 logical w HT, 2.27ghz, 2.8 turbo) got about 1075 in Cinebench R15, back when I had windows on it for testing purposes.

Low clocks and large numbers of cores are great in servers. Especially virtualized ones.

I wouldn't build something like this for use as a desktop though.

mikeblas · Feb 28, 2015

Zarathustra[H];1041455624 said:
Low clocks and large numbers of cores are great in servers. Especially virtualized ones.

Say I'm using your virtualized server to host my app. You allocate my VM a core or two. Why wouldn't I want those cores to be as fast as possible?

Zarathustra[H] · Feb 28, 2015

mikeblas said:
Say I'm using your virtualized server to host my app. You allocate my VM a core or two. Why wouldn't I want those cores to be as fast as possible?

You might, if all else is equal.

But all else is rarely equal.

Most server apps are not extremely CPU intensive on their own. CPU loads are more typically added up by many parallel tasks.

Disregarding power consumption for a moment, from a pure performance perspective, a 4ghz quad core will -of course - perform better than a 2ghz quad core (assuming the same arch).

The argument is simply that for typical server tasks, 8 2ghz cores will typically be more useful and perform better than 4 4Ghz cores, despite the total core*ghz being the same. At the very least there will be little difference between the two.

Typical server loads are just made up of many many small tasks, instead of one large task, and as such having many cores is more efficient, as registers and caches don't have to be cleared and refetched as often, as they would on a faster system with fewer cores, where core time sharing is more intense resulting in many cache misses.

Whenever you have to share CPU time, it results in different threads requiring different information running on the core at the same time. They will be fighting for scarce resources like cache, etc. When you split this up, you reduce this resource constriction, and things work more efficiently.

This effect happens in Desktop use as well, but is a lot less noticeable, as Desktops tend to have many fewer threads running at any given time. It was however rather noticeable when multitasking back in the early dual core days, when you could pick a really high clocked single core, or a lower clocked dual core. The dual core simply felt much more smooth when multitasking on the desktop. IN games - however - the higher clocked single core would kill the lower clocked dual core.

Factor in power consumption per instruction and the many core solutions have a huge advantage in servers, where it is typically less about how fast a single amchine is, but how much you can get done per unit of power.

There are - however - some server applications that DO benefit from fast cores, but they are pretty rare, and not representative of typical server loads.

mikeblas · Feb 28, 2015

Zarathustra[H];1041455709 said:
Whenever you have to share CPU time, it results in different threads requiring different information running on the core at the same time. They will be fighting for scarce resources like cache, etc. When you split this up, you reduce this resource constriction, and things work more efficiently.

Ah, I see. I guess I have a different idea of "sever apps" and "tasks". With those assumptions, then it makes sense to me.

Zarathustra[H] · Feb 28, 2015

mikeblas said:
Ah, I see. I guess I have a different idea of "sever apps" and "tasks". With those assumptions, then it makes sense to me.

Interesting. What do you run on your servers?

Typical server loads - to me - are things like email, database, routing, storage, etc. in which fully utilized serves get many tiny requests from many clients.

An exception here would be the likes of gaming servers, which tend to prefer per core performance, just like the games to on clients. This usually isn't a big deal though, as most gaming servers use pretty light loads, but this definitely wasn't the case for the Red Orchestra II dedicated server, which puts a huge load on servers.

mikeblas · Feb 28, 2015

Really, the problem is just that I don't do well with generalizations. "Servers" are a huge and broad category, so it seems absurd to draw any particular conclusion about them. "most server applications" and "typical server loads" and "in a multithreaded application" are generalizations that fail just as often as they hold. To me, anyway.

But maybe I'm weird. I'd almost never put anything on a VM, for example. Why wouldn't I write code that uses the whole server? If I'm not using the whole rig, I've done a terrible job at capacity planning my fleet, or making the architecture of my application. If I have some application that's required but thin for demand, I should be able to drop it side-by-side as another process with other processes implementing more resource-intensive parts of my system.

Databases are a good example of how generalities don't hold. "Databases" have been offered a couple times in this thread as example applications, but I think even that is too broad of a generalization.

If I'm thinking of an OLTP database, I/O is usually the governing factor instead of CPU. Database queries shouldn't be CPU-intensive; instead, they just go straight to I/O and the accretive effect you're describing happens there. Moar disk!!1! helps. More cache (that is, more memory for the database server; or more application-layer caching) can help, since that reduces the query load. But caching is hard.

A data warehouse, though, has a different pattern. It's doing sequential I/O instead of random I/O, and it's easy to get enough throughput sequentially. Aggregation in the warehouse ends up being memory- and CPU-bound.

Search might be considered a database application. Is it CPU-bound, or I/O bound? It can easily be both or neither, depending on the corpus, the complexity of the search queries, and the algorithms used to perform indexing and matching.

What else can you run on a server? Let's think about web servers. Should be I/O bound too, right? Get content from disk, and pump it out the network port. It's just I/O. But modern web sites are dynamic, not static, so we're running code to generate pages. How complicated is that code? What is its memory demand? Encryption, whether client-facing (like SSL, or signed requests, or ...) or is it application-facing (for identity, or L6 or L7 encrypted payloads, or ...)?

Any modern web server is mutli-threaded, but certainly we can't say that cores matter more than clocks speed in all cases; or even any majority of cases, since there are so many different architectures and applications.

What if I have a specialized application that's compute-intensive? Then, certainly, I need something that's as fast as possible. Maybe I'm not doing many requests per second; or, I scale that out. Scaling up (with more cores/box, and more clock per core) I end up with less latency even if I don't end up with faster throughput. What are compute-intensive applications? Distributed math, including cryptography; machine learning, compute nodes in Hadoop clusters, and so on.

Your VM example seems to assume I'm running a server. What if I'm just running an application? Maybe even just a remote desktop. I'm after low latency per user, not high aggregate throughput across all users.

You mention gaming servers. Indeed, they're often compute-intensive. Some are I/O intensive, though; SecondLife spends most of its time transmitting assets to the client, as does any other MMOG that is more about content than game play.

A "task" is something ambiguous; it's not a thread, it's not a process. What is it, specifically? The more cores I have, the more threads I can run concurrently. That's usually good, but it's also good to have faster execution in those threads so that work is completed quicker. They're not strongly correlated (but they're not inversely correlated!), but it's nice that I have faster memory bandwidth when I have higher clock speeds. Another weak (but positive) correlation is processor cache to clock speed. All of these things help applications that need them.

If I have all these accretive tasks, am I not spending lots of time task switching between them? I probably have more tasks then cores because clients always outnumber servers. The faster the clock speed, the faster a context switch completes. That reduces latency for each task, and overall system throughput. (Assuming a "task" is an incoming unit of work, kind of arbitrarily.)

It seems surprising to me that you're comfortable generalizing about "typical server loads". Maybe you're thinking about the place you work, or the things you've worked on; but in my experience, lots of different applications, lots of different architectures, make widely variant demands on memory, CPU, and I/O; and in different proportions.

In such a problem set, can any generalization possibly hold true without at least some scoping?

gmutale · Feb 28, 2015

mikeblas said:
Really, the problem is just that I don't do well with generalizations. "Servers" are a huge and broad category, so it seems absurd to draw any particular conclusion about them. "most server applications" and "typical server loads" and "in a multithreaded application" are generalizations that fail just as often as they hold. To me, anyway.

But maybe I'm weird. I'd almost never put anything on a VM, for example. Why wouldn't I write code that uses the whole server? If I'm not using the whole rig, I've done a terrible job at capacity planning my fleet, or making the architecture of my application. If I have some application that's required but thin for demand, I should be able to drop it side-by-side as another process with other processes implementing more resource-intensive parts of my system.

Databases are a good example of how generalities don't hold. "Databases" have been offered a couple times in this thread as example applications, but I think even that is too broad of a generalization.

If I'm thinking of an OLTP database, I/O is usually the governing factor instead of CPU. Database queries shouldn't be CPU-intensive; instead, they just go straight to I/O and the accretive effect you're describing happens there. Moar disk!!1! helps. More cache (that is, more memory for the database server; or more application-layer caching) can help, since that reduces the query load. But caching is hard.

A data warehouse, though, has a different pattern. It's doing sequential I/O instead of random I/O, and it's easy to get enough throughput sequentially. Aggregation in the warehouse ends up being memory- and CPU-bound.

Search might be considered a database application. Is it CPU-bound, or I/O bound? It can easily be both or neither, depending on the corpus, the complexity of the search queries, and the algorithms used to perform indexing and matching.

What else can you run on a server? Let's think about web servers. Should be I/O bound too, right? Get content from disk, and pump it out the network port. It's just I/O. But modern web sites are dynamic, not static, so we're running code to generate pages. How complicated is that code? What is its memory demand? Encryption, whether client-facing (like SSL, or signed requests, or ...) or is it application-facing (for identity, or L6 or L7 encrypted payloads, or ...)?

Any modern web server is mutli-threaded, but certainly we can't say that cores matter more than clocks speed in all cases; or even any majority of cases, since there are so many different architectures and applications.

What if I have a specialized application that's compute-intensive? Then, certainly, I need something that's as fast as possible. Maybe I'm not doing many requests per second; or, I scale that out. Scaling up (with more cores/box, and more clock per core) I end up with less latency even if I don't end up with faster throughput. What are compute-intensive applications? Distributed math, including cryptography; machine learning, compute nodes in Hadoop clusters, and so on.

Your VM example seems to assume I'm running a server. What if I'm just running an application? Maybe even just a remote desktop. I'm after low latency per user, not high aggregate throughput across all users.

You mention gaming servers. Indeed, they're often compute-intensive. Some are I/O intensive, though; SecondLife spends most of its time transmitting assets to the client, as does any other MMOG that is more about content than game play.

A "task" is something ambiguous; it's not a thread, it's not a process. What is it, specifically? The more cores I have, the more threads I can run concurrently. That's usually good, but it's also good to have faster execution in those threads so that work is completed quicker. They're not strongly correlated (but they're not inversely correlated!), but it's nice that I have faster memory bandwidth when I have higher clock speeds. Another weak (but positive) correlation is processor cache to clock speed. All of these things help applications that need them.

If I have all these accretive tasks, am I not spending lots of time task switching between them? I probably have more tasks then cores because clients always outnumber servers. The faster the clock speed, the faster a context switch completes. That reduces latency for each task, and overall system throughput. (Assuming a "task" is an incoming unit of work, kind of arbitrarily.)

It seems surprising to me that you're comfortable generalizing about "typical server loads". Maybe you're thinking about the place you work, or the things you've worked on; but in my experience, lots of different applications, lots of different architectures, make widely variant demands on memory, CPU, and I/O; and in different proportions.

In such a problem set, can any generalization possibly hold true without at least some scoping?

this thread is for pics of multi-socket rigs.. not your rants, stfu & gtfo, seriously

mikeblas · Feb 28, 2015

gmutale said:
this thread is for pics of multi-socket rigs.. not your rants, stfu & gtfo, seriously

Please be quiet. The adults are talking.

Zarathustra[H] · Feb 28, 2015

mikeblas said:
Really, the problem is just that I don't do well with generalizations. "Servers" are a huge and broad category, so it seems absurd to draw any particular conclusion about them. "most server applications" and "typical server loads" and "in a multithreaded application" are generalizations that fail just as often as they hold. To me, anyway.

But maybe I'm weird. I'd almost never put anything on a VM, for example. Why wouldn't I write code that uses the whole server? If I'm not using the whole rig, I've done a terrible job at capacity planning my fleet, or making the architecture of my application. If I have some application that's required but thin for demand, I should be able to drop it side-by-side as another process with other processes implementing more resource-intensive parts of my system.

Databases are a good example of how generalities don't hold. "Databases" have been offered a couple times in this thread as example applications, but I think even that is too broad of a generalization.

If I'm thinking of an OLTP database, I/O is usually the governing factor instead of CPU. Database queries shouldn't be CPU-intensive; instead, they just go straight to I/O and the accretive effect you're describing happens there. Moar disk!!1! helps. More cache (that is, more memory for the database server; or more application-layer caching) can help, since that reduces the query load. But caching is hard.

A data warehouse, though, has a different pattern. It's doing sequential I/O instead of random I/O, and it's easy to get enough throughput sequentially. Aggregation in the warehouse ends up being memory- and CPU-bound.

Search might be considered a database application. Is it CPU-bound, or I/O bound? It can easily be both or neither, depending on the corpus, the complexity of the search queries, and the algorithms used to perform indexing and matching.

What else can you run on a server? Let's think about web servers. Should be I/O bound too, right? Get content from disk, and pump it out the network port. It's just I/O. But modern web sites are dynamic, not static, so we're running code to generate pages. How complicated is that code? What is its memory demand? Encryption, whether client-facing (like SSL, or signed requests, or ...) or is it application-facing (for identity, or L6 or L7 encrypted payloads, or ...)?

Any modern web server is mutli-threaded, but certainly we can't say that cores matter more than clocks speed in all cases; or even any majority of cases, since there are so many different architectures and applications.

What if I have a specialized application that's compute-intensive? Then, certainly, I need something that's as fast as possible. Maybe I'm not doing many requests per second; or, I scale that out. Scaling up (with more cores/box, and more clock per core) I end up with less latency even if I don't end up with faster throughput. What are compute-intensive applications? Distributed math, including cryptography; machine learning, compute nodes in Hadoop clusters, and so on.

Your VM example seems to assume I'm running a server. What if I'm just running an application? Maybe even just a remote desktop. I'm after low latency per user, not high aggregate throughput across all users.

You mention gaming servers. Indeed, they're often compute-intensive. Some are I/O intensive, though; SecondLife spends most of its time transmitting assets to the client, as does any other MMOG that is more about content than game play.

A "task" is something ambiguous; it's not a thread, it's not a process. What is it, specifically? The more cores I have, the more threads I can run concurrently. That's usually good, but it's also good to have faster execution in those threads so that work is completed quicker. They're not strongly correlated (but they're not inversely correlated!), but it's nice that I have faster memory bandwidth when I have higher clock speeds. Another weak (but positive) correlation is processor cache to clock speed. All of these things help applications that need them.

If I have all these accretive tasks, am I not spending lots of time task switching between them? I probably have more tasks then cores because clients always outnumber servers. The faster the clock speed, the faster a context switch completes. That reduces latency for each task, and overall system throughput. (Assuming a "task" is an incoming unit of work, kind of arbitrarily.)

It seems surprising to me that you're comfortable generalizing about "typical server loads". Maybe you're thinking about the place you work, or the things you've worked on; but in my experience, lots of different applications, lots of different architectures, make widely variant demands on memory, CPU, and I/O; and in different proportions.

In such a problem set, can any generalization possibly hold true without at least some scoping?

Actually, good point.

The MySQL database as used by my MythBuntu install (not that this by any means is a typical server application) tends to pin one CPU core when doing updates, like populating TV listings.

Everything else that runs on my server tends to spread nicely over the cores it gets assigned.

So, it definitely depends.

For my applications my 2.27Ghz cores (turbo to 2.8) are able to handle my loads from my individual guests. Nothing I have would benefit from faster cores. Having more cores - however - would allow me to run more guests without increasing my consolidation ratio above 1 which results in reduced performance.

I think it is at least safe to say that on average, fast per core speeds matter more for desktops than they do on servers, but there are exceptions to everything.

mikeblas · Feb 28, 2015

Zarathustra[H];1041456157 said:
I think it is at least safe to say that on average, fast per core speeds matter more for desktops than they do on servers, but there are exceptions to everything.

I hope you don't think I'm being difficult, but I just don't understand what you mean. What does "matter more" mean?

If I increase a desktop clock 33%, the CPU-bound parts of the applications it runs will go about 33% faster.

If I increase a server clock 33%, the CPU-bound parts of the application it is running will run about 33% faster.

Why does 33% faster "matter more" for the desktop than the server?

Zarathustra[H] · Feb 28, 2015

mikeblas said:
I hope you don't think I'm being difficult, but I just don't understand what you mean. What does "matter more" mean?

If I increase a desktop clock 33%, the CPU-bound parts of the applications it runs will go about 33% faster.

If I increase a server clock 33%, the CPU-bound parts of the application it is running will run about 33% faster.

Why does 33% faster "matter more" for the desktop than the server?

If your applications can run sufficiently on a lower clocked part, moving to a higher clocked part isn't going to buy you much. Again, we have established that there are exceptions, but in the server market lower clocked parts often do the job.

mikeblas · Mar 1, 2015

That doesn't seem like a very useful observation. If something is fast enough, we can't expect making it faster to be beneficial.

smangular · Mar 1, 2015

rvborgh said:
Running at 2.1 GHz... consumption at idle is around 360w (in Windows performance power mode). Running IntelBurnTest... consumption is just a tad under 700w.

Running at 3.0 GHz... full load power consumption (running Cinebench 11.5) is somewhere around 900 watts (but it finishes the test in 10 seconds).

As far as noise goes... with a fan controller you can use it for desktop use. The main source of noise are the 4 small fans on the front of the dual redundant power supplies.

....

Thanks, that is quite a bit of power but not completely crazy. I was looking at the $99 sale on the 1356 DP Motherboards but didn't see great deals on 1356 CPUs to use.

ToddW2 · Mar 2, 2015

Just got this one together yesterday.

2x E5-2620(v1), 128gb RAM, on-board LSI RAID & CACHE, 10gig NICx2

Will be ESXI host for home stuff, and misc dev. work, learning/etc.

Unsure on SSD Array as of yet for local storage. 5x5TB RED RAID6 (Arriving tomorrow or Wed.) for NAS/shares.

For the $ I'm REALLY liking the 4U Rosewill w/12 hot swap bays. Gutted inside (not much really), replacing 3x120mm mid-fans with 3000rpm noctua industrial PWM fans, they're silent 99% of the time but if you're running high load 3000rpm is nice, especially when 50% of the fan is blocked by the stupid SATA board for the hot swaps. If I wasn't running this specific system someplace where sound mattered I'd keep all the Rosewill fans they are not bad at all, in-fact could use this as a desktop w.out changing anything for most people... could even remove the rear fans (loudest) and I bet it would keep most systems cool. (I may try this, although the rear fans funnel air nice around the CPUs)

SpeedyVV · Mar 8, 2015

Here is my dual E5-2699 v3 box being built and bench tested

Some early benchies:

Folding only seems to be running on half the threads or only on one CPU :-(

Working on a MacPro G5 Case mode with side window and custom paint job.

PowerMac%20Apple%20G5%20Components_zpsnjpzapdx.jpg

Pictures Of Your Dually Rigs!

Weaksauce

2[H]4U

n00b

Weaksauce

[H]F Junkie

Weaksauce

[H]F Junkie

Weaksauce

Gawd

Weaksauce

[H]ard|DCer of the Month - March 2011/June 2013/De

Weaksauce

[H]ard|DCer of the Month - March 2011/June 2013/De

Weaksauce

Limp Gawd

Weaksauce

Limp Gawd

Gawd

[H]ard|DCer of the Month - May 2006

Weaksauce

Gawd

[H]ard|DCer of the Month - May 2006

Weaksauce

[H]ard DCOTM x3

[H]ard|DCer of the Month - May 2006

Extremely [H]

[H]ard|DCer of the Month - May 2006

Extremely [H]

[H]ard|DCer of the Month - May 2006

Extremely [H]

[H]ard|DCer of the Month - May 2006

Limp Gawd

[H]ard|DCer of the Month - May 2006

Extremely [H]

[H]ard|DCer of the Month - May 2006

Extremely [H]

[H]ard|DCer of the Month - May 2006

Limp Gawd

2[H]4U

Supreme [H]ardness