What SMP solution? Xeon or AMD?

The Real Zardoz

Weaksauce
Joined
Jun 29, 2002
Messages
107
Ok, I'm considering either the Tyan Thunder i7525 or Tyan K8WE. I'll be mounting this in a Lian Li V2000B - I have a strong preference towards the Xeons due to price however:

1. Can I mount a Xeon board in my case?
2. How can I cool the processors quitely without modifications to the case? If I can't cool the procs quietely, then the Xeons are out. (I cool my system with a WhisperRock IV at present).
3. I do a lot of RAW photo manipulation and imaging. I also do a lot of bandwidth intensive network stuff - so bus bandwidth (to NICs/disk controllers) is important to me. Which of the two solutions is going to win here?
4. I know the Opterons are going to be better for games - but by how much?
5. How is Hyperthreading going to help me?

If I go with Xeons, I am also considering the Gigabyte GA-9ITDW - is anyone familiar with this board?

Thanks
 
First off, I would NEVER buy a Gigabyte board. Their quality and reliability is strongly lacking. Despite what you might read on Tomshardware. (I am convinced Gigabyte sends the mobo reviewers fluff girls on a regular basis)

The Xeon is cheaper. Yes. LV Xeons on an Asus board overclocked is the cheapest solution. But don't kid yourself. Opterons will rape the Xeon at pretty much every turn depending on what Opterons you went with. There is simply no comparison. For games, the benches would show the same thing you see between A64's and Pentium 4's.

HT will be a bennefit, but not so much on dual processor systems. It's mostly noticable on single CPU machines like standard P4's.

Here are some good benches to look at.

http://www.techreport.com/reviews/2005q4/opteron-254/index.x?pg=1

As for cooling, Opterons will run cooler. No doubt there. You won't have to modify the cooling in your case for the Opterons. Possibly add more cooling for Xeon's. But in either case no case modification will be necessary.
 
Forget Xeons now. Let's get the facts straight.

-2nd Xeon adds peak 40%. 2nd Opteron adds average 80%. (Based on multithreaded apps.)
-Xeon is 100W+ TDP, Opteron is 90W-95W TDP
-Xeon is slower, period. HypeThreading is still just hype.
-Opteron is, very surprisingly, more reliable. (This is more a thermal thing, I suspect.)

Now, addressing each of your applications individually...
E or later Opterons will meet or beat Xeons on applications that don't prevent SSE3 use. Some are compiled by idiots though, and will not use SSE2/SSE3 on Opterons even when present.
Forget the Whisperock (which is really not that good anyways) for either. I recommend sticking with the AMD stockers since most boards finally have Cool'n'Quiet working. (Except Tyan, patience..)
Bandwidth means you really have no choice except AMD. Effective bandwidth on a Xeon board drops by a minimum 40% per CPU. That's memory -and- bus. Both CPUs go to one northbridge, which bottlenecks like all hell across the board. HyperTransport can operate in coherent and non-coherent modes, meaning it can basically switch between CPUs to give each full bandwidth as needed. Regardless of that, it has significant overhead while Intel has none.

That said, unless you absolutely need dual 16x slots, and even then, I recommend against the Tyan K8WE. These boards have been too problematic; too many DOAs, nasty BIOS bugs, etcetera. Typical Tyan, they were severely rushed. (Just like the K7X 1.0 with it's broken USB.) Iwill and Supermicro have much better offerings available with SLI, though only Supermicro has a dual 16x available (the H8DCE.) Iwill has a 16x/2x8x SLI available, and will likely announce a dual 16x at some point soon since they spend a lot of time getting boards right, instead of just getting them to market. Between the two, I do prefer the H8DCE currently, but will likely move to Iwill if/when it becomes available. (REALiZM 800's require dual 16x, and will not work at 8x.)
 
AreEss said:
Forget Xeons now. Let's get the facts straight.

-2nd Xeon adds peak 40%. 2nd Opteron adds average 80%. (Based on multithreaded apps.)
-Xeon is 100W+ TDP, Opteron is 90W-95W TDP
-Xeon is slower, period. HypeThreading is still just hype.
-Opteron is, very surprisingly, more reliable. (This is more a thermal thing, I suspect.)

Now, addressing each of your applications individually...
E or later Opterons will meet or beat Xeons on applications that don't prevent SSE3 use. Some are compiled by idiots though, and will not use SSE2/SSE3 on Opterons even when present.
Forget the Whisperock (which is really not that good anyways) for either. I recommend sticking with the AMD stockers since most boards finally have Cool'n'Quiet working. (Except Tyan, patience..)
Bandwidth means you really have no choice except AMD. Effective bandwidth on a Xeon board drops by a minimum 40% per CPU. That's memory -and- bus. Both CPUs go to one northbridge, which bottlenecks like all hell across the board. HyperTransport can operate in coherent and non-coherent modes, meaning it can basically switch between CPUs to give each full bandwidth as needed. Regardless of that, it has significant overhead while Intel has none.

That said, unless you absolutely need dual 16x slots, and even then, I recommend against the Tyan K8WE. These boards have been too problematic; too many DOAs, nasty BIOS bugs, etcetera. Typical Tyan, they were severely rushed. (Just like the K7X 1.0 with it's broken USB.) Iwill and Supermicro have much better offerings available with SLI, though only Supermicro has a dual 16x available (the H8DCE.) Iwill has a 16x/2x8x SLI available, and will likely announce a dual 16x at some point soon since they spend a lot of time getting boards right, instead of just getting them to market. Between the two, I do prefer the H8DCE currently, but will likely move to Iwill if/when it becomes available. (REALiZM 800's require dual 16x, and will not work at 8x.)

Good information, and I agree with most of it. The part I don't agree with is the bashing of the Tyan K8WE. I've had zero issues with it. No failures or stability problems.

Of course I can also say the SuperMicro solution looks to be a good one as an alternative. I feel the K8WE is a better solution, offering PCI-X instead of PCI-Express slots. There is simply more options available for the PCI-X bus for RAID controllers and Gig-E adapters. Although I like the idea of PCI-Express and 8 SATA ports instead of four on the K8WE.

I'd evaluate your needs and get the appropriate board for those needs based on their different feature sets.
 
AreEss said:
Forget the Whisperock (which is really not that good anyways) for either. I recommend sticking with the AMD stockers since most boards finally have Cool'n'Quiet working. (Except Tyan, patience..)

Cool'n'Quiet works on Tyan boards...K8WE at least...but not C0 stepping CPUs(which are not officially supported by Tyan on those boards). It works fine on the newer ones. (the AMD Tech Tour 246's didn't work, but the new 254s that Sir and I have do).
 
Marduk said:
Cool'n'Quiet works on Tyan boards...K8WE at least...but not C0 stepping CPUs(which are not officially supported by Tyan on those boards). It works fine on the newer ones. (the AMD Tech Tour 246's didn't work, but the new 254s that Sir and I have do).

That's probably true. I've never used Cool'n'Quiet on my rig before.
 
I enabled it, but I don't know if it's actually doing me any good.
Basically makes the thing act like a laptop....CPU frequency jumping around all the time...
 
Sir-Fragalot said:
Good information, and I agree with most of it. The part I don't agree with is the bashing of the Tyan K8WE. I've had zero issues with it. No failures or stability problems.

That's because you've seen what, one or two, whereas I've seen more than ten now. Of these ten, two ere DOA, three failed in testing, and all had issues with supposedly supported cards. (Quadros and REALiZMs.)
Not to mention that for most BIOS versions, there were serious problems with various things. Voltage reads incorrect, temperature incorrect, ACPI didn't work at all, etcetera.

Marduk said:
I enabled it, but I don't know if it's actually doing me any good.
Basically makes the thing act like a laptop....CPU frequency jumping around all the time...

*sigh* Which sounds like it's still broken, no surprise there.
 
I do some heavy database work and frankly have been disappointed with my Dual Xeon setup at the office. I have been begging my boss to let me go to Opterons after bringing in a dual Opteron rig and showing him the performance difference which was about 35%.
 
AreEss said:
*sigh* Which sounds like it's still broken, no surprise there.


Actually, that's exactly what Cool'n'Quiet(technically, it's PowerNow!) does. My A64 w/ MSI NForce3 system did the same thing.

I'm just not sure if I'm actually gaining anything by it.
 
Marduk said:
Actually, that's exactly what Cool'n'Quiet(technically, it's PowerNow!) does. My A64 w/ MSI NForce3 system did the same thing.

I'm just not sure if I'm actually gaining anything by it.

Really, that's not exactly what it's supposed to do. The problem is that everyone's being overagressive and going backwards; throttle clock AND fan and ramp up. You're supposed to throttle fan at full clock, then ramp fan up and clock down. It's supposed to behave fairly stable, e.g. you shouldn't see the clock changing very frequently at all, unless there's a cooling problem.
 
AreEss said:
That's because you've seen what, one or two, whereas I've seen more than ten now. Of these ten, two ere DOA, three failed in testing, and all had issues with supposedly supported cards. (Quadros and REALiZMs.)
Not to mention that for most BIOS versions, there were serious problems with various things. Voltage reads incorrect, temperature incorrect, ACPI didn't work at all, etcetera.



*sigh* Which sounds like it's still broken, no surprise there.

I am not saying that you aren't right in your experiences with the K8WE. I just don't share them. As we've discussed before, Marduk and myself have similar configurations, and ours have been relatively problem free. You are building machines with different needs and hardware configurations than we are using. As a result, there is no doubt that you could have very different experiences than us with the Tyan K8WE. For many of the enthusiasts on this forum that are looking at the K8WE, they'll likely be looking at configurations more like what myself and Marduk have. So the K8WE might be a good fit for some.

As for the BIOS versions, mine worked fine from day 1. The newer BIOS's added features that of course should have been enabled in the first revision I'll agree. I only had one actual problem and that is the dreaded FFFF mac address for the NICs with some nVdia cards. That was addressed in a BIOS update pretty quickly. Plus there was an immediate workaround for it that left me with one operational NIC. I never saw any ACPI issues now, and I'll check out Cool'n'Quiet tonight on my rig and see what happens.
 
I can't speak about ramping up and down...as I don't have the PQM or whatever the 4pin fans are called. I'm using run of the mill 120mm low speed/noise fans...



The only gripes I have with the board are:

1. First one had a bad NIC. It was of the initial run, and still had the 6 pin SSI power connector. RMA was hassle-free through the vendor.

2. Neither of the 2 that I had came with SLI bridges, but Tyan sent one out to me. They come with them now.

3. BIOS: the Serial port text disappears on me if I go through each option, and has for the past 2 BIOS's. However, I can still open up that page of the BIOS, regardless of the text disappearing.

4. BIOS: I have yet to be able to successfully boot off a USB Floppy, however I have the old style floppy drive available as well.

5. BIOS: Can not flash in Windows XP x64 edition. This is not Tyan's fault, as the Award flashing program apparantly does not work in x64 yet.

6. 1 IDE: This is my biggest disappointment...I'd like to have another IDE in order to either set both of my burners on Master (which would improve disc to disc recording performance), and/or allow me the use of an IDE 20gb drive I have sitting around.

Yes, it can be quirky. But, there's a veritable army of enthusiasts with this board that can help people out if needbe. It doesn't suffer from lack of decent SLI like the Iwill(no 2050 daughter chipset), or the lack of official support like the Supermicro(can only be found on the A+ OEM board...which is a PITA to access from SM's website (Search for H8DCE, get an SLI adapter as the result, click the link within that page).


I would prefer the Supermicro's additional SATA, and additional IDE port, as well as the inclusion of 2 4x PCI-e slots.

However, I'd rather not be limited to 32bit PCI for my 64 bit PCI RAID 5 card....and I am not yet prepared to ditch this array for a PCI-e based one.

The K8WE is an established board, and the vast majority of users are having good luck with it.
 
Marduk said:
I can't speak about ramping up and down...as I don't have the PQM or whatever the 4pin fans are called. I'm using run of the mill 120mm low speed/noise fans...



The only gripes I have with the board are:

1. First one had a bad NIC. It was of the initial run, and still had the 6 pin SSI power connector. RMA was hassle-free through the vendor.

2. Neither of the 2 that I had came with SLI bridges, but Tyan sent one out to me. They come with them now.

3. BIOS: the Serial port text disappears on me if I go through each option, and has for the past 2 BIOS's. However, I can still open up that page of the BIOS, regardless of the text disappearing.

4. BIOS: I have yet to be able to successfully boot off a USB Floppy, however I have the old style floppy drive available as well.

5. BIOS: Can not flash in Windows XP x64 edition. This is not Tyan's fault, as the Award flashing program apparantly does not work in x64 yet.

6. 1 IDE: This is my biggest disappointment...I'd like to have another IDE in order to either set both of my burners on Master (which would improve disc to disc recording performance), and/or allow me the use of an IDE 20gb drive I have sitting around.

Yes, it can be quirky. But, there's a veritable army of enthusiasts with this board that can help people out if needbe. It doesn't suffer from lack of decent SLI like the Iwill(no 2050 daughter chipset), or the lack of official support like the Supermicro(can only be found on the A+ OEM board...which is a PITA to access from SM's website (Search for H8DCE, get an SLI adapter as the result, click the link within that page).


I would prefer the Supermicro's additional SATA, and additional IDE port, as well as the inclusion of 2 4x PCI-e slots.

However, I'd rather not be limited to 32bit PCI for my 64 bit PCI RAID 5 card....and I am not yet prepared to ditch this array for a PCI-e based one.

The K8WE is an established board, and the vast majority of users are having good luck with it.

I didn't have that many issues with my K8WE. Still I agree with your points.
 
If you're looking for bang for you buck, a pair of OC'ed LV Xeons. If you're looking for the best performance (at a reasonable price unless you want dual dual core ;) ), got with dual opterons.
 
Let's say you're buying a $12,000 workstation.
Is having to flash your BIOS every month acceptable? How about that flash breaking things, with no easy fix? How about that flash working only to find that now your NICs don't work. On $12,000 machine.
Hint; that is totally unacceptable period. Those kind of BIOS bugs never should have made it out. Every last one listed. Tyan has not and does not test the BIOS on the K8WE prior to release. That's only counting the things you folks see, not the things people like me see, which are MUCH worse.
Now how the hell am I supposed to sell a customer a workstation that basically needs fixed every month? Answer; sure as hell can't. And this isn't the first board with these kind of moronic problems. Try the K8QS Pro. The DIMM sockets can't open fully because they're physically too close. Sloppy engineering there. Then a BIOS update actually rendered the ZCR nonfunctional because of bugs introduced by Tyan. Then after promising me the board would take dualcores, they told me it wouldn't, and I'd have to replace a $2,500 motherboard. After passing this onto customers. They were not half as pissed as I was, especially since the K8QS Pro has only gotten worse every revision. Look at what the VE1.05 BIOS supposedly fixes. After the 1.01 fiasco ("why won't it find a standard or USB floppy? Why does the IDE controller go into PERR with anything attached when using ZCR?") I told Tyan exactly where they could put that 13x16 monster.

So no, I am not under any circumstances, no matter what any of you kids say, going to let anyone say that it is acceptable for a $500 motherboard have bugs like this. Not without a severe tongue lashing.
For $500, it'd better be gold fucking plated, or it had goddamn better well work without fault. No BIOS russian roulette, no repeated reintroductions of supposedly fixed bugs. If they list it as a feature, it had goddamn better well work, and if it doesn't, they'll be giving me every last penny back and I won't be paying a restock fee for a defective product. It's that simple.
 
3. I do a lot of RAW photo manipulation and imaging. I also do a lot of bandwidth intensive network stuff - so bus bandwidth (to NICs/disk controllers) is important to me. Which of the two solutions is going to win here?

I am reading this and taking a stab that you are transferring stuff through the network? If that's is the case, then you really need to look at getting gbit stuff that supports jumbo frames. SMC makes some inexpensive jumbo-frame supporting hardware (8-port switch under $100). Enabling 9K sized frames will bring you to enlightenment :D

Here is a quote that better explains it:
Transfers weren't necessarily faster (the large cd ISO images copied accross in about 15 seconds both with or without jumbo frames), however, the difference in CPU utilization with jumbo frames enabled was awesome. With standard frame sizes (1500 bytes), the cpu has to interrupt the NIC much more often so when transferring files, it works at about 90 - 100 percent until the file is copied. When you turn on jumbo frames (9000 bytes), the cpu has to send far fewer interrupts and only requires about 50 percent usage. When you have to copy large files or do scheduled backups in the background, you'll be able to use your computer at the same time without any noticeable slow down. This is a HUGE advantage.

That said, either Xeons or Opterons won't help you with the network aspect of it. Granted maybe one can process the requests a bit faster than the other, but it prolly wouldn't be too noticeable.

A few quick clarifications:

Iwill and Supermicro have much better offerings available with SLI, though only Supermicro has a dual 16x available (the H8DCE.)

The Supermicro has two PCI Express Graphics slots, however, when both are occupied you get an 8x/8x, not 16x/16x scenario. AreEss may know that already, but his post wasn't quite clear. The trick to the Tyan's Dual 16x slots is the fact that if you only use one processor, you loose the ability to access the second slot (among other things). The second processors hypertransport links allow for connection to the other onboard chipset. No second CPU, no Dual 16x slots.

or the lack of official support like the Supermicro(can only be found on the A+ OEM board...which is a PITA to access from SM's website (Search for H8DCE, get an SLI adapter as the result, click the link within that page).

Not entirely true, although you do have a valid point about actually getting to the part of Supermicros website that details their Opteron offerings. They are no longer OEM, and are now shipped as a retail Supermicro product. And while their site maybe a PITA, Supermicro tech support will handle any issues you have (even the ones purchased as OEM).


In the end I build a TON of Xeon hardware through work, yet I run Opterons myself. I don't think Intel is money well spent in the high-end workstation/server market. They just are behind the curve technologically (and don't think they don't know this). They are losing ground quicker in that aspect of the industry than anywhere else. I won't repost what has already been posted but Opterons are just that much better in pretty much every sense of the word. So my vote is go Opteron. As for mobo choices...the two most mentioned boards, are also the ones I would recommend as well...Supermicro H8DCE and the Tyan K8WE. Both support NUMA, both are exceptional boards. I would say base your decision on two things, your need for PCI-X (K8WE) or not (H8DCE). And a whether or not you want to deal with a somewhat finiky natured mobo (K8WE).

Sorry I have experience using both boards, and although I am dying without my PCI-X, I have just had better results with the H8DCE. But like anything else, your experience may be vastly different than mine, and there is a HUGE support base for the K8WE as it has been out there for quite some time, while the H8DCE is realtively new.
 
hardwarephreak said:
The Supermicro has two PCI Express Graphics slots, however, when both are occupied you get an 8x/8x, not 16x/16x scenario. AreEss may know that already, but his post wasn't quite clear. The trick to the Tyan's Dual 16x slots is the fact that if you only use one processor, you loose the ability to access the second slot (among other things). The second processors hypertransport links allow for connection to the other onboard chipset. No second CPU, no Dual 16x slots.

Wow, this is so utterly wrong.

Both are dual full-16x. Why? Because of the chipset. nForcePro2200 and nForcePro2050. The nForcePro2200 and 2050 interconnect independent of processors, and the 2050 provides the additional PCI-Express lanes. This has absolutely nothing whatsoever to do with CPUs. Foxconn makes a single processor 2200/2050 board that's dual 16x as well (Socket 940 as well.)
Whoever told you it's CPU dependent needs to be shot for being flat out fucking stupid, and then have the design guides for the nForcePro shoved into the bullethole.

Sorry I have experience using both boards, and although I am dying without my PCI-X, I have just had better results with the H8DCE. But like anything else, your experience may be vastly different than mine, and there is a HUGE support base for the K8WE as it has been out there for quite some time, while the H8DCE is realtively new.

The H8DCE is a mixed bag. You lose PCI-X (e.g. no real SATA RAID options) but you gain 4x openend slots, enabling abuse of the Ultra320-2E. The H8DCE also does not have PEG issues with dual REALiZM 800's, unlike the K8WE. Regardless, it's far more stable period. Voltage is significantly cleaner, especially on the PCI slots. Timing is noticably more stable. PEG-method works flawlessly with Quadros and REALiZMs. (Haven't tested 500's yet, not fully comfortable with their cooling yet either.)
Let's just say the H8DCE crushed the K8WE in functional tests, and then added insult to injury by bitchslapping it in real world speed testing and benchmarks. As long as you don't need PCI-X and need dual 16x or 8x-available, I can't recommend anything else.
 
AreEss said:
Because of the chipset. nForcePro2200 and nForcePro2050. The nForcePro2200 and 2050 interconnect independent of processors, and the 2050 provides the additional PCI-Express lanes. This has absolutely nothing whatsoever to do with CPUs. Foxconn makes a single processor 2200/2050 board that's dual 16x as well (Socket 940 as well.)
Whoever told you it's CPU dependent needs to be shot for being flat out fucking stupid, and then have the design guides for the nForcePro shoved into the bullethole.
QUOTE]

I can tell you where that comes from. The Tyan K8WE manual states specifically that both processors must be installed for the second PEG slot to work. It states this unconditionally. I have never tried it. I don't know if this is the design of the K8WE at fault, or if it applies to all nForce Pro 2200/2050 chipset based boards with dual 940 pin sockets.

Your logic concerning the Foxconn board of course, shows that this might not be the case. Which suggests the Tyan K8WE is unique in that requirement.

Reguardless I wouldn't say someone is flat out fucking stupid for saying that. Not when a motherboard manual states such a requirement for a board with that chipset.

As far as the SuperMicro being a mixed bag, I'd agree. It may or may not suit an individuals specific needs. In my case the Tyan K8WE or the SuperMicro would both work just fine for me. However when I bought the K8WE there was simply no other solution that met the criteria I was looking for. If I were buying a board today, I would seriously look at the SuperMicro instead of the K8WE. Mainly because I don't need PCI-X and I would like additional SATA ports.

EDIT* Just found this article on the K8WE vs. the H8DCE. The reviewers stated that the chipsets, the 2200 is tied to one processor, and the 2050 is tied to the other for both boards.
http://www.digit-life.com/articles2/cpu/amd-cmp-vs-smp.html

They don't have any direct benches of the Supermicro vs. the Tyan board, but report comparable performance. Their testing methods are strange to say the least.
 
Whoever told you it's CPU dependent needs to be shot for being flat out fucking stupid, and then have the design guides for the nForcePro shoved into the bullethole.

That's because you've seen what, one or two, whereas I've seen more than ten now.


Wow for someone who has seen 10 or more K8WE I would have expected you to at least read the FUCKING MANUAL ONCE.

As for my understanding of the H8DCE, I have only seen one, and for some reason despite the chipset diagram in the manual I had always thought they broke down to 8x when you put a second card in...i want to say that arose from a discussion over on the 2CPU forums when a debate started up about placing an 8x RAID controller in the second x16 graphics slot. Someone said it operated at 8x, so I just assumed (wrongly) that I must have misread what I had seen prior...shit happens...and I would consider that acceptable since I don't need @work type knowledge of the board.

However, like originally stated...someone who needs @work knowledge and has gone through so many of the K8WE I find your lack of understanding quite funny, which I wonder if your clients or coworkers would think the same.... :rolleyes:
 
AreEss said:
Forget Xeons now. Let's get the facts straight.

-2nd Xeon adds peak 40%. 2nd Opteron adds average 80%. (Based on multithreaded apps.)
This simply isn't true.
 
mikeblas said:
This simply isn't true.

I don't think those numbers are right either. it is true that with the addition of processors on the Xeon platform, you crowd the FSB of the motherboard and as a result, each CPU added starts to become "less effecient" as there is less bandwidth for each CPU to work from. Opterons don't have the same faults.
 
Sir-Fragalot said:
I don't think those numbers are right either. it is true that with the addition of processors on the Xeon platform, you crowd the FSB of the motherboard and as a result, each CPU added starts to become "less effecient" as there is less bandwidth for each CPU to work from. Opterons don't have the same faults.
Assuming the Xeon is in an SMP platform, sure. There are NUMA platforms for the Intel processors. The Opteron doesn't have the same problem -- it has a different one.
 
mikeblas said:
Assuming the Xeon is in an SMP platform, sure. There are NUMA platforms for the Intel processors. The Opteron doesn't have the same problem -- it has a different one.

Any NUMA platform for the Xeon wouldn't be a mainstream one. I've never seen such a thing in the workstation market. Superhigh end servers? Possibly. I don't go looking at those very often.

I am still fuzzy on this. My dual Opteron rig is an SMP rig with NUMA. NUMA is memory access, that has nothing to do with SMP itself. Why are the two getting confused as though they are different technologies that do the same thing? They are not. The Opteron supports NUMA due to the way the memory controllers are configured. The Xeon can't because the motherboard chipsets Intel has put out, don't support it.
 
Sir-Fragalot said:
Any NUMA platform for the Xeon wouldn't be a mainstream one. I've never seen such a thing in the workstation market. Superhigh end servers? Possibly. I don't go looking at those very often.
Well, they're made by mainstream companies -- HP and Unisys, most notably; IBM I think, too -- but yeah, they're in the advanced server rmarket. The higher-end machines all run Itanium chips instead of Xenon.

NUMA helps with scalability with large number of processors, so it only makes sense that the additional engineering was done for machines that support large numbers of processors... high-end servers.
 
mikeblas said:
Well, they're made by mainstream companies -- HP and Unisys, most notably; IBM I think, too -- but yeah, they're in the advanced server rmarket. The higher-end machines all run Itanium chips instead of Xenon.

NUMA helps with scalability with large number of processors, so it only makes sense that the additional engineering was done for machines that support large numbers of processors... high-end servers.

I know what the hell NUMA is. But it isn't different than SMP in that it's not like a competing technology that does the same thing. I just don't know why everyone is calling Opterons NUMA instead of SMP. Many Opteron motherboards do not support NUMA at all. I've got two Opteron systems and both of them support NUMA. Both are SMP, and only one of them runs in NUMA mode.

NUMA isn't something that replaces SMP. Rather it is a feature of Opteron (And other) SMP systems.
 
Yeah, I have seen a good bit of NUMA vs SMP stuff showing up...

NUMA is the ability to keep the processes that are being processed by CPU2 to be stored in CPU2s memory banks. That way CPU2 doesn't have to go "all the way" over to CPU1's RAM banks (and vice versa). It's just a technology that helps reduce one of the inefficiencies inherent in some SMP configs.

Also, it should be noted that not only do the motherboards have to support it (as Sir-Fragalot pointed out), but the OS has to as well.

Intel Xeon NUMA Stuffs:
There are three NUMA chipsets, on which the server system is based, for the Itanium 2 processor relevant to Windows environments. One is based on the Intel E8870 chipset, another from HP for the Integrity and Superdome lines, and a third chipset for NEC systems. Unisys and IBM each have a NUMA chipset for the Intel Xeon MP processors. Unisys uses the Intel E8870 chipset for their Itanium systems. Each of the above NUMA systems supports four processors per cell.
 
mikeblas said:
Assuming the Xeon is in an SMP platform, sure. There are NUMA platforms for the Intel processors. The Opteron doesn't have the same problem -- it has a different one.

Would you like to see the numbers? They're easily repeatable over and over and over again.

There are no 2-way fully-independent Xeons. Only Hurricane, which is 4-way minimum, and it's four-per. Opterons scale 80% per without ccNUMA and Affinity, presuming 2, and tapers off slightly at 4, and more at 8. With ccNUMA and Affinity, scaling is above 80% average.

hardwarephreak said:
NUMA is the ability to keep the processes that are being processed by CPU2 to be stored in CPU2s memory banks. That way CPU2 doesn't have to go "all the way" over to CPU1's RAM banks (and vice versa). It's just a technology that helps reduce one of the inefficiencies inherent in some SMP configs.

NUMA has absolutely nothing to with processor allocation. Absolutely nothing. NUMA is Non-Uniform Memory Access. Go read a book and stop spreading this garbage. Not to mention that Intel supposed-NUMA is pointless except when you've got MPIch or similar involved. Opterons are cache coherent non-uniform memory access (ccNUMA.)

ccNUMA defines areas of memory which are directly related to each CPU in an Opteron system. This must then be coupled with Affinity in order to actually see any benefit whatsoever. Just because you have it turned on doesn't mean it does a goddamn thing. Unless the OS supports both ccNUMA and Affinity, all your 'performance increases' are psychosomatic or BIOS bugs.
 
AreEss said:
Would you like to see the numbers? They're easily repeatable over and over and over again.

There are no 2-way fully-independent Xeons. Only Hurricane, which is 4-way minimum, and it's four-per. Opterons scale 80% per without ccNUMA and Affinity, presuming 2, and tapers off slightly at 4, and more at 8. With ccNUMA and Affinity, scaling is above 80% average.



NUMA has absolutely nothing to with processor allocation. Absolutely nothing. NUMA is Non-Uniform Memory Access. Go read a book and stop spreading this garbage. Not to mention that Intel supposed-NUMA is pointless except when you've got MPIch or similar involved. Opterons are cache coherent non-uniform memory access (ccNUMA.)

ccNUMA defines areas of memory which are directly related to each CPU in an Opteron system. This must then be coupled with Affinity in order to actually see any benefit whatsoever. Just because you have it turned on doesn't mean it does a goddamn thing. Unless the OS supports both ccNUMA and Affinity, all your 'performance increases' are psychosomatic or BIOS bugs.

Wow, who knew we'd ever completely agree on something. :eek:
 
hardwarephreak said:
NUMA is the ability to keep the processes that are being processed by CPU2 to be stored in CPU2s memory banks. That way CPU2 doesn't have to go "all the way" over to CPU1's RAM banks (and vice versa). It's just a technology that helps reduce one of the inefficiencies inherent in some SMP configs.
I'm afraid your definition isn't correct. Non-local access happens in NUMA systems all of the time.

hardwarephreak said:
Also, it should be noted that not only do the motherboards have to support it (as Sir-Fragalot pointed out), but the OS has to as well.
This is kind of "a tree falls in the forest" assertion, isn't it? If the OS doesn't support NUMA, all that happens is that NUMA exists on the motherboard and no applications can take advantage of it. Are you saying that an OS which doesn't support NUMA causes the processors to interleave all memory accesses? Even with interleaved acesses, I'd conten that NUMA is still there; some memory access is local, and some is remote.

AreEss said:
Would you like to see the numbers? They're easily repeatable over and over and over again.
Which numbers? I guess it doesn't matter--I love measuring performance, so bring 'em on!

SirFragAlot said:
But it isn't different than SMP in that it's not like a competing technology that does the same thing. I just don't know why everyone is calling Opterons NUMA instead of SMP. Many Opteron motherboards do not support NUMA at all. I've got two Opteron systems and both of them support NUMA. Both are SMP, and only one of them runs in NUMA mode.

NUMA isn't something that replaces SMP. Rather it is a feature of Opteron (And other) SMP systems.
Indeed, NUMA doesn't replace SMP and I don't think anyone here has made that assertion. But NUMA is certainly a substantially different architecture than uniform memory access. As such, people call Opteron systems NUMA because that's an interesting and important feature of their design.

I'm not sure there are "many" models of Opteron motherboards that "don't support NUMA at all". One board that doesn't give the second processor its own memory is the Tyan S2875. But this machine is still NUMA, because CPU1 accesses memory locally while CPU2 has to access memory remotely. Memory ccess by CPU1 is measurably faster than memory access by CPU2. On a machine with a shared memory bus, both processors get an equidistant path to system memory.

What multi-proessor Opteron boards provide a shared and equal memory bus for both processors?
 
mikeblas said:
I'm afraid your definition isn't correct. Non-local access happens in NUMA systems all of the time.

This is kind of "a tree falls in the forest" assertion, isn't it? If the OS doesn't support NUMA, all that happens is that NUMA exists on the motherboard and no applications can take advantage of it. Are you saying that an OS which doesn't support NUMA causes the processors to interleave all memory accesses? Even with interleaved acesses, I'd conten that NUMA is still there; some memory access is local, and some is remote.

Which numbers? I guess it doesn't matter--I love measuring performance, so bring 'em on!

Indeed, NUMA doesn't replace SMP and I don't think anyone here has made that assertion. But NUMA is certainly a substantially different architecture than uniform memory access. As such, people call Opteron systems NUMA because that's an interesting and important feature of their design.

I'm not sure there are "many" models of Opteron motherboards that "don't support NUMA at all". One board that doesn't give the second processor its own memory is the Tyan S2875. But this machine is still NUMA, because CPU1 accesses memory locally while CPU2 has to access memory remotely. Memory ccess by CPU1 is measurably faster than memory access by CPU2. On a machine with a shared memory bus, both processors get an equidistant path to system memory.

What multi-proessor Opteron boards provide a shared and equal memory bus for both processors?

I thought NUMA requires each processor to have it's own dedicated memory? If it doesn't it's not true support for NUMA. Which is why NUMA requires at least 4 modules to work and they must be split evenly between CPU's in order to work. I know that on my Tyan K8WE (S2895), CPU 1 can access any and all memory in CPU2's memory slots. CPU 2 can access memory in CPU1's memory banks also. When I first got the system built, I only had 2 512MB memory modules installed. They were in CPU1's memory slots. NUMA was disabled according to sandra. CPU 2 obviously could access this memory, but since NUMA needs 4 modules, it didn't work in NUMA mode.

When I installed 4 1GB modules, NUMA was then active and I could benchmark with Sandra and verify my results consistantly. NUMA is definately working.

When I added the two 512MB modules and split them among the two CPU's, they weren't recognized. When I added the 512's to CPU1, the system wouldn't see that memory at all. When I added the two modules to CPU2, NUMA remained enabled, and both chips could access it.
 
Sir-Fragalot said:
I thought NUMA requires each processor to have it's own dedicated memory?
No, it doesn't. NUMA stands for "Non uniform memory access". On that S2875 board, memory access isn't uniform. That makes it NUMA, not memory local to the node.

Sir-Fragalot said:
If it doesn't it's not true support for NUMA.
It's not clear to me what you mean by "true" in this context. For a board like the S2875, you're not getting any of the benefits of NUMA because you can't let each processor work independently on its own memory. You're only paying the disadvantage: the second processor is always paying the remote memory fee, and always distracting the first processor by causing it to sniff memory requests that go by in order to implemnet ccNUMA.

Sir-Fragalot said:
Which is why NUMA requires at least 4 modules to work and they must be split evenly between CPU's in order to work.
They don't need to be split evenly between CPUs to work. Boards like the ASUS K8N-DL have more memory on one node than on the other; an equal amount of memory per node isn't a requirement.

Sir-Fragalot said:
I know that on my Tyan K8WE (S2895), CPU 1 can access any and all memory in CPU2's memory slots. CPU 2 can access memory in CPU1's memory banks also.
Sure. This is called remote memory access. When it happens, the local CPU is taking resources from the remote CPU in order to access that memory; otherwise, there's no cache coherency and the platform is extremely difficult to write code for.

Sir-Fragalot said:
When I first got the system built, I only had 2 512MB memory modules installed. They were in CPU1's memory slots. NUMA was disabled according to sandra. CPU 2 obviously could access this memory, but since NUMA needs 4 modules, it didn't work in NUMA mode.
NUMA being "enabled" or "disabled" is really a misnomer. What I expect happened is that Windows or the BIOS told Sandra that only one node had any memory associated with it.

In your tests, you never measured how fast memory access was for CPU2 compared to CPU1. I'm sure you would have found that CPU2 had measurably slower access to memory than CPU1 did.
 
mikeblas said:
No, it doesn't. NUMA stands for "Non uniform memory access". On that S2875 board, memory access isn't uniform. That makes it NUMA, not memory local to the node.

It's not clear to me what you mean by "true" in this context. For a board like the S2875, you're not getting any of the benefits of NUMA because you can't let each processor work independently on its own memory. You're only paying the disadvantage: the second processor is always paying the remote memory fee, and always distracting the first processor by causing it to sniff memory requests that go by in order to implemnet ccNUMA.

I didn't word that right. I meant, that a board without dedicated memory for each processor can't use NUMA, without four modules installed.

They don't need to be split evenly between CPUs to work. Boards like the ASUS K8N-DL have more memory on one node than on the other; an equal amount of memory per node isn't a requirement.

I realize That it doesn't need to be totally symetrical. Obviously it works in my machine with six modules. However, it does require four modules. Even on the Asus K8N-DL which has six DIMM sockets. With all six, it can run NUMA.

mikeblas said:
Sure. This is called remote memory access. When it happens, the local CPU is taking resources from the remote CPU in order to access that memory; otherwise, there's no cache coherency and the platform is extremely difficult to write code for.

NUMA being "enabled" or "disabled" is really a misnomer. What I expect happened is that Windows or the BIOS told Sandra that only one node had any memory associated with it.

You've got a point. That is likely the case. In one or two instances where Sandra said that it wasn't enabled, a restart corrected the issue. Oddly though, I remember seeing my Tyan S2882-D based system having a NUMA disable and enable option in the BIOS. Or something like that. I just built it yesterday, I'd have to go back and look at it again.

mikeblas said:
In your tests, you never measured how fast memory access was for CPU2 compared to CPU1. I'm sure you would have found that CPU2 had measurably slower access to memory than CPU1 did.

I did not check CPU memory access speeds. I would figure since the remote CPU would have to go through the link between the CPU's and then through that CPU's link to the memory, it would naturally be slower.
 
AreEss said:
Would you like to see the numbers? They're easily repeatable over and over and over again.

There are no 2-way fully-independent Xeons. Only Hurricane, which is 4-way minimum, and it's four-per. Opterons scale 80% per without ccNUMA and Affinity, presuming 2, and tapers off slightly at 4, and more at 8. With ccNUMA and Affinity, scaling is above 80% average.



NUMA has absolutely nothing to with processor allocation. Absolutely nothing. NUMA is Non-Uniform Memory Access. Go read a book and stop spreading this garbage. Not to mention that Intel supposed-NUMA is pointless except when you've got MPIch or similar involved. Opterons are cache coherent non-uniform memory access (ccNUMA.)

ccNUMA defines areas of memory which are directly related to each CPU in an Opteron system. This must then be coupled with Affinity in order to actually see any benefit whatsoever. Just because you have it turned on doesn't mean it does a goddamn thing. Unless the OS supports both ccNUMA and Affinity, all your 'performance increases' are psychosomatic or BIOS bugs.


Never said anything about processor allocation, I said (albiet in laymans terms) that it tries (through use of a supported OS) to keep process that, say CPU1 is accessing/processing in CPU1s memory banks..which depending on the situation can actually decrease performance.

It confirms the necessity of not only NUMA-aware OS, but also of specially optimized multi-threaded applications, where each thread independently allocates memory for its data and works with its memory area. Otherwise (single-threaded applications and multi-threaded ones, which don't care about the right data allocation in memory as far as NUMA is concerned) we should expect memory performance to decrease.

But the correct memory access organization is the key notion here. NUMA platforms must be supported both by OS (at least the operating system and applications should be able to "see" memory of all processors as a whole memory block) and by applications. The latest versions of Windows XP (SP2) and Windows Server 2003 fully support NUMA systems (Physical Address Extension must be enabled in 32bit versions (/PAE in boot.ini), which is fortunately enabled by default in AMD64 platforms, as it's required by Data Execution Prevention). What concerns applications, it first of all means that a program shouldn't deploy its data in the memory of one processor and then access it from the other processor. The effect of sticking to this recommendation or failing to comply with it will be reviewed now.

Good read: Also talks about utilizing forms of NUMA without having to have matching banks of RAM modules as well

http://www.digit-life.com/articles2/cpu/rmma-numa.html
 
Thanks for the flood of info :) I picked up a pair of 250 series Opterons (CG stepping) on ebay - if I can get cool'n'quiet running on these, I'll be laughing. AU$800 for the pair. I think I got an awesome deal.

The K8WE sounds like the only option for me given that the point of building this box in the first place was to get PCI-X slots for my RAIDcore etc. Not sure what to do about the lack of ATA though - since I have two optical drives and three HDDs that all use standard ATA... But I plan on replacing some of those with SATA drives + 15K SCSI anyway.

I wasn't looking at WhisperRock coolers - I was considering the one which has support for S940 and still claims 21dBa. That said, I'll give the AMD coolers a shot first - unfortunately the pair of procs I bought are OEM procs but they come with X2 4400 coolers. I don't know whether these are suitable.
 
The Real Zardoz said:
Thanks for the flood of info :) I picked up a pair of 250 series Opterons (CG stepping) on ebay - if I can get cool'n'quiet running on these, I'll be laughing. AU$800 for the pair. I think I got an awesome deal.

The K8WE sounds like the only option for me given that the point of building this box in the first place was to get PCI-X slots for my RAIDcore etc. Not sure what to do about the lack of ATA though - since I have two optical drives and three HDDs that all use standard ATA... But I plan on replacing some of those with SATA drives + 15K SCSI anyway.

I wasn't looking at WhisperRock coolers - I was considering the one which has support for S940 and still claims 21dBa. That said, I'll give the AMD coolers a shot first - unfortunately the pair of procs I bought are OEM procs but they come with X2 4400 coolers. I don't know whether these are suitable.

Socket 939 and Socket 940 coolers use the same mounting hardware. You should be fine.
 
hardwarephreak said:
Good read: Also talks about utilizing forms of NUMA without having to have matching banks of RAM modules as well
It's a marginal read. The stuff at digit-life is always very interesting, but never carefully edited, and as such has some problems. In the text you quoted, one example is that it says NUMA is fully supported by Win 32 XP. It is not.

A bigger problem is that the article is fundamentally flawed: it reaches the conclusion that NUMA is a more perfect memory organization", but it hasn't compared apples-to-apples. Unfortunately, such a comparison is directly impossible, but this guy doesn't make much of an effort to get close to eliminating the involved variables. Fortunately, the error generally leaves the reader favoring a system that's more performant, anyway.

hardwarephreak said:
Never said anything about processor allocation, I said (albiet in laymans terms) that it tries (through use of a supported OS) to keep process that, say CPU1 is accessing/processing in CPU1s memory banks..which depending on the situation can actually decrease performance.
The process of assigning work to a proessor within an operating system is called scheduling. Some people call scheduing "processor allocation" because it's the act of allocating processor resources to the demands of the process.

SirFrags said:
Oddly though, I remember seeing my Tyan S2882-D based system having a NUMA disable and enable option in the BIOS. Or something like that. I just built it yesterday, I'd have to go back and look at it again.
I have an S2882-NotD (thanks, Tyan!). It has "Bank Interleaving" and "Node Interleaving" settings in the BIOS. I've been playing with these a bit; before I upgraded BIOSes, they didn't disable NUMA reporting in the OS from the GetNuma*() APIs. After BIOS 3.0, IIRC, the settings do cause the OS to not detect NUMA.

I still wouldn't say they disable NUMA, though. I've hard a hard time getting a definition of "Bank Interleaving". "Memory Interleaving" appears to mean that every other page of memory is on the other node. So if "Memory Interleaving" is disabled, four whole DIMM slots of memory are on CPU1, and four whole DIMMs are on CPU2. If "Memory Interleaving" is disabled, one page of memory from DIMM bank 1 is on CPU 1, while the next page is from DIMM bank 2.

While I've investigated the "Memory Interleaving" feature with some experiments, there's no documentation for it. (Hmm -- I should ask one of the AMD guys I've met. Tyan hasn't been responsive.) It appears to be inteleaved on 4k pages, but I'm just guessing at the profile ofmemory access timing tests I've written.

Regardless, this setting seems something awful. It means that an application will almost always pay the cross-node, remote access penalty instead of paying the penalty only when writing to memory on the other node. Large blocks of memory will span nodes, and application performance is noticably (10%, in my tests) slower.

NUMA-aware programs are few and far between, so it's not common for software to fully realize NUMA's benefits. Enabling this setting makes it so none of the software you'll run sees the benefit of the architecture!
 
Whoops.

Haven't received my processors yet but I discovered that the CG stepping doesn't appear to support PowerNow -or- SSE3. Lack of SSE3 will be a pain in the ass for some things. Was under the impression that PowerNow had made it to the Opteron but it seems that's only available in the newer E4 stepping.

Now, how on earth do I get my hands on the E4 stepping?
 
The Real Zardoz said:
Whoops.

Haven't received my processors yet but I discovered that the CG stepping doesn't appear to support PowerNow -or- SSE3. Lack of SSE3 will be a pain in the ass for some things. Was under the impression that PowerNow had made it to the Opteron but it seems that's only available in the newer E4 stepping.

Now, how on earth do I get my hands on the E4 stepping?

Yeah I have some CG stepping Opteron 246's in my Windows 2003 Server box, and it is true that they don't support PowerNow or SSE3.

Newegg and other retailers should be able to tell you what version of the chips they are before you buy. Any 90nm CPU will be an E4 stepping or greater. Just look for Troy core CPU's or 90nm CPU's.
 
Back
Top