Hyperthreading and F@H

Astroman

Gawd
Joined
Jun 14, 2003
Messages
684
Okay I'm sure this has been discussed before but I can't find it.

I am back into folding as much as possible again, and I was doing it in only one instance.

However, the other night I got a wild hair and set up 2 instances with machine ID 1 and 2 to make use of the hyperthreaded p4 in my box.

I also added the -advmethods and -local tags to my shortcuts, not sure why I need the -local tag but I added it because it seemed to be the thing to do when running 2 instances.

Anyhow, I'm wondering if anyone has determined if running 2 instances actually increases total production or if it causes so much internal conflict over system resources that it slows your production down?

Is there a way for me to find out and test ??? my folding has been rather sporadic so I haven't learned anything by looking at my production graphs.

Also, if 2 instances using HT is ideal, how much cpu useage should I set each instance for the best results?

Thanks guys!

Fold on
 
It improves production by about 20%. However, the second instance will not go idle when you run other intensive applications, because the OS thinks that its running on a second CPU, so only the first instance will go idle. So you need to kill both instances when you play games and such.
 
I ran some HT tests and posted them here:
http://forum.folding-community.org/viewtopic.php?t=5789

ChelseaOilman
300000.gif
75000.gif
 
From what I hear, the "-local" switch is really only necessary if you're running the graphical client, but it doesn't hurt to have it. Along with the "-advmethods" switch, it is very important to include the "-forceasm" switch if you're using the beta v3.25 client or the "-forcesse" switch if you're using the v4.0 client. Those switches are the most important ones to enable the maximum production potential of your machine. Without them, the Intel SSE optimizations will not be utilized, and your production will be about a third of what it can be, hyperthreading or not.

Look to ChelseaOilman's link for HT info and benches. I always set CPU usage to 100 on all my machines.

30000.gif
 
This is an issue that has been gnawing at me for a while now. Arkaine23 is the only one I’ve ever heard come anywhere near having the same concerns that I have, and even then I don’t think he’s revealed the full extent of the problem.

Originally posted by Arkaine23
It improves production by about 20%. However, the second instance will not go idle when you run other intensive applications, because the OS thinks that its running on a second CPU, so only the first instance will go idle. So you need to kill both instances when you play games and such.

The problem is that this scenario applies even if you only run one client.

Running two instances of Folding@Home
Pentium 4 w/ HT tech enabled:
F@H1: 50% CPU
F@H2: 50% CPU

-Start game

F@H1: 0% CPU
F@H2: 50% CPU
Game: 50% CPU

Result: Game only has access to a max of half total processor power.
This is the scenario he described above, but let’s run through what would happen if you ran only one client.

Pentium4 w/ HT tech enabled
F@H1: 100% CPU

-Start Game

F@H1: 50% CPU
Game: 50% CPU

Result: Game only has access to a max of half total processor power.

Notes:
-Before the game was started, the client is running at 100% CPU utilization. This, however, would only be reported as 50% but as long as the 2nd virtual CPU remains idle, the first virtual processor will have access to 100% CPU resources.
-Game is launched, rather than forcing the Folding@Home client to stop it’s usage of the processor, it happily launches on the second “idle” processor, thus creating the same exact effect as the first senerio.

Conclusion: running even a single instance of Folding@Home on a Pentium 4 with hyperthreading tech enabled will significantly reduce performance.

Please, for gods sake, someone prove me wrong.

The fact seems to be that the Achilles heel of hyperthreading is the fact that it virtually nullifies thread priorities. On a hyperthreading system, a thread set to idle priority and a thread set to the highest priority will share the CPU resources 50/50. This is simply not a tolerable situation.

I can see why Intel released the technology despite its shortcomings. From their perspective looking upon the 95% of the situations the processor is going to be used in, most fall under two categories:

You were already trying to run more than one app at one time:
Without HT: Threads share cpu time of single processor (by switching between tasks rapidly)
With HT: Two threads can simultaneously.

Result: Performance increase

Or

You are just trying to run one single CPU intensive task:
Without HT: Task gets 100% CPU time
With HT: Task gets 100% CPU time as 2nd virtual processor remains idle.

Result: Performance remains the same.

Conclusion: It is only in the smallest percentage of situations (read: all of us) where hyperthreading decreases performance

Now this is obviously still the case when dealing with dual xeon situations, but it goes without saying that having 4 virtual processors does a great deal to complicate the situation. If you only have one instance of folding@home running, and you launch a game, does the game start on one of the virtual processors on the other physical processor, or the other virtual processor on the same physical processor the folding@home client is running on? If you run two instances of folding at home, would the two instances ever occupy virtual processors on the same physical processor (I’m guessing no), even if you launched a game (3rd cpu intensive app)?

Sigh, so many questions.

I’m about .02 seconds from cleaning F@H off all my hyperthreading boxes and never looking back. I suppose a solution on the P4 system would simply be to disable hyperthreading, but I draw the line at having to dumb down my computer to get a client to work properly.
 
I ALWAYS reboot my system clean before gaming. This clears all IE crap and whatever else may be stuck in my memory cache. I've noticed large reductions in lag from doing this and my frag rates have risen accordingly.

That said, I have to manually open each F@H client by double-clicking an icon. I have not set them up to run as a SERVICE under WinXP. Therefore, they do not cause me a concern, becuase they are disabled upon reboot, and only after I have finished a gaming session do I turn them back on.

Doesn't bother me a bit.

I'm willing to sacrifice the effort of a couple clicks of the mouse in the name of humanity, I don't even have to break a sweat doing it.

Anyhow, It would seem my production has taken a HUGE leap since using the -advmethods and 2 instances using HT.

The only question I have is about the -forcesse tag


When I open each client it always benchmarks then starts the first unit and after it finishes the first unit says "extra SSE boost OK"

I assume that to mean SSE is being used even without the tag. Also, someone in another thread mentioned that the manual states using this tag forces the SSE boost on AMD systems, which would normally be disabled... Since I have Intel are we sure this applies?

Thanks.
 
Originally posted by GotNoRice
I suppose a solution on the P4 system would simply be to disable hyperthreading, but I draw the line at having to dumb down my computer to get a client to work properly.
As long as you install WinXP Pro with HT enabled all you have to do is enable or disable HT in the BIOS when you restart the computer. Of course you need a motherboard with the HT setting in the BIOS. If the game or program your using doesn't take advantage of HT your not losing any performance. I don't believe that many programs take advantage of HT. I don't see it as that big a deal to just change back and forth as needed. Works for me. ;)

ChelseaOilman
300000.gif
75000.gif
 
Originally posted by APOLLO
From what I hear, the "-local" switch is really only necessary if you're running the graphical client, but it doesn't hurt to have it.

I tried removing the -local tag. When doing this both instances try to run on the same machine ID, in other words, it doesn't use the HT.

I put the argument back in and it is using HT again... That said, I guess you need the -local tag regardless.
 
GotNoRice,

I wouldn't have any DC client running when playing a game or using any other CPU-intensive app. Games will run better without F@H running in the background whether you have a HT enabled CPU or not, Intel or AMD. You can turn off HT from the BIOS as ChelseaOilman explained, or simply turn off the client(s) for the duration of the game. I wouldn't be overly concerned about a few hours a week not folding on one machine if I was employing multiple machines, as it appears you are. It won't hurt your stats all that much and it certainly won't hurt the science. It's a little extreme to remove F@H from all your HT boxes. That will hurt your stats and the science. It would be far better to just disable HT, if it's causing you so much trouble.

Astroman,

I never heard of HT not being utilized without the -local tag. As long as you have HT enabled in the BIOS and seperate IDs for each client. Then again, I always use that tag on all my machines with multiple clients, and therefore don't know and you may be right.

Yes, the F@H client will automatically use the SSE registers of your P4 CPU. It might seem superfluous to add in the -forcesse tag to your shortcut, but according to another thread, if there is some problem with a WU and it causes a crash, the client might not run with any optimizations when it restarts. I don't know if this becomes a permanent state, but I am not taking any chances.
 
I really don't think HT is that big of a deal. Can you guys actually notice anything when its on or off? The way I see it, its just some Intel marketing BS.

FOLD ON!!!
 
Originally posted by pduan87
I really don't think HT is that big of a deal. Can you guys actually notice anything when its on or off? The way I see it, its just some Intel marketing BS.
It seems like you didn't look at the HT tests I did. There clearly is an advantage. At least as far as F@H goes. The link is in my earlier post.

ChelseaOilman
300000.gif
75000.gif
 
Originally posted by ChelseaOilman
As long as you install WinXP Pro with HT enabled all you have to do is enable or disable HT in the BIOS when you restart the computer. Of course you need a motherboard with the HT setting in the BIOS. If the game or program your using doesn't take advantage of HT your not losing any performance. I don't believe that many programs take advantage of HT. I don't see it as that big a deal to just change back and forth as needed. Works for me. ;)

Restart my computer every time I run a CPU intensive app/game? That just doesn't seem like a reasonable solution. I don't think i've restarted my computer in a month.

Originally posted by APOLLO
I wouldn't have any DC client running when playing a game or using any other CPU-intensive app. Games will run better without F@H running in the background whether you have a HT enabled CPU or not

Yes, I know that, however with thread priorities functioning as they are supposed to, you would most likely be talking about a ~5% loss in performance instead of one approaching 50%.

I’m not so much concerned about my personal computers, but rather computers that I’ve borged. I’ve made many promises to those whose computers I’ve borged that the client will not significantly reduce the performance of their computer. As a man of my word, I am obligated to remove the client if in fact hyperthreading causes such severe reductions in performance as I hypothesized in my earlier post.

Originally posted by pduan87
I really don't think HT is that big of a deal. Can you guys actually notice anything when its on or off? The way I see it, its just some Intel marketing BS.

2cpu.com recently did a hyperthreading benchmark review, the results were quite favorable. It is certainly not “intel marketing BS”, as there is a significant improvement in almost every single test, some even showing the p4 approaching the performance of the dual xeon system it was benched against when HT was enabled.

http://www.2cpu.com/articles/ht_explored/index.html
 
Originally posted by GotNoRice:

Yes, I know that, however with thread priorities functioning as they are supposed to, you would most likely be talking about a ~5% loss in performance instead of one approaching 50%.

I’m not so much concerned about my personal computers, but rather computers that I’ve borged. I’ve made many promises to those whose computers I’ve borged that the client will not significantly reduce the performance of their computer. As a man of my word, I am obligated to remove the client if in fact hyperthreading causes such severe reductions in performance as I hypothesized in my earlier post.
Concerning borged machines that are not accessible, it had occurred to me right before I saw your response that the owners of the systems might not appreciate the reduction in performance. In situations where the computer that is running the client is borged and belongs to someone else, you are 100% right . If what you are stating is indeed correct regarding HT issues and thread priorities, and I believe they are, I would not recommend anyone borg HT machines that see heavy usage.

Sorry if I seemed a little insensitive to your predicament before. My main machine is a dual Barton 3200+ and my HT systems are used solely for DC. I wasn't aware of the issues involving HT and the F@H client in environments where other apps need priority. The only other solution I offer is to reduce CPU usage by selecting "advanced" when installing the program, and chosing a number less than 100 in the settings. I don't know how much this will help, but it's a last resort if you're still interested in running your clients on borged machines with HT.
 
I mainly run AMD duallies, no Intel systems here.

I have noticed that when running a CPU intensive app/game on my main dual system that some times both cores will slow down and other times one core will basicly stop and the other runs at full speed.
There seems to be no ryme or reason to which one/both slow down.
I think it where the core is in the thead loop when the call comes for cpu cycles.

I would think that an Intel system would work the same.

The only way I can think to test it would be to copy a new protien to both cores.
Start them up togther.
Run an app/game for a few hours.
As soon as you go back to idle see if both cores have slowed down or only one.
Repeat a few times with different apps/games.

Luck........:D
 
I don't think anyone is saying that HT is marketing BS per se. We are simple calling it out for its limitations.

Does it improve multitasking performance by faking the OS into thnking there are 2 CPU's? Yes.

Does it make a simgle CPU system perform better with multi-threaded apps? Yes.

Is it as good as a true SMP machine? No.

Does it sometimes cause a reduction in performance? Yes.


Specificlly for folding and other DC programs- Do these fail to idle the way they were inteded to when run on an HT system? Yes. Does that reduce performance? Yes.

The performance reduction is not usually a full 50%, but its enough that, you best bet is to disable HT or tunrn off DC progrmas when doing something else that's intensive. In Linux, I have a perl script that forces my folding client to go idle whenever an app needs more than 50% of my CPU power. I use it on duallies and simgle CPU systems. Perhaps someone who knows more perl than me could port it for windows, or perhaps it will work as is if running cygwin....
 
When running 2 clients on my XP home edition, if I do not use the -local tag both clients suddenly have the same machine ID and no matter what I do I can't get them to have different machine ID's.

as soon as I replace the local tag, the machine ID's go back to 1 and 2 when I restart the clients, and HT is made use of.

Obviously running 2 clients on the same machine ID will not make use of HT.

That's what I was trying to say.
 
yes, you must set affinity. One instance will be machine id 1, and the other will be 2.
 
Originally posted by Astroman:

yes, you must set affinity. One instance will be machine id 1, and the other will be 2.
From what I understand, HT is just like SMP when it comes to F@H; there shouldn't be a need to set processor affinity. I have two HT machines and neither one is configured with processor affinity on their respective clients. I have 100% CPU usage for each logical processor on both machines. Is there something wrong with my setup?
 
Originally posted by APOLLO
From what I understand, HT is just like SMP when it comes to F@H; there shouldn't be a need to set processor affinity. I have two HT machines and neither one is configured with processor affinity on their respective clients. I have 100% CPU usage for each logical processor on both machines. Is there something wrong with my setup?
Yes, you obviously have put in the wrong variable for the username :) it should have been gemniii :)
 
maybe I'm wrong. I said that because I remember reading that the first time I set up F@H on my HT machine, about 6 months ago.

I seem to remember some problem I had when I didn't set affinity the first time, but I don't remember what it was or if I ever truly had the problem.

guess I"ll screw with it a little and see what happens..
 
Originally posted by ChelseaOilman
It seems like you didn't look at the HT tests I did. There clearly is an advantage. At least as far as F@H goes. The link is in my earlier post.

ChelseaOilman
300000.gif
75000.gif


I agree with your testing as I did it last week when I started folding for the first time...I was running gromacs on both but drew the exact same protein for each so I though it was a great comparison.....

I saw roughly the same 28-30%.....


On a side not I ran the SETI work bench at anandtech a few eeks back and got this...(contact Mechbgon @ AT for the bench if you want it )

p4 [email protected]

1 instance = 2 hours 7 minutes
2 instance = 2 hours 59 minutes = 1 per 1 hour 29.5 minutes

in this case 2 instances separately would have taken 4hr14min but only took 2hr59min for about a +40% increase....


As a comparison

Barton 2500+@3200+ 1 instance = 2 hours 32 minutes (mine)

AMD64 [email protected] 1 instance = 2 hours 10 minutes (other)
AMD64 [email protected] 1 instance = 2 hours 4 minutes (other)

P4 [email protected] 2 instance = 3 hr 26 min =1 per 1 hr 43 min (mine)


So even without HT the P4 was a comparable performer per pr rating since the 3000+ @2.3ghz would likely be more rated as a 3400+....With HT it is all by itself....
 
MACHINE ID and AFFINITY are different critters...

Originally posted by APOLLO
From what I understand, HT is just like SMP when it comes to F@H; there shouldn't be a need to set processor affinity. I have two HT machines and neither one is configured with processor affinity on their respective clients. I have 100% CPU usage for each logical processor on both machines. Is there something wrong with my setup?

if you have each of the clients (two per machine) in their own directory, (fah1 and fah2 for example)
and the config file machine id set for 1 in one dir, and 2 for the other

you are set and good to go... you do not need to set affinity... you do need to set machine id (its different, just a part of the folding program and how stanford sends each box different WU)
hope that is if some use. sharp.


**********this SUCKS. I just read this thread after adding a friends 2.6 HT box LAST NIGHT.
Somebody please tell me this has been tested, not just a good theory, but "yes, with game, FPS dropped to XX instead of XX and folding core not stop production..." for both one and two instance on a HT machine.
If this is the case, I need to get rid of folding. %$^%$ idle should mean i d l e. &#@$.
 
Sharp I have confirmed that gaming is not the best situation with folding as HT seems to control priority of threads on its own...

If I run with HT off the gaming will get all of the cpu cycyles and in about 2 -3 hours I may see only a 5% or less drop in fps but in that time I may get only a few frames done when in some instances I should and could be 60% done with a WU if it was running by itself....

If I run with HT on the game it can take more like a 15-20% hit in fps (UT2003) and time per frames may increase 25%....IN the end if you have a lot of headroom this can be a great situation as 3 hours later you may be 45% done in a WU and still got to game for 3 hours....

However I know gamers have to have every drop of FPS....So that is why I say gamers may not like HT...

I confirmed this in many application other then FH...I treid mpeg2 encoding, rendering, capturing DV files, SETI, NAV2003, etc and gaming took more of a hit them most gamers like...About similar to numbers above....Changing priority with HT enabled in the OS has little results in some instances actually effected the app I was trying to help negatively....



I figure I run Fraps and make sure my lowest fps never hit below 30's with 1280x1024 with all settings high and I am fine...I am happy knowing I am getting WU's done still and getting to use the computer...I don't even have a fancy card...testing my buddies 9600XT I had more then enough fps to fold and game at more normal settings and never experience a hiccup....


I never tried this with 2 instances of FH cause logic just points that as not being a good idea...however 1 istance runing 24/7 with the 2nd LOGICAL cpu still being able to handle most of my apps on my computer as if it is still a 2.4-3.0ghz P4 (app depending) then I am fine. I turn on 2 instances at night or if I am going to be gon all day...
 
In most instances you will be impressed with how responsive and excellent performance can be obtained while running FH with most applications.....

there are a few however where either ppl do not like the drop in performance in the one app...T othose I say don't multitask then...Multitasking is about doing several things at once because even in the case of gaming it is stil more efficient to do that then turn off FH altogether and get no frames done while gaming...the matter is whether or not you can play at the level you desire, and I mean play.....

Another app I have found that does not play well and I have tested many and this one is the only I found that did this so far...

TMPGenc version 2.52....

If running with multiprocessors enabled in the environmental setting menu you will experience the most unwillingness to share cpu cycle of any test I have seen.....The encoding got done in 5.5% more time then in single app mode, but the FH frame got virtually no cycles and in the 1 hour test I ran I completed 3 frames when they normally would have been 20 done....


If you change that setting to off but still had the OS enabled for HT the app played much better with FH....The encoding took 45% longer but the time per frames only increased 35% unlike the 600% above...In the hour test I got 15 frames done versus 20...
 
Back
Top