true distributed computing?

Dytralis

Limp Gawd
Joined
Feb 6, 2003
Messages
358
I'm about to set up four servers as a FAH farm. Has anyone ever thought of or found a way for one machine to act as the master server and then let other machines do the processing?

kinda of like having all machines act as one big one.
 
Beowulf cluster, which is not supported by F@H. I was actually just looking at this this morning.

http://wiki.openssi.org/go/Main_Page

Interesting idea, but I still don't think the F@H client would work on it the way it is now.

 
It would be great if it worked. With the bonus system, having multiple PCs working together on a single unit would increase your ppd exponentially. I almost wish I had a few old SMP machines lying around to set one of these up and see what kind of configuration options are available.

 
Supposedly there is a lot of data traffic between SMP threads, making a F@H cluster unrealistic due to the high latency of a network interface (milliseconds versus nanoseconds). I looked into it a while ago. The problem is that with a molecular dynamics (MD) simulation like F@H the process as a whole isn't very parallel. This is why I proposed using an FPGA instead, which would have all the bandwidth and FPUs one needs and could even be optimized for a particular core.
 
Supposedly there is a lot of data traffic between SMP threads, making a F@H cluster unrealistic due to the high latency of a network interface (milliseconds versus nanoseconds). I looked into it a while ago. The problem is that with a molecular dynamics (MD) simulation like F@H the process as a whole isn't very parallel.
Isn't Stanford looking into a cluster client? I was under the impression one would be available at some pioint.
 
It is called (or will be called) Folding@Cluster if I remember correctly, which I would guess would be a completely different project.

 
How would you be able to get enough bandwidth between the machines for them to communicate effectively on a single WU?
 
How would you be able to get enough bandwidth between the machines for them to communicate effectively on a single WU?

That, I believe, is the whole problem. The regular SMP client may very will see all of the cores in your cluster, but if there really is a lot of cross-thread dependency, then that cross-machine communication will be the bottleneck.

Don't get me wrong...I would still like to see someone try it. I just don't think it would work.

 
How would you be able to get enough bandwidth between the machines for them to communicate effectively on a single WU?

You'll need a high speed/low latency network such as Infiniband connecting the nodes. Last I heard, they'll get working on a client as soon as someone gives them access to a machine worth the development time.

The real problem here is cost. A low end infiniband switch will run you $3500 and the NIC's for each node start at around $500. You'll see setups like this on the Top500 Supercomputer list. The work those clusters are doing have to have it. But for folding, its just not worth the cost. Think how many more folding rigs you can build with that kind of money. And as I said, those were the absolute cheapest ones I saw so who knows how well they perform.
 
Isn't Stanford looking into a cluster client? I was under the impression one would be available at some pioint.

It is called (or will be called) Folding@Cluster if I remember correctly, which I would guess would be a completely different project.

Well currently Pande Group is moving away from MPI and is going to threads based (aka A3). So the chances of seeing a cluster based client for the masses is slim to none.

That said if someone came to them and said i have a supercomputer, then they might spend the time to develop a client for some A2 WU's

How would you be able to get enough bandwidth between the machines for them to communicate effectively on a single WU?

Exactly.

You would need fiber or infiband.
Although a 4 bonded gb ethernet might do ok.
 
Well currently Pande Group is moving away from MPI and is going to threads based (aka A3). So the chances of seeing a cluster based client for the masses is slim to none.

That said if someone came to them and said i have a supercomputer, then they might spend the time to develop a client for some A2 WU's

One of the developers said that they definitely would, but nobody has taken them up on it.

You would need fiber or infiband.
Although a 4 bonded gb ethernet might do ok.

GB Ethernet is probably fast enough in terms of throughput, its the latency that kills Ethernet in this regard. Channel bonding (or etherchannel if you speak Cisco) would likely make the latency worse. Then there's the price (see my post above yours).
 
the question that comes to my mind is how would the cluster react/handle FAH. From reading up on the Beowulf/Kerrighed projects, they don't support distributed threading. Assuming four servers and sixteen combined cores (in my situtation only), I wouldn't be able to run one FAH process and have, say, -bigadv running.

So, let's say I built 4 headless units to PXE boot. At that point, from my understanding, the FAH process would have to be distributed/assigned to each processor (from each node) individually.

Am I wrong in my understanding? And if I'm not, would there be any advantage to that, other than the reduced amount of power used due to not have four hard drives connected to the headless units.
 
InfiniBand has just a tad over 1 us of latency: http://en.wikipedia.org/wiki/Infiniband#Latency

With Ethernet you're lucky to get even remotely near 100 us. Inter-core latency is on the scale of 10 ns. This means that even InfiniBand would be 100x slower than a direct IC link.

There's a reason why clusters are used for embarrassingly parallel calculations. I really fail to see how F@H with its MD algorithm could be adapted to work with this.
 
There's a reason why clusters are used for embarrassingly parallel calculations. I really fail to see how F@H with its MD algorithm could be adapted to work with this.
It's obviously possible, since if you look at the current Folding@home cluster the latency between most of the individual units runs into the hundreds of milliseconds, seeing as it's over the internet. It's so high that it's prohibitive to actually let the computers actually communicate with each other at all; each one operates independently of the rest for a certain amount of time then sends the results back independently. It's obviously possible to split up proteins into smaller work units for individual computers, it may (should?) be possible to split the work up even further.
 
I can see this working under 2 conditions:

  1. WUs can be divided in "chunks" that are tagged and can be processed by by different computers. This would be done in order to minimize latency of numerous small pieces.
  2. There is a single, management computer that distributes these "chunks" to the different folding computers. When chunks are done folding, they are sent back to the management computer and recomplied before being sent back to Stanford.
This would result in -bigadv units taking only hours to be processed, even by not-so-stellar computers. Stanford would also release a benchmarking program in order for the management computer to determine the appropriate size of the chunks depending on processing power of each "node" and the latency of the connections between them.
 
So, let's say I built 4 headless units to PXE boot. At that point, from my understanding, the FAH process would have to be distributed/assigned to each processor (from each node) individually.

Am I wrong in my understanding? And if I'm not, would there be any advantage to that, other than the reduced amount of power used due to not have four hard drives connected to the headless units.

Correct. PXE booting just loads the OS from the "mater" system.Each node runs a seperate Work units.The advantage is power saving / costs of not having a hard drive.

Also the master system will save the progress of each node so if the system is restarted, it will resume where it left off.

Notfred has instructions posted on his site how to PXE boot using windows & linux.
 
Stanford has run F@H on Super Computer Clusters before and if you have access to a super computing setup, they will personally work with you on setting this up but they are looking for 100+ cpu clusters.Setups with 3-4 nodes is not worth the effort and don't have the network infrastructure to support F@H.

F@H is already setup to break down large projects and divide them up into Work units, which is what we run now.
 
Stanford has run F@H on Super Computer Clusters before and if you have access to a super computing setup, they will personally work with you on setting this up but they are looking for 100+ cpu clusters.Setups with 3-4 nodes is not worth the effort and don't have the network infrastructure to support F@H.

F@H is already setup to break down large projects and divide them up into Work units, which is what we run now.

Is there any detailed information available on how exactly this work is split up? I'd think that within a single protein all steps but the first one depend on the preceding steps, thus making it impossible to split it up. The WUs being send out seem to be complete proteins.
 
Is there any detailed information available on how exactly this work is split up?

Each cpu core would run a thread with high speed networking connecting all the systems.
Think -smp 100 or higher.Instead if mpchi running on the local network stack, it was on a real network.What it was really designed for in the first place.
 
Each cpu core would run a thread with high speed networking connecting all the systems.
Think -smp 100 or higher.Instead if mpchi running on the local network stack, it was on a real network.What it was really designed for in the first place.

But how is network latency dealt with?
 
Back
Top