Infastructure Upgrade

Modder man

[H]ard|Gawd
Joined
May 13, 2009
Messages
1,770
Our current hosts are running a bit close to capacity and we are in the process of getting qoutes for upgrades. We have 270 windows 7 desktops running 2GB of ram. We continually get errors on the machines that they are low on memory. I was thinking the machines should really all be allocated 4GB of ram each. The thought at this point is to scale out and add a 5th host, Im not sure that a 5th host adds enough ram to fully get us out of our current problem though.



 
Last edited:
A couple of things. What is the planned growth rate for the next 2-3 years for Win 7 desktops? Is it safe to leave storage out of this conversation for the moment?

Also, you may want to blank out the host names for security.
 
The problem is our company has just been purchsed. I dont know that we are going to be able to upgrade the environment from a future perpective as we are not even usre that they will be keeping out exsisting infasstructure. I think as they come in we are hoping to let them know what our current pain points are hopefully we re able to adress them. We poscess 300VDI licenses and as of now I cant project that we will grow past that. We have a VNX running the storage and for now I will assume our storage is solid for the sake of this conversation.
 
Can you just upgrade the memory in your current hosts, probably the most cost effective if all you need is memory.
 
That was dicussed, my boss mentioned he thought it would be better to scale out than scale up. An extra host gives us more expandability in the future as well as being more reliable in a failover perspective if all else fails.
 
Can you just upgrade the memory in your current hosts, probably the most cost effective if all you need is memory.

Exactly. I think you just need to scale up rather than out. CPU power is not in contention from what I see.
 
That was my though, More ram gets us out of our current issues. I was told think think bigger and longer term. The though process from managenment was that these Gen8 hosts may be hard to find in a year or two more so better to expand the number of hosts now. They want to keep all the hosts the same so that is the only option in thier mind. Is there really any disadvantage to just adding another host if they can get that approved? On a side note one more host puts us at ~1TB of ram. Unless my math is wrong (very possible, I dont really know how to size an environment properly) I think that we should really be at ~1.2TB or more to allow each machine at least 4GB of ram
 
Buy another host and upgrade the RAM in the current ones? :p

It's difficult to really suggest a plan without knowing the budget. I can see the thought/appeal behind having all the same hardware, even if I don't necessarily agree with it.
 
What I've found with VDI is it's almost always ram (or disk iops) where you're falling short. You have 3+1 right now with the current load if you up the ram in each of those boxes. When you add more users, add another box with the same amount of ram that you've upgraded to to scale out.
 
We are definitely falling short on RAM. There isnt a budget perse, we are just pushing the upper management to make the right decision to fix our pains.
 
I think we're buying G8s with 384GB in them right now. If they want to buy more, go that route, but I'd bring the others up to the same specs otherwise you're just going to end up with the original 4 boxes being underutilized due to the lack of ram.
 
When sizing for ram requirements shoudl I assume since 300 machines at 4GB each that I need 1.2TB or more? Or since the machines wont all be at %100 at the same time go for less?
 
What is bigger and long term if you don't have any more plans to grow the user count, and as far as we can tell nothing new will be added. (You haven't stated that new software, etc. will be added to the desktops.)
 
VMware says you shouldn't overcommit memory for VDI deployments.

http://www.vmware.com/files/pdf/view/Server-Storage-Sizing-Guide-Windows-7-TN.pdf

Memory

A typical Windows 7 64-bit enterprise deployment requires 2 vCPUs and 4GB RAM. (For a Windows 7 32-bit virtual machine guest with 2 vCPUs, a minimum of 2GB RAM is recommended.) The native OS alone is approximately 400MB. The goal is to allocate enough memory to hold the set of applications and data while keeping the memory overcommit ratio as low as possible. This prevents Windows from writing data to the paging file because there is not enough RAM available in the guest OS.

As a guideline, for balance between performance and memory utilization, the virtual machine should have approximately 25 percent more RAM allocated than the maximum active load on the virtual machine. This allocation prevents Windows from writing data to its paging file and keeps the active working set (applications and data) for the virtual machine in RAM instead of in virtual memory space.

Server and Storage Sizing Guide for Windows 7

Desktops in a Virtual Desktop Infrastructure

TECHNICAL NOTES / 6

Memory should not be oversubscribed in a VDI deployment. There should be sufficient RAM in the host, plus 25 percent for vSphere and swap overhead, and potentially more if 3D is being used. For more 3D-overhead numbers, see Storage Considerations for VMware Horizon View 5.2.

You'll want to have more ram on the host then the amount of guests you have powered on at any given time total amount of guest ram. According to this doc, it should be 25% more.
 
The BIGGER/LONGER term picture is that all of the machines are currently allocated 2GB of RAM each, I would really like to see them at 4GB. Win7 does not behave all the well with 2GB of RAM.
 
Now we are getting somewhere. I think ND40 and I are tracking to the same point. I would get denser with my RAM (scale up), add a new host with similar specs (if possible), add the new hots to the cluster, take one of the current hosts and make it a cold spare. At that point you could give each VM 4GB of RAM, you are adding some "new" life to the cluster, testing the new server out for any issues while still in warranty (try to burn-in on a bench first), and you have a cold spare. I would say keep RAM usage to about 75%-80% usage on the hosts. If you get denser with RAM if for some reason a host dies and the cold spare won't respond the other hosts can handle the load. This is until you can get one of the host that is down back online. Keep in mind any hypervisor costs, etc. too.
 
Thank you much for the discussion guys. ND40, I looked over the documentation you provided from Vmware. Looks like they suggest no more that 48 VM's per 16 phisical cores assuming 2vcpu to avoid cpu contention, if this is the case I am already in violation of that correct?

We have (or should have) 300 VDI machines with at least 2 vCPU and 4GB of ram, some have/need even more.

That said looks like I should have 6 of these hosts with ~250GB of ram each.
 
CPU contention really depends on workload. The doc also says as many as 10, so that's 80 machines per box using that number. If you're not running into contention issues now, then I'd say your fine with your current 75 per host.
 
I'm not sure I really know how to determine if I am having contention issues or not. All of the stuff I have learned is just in learning as I go. I have never been formally trained.
 
This is one of our regular problem children. Most often the issue is with low memory errors but we do get some complaints of machines being unresponsive as well.

 
I'd look into the wait times on that one and track down what's causing it, doesn't look like cpu contention though. With low memory, it could be paging to disk but you'd have to delve deeper into it.
 
That particular machine has 6GB of ram I suppose I should have mentioned that. This is one of our excel power users. I also looked at my bosses machine, the graph looked about the same. The two machines I looked at were on two seperate hosts. In chasing this down do i need to be looking at more backend stuff? Or should this be explored from the users machine?
 
You'll just need to follow the KB, or call and open a ticket with VMWare so they can do it with you.


A high %WAIT value can be a result of a poorly performing storage device where the virtual machine is residing. If you are experiencing storage latency and timeouts, it may trigger these types of symptoms across multiple virtual machines residing in the same LUN, volume, or array depending on the scale of the storage performance issue.

A high %WAIT value can also be triggered by latency to any device in the virtual machine configuration. This can include but is not limited to serial pass-through devices, parallel pass-through parallel , and USB devices. If the device suddenly stops functioning or responding, it can result in these symptoms. A common cause for a high %WAIT value is ISO files that are left mounted in the virtual machine accidentally are either deleted or moved to an alternate location. For more information, see Deleting a datastore from the Datastore inventory results in the error: device or resource busy (1015791).

If there does not appear to be any backing storage or networking infrastructure issue, it may be pertinent to crash the virtual machine to collect additional diagnostic information.


If %WAIT is relatively high and the virtual machine is unresponsive, but there are no backing storage or networking infrastructure problems, this indicates that the virtual machine may be blocked on some stuck operation. For more information, see Crashing a virtual machine on ESX/ESXi to collect diagnostic information (2005715).
 
Sorry didnt mean to be that guy, asking for hand holding. Looking at that graph I wasnt sure what that should mean. After reading the KB and looking at the graph i asked the SAN guy about it and he looked at me like I had 4 eyes and said there was no way the storage could be at fault here..:rolleyes:
 
Yeah, it's really environment dependent and sometimes you need to go down the rabbit hole to troubleshoot it. For that excel user, you could always throw more vRAM at their guest and see if it helps, just in case it is paging.
 
Back
Top