If Your IT Budget Allowed, Would You Think Provision Everything

KapsZ28

2[H]4U
Joined
May 29, 2009
Messages
2,114
Just curious what others think. I was talking to a buddy of mine recently and his company is a VMware and 3PAR shop. He said that because they have a high IT budget and storage is cheap, they thick provision everything and deduplication is turned off.

Would you do the same?
 
Possibly. Certainly takes KISS to an extreme, but if you can afford it...
 
Last edited:
De-duplication has its places to be used.

I would rather thick, better safe then sorry and over provision storage on a server and get someone who screws it all up and takes down every VM host on that box..
 
I would rather thick, better safe then sorry and over provision storage on a server and get someone who screws it all up and takes down every VM host on that box..

True, but that is kind of the point in having some type of monitoring in place. :D
 
Thin provisioning sucks. Plain and simple.

After having to manually increase the volume size on multiple thin provisioned volumes because they didn't auto size increase and because it was a huge pain to keep track of exactly what space I had available, I now thick provision everything.
 
Thin provisioning has its place but I'd feel a lot more comfortable thin provisioning my VMDKs but thick provisioning the LUNs.

Thin on thin is just asking for trouble unless you're committed to carefully and diligently monitoring everything, including the monitoring itself.
 
Thin provisioning has its place but I'd feel a lot more comfortable thin provisioning my VMDKs but thick provisioning the LUNs.

Thin on thin is just asking for trouble unless you're committed to carefully and diligently monitoring everything, including the monitoring itself.

Yeah, that is true. When it comes to LUNs, I can't think of too many reasons why you need to thin provision them anyway. But for VMDKs it drives me crazy when Administrators thick provision all their Windows VMs with 100 GB C: volumes that really only take up like 20 GB of space. Then again using the awesome Tintri cloning feature to deploy VMs works like a charm to free up space. :D
 
That's weird, we have 3PAR and we thick provision every vm but let the 3par do thin volumes.

Works great
 
I thick everything because spending money is like pulling teeth. I have to allocate all my space up front because if I don't, it won't be there when I need it.
 
Anytime I strike up this conversation with people, it seems almost religious :)

For me, I basically just make a point to avoid thin on thin. I don't particularly care if one side or the other is thin, so long as an eye is kept on the over provisioning. I've been in shops that thick across the board, thin on LUN and others that thin on the VMDKs. Never had an issue in any of those situations.

One thing I hate, and it's not particularly on topic is in-guest iSCSI or RDMs. Ugh. I understand the use cases for them, but each time I've run into them, it's been a PITA and a solution looking for a problem.
 
We thick provision everything on just over 1k VMs. The risk inherent to thin provisioning just isn't worth it to us. It does create some work when volumes need to be expanded, but that doesn't happen often enough for it to be a hassle.
 
We're thin on thin everywhere. I don't think the monitoring is that difficult and there were some definite issues and catch 22s. I think it is learning HOW to do it and what to watch for, especially around thresholds and reaction times. Just a single site can have multiple enterprise class arrays. I cannot imagine that we'd ever switch back, it saves way too much money. It translates to millions of dollars. Now that it has been about 4 years of doing it, we'd have an extremely hard time justifying going back -- despite the issues, senior leadership was onboard and willing to deal with the problems and now we're very good at it.
 
We're thin on thin everywhere. I don't think the monitoring is that difficult and there were some definite issues and catch 22s. I think it is learning HOW to do it and what to watch for, especially around thresholds and reaction times. Just a single site can have multiple enterprise class arrays. I cannot imagine that we'd ever switch back, it saves way too much money. It translates to millions of dollars. Now that it has been about 4 years of doing it, we'd have an extremely hard time justifying going back -- despite the issues, senior leadership was onboard and willing to deal with the problems and now we're very good at it.

That's awesome. If you can do it right, it can save a lot of money.

I've seen a lot of clients "just do it" in regards to thin on thin then they have to call in vendor support or consultants to fix things when shit hits the fan because they didn't realize a VMDK or LUN had grown so big it filled up the entire datastore or Storage Pool.
 
No performance issues with high I/O VMs (i.e. databases)? Seems like thin on thin would delay a write too much?

There's a small IO penalty on initial zeroing, but that's gotten very efficient for block storage - almost doesn't exist on NFS. The growth system is also very efficient these days. All from the VMware side, of course - array implementations of thin luns may have their own impacts.
 
Do you guys have one comprehensive monitoring tool that you use to look at the guests, hosts, and SANs for scenarios like this, or a bunch of different tools?
 
Do you guys have one comprehensive monitoring tool that you use to look at the guests, hosts, and SANs for scenarios like this, or a bunch of different tools?

That depends on a lot of things, including what hypervisor you use, what SAN choices you've made, what software you own, etc.
 
There's a small IO penalty on initial zeroing, but that's gotten very efficient for block storage - almost doesn't exist on NFS. The growth system is also very efficient these days. All from the VMware side, of course - array implementations of thin luns may have their own impacts.

Write delay is really the biggest concern in our environment. The culture is such that application lag is unacceptable. When instantaneous response is the expectation for datacenter hosted applications then anything less than that is perceived as infinitely slower.

Of course it's an unreasonable expectation, but hey, it's healthcare and the providers enjoy God-like status.
 
Do you guys have one comprehensive monitoring tool that you use to look at the guests, hosts, and SANs for scenarios like this, or a bunch of different tools?

We do. I am not a huge fan of the tool, LogicMonitor, but it integrates with just about everything.
 
One thing I hate, and it's not particularly on topic is in-guest iSCSI or RDMs. Ugh. I understand the use cases for them, but each time I've run into them, it's been a PITA and a solution looking for a problem.

I use it for Direct SAN Access with Veeam. How else would you do this without in-guest iSCSI?
 
Do you guys have one comprehensive monitoring tool that you use to look at the guests, hosts, and SANs for scenarios like this, or a bunch of different tools?

So I'll have to bow out on the conversations related to how our storage guys handle their pool provisioning thresholds and the tools they use to monitor such things, but I understand they order disk for the pools at ~80% of capacity. I think that is just a general threshold and is actually a bit fuzzy. For instance 80% on a huge pool isn't the same as 80% on a much smaller one. There are some loose rules they follow related to overcommitment as well.

On the vSphere side, storage drs handling out of space conditions works well once you get the percentages adjusted correctly so that your datastores have enough free space to have a enough time for an svMotion to finish once triggered. We have datastores from multiple arrays in the same sdrs cluster. We use 3TB as a standard, which is a touch too big, but don't forget that there is a lot of transient space use from VADP backups. Snapshots have to be handled. We regularly eye ball all the storage drs clusters (we have a lot) and look at the "general" free space across all of them. We keep 2-3 datastores just idle as "unused" per cluster ready to be added. We have rules for east/west growth... ie each storage drs cluster can only hold 32 datastores. We only expand to 20, before no more provisioning can be done on it and we start another. We overcommit the 20, but we'd never overcommit the maximum of 32! Otherwise you end up in a situation where you cannot easily grow it east/west. You can handle that, but not quickly in the middle of the night when it needs to automatically save your shorts. We also only provision VMDKs at 500GB max which gives SDRS a lot of freedom if moves are required. For now we have the VM:Disk affinity default rule in place, but as we migrate from our HDS arrays to 3Par we plan on changing that for even better balance/growth pattern (for all except the vCD environments).

Related to an earlier question about databases -- that *MIGHT* be an exception to the rule. It's always business requirements, risks that can be absorbed, and then trade offs on what you can afford. Start from perfect world and move backwards. Thin on thin works great for our general population, but storage that must be replicated, super high performance/transactional databases, or x 'thing' here that doesn't fit the mold might have what you might call "dedicated" datastores just for them. Those are the rare case.

If you're a provider like us -- we have a tier of things that 99.9% of things fall within. Latency is no higher than X and IOPS is at least X and both are pretty easy to achieve. Being that we're also government (that's all I can say), those things are very loosely taken care of. Given budget issues all the time and contracts... squeaky wheel gets the grease. If we were having to proactively deliver certain levels of performance -- for vSphere anyway -- vCOPs is a great tool. You can get it to do just about anything with a bit of sweat and blood. :)
 
Thanks for the insight. I'm not at that level with virtualization yet, but I hope to be.

So I'll have to bow out on the conversations related to how our storage guys handle their pool provisioning thresholds and the tools they use to monitor such things, but I understand they order disk for the pools at ~80% of capacity. I think that is just a general threshold and is actually a bit fuzzy. For instance 80% on a huge pool isn't the same as 80% on a much smaller one. There are some loose rules they follow related to overcommitment as well.

On the vSphere side, storage drs handling out of space conditions works well once you get the percentages adjusted correctly so that your datastores have enough free space to have a enough time for an svMotion to finish once triggered. We have datastores from multiple arrays in the same sdrs cluster. We use 3TB as a standard, which is a touch too big, but don't forget that there is a lot of transient space use from VADP backups. Snapshots have to be handled. We regularly eye ball all the storage drs clusters (we have a lot) and look at the "general" free space across all of them. We keep 2-3 datastores just idle as "unused" per cluster ready to be added. We have rules for east/west growth... ie each storage drs cluster can only hold 32 datastores. We only expand to 20, before no more provisioning can be done on it and we start another. We overcommit the 20, but we'd never overcommit the maximum of 32! Otherwise you end up in a situation where you cannot easily grow it east/west. You can handle that, but not quickly in the middle of the night when it needs to automatically save your shorts. We also only provision VMDKs at 500GB max which gives SDRS a lot of freedom if moves are required. For now we have the VM:Disk affinity default rule in place, but as we migrate from our HDS arrays to 3Par we plan on changing that for even better balance/growth pattern (for all except the vCD environments).

Related to an earlier question about databases -- that *MIGHT* be an exception to the rule. It's always business requirements, risks that can be absorbed, and then trade offs on what you can afford. Start from perfect world and move backwards. Thin on thin works great for our general population, but storage that must be replicated, super high performance/transactional databases, or x 'thing' here that doesn't fit the mold might have what you might call "dedicated" datastores just for them. Those are the rare case.

If you're a provider like us -- we have a tier of things that 99.9% of things fall within. Latency is no higher than X and IOPS is at least X and both are pretty easy to achieve. Being that we're also government (that's all I can say), those things are very loosely taken care of. Given budget issues all the time and contracts... squeaky wheel gets the grease. If we were having to proactively deliver certain levels of performance -- for vSphere anyway -- vCOPs is a great tool. You can get it to do just about anything with a bit of sweat and blood. :)
 
Thin provisioning has its place but I'd feel a lot more comfortable thin provisioning my VMDKs but thick provisioning the LUNs.

Thin on thin is just asking for trouble unless you're committed to carefully and diligently monitoring everything, including the monitoring itself.


x2.

But to add, it also depends on the environment you are in. In my last environment, I was never sure I would get the budget I needed to grow out the arrays. I didn't want to be in a situation where I could run out of array space, but didn't have budget to add the space. So I thick provisioned all the luns. If the VMware admins thin provisioned the space on their side, and they tanked something.. Then it was their problem and they had to deal with the fall out. In my current gig, I thin provision everything but I don't over provision my pools, unless I'm in a situation where I need to. I am also able to upgrade my environment, sometimes on a quarterly basis, but usually a semi-annual basis.

so like everything in IT.. it depends.
 
Back
Top