MS SQL Server in VMware Cluster

ziorus

n00b
Joined
Jan 10, 2016
Messages
4
Hi everyone,

We currently have an old SQL Cluster (just 2 modes) on physical servers, which we will be moving to a VMware Cluster infrastructure. My question, since the VMware is already clustered, does it make sense to cluster the SQL again?
 
Yes. I'm guessing you want the ability to keep the SQL cluster up while performing updates and such on individual VMs. Or in case one goes down/gets corrupted/etc. Basically all the reasons you presently are running as a cluster.

You can also set up affinity rules (depending on VMware subscription level?) to force the SQL server instances to run on separate hosts. So if one host goes down you still have an instance running on another.
 
Yes, VMware wont provide application redundancy...so you still need to setup the cluster/AlwaysOn.

Also, while I know you can technically virtualize MS SQL, I would exercise caution and analyze your current I/O performance and compare it to what you will get in the VMware cluster. SANS wont match the performance of direct attached SAS drives which could be a critical factor in DB performance.
 
  • Like
Reactions: Dan_D
like this
Many thanks for your answers. The cluster types would include the standard SQL Cluster and also AlwaysOn, no mirroring.

I am also afraid of the performance but this push is over my pay grade. I have already included that in my risk assessment to this idea/plan. It gets worse because this service provider we are going to, we actually have no clue how their infrastructure is setup and configured. I preferred actual physical machines when it comes to SQL...
 
We virtualize sql clusters, but our architecture team is above my pay grade also 🤣

See attached for your reading pleasure
 

Attachments

  • sql-server-on-vmware-best-practices-guide.pdf
    5.3 MB · Views: 0
Make sure you do always on. Standard clustering in SQL on Vmware is no bueno
 
I've recently moved the physical SQL cluster into VMware cluster. The main task was to use FCI approach in the VMware. We have decided to rework storage architecture also and move from old SAN box into something new. Since we need iSCSI storage to be able to create SQL cluster, we have tried different SDS solutions (datacore, starwind, hpe storevirtual) and decided to use starwind vsan free (quite simple to set up and allows to create 2 node storage replication in active-active manner - https://www.starwindsoftware.com/starwind-virtual-san) for iSCSI HA storage sharing. I've pretty much the same setup besides the SDS software is now in charge instead of classic SAN storage.
 
Also, while I know you can technically virtualize MS SQL, I would exercise caution and analyze your current I/O performance and compare it to what you will get in the VMware cluster. SANS wont match the performance of direct attached SAS drives which could be a critical factor in DB performance.

That is an incorrect statement. If they have a full flash array with sub MS query times and IOP's in the millions or even just hundreds of thousands this is normally just fine for most databases. It all depends on the solution but a blanket that SAN's are less than direct attached SAS drives is wholly incorrect. CHEAP SAN's are. GOOD San's that have a solid connection like dedicated Fiber HBA's are more than capable of supplying the I/O to an entire host of SQL servers in a Vmware cluster.

But to answer your question if you want 100% uptime even while applying OS and SQL patches and such you still want your cluster. Since a san is involved you can set up your Availability Group with a SAN based SMB share to act as the share witness. That way there are less chances of it going down on you as compared to a dedicated VM hosting your witness that will go down on maintenance and such.
 
Doesn't VMWare have its own VSAN solution which can be used for SQL FCI?

VMwares VSAn isn't that great for these types of deployments in my experience. The level of fault tolerance is less and the ability to loose data is in my experience worse. A solid Unity or like attached fiber storage will provide ample performance unless the databases are insanely large/busy.
 
Please take note of MSSQL licensing. to get full Alwayon AG features, you need enterprise, and list price of an enterprise 2-core pack is around $14k. so in a lot of cases, although its nice to have, its not the best solution (unless you have dependency on enterprise features like TDE, online indexing, readable-secondaries, etc...). FCI on shared storage works just fine, and only uses MSSQL standard ($3-4k per 2-core pack). Also, VMware and Hyper-V does support shared storage - and if for some reason that it's not possible to implement on the hypervisor level, you can virtualize this in a hyperconverged configuration with Storage Spaces Direct.

This can of course go a lot deeper, but from a total solution perspective, its best to always incorporate licensing/cost/TCO.
 
Please take note of MSSQL licensing. to get full Alwayon AG features, you need enterprise, and list price of an enterprise 2-core pack is around $14k. so in a lot of cases, although its nice to have, its not the best solution (unless you have dependency on enterprise features like TDE, online indexing, readable-secondaries, etc...). FCI on shared storage works just fine, and only uses MSSQL standard ($3-4k per 2-core pack). Also, VMware and Hyper-V does support shared storage - and if for some reason that it's not possible to implement on the hypervisor level, you can virtualize this in a hyperconverged configuration with Storage Spaces Direct.

This can of course go a lot deeper, but from a total solution perspective, its best to always incorporate licensing/cost/TCO.
Yea sql licensing is a bitch. Not to mention if you need more than two sql servers in an AG now you need enterprise anyway.
 
VMwares VSAn isn't that great for these types of deployments in my experience. The level of fault tolerance is less and the ability to loose data is in my experience worse. A solid Unity or like attached fiber storage will provide ample performance unless the databases are insanely large/busy.
Grimlakin vSAN is perfectly fine for these types of deployments and is fully supported. Newer versions of ESXi with vSAN also do not require RDM disks any more either for Windows Fail over clustering to work, and can be fully run on a vSAN cluster so Always on is not always needed (preferred for sure, but if you just need to P2V).

The level of fault tolerance is however you design it. If you have a single SAN, you have a single point of failure, vs a 3 node VSAN cluster. You scale out with vSAN not up to give more redundancy and performance.

So if you lose data more and redundancy - you designed it wrong from the start. That is no fault of VMware and vSAN.
 
Grimlakin vSAN is perfectly fine for these types of deployments and is fully supported. Newer versions of ESXi with vSAN also do not require RDM disks any more either for Windows Fail over clustering to work, and can be fully run on a vSAN cluster so Always on is not always needed (preferred for sure, but if you just need to P2V).

The level of fault tolerance is however you design it. If you have a single SAN, you have a single point of failure, vs a 3 node VSAN cluster. You scale out with vSAN not up to give more redundancy and performance.

So if you lose data more and redundancy - you designed it wrong from the start. That is no fault of VMware and vSAN.

No but the cost of doing a vsan cluster to maintain redundancy in a medium size deployment is very high and much higher risk. let me Put it to you this way....

Lets say you need just 30 TB of capacity for your SAN and you want it high speed. To get that with a VSAN cluster you will need 6 nods at 6 Terabytes each. And any time you had to take down a node to do repair your would put your SAN into a vulnerable state. Heaven forbid if you lost another node... Unless of course you bought even MORE storage per node, or More nodes to have your fault tolerance.

Where as I can get a 32 Terabyte SAN with 20 2 TB disks. Run a Raid 5 to get my 30TB of storage space (n-1 means roughly 16 disks lets call it 17 for space lost to logic.), Have 3 hot spares, Dual SP's for redundancy purposes, AND I can upgrade firmware on my SP's, and Disks and never take a outage to do it. I can also run dual HBA's through a pair of Fiber switches to get to my SAN and have path redundancy for every device. Plus when I HAVE TO reboot my ESXi hosts I also don't loose disk or migrate data around to recover my VM's. All in all it is a better solution at least in my mind to maintain and keep everything running.

Is it more expensive due to all of the fiber... maybe. I guess then it's all about the value of the data to the enterprise.

Not to mention I've had ESXi hosts go down FAR more often than I've ever had a storage array go down.... I dont want to jinx anything but you know what I'm saying. We've all had ESXi hosts crash for memory module issues, or some funky bit gets thrown and the host crashes. Thankfully out of my smallish footprint of 32 esxi hosts I've only had one crash fully once because they are built with redundant everything.
 
Depends on your storage solution. If it's a high end flash based SAN going fiber channel to your hardware, it'll work (how well is a different story). If you're using a NAS for iscsi, or using Vsan.... the performance is going to suck.

Also if you MS cluster your SQL servers you will need to use RDM drives which are their own special flavor of headaches, again depending on what type of storage.

If you have a functional system now and management is just looking to consolidate old hardware.... tell them to take a hike. What savings is there to gain from reduced performance and the massive headaches of doing all this? $20 a month in electricity?
 
...

Also if you MS cluster your SQL servers you will need to use RDM drives which are their own special flavor of headaches, again depending on what type of storage.
No longer the case. vSAN supports it natively as does FC storage for vmdk.
See Vladan's post here: https://4sysops.com/archives/vmware-vsphere-7-clustered-vmdk/

Shared VMDK requirements:​

  • A fiber channel array only.
  • The array must support ATS, SCSI-3 PR–type Write Exclusive–All Registrant (WEAR).
  • The datastore must be formatted with VMFS 6 (VMFS 5 is not supported).
  • VMDK must be Eager Zero Thick (no thin provisioned VMDKs).
  • If you have DRS configured in your environment, you must create an anti-affinity rule so that the VMs can run on separate hosts.
  • vCenter server 7.0 and higher.
  • Snapshots, cloning, and storage vMotion are not supported (no backup of nodes is possible, because backup software uses snapshots).
  • Fault tolerance (FT), hot change to the VMS virtual hardware, and hot expansion of clustered disks are not
  • vMotion is supported, but only for hosts that meet the same requirements.
I am seeing less clients use this approach and more using the Always On method due to the limitations around snapshotting using this method. To be fair though, if you were using the RDM approach you also had this limitation and were doing native SQL Maintenance Plans anyways.
 
No longer the case. vSAN supports it natively as does FC storage for vmdk.
See Vladan's post here: https://4sysops.com/archives/vmware-vsphere-7-clustered-vmdk/

Shared VMDK requirements:​

  • A fiber channel array only.
  • The array must support ATS, SCSI-3 PR–type Write Exclusive–All Registrant (WEAR).
  • The datastore must be formatted with VMFS 6 (VMFS 5 is not supported).
  • VMDK must be Eager Zero Thick (no thin provisioned VMDKs).
  • If you have DRS configured in your environment, you must create an anti-affinity rule so that the VMs can run on separate hosts.
  • vCenter server 7.0 and higher.
  • Snapshots, cloning, and storage vMotion are not supported (no backup of nodes is possible, because backup software uses snapshots).
  • Fault tolerance (FT), hot change to the VMS virtual hardware, and hot expansion of clustered disks are not
  • vMotion is supported, but only for hosts that meet the same requirements.
I am seeing less clients use this approach and more using the Always On method due to the limitations around snapshotting using this method. To be fair though, if you were using the RDM approach you also had this limitation and were doing native SQL Maintenance Plans anyways.

Good to know. However personally I don't know anyone that uses VSAN in a large scale enterprise environment.... the no snapshot thing is a bummer and our backup methods also employs snaps.
 
Good to know. However personally I don't know anyone that uses VSAN in a large scale enterprise environment.... the no snapshot thing is a bummer and our backup methods also employs snaps.

Sold and implemented quite a few large scale VSAN implementations on the enterprise side - up to a couple of hundred nodes, including large Oracle RAC/SQL. AAG tends to be the preferred method for lots of reasons these days (and the direction MSFT would like you to go), but either works - AAG is easier to do node backups though. Hot-Add is moving to be the preferred backup solution as well for VMs, so you don't need the storage snaps either.
 
Good to know. However personally I don't know anyone that uses VSAN in a large scale enterprise environment.... the no snapshot thing is a bummer and our backup methods also employs snaps.
Burticus Deployed 32 nodes VxRail across 4 clusters, just under $3 million in hardware, one cluster specifically for SQL only VMs. "If you're using a NAS for iscsi, or using Vsan.... the performance is going to suck." this is not true either, if anything performance is better as you eliminate latency using local disks vs going to an array of some sort. Also if performance sucks, it means you did not spec it out properly for the work loads. These are all flash nodes.

As for snapshots,VM level yes, this client uses Dell EMC Avamar, that does snapshots, storage level snapshots are not needed like they used to be. You may want to read up on more recent vSAN documentation, plenty has changed in the vSAN space...

https://blogs.vmware.com/virtualblocks/2019/12/04/closer-look-vsan-snapshots/
 
Burticus Deployed 32 nodes VxRail across 4 clusters, just under $3 million in hardware, one cluster specifically for SQL only VMs. "If you're using a NAS for iscsi, or using Vsan.... the performance is going to suck." this is not true either, if anything performance is better as you eliminate latency using local disks vs going to an array of some sort. Also if performance sucks, it means you did not spec it out properly for the work loads. These are all flash nodes.

As for snapshots,VM level yes, this client uses Dell EMC Avamar, that does snapshots, storage level snapshots are not needed like they used to be. You may want to read up on more recent vSAN documentation, plenty has changed in the vSAN space...

https://blogs.vmware.com/virtualblocks/2019/12/04/closer-look-vsan-snapshots/
This is the only way to do it and imo still is a pita compared to always on
 
Back
Top