Keeping It Up - Windows Server Failover Clustering for HA Applications
September 07, 2016

Keeping It Up - Windows Server Failover Clustering for HA Applications

Anonymous | TrustRadius Reviewer
Score 10 out of 10
Vetted Review
Verified User

Overall Satisfaction with Windows Server Failover Clustering

We use Windows Server Failover Clustering for two primary reasons: high availability and simplification of performing systems maintenance. Our failover clustering allows critical applications to continue with only a minor interruption in service if a needed system resource fails. It also allows systems administrators to failover an application to a passive node in order to perform scheduled or un-scheduled maintenance on the other node, and then fail back if necessary, all with minimal interruption of critical business applications such as Microsoft SQL Server and BMC's Control-M Workload Automation.
  • Windows Failover Clustering is well suited to keeping critical applications online with only a brief outage in services during the actual failover. In some cases, it will disconnect user applications during the failover. That isn't a good thing, but better than taking the entire application down for a longer period of time to shutdown one server and bring another online.
  • Windows Failover Clustering can be easily configured to manage individual cluster resources. For example, we use BMC Control-M/Enterprise and Control-M Server. Our gateway resources for distributed systems and mainframe (z/Os), are managed well as individual resources within the cluster, allowing us to take a single resource offline when necessary, without having to take the entire cluster down.
  • When used in combination with Microsoft PowerShell (now also available to Linux systems), it provide tremendous ability to monitor, query, report, configure and deploy systems in high availability (HA) infrastructures.
  • The disconnection of services or users -- brief though it may be -- is a drawback to a seamless failover. The failover process is generally quick, and in many cases invisible to the business end user community, but with the variety of applications and how they interact with Windows Failover Clustering, sometime there is a brief outage (seconds) that does NOT go unnoticed.
  • Windows Server Failover Clustering in a Hyper-V environment can be a little tricky if the Hyper-V infrastructure is not properly configured at the cluster level for affinity. If you are considering using Windows Failover Clustering in combination with Hyper-V, be sure to set your affinity rules so that both nodes are never on the same host.
  • Error reporting is quite detailed, if you know where to look. What appears in the Critical Events list for a cluster, and even the Windows Event Logs can lead one to think that Microsoft overlooked that critical area. You have to dig deeper into the Windows logs -- not just the usual three of Application, System and Security -- to get meaningful and helpful detailed error data.
  • Windows Server Failover Clustering has enabled us to provide better adherence to SLAs while still keeping company data resources properly protected. For example, patching the operating system, repairing corrupted antivirus definitions, and the like.
  • Windows Server Failover Clustering also allows us to be more proactive in the area of system resources. If we see from our server monitoring that disk capacity is growing, we can take a node down, add resources to it (disk, CPU, memory) and then bring it back online -- all without the end users being aware that it was being done. In other words, no outage. SLAs remain high and IT management is happier.
  • Using Windows Server Failover Clustering on Hyper-V hosts enabled us to SIGNIFICANTLY reduct the cost of licensing Microsoft SQL Server, and by that I mean over $100,000 annually.
Several years ago we began using DoubleTake to cover our highly critical application, Control-M/Enterprise and Control-M/Server. We configured it to perform an automatic failover in the event of a critical failure. In that scenario, the system that was mirrored and came online assume the full identify of the original server. It also resulted in a short outage window, but at least the application and its data were not lost, and service was restored quickly. The downside of this was that it did not scale well from a licensing perspective for using it on many servers. The major downside of this -- other than cost -- was that if a system failed and DoubleTake performed a full system failover, the old server had to be completely rebuilt from scratch.
Windows ServerFailover Clustering works very well for applications that can sustain a short disconnect when failing over. It works, and works well, in providing single-node applications HA, meaning an active/passive setup. It is not a load balancing solution. Use NLB for that. Another area that it works well is when used in combination with Hyper-V. We set our Hyper-V hosts up as clusters, and those clusters also host clusters for SQL Server and other enterprise class applications like BMC's Control-M/Enterprise and Control-M/Server.

Using Windows Server Failover Clustering

  • Business Intelligence
  • Database Administration
  • Production Control
  • Product and Procurement
  • PeopleSoft HR
  • PeopleSoft Finance
  • Core Services
5 - Supporting Windows Server Failover Clustering requires the expertise of a trained Windows Administrator: preferably someone with certification as an MCSE or MCITP. Windows Server Failover Clustering will not be well or properly supported by someone who does know have a depth of knowledge of both Microsoft Windows Server and Windows Server Failover Clustering.
  • Microsoft SQL Server - ALL of our important databases run on Windows Server Failover Clustering in order to provide HA.
  • BMC Control-M/Enterprise and Control-M/Server. This enterprise class workload automation product is extremely critical to our business. Windows Server Failover Clustering provides us with the ability to meet SLAs for this application.
  • We are investigating ways to eliminate the need to install individual instances of Control-M modules on client servers by having them linked back to clustered module servers.
It has proven its value to us both for maintaining SLAs and providing the ability to perform much needed and regular systems maintenance without taking applications offline for more than a few seconds.

Using Windows Server Failover Clustering

With adequate knowledge, it is pretty easy to work with and manage a Windows Server Failover Cluster. It can, however, be very confusing in combination with Hyper-V to the neophyte. For example, learning when to use the Hyper-V Manager and when to use the Failover Cluster Manager.
Like to use
Well integrated
Requires technical support
Slow to learn
Lots to learn
  • Until you have the knowledge of how clustering works, and particularly how Windows clustering works, you will only end up banging your head. It is critical that a neophyte to Windows Server Failover Clustering learn and understand how it all works before embarking on a project as complicated as this can be.
  • Setting up the initial cluster can be very tricky. It isn't a case of just accepting the defaults and clicking on the "Next" button. You have to know what your doing. For example, you have to create a cluster resource with its own IP address separate from that of the nodes. IP for node 1. IP for node 2. IP for cluster. I would also suggest using a CNAME in DNS that points to the cluster name. That way, no matter which node is the active node you can still get to it.