It is no secret that HPC Server 2008 will offer the option to make the head node of a HPC cluster highly available. This feature is not in beta 1, but it is being developed. It will exploit fail-over mechanisms provided by Server 2008 (enterprise edition or better), so I thought I’d mention some highlights in this area too.
High-availability clusters are difficult to set up and troubleshoot on several platforms. With Windows Server 2003 we made progress in simplifying them, but limitations are still significant:
- You need a configuration that is fully and specifically certified as a cluster in order to obtain support when things go wrong.
- There is very limited support for geo-clusters, because of limitations in intra-cluster communications, no awareness of storage location and cluster quorum models. Also, geo-clusters require yet another level of certification.
- Writing cluster-aware applications is not easy. It requires knowledge of cluster-specific APIs in order to produce “resources” usable by the cluster software. Scripting generic application fail-over is supported, but limited in functionality.
- Troubleshooting by reading cluster logs requires very deep knowledge to interpret the cryptic messages therein.
Microsoft, HPC, Server, Windows Server 2008, HPC Server, Cluster, Clustering