Monitoring and managing a large scale cluster often requires advanced tooling. System Administrators demand tools that help them to manage heterogeneous compute nodes, check cluster status at a glance, identify deviance, correlate node and job information, track changes, and the ability to integrate with existing IT infrastructure. Windows HPC Server 2008 admin console addresses all of the above problems with an integrated solution, revealed Wenming Ye, Microsoft technical evangelist.
Windows HPC Server 2008 Program Manager Rae Wang has put together a screencast of the monitoring and management capabilities of Windows HPC Server 2008, available via TechNet Edge. Also on TechNet Edge, Microsoft is offering a second screencast, this time focused exclusively on Windows HPC Server diagnostics. Generally speaking, the diagnostic process for Windows for supercomputers involves constantly monitoring the degradation in performance, but also validating clusters following deployment or configuration modifications. And last, but of course not least, the diagnostics capabilities will benefit admins during troubleshooting tasks for failures.
Windows HPC Server 2008 has 16 built-in diagnostics to help Sysadmins do diagnostics with ease. These diagnostic tests can be classified into the following categories, infrastructure, configuration report, and performance. Infrastructure tests include scheduler, system services, connectivity, and Service Oriented Architecture or the WCF broker model. While configuration report has application, network, software updates and system service tests available. Finally, we have two MPIPingPong tests that measure the cluster performance in terms of latency and bandwidth.