Windows: Track Down Elusive Network Problems

You've probably seen it happen many times—your machine can't communicate with other machines and you don't know why. Your management system sits on one segment of a routed network connected to other network segments by means of a router such as Microsoft Internet Security and Acceleration (ISA) Server or another hardware device. When you attempt […]

You've probably seen it happen many times—your machine can't communicate with other machines and you don't know why. Your management system sits on one segment of a routed network connected to other network segments by means of a router such as Microsoft Internet

Security and Acceleration (ISA) Server or another hardware device. When you attempt to manage 10, 20, or even 100 systems, you don't encounter any problems. But when you attempt to manage 500 systems, your computer is unable to communicate on the network except with the machines with which it already has open connections. You cannot communicate with any other systems, you cannot get onto the Internet, yet no one else on the network, including on your segment, is experiencing this phenomenon. Where would you look first?

Diagnosing the Problem: The most common assumption in this situation is that the management software is faulty. Many proactive management tools connect to and manage your systems, but sometimes these tools themselves can cause the problem you're trying to track down. That's because a proactive management tool can spawn thousands of connections to your devices in the name of better management. Windows® will keep these connections open for two minutes by default even when the connections are idle, unless the tool, application, or service keep them alive longer. This means that even though your management system has not spoken to any machines in two minutes, you may still have more than 1,000 connections open. (You can view open connections by running NETSTAT in a command prompt. The NETSTAT command will show you all open, pending, and closing connections to and from your system and give you their status. Descriptions of the status messages can be found within RFC 793 at tools.ietf.org/html/rfc793.)

To rule out malfunctioning management software, you can create a batch file that establishes connections to the remote systems. If the same problem occurs while running the batch file, you'll know the management software and its threads were not to blame. Here's an example of the required batch file's contents:

Net use \\system01\ipc$
Net use \\system02\ipc$
Net use ....

If the management program in question happened to implement its own networking and authentication stack, it might have been the culprit, but in agentless solutions like most of these management packages, the tool uses the operating system's networking and authentication stacks to perform network operations. Using a batch file that launches just as many network connections without causing the failure will show that the issue is not a result of the program's use of the operating system's networking and authentication stacks, as the batch file uses them as well.

Full Article