Almost everyone that has ever used Windows has either heard of or experienced a bugcheck - the infamous "Blue Screen of Death." A system may bugcheck for different reasons, but the bottom line is that the operating system has experienced a catastrophic fault that prevents the system from continuing to run. We're going to cover some basic information about why a server may crash, explain how to configure and capture crash dumps and review some basic debugging of a crash dump.
Before we get started however, remember that there is a difference between a bugcheck and an application crash. A bugcheck is a kernel-mode crash, whereas an application crash is a user-mode event. We covered the differences between kernel- and user-mode memory in our Memory Management 101 post several months ago. So what are some common reasons why you may experience a bugcheck?
- A device driver or operating system function that runs in the kernel-mode space experiences an exception that it does not know how to handle (an unhandled exception). This would include trying to write to memory to which it does not have access, or trying to read an address that is not mapped and therefore invalid
- A kernel support routine is called that results in a reschedule when the Interrupt Request Level (IRQL) is Deferred Procedure Call (DPC) / dispatch level or higher. An IRQL is the priority ranking of an interrupt. The IRQL at which a piece of kernel-mode executes determines the hardware priority. DPC is a mechanism that allows the processor that is currently executing a critical task to perform less critical tasks by deferring their execution to some point later - when the IRQL drops below Dispatch level.
- A device driver or operating system function explicitly crashes the system because it detects that there is either corruption or some other situation indicating that the system cannot continue to function without risking data corruption
- Faulty hardware may also cause a bugcheck