Measuring the performance of an operating system is a tricky thing. At the same time, it’s the right and necessary thing to do, because performance is one of many criteria important to customers. Part of the trick of measuring performance is to time testing execution with the product cycle such that the results are as meaningful as possible for customers; this helps them make a better decision by making use of the full array of available information. As one example, about a year ago we commissioned a firm called Principled Technologies to conduct a study comparing Windows XP SP2 to Windows Vista RTM. That study found the performance measures of the two operating systems were within the same range for many tasks that home and business users frequently perform under real-world conditions.
My point is that we waited to conduct these benchmarking tests until Windows Vista had reached the RTM milestone in the product cycle, as this allowed us to provide our customers the most meaningful data available at the time — the data most likely to directly affect their decision to upgrade to Windows Vista. We do a whole range of performance tests at every stage of the OS development process, but, as a general rule, we avoid sharing benchmark tests of software that hasn’t gone RTM (i.e., final code). This explains why we have not to date published any findings of benchmark tests (nor commissioned anyone to do so) on performance improvements brought about by Windows Vista SP1. Publishing benchmarks of the performance of Windows Vista SP1 now wouldn’t be a worthwhile exercise for our customers, as the code is still in development and, to the degree that benchmarking tests are involved, remains a moving target.
Aside from that point, let me also emphasize that there are a variety of ways to benchmark the performance of a PC. Different techniques can yield different results. Some benchmark techniques simply test PC hardware performance by running a series of tasks at superhuman speed. Such tests tend to exaggerate small differences between test platforms and consequently are used less frequently nowadays, replaced in favor of benchmarks running tasks at human speeds with realistic waits and data entry. Benchmarks that run at superhuman speeds often deliver results that don’t tell the whole story. In fact, we made deliberate choices during the development of Windows Vista to focus on real-world scenarios affecting user experience, rather than focusing on improvement of microsecond operations imperceptible to the user. In addition, in Windows many operations can require additional processing time for work that is done for reasons that benefit the customer; these can include security, reliability or application compatibility checks conducted when a program launches. These operations may add microseconds to an individual application’s launch that under real usage isn’t perceivable to the human eye. When thousands such operations are strung together through automation, those few microseconds can have a cumulative effect on the benchmark result, causing performance to appear much better or worse than expected.
I’ve included below a video we captured depicting a “benchmark test” running a window-open, window-close routine at accelerated speed. You can see that it isn’t representative of real-world user behavior and hence isn’t an accurate gauge of the actual end-user experience. Further, tests like these only measure a very small set of Windows capabilities and so aren’t representative of the user’s overall day-to-day experience of working with Windows and running applications.
Windows Vista, Windows XP, Performance, Service Pack, Microsoft