Firmware Enhancements for PCs Running Windows 7 - 1

Firmware Enhancements for PCs Running Windows7

September 21, 2009

Abstract

To meet the expectations of Windows®7 users, the firmware of PCs running Windows7 must both be fast and present an attractive boot UI by improving graphics resolution. This paper documents the investigation and proof of concept enabling “fast and pretty” firmware. The document provides background, methods, and data.

This information applies to the following operating systems:
Windows7

References and resources discussed here are listed at the end of this paper.

The current version of this paper is maintained on the Web at:

Disclaimer: This is a preliminary document and may be changed substantially prior to final commercial release of the software described herein.

The information contained in this document represents the current view of Microsoft Corporation on the issues discussed as of the date of publication. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information presented after the date of publication.

This White Paper is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS DOCUMENT.

Complying with all applicable copyright laws is the responsibility of the user. Without limiting the rights under copyright, no part of this document may be reproduced, stored in or introduced into a retrieval system, or transmitted in any form or by any means (electronic, mechanical, photocopying, recording, or otherwise), or for any purpose, without the express written permission of Microsoft Corporation.

Microsoft may have patents, patent applications, trademarks, copyrights, or other intellectual property rights covering subject matter in this document. Except as expressly provided in any written license agreement from Microsoft, the furnishing of this document does not give you any license to these patents, trademarks, copyrights, or other intellectual property.

Unless otherwise noted, the example companies, organizations, products, domain names, e-mail addresses, logos, people, places and events depicted herein are fictitious, and no association with any real company, organization, product, domain name, email address, logo, person, place or event is intended or should be inferred.

© 2009 Microsoft Corporation. All rights reserved.

Microsoft, Windows, Windows Server, and Windows Vista are either registered trademarks or trademarks of Microsoft Corporation in the United States and/or other countries.

The names of actual companies and products mentioned herein may be the trademarks of their respective owners.

Document History

Date / Change
September 21, 2009 / First publication

Contents

Background

Proof of Concept

Boot Performance Improvements

Overview of the Boot Process

High-Level Boot Improvement Timing Results

BIOS Boot Performance Improvements

SEC Phase

PEI Phase

DXE Phase

BDS Phase

Other Performance Improvements

User Experience Improvements

Visual Improvements

Boot Graphics Timing

Boot Graphics Improvements

Performance Impacts of Visual Improvements

Fan Noise

Windows Access to Firmware Settings

Problem Overview

Windows Interfaces to Firmware

Accessing Firmware from Windows

Appendix 1: Cost Impacts

BOM Impacts

Engineering Process Impacts

Appendix 2: UEFI-Only Firmware

Appendix 3: References

Background

The boot user experience (UX) for Windows Vista® rated poorly compared to other products because the experience was slow, fragmented, and included multiple graphics-mode transitions with a low-resolution boot progress indicator. One of the vision areas of Windows®7 was to improve upon that experience by shortening and refining the graphics experience that users see during the interval between when the system power is turned on and when the logon screen is displayed.

To meet this goal, Windows7 enhances the boot UX by initializing the graphics device in high resolution, which eliminates most of the mode transitions. Also, Windows7 uses a high-resolution background with some animation to indicate smooth, continuous progress while the Windows7 kernel is initialized. The boot speed of Windows7 was also improved by other new features that tuned the bootloader and prefetcher.

Nevertheless, both the speed and appearance of the boot experience remain dependent on firmware because the percentage of boot experience dependent on firmware has increased. Apple Macintosh computers demonstrated that modern Unified Extensible Firmware Interface (UEFI) firmware could be optimized to deliver fast Power-On Self-Test (POST) times with streamlined user interface (UI). This has increased expectations for PCs running Windows7.

However, modern PC manufacturers have challenges that do not apply to Apple Macintosh, most specifically the need to manage a large number of configurations and price points, sometimes at narrow profit margins. The impact of these requirements on performance and appearance goals is addressed in this paper.

This paper documents the creation of proofs of concept to investigate these topics and describes the results that were achieved and how those results can be reproduced.

Proof of Concept

This paper assumes some familiarity with Advanced Configuration and Power Interface (ACPI)–compliant and UEFI-compliant basic input/output system (BIOS) technology and concepts. See the “Appendix 3: References” section for more details on ACPI and UEFI.

In 2008, Microsoft® worked closely with the Hewlett-Packard (HP) Consumer Notebook Division to tune several DV3 (“Diablo”) and DV4 (“Blade”) configurations running Windows Vista with Service Pack1. As a result, we had access to many units and were knowledgeable about Blade firmware, hardware, and drivers. Based on that knowledge, we selected the Blade configuration (typically DV4-1145go, although we also used other configurations) as the target for the creation of proofs of concept. We also selected Insyde Software Corporation as a prototyping partner. HP’s original device manufacturers (ODMs) had licensed InsydeH20firmware to create the DV4 design.

Microsoft and Insyde implemented time measurements of UEFI firmware runtimes by using Insyde’s benchmarking subroutines. We also measured circuit timing and embedded controller firmware timing with external instruments. The latter measurements were critical because the Montevina chipset used in Blade optionally supports a slower, lower-cost circuit configuration that significantly increased Blade’s time from power on to UEFI initialization, although we did not know this at the beginning of the project.

Boot Performance Improvements

Overview of the Boot Process

The boot process consists of four sections: Pre-BIOS, BIOS POST, Boot Loader, and operating system Boot. This paper is concerned with only the first two sections, and primarily with BIOS POST.

Blade’s InsydeH2O BIOS is based on UEFI, and consists of four main phases:

  • SEC. The Security (SEC) phase brings the system from CPU reset and makes temporary RAM available for stack and data storage.
  • PEI. The Pre-EFI Initialization (PEI) phase finishes initializing the CPU, makes permanent RAM (such as normal DRAM) available, and then determines the boot mode (such as normal boot, ACPI “S3” resume from sleep, orACPI “S4” resume from hibernation).
  • DXE. The Driver Execution Environment (DXE) phase initializes the rest of the system hardware (HW).
  • BDS. The Boot Device Selection (BDS) phase selects a boot device and then boots an operating system from it.

High-Level Boot Improvement Timing Results

Phase / Initial Timing (sec) / Final Timing (sec) / Improvement (sec) / Improvement (%)
Non-BIOS Circuit & Embedded Controller / 2.0 / 0.5 / 1.5 / 75%
Total BIOS / 9.8 / 6.1 / 3.7 / 38%
BIOS SEC / 0.3 / 0.2 / 0.1 / 35%
BIOS PEI / 3.4 / 2.8 / 0.6 / 19%
BIOS DXE / 0.8 / 0.8 / 0.0 / 4%
BIOS BDS / 5.3 / 2.4 / 2.9 / 55%
Total Duration / 11.8 / 6.6 / 5.2 / 44%

BIOS Boot Performance Improvements

Each phase offered opportunity for performance tuning, but some phases were simpler or more significant to tune than others.

SEC Phase

  • Code decompression. Code is largely compressed on ROM and is decompressed during POST. Compression and decompression are ideally asymmetric, because longer compression times are done only once at compile time, whereas faster decompression times benefit every boot.
  • InsydeH2O firmware already has an optimized decompression algorithm, so no further improvement was seen in this area of performance. Although no change was required for our proof of concept, this topic is listed here for completeness.
  • Cache. L1 cache as RAM is the fastest RAM available in the system, so early initialization of cache and use of cache for stacks and variable storage is critical for fast firmware initialization. Writing to NVRAM during SEC is strongly discouraged.
  • InsydeH2O firmware already made extensive use of system cache memory during SEC, so no further improvement was seen in this area of performance. Although no change was required for our proof of concept, this topic is listed here for completeness.
  • Throttling. CPU vendors provide mechanisms (such as Intel SpeedStep and AMD Powernow!) to adjust CPU voltage and throttle CPU clocking in order to manage power consumption and system heat. It is common for original design manufacturers (ODMs) to adjust these mechanisms to a low level during firmware initialization when adapting firmware to a new chipset or new chassis thermal design. It is critical that the settings be reset as high as possible and as early as possible during SEC after the chipset is known to be stable and the thermal characteristics of the chassis are established. Although the operating system will take control of these settings after boot, needlessly low settings during firmware initialization can have a significant negative impact on boot performance.
  • InsydeH2O firmware set SpeedStep to a high level early in SEC, so no further improvement was seen in this area of performance. Although no change was required for our proof of concept, this topic is listed here for completeness.
  • Microcode. Most ODMs manage many designs simultaneously for their original equipment manufacturer (OEM) partners. These designs may have significant overlap. As a result, ODMs may seek to manage configuration complexity by including a small number of firmware images that dynamically adapt to the specific hardware configuration. Typically this involves maintaining a large number of CPU microcode variations within the ROM and forcing firmware to search for the correct microcode during SEC. The number of CPU microcode variations must be kept as small as possible and the search algorithm must be optimized.
  • Our prototype contained only a single microcode version. The time savings was about 100 msec.

PEI Phase

  • Memory initialization. In a typical power-on POST, the system memory is dynamically detected, tested for errors, and then cleared to zero. Dynamic detection is performed in case the user has physically changed the memory configuration since the last boot. Memory testing and clearing is performed mainly for historical reasons and is not required at every boot for modern memory technology.
  • This project skipped the memory testing and did not clear memory to zero. Memory testing can take as long as 10 seconds per gigabyte of DRAM. However, the original Blade firmware was not performing a complete memory test so we were only able to achieve a 500-msec improvement. Note that a comprehensive system memory test can also be run in the BIOS setup menu environment, giving a user the ability to run such a check on demand without impact on every boot.
  • This project skipped dynamic detection of memory configuration by hard coding memory DIMM attributes. This might not be practical for many real-world system designs. However, whenever RAM is soldered to the system board, rather than populated into DIMMs, this method should be used to improve performance. In our case the time savings was 100 msec.

DXE Phase

  • USB topology exploration. In a typical POST, the USB host controllers are reset by firmware and their endpoint devices detected. Some of these endpoint devices can be USB hubs, which also need to have their endpoint devices discovered. Some of the endpoint devices that are USB hubs may have additional devices attached. Each USB hub requires a reset operation and a detection of the endpoint devices. This can take three or more seconds for each device, depending on the speed of the device.
  • This project supported discovery of USB hubs only on the motherboard. No USB devices or hubs plugged into the USB ports were discovered. Nested hubs on the motherboard were not discovered.
  • Blade hardware did not include nested USB hubs on the motherboard, so the restriction above did not apply. In order to support this methodology, the system design should always avoid nested USB hubs on the motherboard.
  • Laptops theoretically could avoid the restriction above because they always include an integrated pointing device and an integrated keyboard. Desktop designers would need to include keyboard and mouse detection but may still elect to eliminate discovery of external hubs and devices connected to external hubs. Likewise, laptop designers may decide to provide discovery of devices plugged into docking stations, but must be careful not to affect typical laptop performance in support of the docking station scenario.
  • To support Bitlocker (a volume encryption feature available in some versions of Windows Vista and Windows7 that requires the ability to load an encryption key from a USB thumb drive at boot time), the Windows Logo Program requires enumeration of the topology.
  • USB topology exploration also affects boot device selection if booting from USB is enabled. DXE can check whether USB boot is selected, and then explore the full topology only if USB boot is enabled. See the “BDS Phase” section later in this document for more information.
  • The time savings for USB setup was only 55 msec. Given the many tradeoffs listed above, limiting USB topology exploration is not a good choice for most OEM designs.
  • Other HW configuration. In addition to CPU microcode, other firmware support for hardware variations must be held in ROM and applied during firmware initialization. Unlike CPU microcode, these variant initializations change frequently. In order to maintain a single BIOS image, these configurations are typically held in an external, electrically erasable programmable read-only memory (EEPROM) and accessed through a serial interface. It is common for firmware to treat the EEPROM as if it were RAM and to access variables in it as needed throughout POST.
  • Since the EEPROM access is much slower than RAM, configuration data should be read all at once, and then cached in memory.
  • The time savings for HW configuration was 100 msec.
  • Multi-threading. All modern CPUs support multi-threaded operation. Wherever operations may be performed in parallel, they should be written to run as multiple threads. Typical firmware is single threaded.
  • Our prototype did not investigate the impact of multi-threading, but subsequent investigations will do so.
  • Note that synchronization primitives are not available and UEFI protocols are not guaranteed to run properly across multiple threads or processors (“ MP-safe”), so the burden of building multi-threaded firmware would largely fall on firmware writers to redesign their code in implementation-specific ways until the UEFI Platform Initialization Specification is modified to support this.
  • On the other hand, UEFI drivers are required to be re-entrant, so parallelization during DXE should be practical and provide the best return on engineering effort.
  • PS/2 devices. Many chipsets include support for a pair of PS/2 devices, this being intended for keyboard and pointing devices. However, PS/2 initialization is slower than USB initialization.
  • Blade included PS/2 devices for the internal keyboard matrix and for the touchpad. This investigation did not implement a keyboard or touchpad without PS/2 support. The time savings for such a system is estimated to be about 50 msec.
  • Although PS/2 devices have slow initialization, USB devices that support PS/2 are disruptive to developers because the emulation is usually driven by system management interrupts (SMIs), which interfere with proper operation of debuggers. The ideal implementation is one in which PS/2 support in a USB device is implemented by native UEFI drivers without any dependency on System Management Mode (SMM).
  • Option ROMs. Video, Universal Network Device Interface during PXE (PXE UNDI), and storage controller devices often include their own ROM with BIOS extensions. These usually are in the option ROM form. Option ROMs are a de facto standard but in practice have wide variations of form and performance. For best performance, PXE UNDI, SATA, and video drivers should be provided as UEFI drivers, not as option ROMs.
  • Blade included option ROMs for video and SATA. This investigation did not implement native UEFI video or SATA ROM. The time savings for a native UEFI system is estimated to be between 0.5 sec and 1.0 sec.
  • Excess SATA channels. Chipsets often offer more channels of SATA than are utilized in the final design. Because ODMs often use the same BIOS image on many systems, these channels may be erroneously initialized during DXE, and then enumerated to Windows, leading to further delays during boot as Windows configures this redundant hardware. Excess hardware should not be initialized or enumerated to Windows.
  • Blade did not include excess initialized SATA channels. If it had, the time savings is estimated to be about 800 msec in firmware and 2.0 sec in Windows boot time.
  • Blade has its optical disk drive (ODD) on SATA port 4/5. This was done by design because the firmware takes longer to enumerate than if the ODD were on SATA port 2/3 or on port 0/1.

BDS Phase

  • HDD as single boot device. Normally during POST, each bootable device is dynamically detected and the media examined to see whether it is bootable. It takes 2–4 seconds for DVD/CD devices that have long latency times to spin up the rotating media.
  • Our prototype attempted only to improve time booting from the primary hard disk drive (HDD). The internal HDD was first in the boot order by default. The time savings was 2.9 sec.
  • Some OEMs prefer to set the DVD drive as the default boot device because users who are unable to boot from a DVD might not know how to change the default boot settings and might generate a support call. To avoid this, our prototype includes a predefined hotkey that provides the option to boot from DVD. If the user presses the hotkey during POST, the DVD is selected as the boot device. The hotkey should be displayed as early as possible and the time-out value kept short, or the hotkey feature itself will contribute boot delay.
  • Booting from USB requires that the USB boot device be detected during DXE, which is a timing trade off. We suggest that detection of a USB boot device be performed only if USB boot is selected by the user.
  • During all on and off transitions (such as power on, resume from S3 sleep, and resume from S4 hibernate), boot media should be spun up as early as possible during initialization to reduce I/O latencies.
  • SSD to replace HDD.
  • All on and off transitions can benefit from solid-state memory to store the system volume, rather than a rotating storage device, to avoid spin-up latency.
  • We did not test this impact. The degree of improvement depends on other factors (bus speed and bus enumeration cost, for instance), so system timing analysis is required to determine the best boot device selection. Obviously, some price points cannot support SSDs in place of HDDs.

Other Performance Improvements

  • Compiler optimizations. Carefully examine the impacts of any compiler optimizations when building a production BIOS. This gives you the most control over timing and code size.
  • Montevina embedded controller circuit options. Typically, a separate microcontroller integrated circuit (IC) performs keyboard matrix scanning and decoding, monitors the laptop battery and charger, checks the initial thermal state, and can also implement other features such as the power button, lid button, and CD drive. The spec of this controller is generally proprietary to an ODM. The features of the controller and its firmware are not exposed to the BIOS firmware or to the operating system.
  • Some embedded controllers (ECs) such as Renesas H8, maintain electrically programmable read-only memory (EPROM) on chip, but others require external ROM storage for its own microcontroller instructions.
  • For ECs with external ROM, Montevina offers two options, shown in figures 1 and 2, below.