Microsoft Windows 2000 Server

Windows 2000 Chkdsk Management

Microsoft Product Support Services White Paper

Written by Elden Christensen

Additional contributions byTerence Hosken, Wim van Wieren

Published on January 2003

Abstract

This white paper contains informationabout the Chkdskcommand-line utility. The Chkdsk utility creates and displays a status report for the disk and lists and corrects errors on the disk. The paper presents general information about how to use Chkdsk and specific information about how to use Chkdskwith Microsoft Windows2000 Server Cluster.

Some material in this white paper is adapted from Microsoft Knowledge Base articles and Microsoft TechNet articles (including material from other Microsoft books, resource kits, and other references).

The information contained in this document represents the current view of Microsoft Corporation on the issues discussed as of the date of publication. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information presented after the date of publication.

This White Paper is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, AS TO THE INFORMATION IN THIS DOCUMENT.

Complying with all applicable copyright laws is the responsibility of the user. Without limiting the rights under copyright, no part of this document may be reproduced, stored in or introduced into a retrieval system, or transmitted in any form or by any means (electronic, mechanical, photocopying, recording, or otherwise), or for any purpose, without the express written permission of Microsoft Corporation.

Microsoft may have patents, patent applications, trademarks, copyrights, or other intellectual property rights covering subject matter in this document. Except as expressly provided in any written license agreement from Microsoft, the furnishing of this document does not give you any license to these patents, trademarks, copyrights, or other intellectual property.

 2002 Microsoft Corporation. All rights reserved.

Microsoft, Active Directory, Microsoft Press, Win32, Windows,WindowsNT, and the Windows logo are either registered trademarks or trademarks of Microsoft Corporation in the United States and/or other countries.

The names of actual companies and products mentioned herein may be the trademarks of their respective owners.

Contents

Introduction......

about CHKDSK

What Are My Options

NTFS and File System Consistency......

NTFS Transaction Log Recoverability

Cluster Remapping

Chkdsk Phases

First Pass: Files and Folders

Second Pass: Indexes

Third Pass: Security Descriptors

Fourth Pass: Sectors

Running Chkdsk on a Stand-alone server......

Run Chkdsk in Read-Only Mode

Run Chkdsk in Repair Mode

Chkdsk Syntax and Optional Command-Line Switches

Using the /c and /i Command-Line Switches to Run an Abbreviated Chkdsk

The /c Command-Line Switch

The /i Command-Line Switch

Autochk

Prevent Autochk from Running

Chkntfs

Chkdsk and Server Clusters......

How to Run Chkdsk on a Server Cluster

If Handles Remain Open, or If the Cluster Contains a Single Shared Drive

Cluster Resource: Enhanced Private Properties

Chkdsk Status Codes in the Cluster Log

Capturing Chkdsk Results for Server Clusters

Differences in WindowsNT4.0

Differences in Windows2000 Versions Earlier than SP1

Differences in Windows2000 Advanced Server SP1 and Windows2000 Datacenter Server

Windows2000 Chkdsk Improvements......

Chkdsk Performance improvements

How Long Will Chkdsk Take to Run?

Variable 1: The “Indexes” Phase

Variable 2: The Condition of the Volume

Variable 3: Hardware Issues

Variable 4: The Chkdsk Settings

About Running Chkdsk in Read-only Mode

Conclusion......

For More Information......

Appendix......

How to Automate Chkdsk

FSUtil.exe

Introduction

As the options for data storage consolidation evolve, Microsoft strives to make sure that the Microsoft®Windows®2000operating system continues to be highly reliable. For example, with the shift to Storage Area Networks (SANs), Windows2000 addresses new concerns with faster recovery and offers new features,such as Active Directory® directory service, that increase scalability and improve manageability.

This white paperdescribes some of the improvements that Microsoft has made in the Windows2000Chkdskutility and describes ways to manage corrupted volumes. It also describes considerations of running Chkdskon a server cluster in Microsoft Windows2000 Advanced Server or Microsoft Datacenter Server.

Microsoft has significantly improved Chkdsk performance in Windows2000 and continues to improve its performance to address the challenge of new I/O hardware technology that puts more and more data on “single,” very large, growing volumes. Microsoft also has enhanced the NTFS file system to minimize failures.

To complement these improvements, organizations that use Windows2000must apply “best practice” operational management andmust develop recovery procedures and disaster recovery processes tominimize system outages of all types. By applying these best practices, you can drive the recovery process, instead of being a victim of system failures.

about CHKDSK

Chkdsk is a command-line utility that verifies the logical integrity of a file system on Windows2000. NTFS, which maintains the integrity of all NTFS volumes, automatically runs Chkdsk the first time that Windows2000 mounts an NTFS volume after the computer is restarted following a failure.You can also manually run Chkdsk or schedule Chkdsk to be run if you suspect there may be file system corruption.

Chkdsk examines all the metadata on a volume, compares it to the transaction logs that are maintained by NTFS, and if it finds logical inconsistencies, it takes actions to repair file system data. Metadata is “data about data.” It is the file system overhead, so to speak, that NTFS uses to keep track of everything about all the files on the volume. For example, metadata tells NTFS which allocation units make up the data for a particular file, which allocation units are free, and which allocation units contain bad sectors.

If Chkdsk runs at a time other thanduring the startup process, the code that actually performs the verification resides in utility dynamic-link libraries (DLLs), such as Untfs.dll and Ufat.dll. The verification routines that Chkdskruns are the same ones that are run when Windows Explorer or Disk Administrator verifies a volume through its graphical user interface (GUI). If Chkdsk runsduring the startup process, the binary module that contains the verification code is Autochk.exe.

Autochk is anintegrated Windows2000command-line utility that runs early enough in the system startup process that it does not have the benefit of virtual memory or other Win32® application programming interface(API) services. Autochk generates the same kind of textual output that Chkdsk does,except thatin addition to displaying this output on the screen during the startup process, Autochk also logs an event to the Application event log for the system.This event contains as much textual output as can fit into the event log's data buffer.

Because Autochk and the verification code in the utility DLLs that are used by Chkdsk are based on the same source code, this white paper will sometimes refer to Autochk and Chkdsk collectively as Chkdsk.

After the release of Microsoft WindowsNT®4.0 Service Pack 4 (SP4) and Windows2000, Microsoft added two new command-line switches, /i and /c, to Chkdsk. Theseoptions are only valid when the destination drive has the NTFS file format. Each option directs Chkdsk to bypass certain actions, which reduces the time it takes Chkdsk to run. The /coption directs Chkdsk to skip the checking of cycles in the folder structure, and the /ioption directs Chkdsk to perform a less vigorous check of index entries.

These command-line switches are intended for users with exceptionally large volumes who require flexibility in managing system downtime. Because the use of the/c and /ioptions can result in a volume remaining corrupted after Chkdskhas completed, it is a good idea to use these optionsonly in situations in which system downtime must be kept to an absolute minimum.

To understand when it is appropriate to use these command-line switches, it is important to understand some of the internal NTFS data structures, the kinds of corruption that can happen, what actions Chkdsk takes when it verifies a volume, and what the potential consequences are if you circumvent the typical Chkdsk verification steps.

What Are My Options

When disk corruption is detected on a volume, you can respond in any of four ways:

  • Run a Full Chkdsk
    This option repairs all file system data and restores all user data that can be recovered by means of an automated process. The drawback to this option is that a full Chkdsk can require several hours of downtime for a mission-critical server at an inopportune time. However, in terms of data recovery, this is the recommended course of action.
  • Run an Abbreviated Chkdsk
    By using some combination of the /c and /icommand-line switches, you canrepair the severe kinds of corruption that can grow into bigger problems in much less time than a full Chkdsk requires. However, this option does not repair all the corruption that might exist. A full Chkdskis still required at some future time to guarantee that all the data that can be recovered will be recovered.
  • Do Nothing
    For a mission-critical server that is expected to be online 24 hours a day, this is frequently the necessary choice. The drawback to this option is that relatively minor corruption can grow into major corruption if it is not repaired as soon after it is detectedas possible. Therefore, consider this option only when keeping a system up is more important that the integrity of the data that is stored on the corrupted volume.Keep in mind that all data on the corrupted volume is “at risk” until Chkdsk is run.
  • Format the Partition and Restore from Tape
    When Chkdsk is run against a volume,Chkdsk may not correctly recover 100 percent of the data if there is extreme corruption. If you have a high-speed tape backup solution and a known last-good backup, it may be just as fast or faster to reformatthe partition and then restore the data from tape. This is a rare scenario. Use this option only in extreme situations with careful consideration.

How long will Chkdsk take to run? This frequently asked question has no quick answer. For information about the factors that affect the length of time that Chkdsk takes to run, see “How Long Will Chkdsk Take to Run?”on page 31of this white paper.

NTFS and File System Consistency

To better understand Chkdsk and its command-line switches, it is important to understand the basics of some of the internal NTFS data structures. NTFS is a recoverable file system that maintains volume consistency by using logging techniques. If the operating system stops responding (crashes or hangs), NTFS restores consistency by running a recovery procedure that accesses information that is stored in a log file. NTFS does not guarantee protection of user file data. If the system crashes while a program is writing a user file, the file can be lost or corrupted, and you may need a file system checker.

A file system's correctness and validity may have to be verified if there is serious corruption of a metadata file or corruption of user data.Its correctness and validity can be checked by using a file system's check command.In WindowsNT and Windows2000,this command is Chkdsk. Chkdskcan repair any problems it finds in the file system and alert you if there are any unrepairable issues.

When you format an NTFS volume, the format program creates a set of files that contain the data that is used to implement the file system structure. NTFS reserves the first 16 records in the Master File Table (MFT) for the information about these files, named metadata. Metadata is data that is stored on a volume in support of the file system format management.Typically, it isnot made accessible to applications. Metadata includes the data that defines the placement of files and directories on a volume. In NTFS, all data that is stored on a volume is contained in files, including the data structures that are used to locate and retrieve files, bootstrap data, and the bitmap that records the allocation state of the whole volume. Metadata file names startwith a dollar sign (for example, $Bitmap) and are hidden.

NTFS recovers metadata after a crash by using standard transaction logging and recovery techniques. If an I/O failure occurs when the operating system iswriting data to the disk, NTFS restores consistency by running a recovery procedure that accesses information that is stored in a log file. The NTFS recovery procedure is exact, guaranteeing that the volume is restored to a consistent state.

NTFS does not protectuser data—that is, the contents of files—through the use of a transaction log, the way it protects metadata. NTFS does not guarantee the integrity of user data after an instance of disk corruption, even if you immediately run a full Chkdsk operation. Chkdsk may not be able to recover some files, and some files that Chkdsk does recover may still be internally corrupted. It remains vitally important that you protect mission-critical data by making periodic backups or by using some other robust method of data recovery.

NTFS maintains the integrity of all NTFS volumes by automatically running Chkdsk and performing disk recovery operations the first time that Windows2000 mounts an NTFS volume after the computer is restarted following a failure.

NTFS views each I/O operation that modifies a metadata file on the NTFS volume as a transaction and manages each one as an integral unit. After the transaction is started, the transaction is either completed or, if an I/O operation failure occurs, rolled back (such as when the NTFS volume is returned to the state it was in before the transaction was started).

To make sure that a transaction can be completed or rolled back, NTFS performs the suboperations of the transaction on the volume. After NTFS updates the volume, it commits the transaction by recording in the log file that the whole transaction is complete. Both the log file entries and the volume updates are buffered by the system’s file cache.

After a transaction is committed, NTFS makes sure that the whole transaction appears on the volume, even if the I/O operation fails because of a system shutdown or crash. During recovery operations that occur the next time the volume is mounted, NTFS redoes each committed transaction that it finds in the log file. Then NTFS locates the transactions in the log file that were not committed at the time of the system failure and undoes each transaction suboperation that is recorded in the log file. In this way, incomplete modifications to the volume metadata are prohibited.

ImportantNTFS uses transaction logging and recovery to guarantee that the volume structure is not corrupted. For this reason, all metadata files remain accessible after a system failure. However, user data can be lost because of a system failure, a bad sector, or a simple delete operation that is initiated by a user. NTFS does not implement transactional logging to protect user data.Regular backups of data are highly recommended.

NTFS Transaction Log Recoverability

Each file on an NTFS volume is listed as a record in the MFT. The first record in the table describes the MFT itself, and the second record describes a special file that mirrors the first few entries of the primary MFT. If thefirst MFT record is corrupted, NTFS uses the second record to find the MFTmirror file that is stored at the end of the logical disk, in which the first record is the same as the first record of theMFT. The boot sector records the locations of both the MFT and MFT mirror files. The MFT mirror file does not contain 100 percent of the whole MFT.Instead, it contains only the first few critical entries that Windows must have to mount the volume.

The third record in the MFT is the log file that records all file transactioninformation. NTFS and the Log File Service use the DATA attribute of the logfile to implement file system recoverability. The Log File Service is acomponent of WindowsNT Executive. Because the log file is a system file, it can be found early in the startup process and used to recover the disk volume if the volume is found to be corrupted. When a user updates afile, the Log File Service records all redo and undo information for thetransaction. For recoverability, redo information allows NTFS to roll thetransaction forward (repeating the transaction), and undo allows NTFSto roll the transaction back if an error occurs.

Committing data to the disk involves the following steps:

  1. NTFS writes a log file record that notes the volume update that it intends to make.
  2. NTFS calls the cache manager to flush the log file record to disk.
  3. NTFS writes the volume update to the cache, modifying the cached metadata.
  4. The cache manager flushes the modified metadata to disk, updating the volume structure.
  5. NTFS writes a log file record that flags the transaction as having been completed.

If a transaction is completed successfully, NTFS commits the file update to disk. Ifthe transaction is not completed, NTFS ends or rolls back the transactionaccording to the undo information. If NTFS detects an error in the transaction,it rolls back the transaction. If NTFS cannot guarantee that a transactioncompleted successfully, it rolls the transaction back. Incomplete modificationsto the volume are not permitted.