Paper Title (Use Style: Paper Title) s1

Use of distributed or Parallel Computing to Crack Windows Password Hashes

Shishir Jha (Author)

Department of Computer Science

Hood College

Frederick Maryland, USA

Abstract — Cracking windows password hashes has been an inherently single process based algorithm that requires extensive computing resource and time. Since use of parallel or distributed computing can address the need of computing resource by delegating the processing to multiple nodes, this paper is an attempt to see if password cracking be modified to take the advantage of multiple nodes and find the potential speed up.

Keywords-LM Hash, parallel, distributed, dictionary attack, windows password

I. Introduction

Passwords have been used for centuries as a method for challenging the credentials of someone attempting to enter a secured compound or accessing a private gathering. Computers are no different and require one form or other of passwords to let only certified personnel access the information in it. The trend of using password in computers started in sixties and seventies when a need for more structured method of processing the access control emerged in the computing world.[1]

The first password schemes relied simply on a flat file located on disk or in memory which contained the user names and passwords. These password files were typically locked by the operating system however some systems allowed any user with the appropriate privileges to access the file.

When Windows 95 was released, user account information was placed in a .pwl file or password list file and the password file was encrypted using the password and an RC4 encryption algorithm. When a user entered his or her credentials, the password was encrypted and the checksum of the .pwl file was compared against the checksum derived from the user credentials. By the time Windows XP was released, the passwords were moved to a SAM (Security Account Manager) file. The SAM file was encrypted with the SYSKEY. The passwords were still not placed in the file directly but rather a hashed version of the password was saved. The Hashing algorithm was based on the MD5 hashing. Access to the computer required the user to enter a password and comparing the hashed output with the contents of the SAM file.

Hashes in windows SAM file are computed using either the LM has method or the NTLM Hash method. Although it is based on DES encryption, LM Hash is not a true one-way function. Because of way the LM Hash function is implemented, there are several weaknesses in its implementation which allows careful programmers[2] to use different algorithms to get the password from the MD5 hashes.

II. Backgound

A. LM Hash

Though NTLM has primarily replaced the LM Hash in all the application protocols used to authenticate remote users and to provide session security when requested by the application, LM is still used in vast majority of Windows machine that are not part of any remote domain and Active Directory based networks.[3] However, Windows before the introduction of Vista still compute and store the LM hash by default for compatibility with previous generation clients that still use 16 bit applications which requires LM Hash based authentication. This paper is an attempt to attack this security hole present in the Windows architecture and gain access to users authentication credential.

To better understand the premise of the problem that this paper is attempting to solve, it is necessary to first understand how LM hash encrypts the user credentials. The general steps in LM Hash computing are as follows: [2]

1. The user entered ASCII password is converted to UPPERCASE

2. This password is then made 14 byte long by null padding and split into two 7-byte halves

3. These values are used to create two DES keys, one from each half. This generates a 64 bit DES Keys

4. Each of these keys is used to DES-encrypt the constant ASCII string, resulting in two 8-byte cipher text values.

5. These two cipher text values produce a 16 byte long cipher text which is result of concatenating the two 8-byte values.

The hashes produced by both NTLM and LM are stored in the System32 directory of the Windows installation in a SAM file which itself is encrypted by using a different file.

B. Different ways of extracting the password

Though LM Hash uses a DES encryption key, because of the encryption methodology used by the LM, the encryption is not a true one way function and with some widely known hash attack algorithms and fair bit of time and computing resource, the hash can be reversed to the user credentials. Using simple math it can be shown that total number of passwords to be cracked is ideally 295 but because of the different steps like converting the characters to uppercase and splitting the whole password into two the real problem domain for any hash attack algorithm is reduced to 243.

The most common form of cracking method used is Dictionary attack. Ideally, it does nothing more than comparing the hash function obtained from the host computer with that in pre-existing dictionary of commonly used passwords. This way of cracking method is a long attempt and usually works for weak and commonly used passwords. The ability of the algorithm to crack the password is directly dependent upon how diverse and big the dictionary is.

The next technique used in cracking of hashes is the brute force method. Essentially, a password generator, generating all possible of combination of words until a possible match is found. Though bound to work, the time complexity increases with password length and set of different characters and symbol used to generate the password.

The third method and now widely used method is a cracking utility by Zhu Shuanglei called Rainbow Table Method. His tool is based on Philippe Oeshslin’s faster time-memory trade off technique. This method proposes a new way of pre-calculating the data which reduces by two the number of calculation necessary for cryptanalysis [4]. For use in cracking passwords, this method is almost equivalent to brute-force attack but Rainbow Table uses pre-calculated chains of words stored in the table. Though the cracking speed decreases by fold of 100’s if not 1000’s using this method, the main pitfall of this method is the time investment required to build the tables. However, thanks to the internet and wide availability of tools and resources under open license there are numerous libraries available for free in the internet which provides users with pre-build tables.

Identify applicable sponsor/s here. (sponsors)

III. Present Work

There has been wide spread work in this field as recovering passwords for both legitimate and illegitimate purposes has been a necessity ever since the start of its use. Though there are freely available tools that will attempt at cracking the password using one of the numerous methods mentioned above, there is no guarantee of results. This is primarily because of the possible time complexity and lack of proper utilization of the computing resource available in today’s multi core and high speed interfaced computers.

To fill in the gap left by these free tools there are commercially available tools in the market that use specialized algorithms, pre hashed tables, rainbow tables to crack the password. Further more, there are online web sites that offer services to crack your password, provided you can give them the hash function from the SAM file. These online sites use massive collection of rainbow tables ranging anywhere from 60 to 160 GB to facilitate its user with cracking of their purpose.

Inherently almost all the programs available in the internet that are downloadable are single processor based programs and not optimized to take benefit of either multiprocessor environment. Though in reality the general public might not have access to a hugely parallel system or a distributed computing for their daily use, the fact that these systems cannot even use the processing power of general multicore processors in use today puts these programs at a disadvantage. Because of this and other reasons out of scope of this paper, acceptance of use of such programs has not grown over the year. From the presence of different commercial tools and websites available in the internet it is evident that need of a tool which can run faster and utilize the available computing resources in general users computer exists.

It is however worth mentioning that there are projects which have implemented crack tools in various parallel programming paradigms that utilize the multiple cores and other recent advancement in computing resources available, not much of material is available for review.

Furthermore, it has to be noted that there are numerous implementation of crack tools that utilize the massively parallel capability of today’s graphics card to do calculation for cracking hashes generated by different programs. Primarily programmed using CUDA and similar interface, this kind of crack tool claim huge improvement in cracking time over traditional single and SMP based computers. But because using such programs mandates presence of compatible graphics card in users’ computer, it is not always feasible for using such programs.

Since cracking password is among many different families of present problems that can utilize parallel computing techniques to speedup their performance, this paper and the final project is an attempt at measuring how much of difference can using parallel and distributed computing can make over traditional methods of cracking a password.

IV. APPLYING PARALLEL COMPUTING TECHNIQUES

Applying parallel computing technique to a problem that is repeated time and over which can be broken into smaller pieces seems to be a pretty intuitive thing to do. All crack tools essentially have an approach where the algorithms serially attack the hash in hand with the ones that are either present in a table, dictionary or are generated depending upon the constraints set by the user. As the data set that the algorithm is working on is usually mutually exclusive or can be made exclusive to each other, there is a distinct possibility that using parallel computing or distributed computing in which data set that the algorithm acts upon can divided will speed up the process.

Though there are numerous constraints on how the algorithms works in some of the method explained above, Brute force attack and dictionary attack work on set of data and constraints that are pre-defined and not dependent upon pre-formulated chains as in Rainbow attack. Hence, by dividing and distributing either the dictionary in dictionary attack or the random password generator constraint in the brute force attack, theoretically, a good speed up can be achieved by using parallel or distributed computing. Since the overall hash calculation and matching will be done in the cluster or individual processor there is minimum communication and setup overhead which compared to the processing time necessary should have insignificant effect on overall computing time. This means that there will be less communication overhead resulting in a better speed up.

The attempt here will be to parallelize or distribute the dictionary attack algorithm for cracking the hash file using MPI or OpenMP based parallel computing approach by distributing the hash file to multiple nodes and with pre-assigned dictionaries to perform the algorithm. On success the node will reply to the root node to indicate that matching hash has been found and performance time will be noted.

V. Expected result and conclusion

Though the expected speedup of the algorithm is ideally number of processes spawned by the MPI program, this is true just for worst case scenario. However, even for a general case, this simulation of parallel computing should yield some substantial speed up data that can be used to measure the viability of using parallel processes to speed up programs that have inherently been designed for single processor based computers. As we know that result of any parallel computing needs a sizeable amount of data for producing any legible data. Though the data in the test might be slightly limited because the processing is being distributed among more than one node, the overall result should produce data that can be used legibly for drawing conclusions for the work being done.

References

[1] M Naylor, S Jha, S K Mehta, “Hacking Windows Password,” unpublished.

[2] Microsoft Knowledgebase, www.microsoft.com

[3] Brian Wilson, “LM & MD5 Hash Security & Cracking,”

[4] PhilippeOechslin, “Making a Faster Cryptanalytic Time-Memory Trade-Oﬀ, ”